Evaluating the maximum likelihood method for detecting short-term variability of AGILE γ-ray sources

A. Bulgarelli; A. W. Chen; M. Tavani; F. Gianotti; M. Trifoglio; T. Contessi

doi:10.1051/0004-6361/201118023

Home

All issues

Volume 540 (April 2012)

A&A, 540 (2012) A79

Full HTML

Free Access

Issue		A&A Volume 540, April 2012


Article Number		A79
Number of page(s)		12
Section		Astronomical instrumentation
DOI		https://doi.org/10.1051/0004-6361/201118023
Published online		28 March 2012

A&A 540, A79 (2012)

Evaluating the maximum likelihood method for detecting short-term variability of AGILE γ-ray sources

A. Bulgarelli¹, A. W. Chen³, M. Tavani²^,4, F. Gianotti¹, M. Trifoglio¹ and T. Contessi³

¹ INAF/IASF–Bologna, via Gobetti 101, 40129 Bologna, Italy
e-mail: bulgarelli@iasfbo.inaf.it
² INAF/IASF–Roma, via del Fosso del Cavaliere 100, 00133 Roma, Italy
³ INAF/IASF – Milano, via E. Bassini 15, 20133 Milano, Italy
⁴ Dip. di Fisica, Univ. “Tor Vergata”, via della Ricerca Scientifica 1, 00133 Roma, Italy

Received: 5 September 2011
Accepted: 7 January 2012

Abstract

Context. The AGILE space mission (whose instrument is sensitive to the energy ranges 18–60 keV, and 30 MeV–50 GeV) has been operating since 2007. Assessing the statistical significance of the time variability of γ-ray sources above 100 MeV is a primary task of the AGILE data analysis. In particular, it is important to verify the instrument sensitivity in terms of Poisson modeling of the data background, and to determine the post-trial confidence of detections.

Aims. The goals of this work are: (i) to evaluate the distributions of the likelihood ratio test for both “empty” fields and regions of the Galactic plane, and (ii) to calculate the probability of false detections over multiple time intervals.

Methods. We describe in detail the techniques used to search for short-term variability in the AGILE γ-ray source database. We describe the binned maximum likelihood method used for the analysis of AGILE data, and the numerical simulations that support the characterization of the statistical analysis. We apply our method to both Galactic and extragalactic transients, and provide a few examples.

Results. After checking the reliability of the statistical description tested with the real AGILE data, we obtain the distribution of p-values for blind and specific source searches. We apply our results to the determination of the post-trial statistical significance of detections of transient γ-ray sources in terms of pre-trial values.

Conclusions. The results of our analysis allow a precise determination of the post-trial significance of γ-ray sources detected by AGILE.

Key words: gamma-rays: general / methods: statistical / methods: data analysis

© ESO, 2012

1. Introduction

The current generation of γ-ray space missions, AGILE and Fermi, are sensitive in the energy range above 100 MeV up to tens of hundreds of GeV. These missions are providing a wealth of data on a variety of γ-ray sources, both in our Galaxy and at extragalactic distances. Compared with the previous generation of γ-ray instruments (e.g., EGRET onboard the Compton Gamma-Ray Observatory), the AGILE and Fermi-LAT γ-ray imagers are based on silicon detectors with optimal spatial resolution and much improved background rejection. These characteristics allow positioning of accuracies of a few arcminutes to be achieved for intense sources and very large fields of view (FOV). Both AGILE and Fermi-LAT have FOVs of more than 100 degrees across, i.e., 2.5 sr, which is of great relevance to the statistical analysis of the γ-ray sources. On the basis of the current detector performances, it is crucial to perform a statistical analysis specialized to the specific modes of operation of the two γ-ray missions. In particular, the distinction between non-significant and significant steady/transient sources must be supported by a specific treatment of the satellite γ-ray data. The relatively large effective areas at γ-ray energies, and the very large FOVs, lead to the need for long exposures (measured as the product of effective area and the on-source duration). The statistical significance of γ-ray source detections must address the pre- vs. post-trial significance. As we see, the characteristics of the pointing and the specific search (whether a “blind” search or a search for a specific source) have a direct influence on the proper statistical treatment. In this paper, we investigate the statistical analysis of γ-ray sources detected by the AGILE mission.

The AGILE-GRID instrument for observations in the γ-ray has an energy range of 30 MeV–50 GeV (Tavani et al. 2009a). The AGILE data are down-linked approximately every 100 minutes and sent to the AGILE Data Center (ADC), which is part of the ASI Science Data Center (ASDC) for data reduction, scientific processing, and archiving. The ASDC then forwards the AGILE data to the AGILE team local sites where a quick look analysis is performed.

Owing to the low detection rate of events and the extent of the AGILE GRID point-spread function (PSF), statistical techniques such as the maximum likelihood method (or estimator) are required to analyze the AGILE point sources. Similar analysis techniques were applied previously to EGRET data (Mattox et al. 1996).

For steady sources and sources with high γ-ray flux, the significance of a detection increases as a function of the observation duration. This is not necessarily the case for variable sources with low flux: a short integration time (the integration period should be of the same timescale as the duration of the astronomical phenomena under study) may yield a low significance level because the observation is photon limited; for a longer integration time, the source may disappear entirely because the observation is background limited.

The main purpose of this work is the evaluation of the likelihood ratio test in the context of short timescale (1–2 days) flaring γ-ray sources. The likelihood ratio test is used to compare two ensembles of hypotheses, one in which a γ-ray source is present, and another (the null hypothesis) in which it is absent. The determination of the likelihood ratio distribution in the case of the null hypothesis is used to evaluate the occurrence of false positive detections.

The search for γ-ray transients (Galactic and extragalactic) detectable on timescales of 1–2 days is one of the major activities performed by the AGILE Collaboration¹.

The main goals of this work are: (i) to evaluate the distribution of the likelihood ratiotest statistic (T_s) for empty fields both outside of and within the Galactic plane; (ii) to evaluate the T_s distribution in the case of one flaring source with different combinations of parameters; and (iii) to calculate the confidence levels corresponding to the samefalse detection probability in multiple non-overlapping time periods.

We performedMonte Carlo simulations to determine the T_s distributions both in the presence and absence of a flaring source, which may differ from the theoretically predicted distribution for a number of reasons that wediscuss below. We compare these results with the AGILE data where feasible, and show how the formulation of the hypotheses influences the T_s distribution.

In Sect. 2, we present the AGILE -GRID maximum likelihood analysis method. In Sect. 3, we report a general description of theperformedMonte Carlo simulations. In Sect. 4, we characterize the T_s distribution of a simulated extragalactic empty field and compare it with a real observation case. Section 5 characterizes the T_s distribution for two simulated Galactic fields, a simple and a complex case; in the latter, we analyze the effectthat uncertainties in the analysis parameters produce on the T_s distribution. In Sect. 6, we describe the pre- and post-trial probability, and in Sect. 7 we consider multiple detections at the same sky position.

2. Analysis method

The likelihood ratio test is used to compare two ensembles of models, one of which is a subset of the other, each of which can be characterized by a set of parameters. In the most common case, one of the ensembles of models is the null hypothesis, while the other, of which the null hypothesis is a subset, is the alternative hypothesis, corresponding to, for example, the hypothesis of the existence of a source. In each case, the values of the parameters are foundby means of a maximum likelihood method or estimator that maximizes the likelihood of producing the data given in the ensemble of models. The application of likelihood to photon-counting experiments is described in (Cash 1979). Details of how the likelihood is calculatedin the context of γ-ray data analysis can be found in Mattox et al. (1996). The likelihood ratio is then simply the ratio of these two maximum likelihoods, and the test statistic T_s is defined as $T_{s} = - 2 \ln \frac{L_{0}}{L_{1}},$ $\begin{equation} T_{\rm s} = -2 \ln \frac{L_0}{L_1}, \label{eqn_TS} \end{equation}$ (1)where L₀ and L₁ are the maximum value of the likelihood function for the null hypothesis and the alternative hypothesis, respectively.

In the AGILE-GRID case, the data are binned into FITS count maps, while each model is a linear combination of isotropic and Galactic diffuse components of the γ-ray emission and point sources.The γ-ray exposure maps, and Galactic diffuse emission maps are then used to calculate the models. Among the parameters that may be varied to find the maximum likelihood are the coefficients of the diffuse and point source components.

To describe a single point source four parameters in general are used: the predicted source counts s_c, the spectral index s_si (the spectrum of each source is assumed to be a power law), and two parameters corresponding to the position of the source (s_l,s_b, in Galactic coordinates). It is possible to keep each parameter either free or fixed; a free parameter is allowed to vary to find the maximum likelihood. Possible combinations include: (1) only allowing the flux to vary, (2) allowing both the position and flux to vary, (3) allowing both the spectrum and flux to vary, or (4) leaving all four parameters to be free. Throughout this paper, we keep the spectral index fixed to 2.1; for this kind of source, the AGILE-GRID instrument has a PSF at 30° off-axis for E > 100 MeV of 2.1°, for E > 400 MeV of 1.1°, and for E > 1 GeV of 0.8°.

The two parameters that we use to describe the Galactic (diffuse) and isotropic γ-ray emission are: (i) g_gal, the coefficient of the Galactic diffuse emission model, and (ii) g_iso, the isotropic diffuse intensity. A value of g_gal ≤ 1 is expected if the Galactic diffuse emission model is correct. Values of g_iso between 1 and 15 × 10^-5 cm^-2 s^-1 sr^-1 are expected, depending on both pointing strategy and onboard and background filter rejection. These two parameters can be either left free or fixed independently. For short timescale variability (less then 3–4 days) of γ-ray sources, we usually first estimate these parameters with a longer timescale integration, and then fix them for the short timescale analysis, assuming that these components do not vary significantly on the shorter timescales. We note that for the AGILE data used in this analysis (based on the standard filter FM3.119), the isotropic component is dominated by instrumental charged particle background rather than by the extragalactic diffuse emission, in contrast to data from EGRET and Fermi-LAT.

The values of the parameters that maximize the likelihood are those describing the model in the ensemble that are most likely to reproduce the data.

The null hypothesis corresponds to the absence of the point source, while the alternative hypothesis corresponds to its presence. The null hypothesis, clearly, is a subset of the alternative hypothesis, corresponding to a source with zero flux. The Galactic diffuse and isotropic coefficients, as well as the parameters of other known point sources in the FOV, must be kept either fixed or free in the same manner when evaluating both the null and alternative hypothesis.

To limit the effect of systematic errors far from the position of the hypothesized source, the data bins evaluated (and their predicted values according to the models) are limited to an analysis region of radius 5° or 10° centered on the source position.

From Wilks’s theorem (Wilks 1938), the T_s distribution ϕ is expected to asymptotically follow $χ_{n - m}^{2}$ $\hbox{$\chi^{2}_{n-m}$}$ in the null hypothesis, where n − m is the number of additional parameters that are optimized in the alternative hypothesis. In the most simple case, n − m = 1 (e.g., in the case of the determination of the flux of a single source). This means that from Wilks’s theorem, T_s is expected to be asymptotically distributed as $χ_{1}^{2}$ $\hbox{$\chi^{2}_{1}$}$ in the null hypothesis. The expected departure of the distribution from $χ_{1}^{2}$ $\hbox{$\chi^{2}_{1}$}$ is of order (N)^− 1/2, where N is the number of samples. In our context, the number of samplesis the number of photons that carry information about all the parameters; these are all the photons in the analysis region. This is true regardless of the number thatis eventually estimated to come from the point source.

When there are multiple sources whose fluxes are allowed to vary, the following procedure,divided into two loops, is used to find the maximum T_s. In the first loop the sources are first sorted according to the hypothesized flux. One by one, the sources are added to the model, from highest to lowest flux. If the source flux is allowed to vary, then the maximum likelihood is found in both the presence and absence of the source. If the position is allowed to vary,the first fit is done at a fixed position and the resulting T_s is compared to a location confidence level threshold (t_lcl). If T_s > t_lcl, then T_s is again maximized with variable flux and position. We define T_ss to be a threshold for promoting the source to the second loop of the algorithm: if the final T_s is greater than T_ss, the source is considered significant and added to both the null and alternative hypotheses for the other sources. If not, it is considered undetected, and is set to zero flux for all subsequent analyses. For the sources over the T_ss threshold (with or without a location confidence level),the second loop is similar to the first loop, except that all of the sources marked significant in the first loop are contained in the models from the beginning. The sources are again evaluated one at a time from highest to lowest flux. The T_s of each source is again maximized, and set to its final value. The values of the parameters t_lcl and T_ss affect the behavior of the procedure, where t_lcl = 5.99147 corresponds to a 95% confidence level for two degrees of freedom.

The maximum likelihood estimator developed for AGILE constrains the flux of a source, and therefore the source counts s_c, to be either greater than or equal to zero. Since the ensemble of models considered is half of the theoretically possible number, the shape of the T_s distribution differs from that of Wilks’s theorem by being asymptotically distributed as $0.5 \times χ_{1}^{2}$ $\hbox{$0.5 \times \chi^{2}_{1}$}$ instead of $χ_{1}^{2}$ $\hbox{$\chi^{2}_{1}$}$ (Mattox et al. 1996).

To compare the data distribution of T_s produced by the AGILE analysis procedurewith that predicted by Wilks’s theorem, we performed a series ofMonte Carlo simulations of AGILE data. Each simulation of an analysis region and its subsequent maximum likelihood analysis constitutes a singletrial. The probability that the result of a trial in an empty field has T_s ≥ h (that is the complement of the cumulative distribution function) is $P (T_{s} \geq h) = \int_{h}^{+ \infty} ϕ (x) d x,$ $\begin{equation} P(T_{\rm s} \geq h) = \int_{h}^{+\infty} \varphi (x) {\rm d}x, \label{eq_A} \end{equation}$ (2)which is also called the p-value p = P(T_s ≥ h). This is the pre-set (pre-trial) type-1 error (a false positive, rejecting the null hypothesis when in fact it is true). Given a statistical distribution, a “p-value” assigned to a given value of a random variable is defined as the probability of obtaining that value or larger when the null hypothesis is true. This value may be interpreted as an “occurrence-rate”, that is, how many trials occur on average before obtaining a false detection at a level equal or greater to h.

2.1. Hypothesis formulation

In the context of the γ-ray transient analysis, the null hypothesis is defined as an analysis region containing only steady and known sources with no flaring sources present. We can translate this into the ensemble of models by keeping the flux of the flaring source fixed to zero, and thefluxes of steady sources fixed to their known fluxes. In the alternative hypothesis that a flaring source is present, the flux (and position if specified) of this source is allowed to be free and thefluxes of steady sources are fixed to their known fluxes. In this work, we restrict our analysis to hypotheses of single flaring sources, neglecting alternative hypotheses of two or more flaring sources in the same analysis region.

Additional knowledge of the source, e.g. from other wavelengths, can add useful additional constraints about the position of the source in the hypothesis formulation. In Sect. 5, we show that this additional knowledge can change the T_s distribution, thereby reduce the occurrence rate of false detection.

Table 1

Parameters and number of trials for the performed extragalactic and Galactic field simulations.

For the analysis of a flaring source, we consider two possible scenarios:

The flaring source is unknown: in this case, the position and fluxparameters are allowed to be free and optimized with respect tothe input data: the starting (l,b) position is usually the count peak found in the smoothed map. If the T_s is higher than a well-defined threshold, the alternative (flaring source) hypothesis is accepted, and a counterpart search may be performed.
The source is known and the alternative hypothesis is that this source is in a flaring state. This scenario can be further subdivided into two cases:
- (a)
  the mean flux of the source is below the background level ofthe sky region, providing only an upper limit over longintegrations;
- (b)
  the mean flux of the sourceis above the background level of the sky region and is detected over long integrations.
In both sub-cases, two types of analysis are possible: (i) the flux parameter is allowed to vary and the position kept fixed, (ii) the flux and position parameters are allowed to be free. If the position is allowed to vary, the starting position is usually the position of the source in steady state. Keeping the position fixed implies that the alternative hypothesis being tested is that the flare comes from the known source (e.g. from the behavior at other wavelengths or because other flares have been detected in the past), whereas allowing the position to vary allows the alternative hypothesis of including any flaring source within the analysis region. We may then, at the same time as we calculate the significance of the detection, verify whether the confidence contour of the source position is compatible with the source hypothesized to be responsible for the flare. If the position of the known source is outside the confidence region then the alternative hypothesis can be rejected; we can conclude that the known source is not responsible for the flare.

3. Monte Carlo simulations

Monte Carlo simulations of AGILEγ-raydata were used to characterize the maximum likelihood analysis procedure. Simulated data were generated using a background model (Galactic diffuse radiation model and isotropic background) and the AGILE-GRID instrument response functions (version I0023 of the calibration matrices for effective area, energy dispersion, and PSF).The energy range used is 100 MeV–50 GeV. The simulated observations were generated by adding Poisson-distributed deviates to each pixel. Each bin of the generated maps (counts, exposure, and Galactic emission maps) was analyzed exactly as flight data as described in Sect. 2. The bin size chosen for the simulations is 0.25°, the same size used in theAGILE-GRID daily monitoring.

The exposure level chosen for the simulations was a level equivalent to a mean value of a one-day pointing/two-day spinning AGILE observation mode.

In Table 1, we report the parameters and the number of trials for the performed simulations.

4. Extragalactic empty field

4.1. Monte Carlo simulation

We simulated an extragalactic empty field without flaring or steady sources. The simulation was performed with an AGILE FOV of 60°. Figure 1 shows the exposure map, and the bins used in this simulation to perform the trials. This is a typical two-day exposure map in spinning mode, but in this context the key point is the level of exposure, and not its shape. We analyzed positions within 50° of the center of the map to exclude area of low exposure, which we also do in every day sky monitoring.

We performed a maximum likelihood analysis at positions corresponding to every fifthdegree on the map. The spacing was chosen to ensure that the analyses would be independent of each another. The positionis kept fixed and the flux allowed to vary, implying that there is one additional parameter in the alternative hypothesis. We repeated the count map simulations and analysis for different values of the coefficient of the isotropic diffuse component (g_iso = 6 and 12) that were consistent with values found in real AGILE observations. Figure 2 shows the resulting T_s distribution (left panel) and the related p-value distribution (right panel). We fit this T_s distribution with the function: $κ^{'} (T_{s}) = {\begin{matrix} δ & if T_{s} < 1, \\ η χ_{N}^{2} (T_{s}) & otherwise . \end{matrix}$ $\begin{equation} \kappa'(T_{\rm s}) = \left\{ \begin{array}{ll} \delta &\mbox{ if }T_{\rm s}<1, \\ \eta \chi^2_{N} (T_{\rm s}) &\mbox{ otherwise. } \\ \end{array} \right. \label{eqn_fit1} \end{equation}$ (3)

In Table 2, we report the results of the fit for different background g_iso levels. The function δ in the first bin takes into account the constraint on the source counts (s_c ≥ 0) . In this simple case, the simulated distribution isclose to the expected value of $1 / 2 χ_{1}^{2}$ $\hbox{$1/2 \chi^2_1$}$ (see the η parameter in Table 2).

In Table 3, we report the results of the fit for different levels of exposure. Figure 1 shows the regions with different exposure levels.

Fig. 1

Extragalactic (b = 90°) exposure map used for the simulations(in units of cm² ssr) in Galactic coordinates. Notice the larger exposure near the celestial pole.The white circles are the positions of the trials, the green circles indicate the high and low chosen exposure regions.

Fig. 2

T_s distribution (left side) and p-distribution (right side) of a simulated empty field, with g_gal = 1 and g_iso = 6 (where these parameters are allowed to vary during the analysis), flux free, and position fixed. The bluecrosses are the calculated distribution, the black line is the best fit according to Eq. (3), the red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

4.2. Real observation

We compared the simulated data shown in the last section to a real AGILE observation. The observation block chosen was OB7410², in which AGILE was pointed at the north Galactic pole for along exposure. For each day of the observation counts, exposure and gas maps were generated and analyzed. As in the Monte Carlo simulations, a maximum likelihood analysis was performed for a hypothetical source position at every fifthdegree to ensure the independence of each trial. The results are shown in Fig. 3. Taking into account the limits of the statistics collected from this real observation, real and simulated data are compatible at the 1σ error level.

Table 2

Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25 for different values of g_iso, the isotropic emission coefficient.

5. Monte Carlo simulations of Galactic fields

We performed simulations of two regionsof the Galactic plane: a Galactic regionwith a low density of potential sources, and a complex Galactic region (the Cygnus region). These two regions represent two extremes of the AGILE analysis.

5.1. A simple Galactic region

We performed a simulation of a relatively simple Galactic region by assuming only that the Galactic diffuse and isotropic emissions do not originate in either steady or flaring sources. This calculation was aimed at evaluating the photon density function and the p-value distribution of the AGILE likelihood maximum estimator in the presence of the Galactic diffuse emission. We chose a region centered on (l, b) = (160, 0) (Galactic coordinates) with a one-day exposure level in pointing mode. The parameters used in the simulation are g_gal = 1 and g_iso = 3. During the analysis, the spectrum of any hypothetical source was kept fixed.

To analyze the flux of a source whose position is known, we fixed the position of the source, and allow the flux to vary in the alternative hypothesis. The resulting p-value distribution is shown as the brown histogram in Fig. 4. Using Eq. (3) we found that the best-fit solution is δ = 0.8600 ± 0.0003 and η = 0.4540 ± 0.0005, for N₁ = 1. However, owing to the systematic errors in the event reconstruction, there are cases in which the source is detectable but at a position farther away from the known position than a purely statistical analysis would predict. To handle these cases, we kept the position of the source free and we developed an analysis criterion, which we call ICL (Inside Confidence contour Level). In this criterion, if the contour level is present, T_s ≥ t_ICL (and we fix t_ICL = 9 for the AGILE analysis), and the position of the source under investigation is outside the contourlevel found by the maximum likelihood procedure, we reset T_s to 0. In principle, a contour always exists (even if it is not necessarily closed or connected), but sometimes our software fails to find it.The contour is always searched for at t_lcl.

We used the ICL criterion only in the case of a known source. The T_s distributions presented hereafter that are position-free and without an ICL criterion are used in the analysis of an unknown source. The reason for rejecting the event is that the technique is used to weed out detections that we are unsure are coincident with the source.

Fig. 3

Comparison betweenp-value distributions for simulated (blue) and real (green) empty extragalactic fields. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

Table 3

Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25, g_iso = 6, for two levels of exposure.

We compared the results with those obtained when the ICL criterion was applied in the standard analysis (position left free). The p-value distributions are reported in Fig. 4 (the red histogram for the ICL criterion, the blue histogram without the ICL criterion,both with T_ss = 4, t_lcl = 5.99147). The blue and red histograms show a pronounced dip just above T_lcl with respect to the brown histogram because sources with T_s > T_lcl may increase their T_s during relocalization (shift to the right). We highlight the T_s distribution produced by the application of the ICL criterion because it is used by the AGILE automated quick-look analysis when searching for flares from known sources.

Fig. 4

The effect of the hypothesis formulation (trial selection). The p-value distribution depends on the constraint on the position of the source: a comparison of the different analysis methods (in particular, when the position of the source is either kept fixed or allowed to vary in the case of absence of a source at the (l, b) = (160, 0)° location. Histograms are the p-value distribution for an empty Galactic field when the null hypothesis is true. Blue and red histograms have the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of the source allowed to vary. The blue histogram contains all trials regardless of the calculated position, while the red histogram contains the trials that respect the ICL criterion. The brown histogram corresponds to the position of the source when it is kept fixed and flux free. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2}\chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

Fig. 5

The blue histogram is the T_s distribution for the empty Galactic region when the null hypothesis for a source at the position (l, b) = (160, 0)° is true, T_ss = 4, t_lcl = 5.99147, and both the flux and position of the source are allowed to vary. Black line is the best-fit function described in Eq. (4), N₂ = 5. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2}\chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

Figure 5 reports the T_s distribution when the null hypothesis is true, and is produced by the maximum likelihood analysis with the parameters: T_ss = 4 and t_lcl = 5.99147, and both the flux and position of a source at the (l, b) = (160, 0)° position are allowed to vary. The related p-value distribution is shown in Fig. 4 (blue histogram). The ICL criterion was not applied, i.e. no rejection was applied based on the compatibility of the location contour with the position of the source. Fitting thisdistribution with the function, $κ^{′′} (T_{s}) = {\begin{matrix} δ & if T_{s} < 1 \\ η_{1} χ_{N_{1}}^{2} (T_{s}) & if T_{s} \geq 1 and T_{s} \leq t_{lcl} \\ η_{2} χ_{N_{2}}^{2} (T_{s} - t_{lcl}) & otherwise \end{matrix}$ $\begin{equation} \kappa''(T_{\rm s}) = \left\{ \begin{array}{ll} \delta &\mbox{ if }T_{\rm s}<1 \\ \eta_1 \chi^2_{N_1}(T_{\rm s}) &\mbox{ if }T_{\rm s}\ge 1 ~{\rm and}~ T_{\rm s}\le t_{\rm lcl} \\ \eta_2 \chi^2_{N_2} (T_{\rm s} - t_{\rm lcl}) &\mbox{ otherwise } \\ \end{array} \right. \label{eqn-posfree} \end{equation}$ (4)

we found that N₁ = 1 (if T_s < T_ss, no optimization of the position takes place and therefore the only free parameter is the flux of the source), N₂ = 5, δ = 0.89 ± 4.5 × 10^-4, η₁ = 0.35 ± 5.1 × 10^-4, and η₂ = 3.96 × 10^-3 ± 3 × 10^-5. We used functions with N = 5 (d.o.f.) solely as an analytical approximation to the functional form produced by this process. The translation (T_s − t_lcl) is due to the switch between the fixed and free position regime(see Sect. 2).

Equation (4) is applicable to blind searches for unknown sources. When searching for flares from a known source (i.e. our hypothesis isthat there is a known source at the (l, b) = (160, 0)° position), the appropriate alternative hypothesis should exclude sources for which the known source position lies outside the location contour. Therefore, after applying the ICL rejection criterion to the 95% confidence level, we report the resulting p-value distribution in Fig. 4 (red line) compared with the blue histogram of the same figure, to which no selection criterion was applied. The effect of an appropriate hypothesis formulation is evident. The application of the ICL rejection reduced the number of degrees of freedom. After fitting this distribution with Eq. (4), we found that the histogram has a distribution between N₂ = 3 and N₂ = 4, owing to the ICL selection criterion.

The analytical expression that can be used with t_ICL = 9 where T₁ = 14 is given by $κ^{′′′} (T_{s}) = {\begin{matrix} δ & if T_{s} < 1 \\ η_{1} χ_{N_{1}}^{2} (T_{s}) & if T_{s} \geq 1 and T_{s} \leq t_{lcl} \\ η_{2} χ_{N_{2}}^{2} (T_{s} - t_{lcl}) & if T_{s} \geq t_{lcl} and T_{s} \leq t_{ICL} \\ η_{3} χ_{N_{3}}^{2} (T_{s}) & if T_{s} \geq t_{ICL} and T_{s} \leq T_{1} \\ η_{4} χ_{N_{4}}^{2} (T_{s}) & if T_{s} \geq T_{1} \end{matrix}$ $\begin{equation} \kappa'''(T_{\rm s}) = \left\{ \begin{array}{ll} \delta &\mbox{ if }T_{\rm s}<1 \\[2mm] \eta_1 \chi^2_{N_1}(T_{\rm s}) &\mbox{ if }T_{\rm s}\ge 1~ {\rm and}~ T_{\rm s}\le t_{\rm lcl} \\[2mm] \eta_2 \chi^2_{N_2} (T_{\rm s} - t_{\rm lcl}) &\mbox{ if }T_{\rm s}\ge t_{\rm lcl} ~{\rm and}~ T_{\rm s}\le t_{\rm ICL} \\[2mm] \eta_3 \chi^2_{N_3} (T_{\rm s}) &\mbox{ if }T_{\rm s}\ge t_{\rm ICL} ~{\rm and}~ T_{\rm s}\le T_1 \\[2mm] \eta_4 \chi^2_{N_4} (T_{\rm s}) &\mbox{ if }T_{\rm s}\ge T_1 \\ \end{array} \right. \label{eqn-ICL} \end{equation}$ (5)

for the parameters values N₁ = 1, N₂ = 5, N₃ = 5, N₄ = 1, δ = 0.89 ± 4.4 × 10^-4, η₁ = 0.34 ± 5.1 × 10^-4, η₂ = 4.5 × 10^-3 ± 5.7 × 10^-5, η₃ = 1.27 × 10^-2 ± 1.7 × 10^-4, and η₄ = 0.91 ± 3.3 × 10^-2. This expression approximates the expected behavior of the analysis, which shifts gradually from a source location algorithm with many free parameters near the threshold where the location contour is large, to an analysis more similar to a fixed-position analysis at high T_s where the location contour is small. Figure 6 reports the T_s distribution when the null hypothesis is true, that is produced by a maximum likelihood analysis with the parameters T_ss = 4, and t_lcl = 5.99147, when the flux and position are allowed to vary, after applying the ICL criterion; the red histogram corresponds to a region centered at (l, b) = (160, 0)° and the blue histogram is related to the Cygnus region. The black line is the best-fit solution reported in Eq. (5). The reported distributions correspond to the standard quick-look analysis of AGILE data forempty Galactic regions. We note that when the selection criterion is altered the expected p-values also change with respect to the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ expected theoretical distribution (cyan dashed line).

Fig. 6

The effect of steady sources in the ensemble of models. The blue line is the PDF for Cygnus field when the null hypothesis for Cygnus X-3 is true, and the red line is the PDF for an empty Galactic field when the null hypothesis is true, with the parameters T_ss = 4, t_lcl = 5.99147, when the flux and position of source in the alternative hypothesis are allowed to vary, but g_gal and g_iso parameters are fixed, ICL criterion. Black line is the best-fit function described in Eq. (5). The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

The analysis performed with the position of the source kept fixed yields lower p-values than the analysis for which the position is allowed to vary, either with (red histogram of Fig. 4) or without (blue histogram) the ICL criterion. However, the fraction of detections above a given T_S threshold in the presence of a real source is also lower when the source position is kept fixed, as shown by the histograms in Fig. 7. Table 4 reports the number of detections (in %) for some T_s thresholds.

Table 4

Percentages of detections for some T_s thresholds (with related p-values when the null hypothesis is true) when the null hypothesis is false with a source at the (l, b) = (160, 0)° position with a simulated flux of 180 × 10^-8 photons cm^-2 s^-1.

Fig. 7

The effect of the hypothesis formulation (trial selection). The histograms show the p-value distributions in the presence of a simulated source at the location of (l, b) = (160, 0)° with flux 180 × 10^-8 photons cm^-2 s^-1. Red histograms have the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of the source left free with ICL rejection criterion. Blue histograms have t_lcl = 5.99147, flux and position of the source left free without ICL rejection criterion. Brown histograms have the flux left free and the position of the source kept fixed.

5.2. A complex Galactic region: the case of the Cygnus field

We simulated observations of the Cygnus region both including and not including a source at the position of Cygnus X-3 to test the analysis procedure in a complex case with nearby point sources and Galactic diffuse emission. We chose this field because it is one of the most complex cases that our analysis procedure must address. Cygnus X-3 is a well-known microquasar (Giacconi et al. 1967), showing variable emission at all wavelengths, including repeated γ-ray flaring activity above 100 MeV as detected by AGILE (Tavani et al. 2009b). This target was chosen because it displays significant variability in the γ-ray energy range that is strongly correlated with other wavelengths. The list of simulated sources of the Cygnus region is reported in Table 5. In Fig. 8, we show a half-year year integration of AGILE data from the Cygnus region and a simulation of a comparable integration time using the same parameters used to simulate the short trials, demonstrating that the underlying model is sound.

The null hypothesis is that no γ-ray source coincident with Cygnus X-3 is present in the AGILE data, while the alternative hypothesis is that a source coincident with Cygnus X-3 emits γ-rays. The parameters of the other sources and the diffuse emission coefficients were all kept fixed.

Fig. 8

The binned counts map of the real (left side) and simulated (right side) Cygnus fieldforan integration time of 0.5 years (AGILE counts, E > 100 MeV). The real data is taken from July 2007 to October 2009; the map is centered on (l, b) = (78.75, 0)° in Galactic coordinates with a bin size of 0.1°.

Table 5

List of Cygnus region sources for E > 100 MeV.

If we fix the position of the source as already described in the previous section, using Eq. (3) we find the best-fit solution with δ = 0.65 ± 3.6 × 10^-4, η = 4.6 × 10^-1 ± 4.2 × 10^-4, N₁ = 1. The two η parameters of the cases of empty and complex Galactic fields are very similar. Therefore, in the Tables 6 and 8 we report a single value for the fixed position analysis that is valid in both cases.

Keeping the position of Cygnus X-3 free and fitting with Eq. (4), we find that N₁ = 1, N₂ = 5, δ = 0.85 ± 4.6 × 10^-4, η₁ = 0.46 ± 6.2 × 10^-5, and η₂ = 6.15 × 10^-3 ± 0.4 × 10^-5. This fitting is appropriate for blind searches of unknown sources in complex Galactic regions.

Keeping the position of Cygnus X-3 free, applying the ICL criterion, and fitting with Eq. (5), we found that N₁ = 1, N₂ = 5, N₃ = 5, N₄ = 1, δ = 0.84 ± 2 × 10^-4, η₁ = 0.49 ± 2.8 × 10^-4, η₂ = 6.7 × 10^-3 ± 3.2 × 10^-5, η₃ = 1.8 × 10^-2 ± 1.0 × 10^-5, and η₄ = 1.32 ± 1.9 × 10^-2. This expression approximates the expected behavior of the analysis, as already established in the case of an empty Galactic field (see Sect. 5.1). The reported distributions correspond to the standard quick-look analysis of AGILE data for complex Galactic regions.

Figure 6 compares the probability density function for the Cygnus field (see Table 5) with this empty Galactic field. As expected, the effect ofa more complexregion is an increase in the number of false detections.

With the performed Monte Carlo simulation, we determine the p-value function for the most common hypothesis formulations. On the basis of these simulation, we are able to establish the T_s level for each p-value and constrain the false occurrence rate. Table 6 reports the correspondence between p-value and T_s value for different methods of analysis, in addition to the theoretical reference for $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ , $1 / 2 χ_{1}^{2}$ $\hbox{$1/2 \chi^2_1$}$ , and $1 / 2 χ_{3}^{2}$ $\hbox{$ 1/2 \chi^2_3$}$ .

5.3. Deviation from the nominal distribution

In the following we report the effect of uncertainties in the analysis parameters on the shape of the distributions. The analyses were performed for the case of the complex Galactic region.

5.3.1. The effect of the t_lcl parameter

We performed additional maximum likelihood analyses using different values of the t_lcl parameter, including the case of t_lcl = 0. In the case of a blind search for unknown sources, the maximum likelihood estimator could be used as a source finder, instead of an hypothesis validator, by setting t_lcl = 0. In this case the optimization of the position was performed regardless of the T_s found in the first step with fixed positions. The resulting p-value distributions are shown in Fig. 9 (with t_lcl = 0 shown in green) including all trials without ICL rejection. As expected, higher t_lcl values correspond to lower p-values because the optimization of the position starts at higher T_s values in the first loop of the maximum likelihood procedure.

Table 6

Correspondence between p-value and T_s value for different methods of analysis.

Fig. 9

The p-value distributions change as a function of the t_lcl parameter. Simulations of the Cygnus region with no source present at Cygnus X-3 position were analyzed for different t_lcl values without applying the ICL rejection criterion. The flux and position of the hypothetical source at the Cygnus X-3 were allowed to vary, the g_gal and g_iso parameters were kept fixed, and T_ss = 4. Green histogram: t_lcl = 0; black histogram: t_lcl = 2.29575 (corresponds to a 68% confidence level for two degrees of freedom); blue histogram: t_lcl = 5.99147. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

5.3.2. The effect of the radius of analysis

Figure 10 shows that no appreciable differences are produced by changing the radius of analysis. The comparison was performed for the case of t_lcl = 0 applying the ICL criterion, but should also be valid for analyses using the othercriteria.

Fig. 10

Comparison between different radii of analysis. The histograms are the p-value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 0, flux and position of Cygnus X-3 left free, ICL rejection applied. Red histogram:radius of analysis = 10°; blue histogram: radius of analysis = 5°. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

5.3.3. The effect of keeping g_gal and g_iso either free or fixed

In Fig. 11, we show the effect of keeping g_gal and g_iso parameters fixed (blue) and free (red): the p-values with free parameters are larger. This result is expected because fixing these parameters reduces the range of possible hypotheses explored. The comparison was performed for the case of t_lcl = 0 after applying the ICL criterion, but should also be valid for analyses using the other criteria.

Fig. 11

Comparison between g_gal and g_iso parameters free or fixed.The blue histogram is the p-distribution forthe Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 0, flux and position of Cygnus X-3 left free, g_gal and g_iso parameters fixed.The red histogram has the same parameters but with g_gal and g_iso parameters left free. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

5.3.4. The effect of unmodeled point sources

We performed a Monte Carlo simulation in which the simulated data contain all of the sources listed in Table 5. This was followed by a maximum likelihood analysis with models containing only a point source at the location of Cygnus X-3, in order to evaluate the effect of nearby unmodeled point sources. The resulting p-value distribution is shown in Fig. 12, where the red histogram is the p-value distribution with the analysis models containing all the sources used in the simulation and the black histogram is the p-distribution with only the source at the Cygnus X-3 location. This illustrates that it is extremely important to model existing sources correctly.

Fig. 12

The effect of unmodeled sources. The histograms are the p-value distribution for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of Cygnus X-3 left free, g_gal and g_iso parameters fixed. The red histogram shows the result when all the simulated sources are included in the models, while the black histogram shows the result when only the source at Cygnus X-3 position is included. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

5.3.5. The effect of errors in the diffuse emission: error estimation using g_gal

We performed a preliminary investigation of the effect of systematic errors in the diffuse emission model on the analysis results by fixing the g_gal parameter during the analysis at a value g_gal = 0.67, which is slightly different from the one used in simulating the data; g_iso = 7.7 is simulated but the g_iso parameter is allowed to vary during the analysis. Figure 13 reports the p-value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true where the results of three analyses are compared; one with simulated g_gal = 0.67 (blue thick lines), one with simulated g_gal = 0.67 ∗ 1.1 (red thick lines), and one with simulated g_gal = 0.67 ∗ 0.89 (black thick lines). Table 7 reports the resulting calculated g_iso.

Table 7

Mean value of the calculated g_iso parameter.

As expected, g_iso is larger in value when g_gal is sufficiently small and vice versa. If g_gal is under-estimated, then the number of false detections when the source is absent increases (see the red lines of Fig. 13), because background photons are assigned to the source. If the g_gal parameter is over-estimated, then the number of false detections when the source is absent decreases (see black line of Fig. 13) because the diffuse model is already too high at the position of Cygnus X-3.

Fig. 13

The effect of a poor estimation of the Galactic diffuse emission. The histograms are the p- value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of Cygnus X-3 left free. The true value of g_iso in the simulated data is 7.7, the value of g_gal used in the analysis is fixed to 0.67. Blue thin line: simulated g_gal = 0.67, g_iso fixed to 7.7; blue thick line: simulated g_gal = 0.67, g_isoleft free; red thick line: simulated g_gal = 0.67 ∗ 1.1, g_isoleft free; black thick line: simulated g_gal = 0.67 ∗ 0.89, g_isoleft free. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

Table 8

Post-trial significance expressed in Gaussian standard deviations (σ).

6. Pre-trials and post-trials significance

We have demonstrated how the Monte Carlo simulations can be used to characterize the T_s distributions produced by the AGILE-GRID maximum likelihood analysis procedure. In the end, we have derived the probability, or p-value, of finding a false positive detection (rejecting the null hypothesis when it is true) in a single observation.

In practice, for each region of the sky we performed many trials during the daily monitoring to search for transient γ-ray events. The probability of obtaining a single false detection over a large number of trials is therefore much higher than p. In the AGILE context, we performed two kinds of analysis for each analyzed map:

1.
a blind search for unknown sources, in which we searched formore than one source at a time with free positions;
2.
searches from a list of known sources, where we searched for more than one source at a time with fixed source positions.

We defined K = M·N to be the number of independent trials, where N is the number of maps and M is the number of unknown sources in the first case or the number of sources in the predefined list in the second case. If we have only one source in both cases M = 1.

Since the probability of not making a false positive error in a single trial is 1 − p, the probability of not making any false positive error is (1 − p)^K (type-I error), hence the probability of making at least one false positive error is π = 1 − (1 − p)^K. This is defined as the post-trial probability, which is also referred to as the experiment-wide error rate, while p is denoted as the pre-trial probability, or comparison-wise error rate. For an experiment-wide false positive rate of π, we constrained the comparison-wise error rate withthe Dune-Ŝidák correction p ≤ 1 − (1 − π)^1/K.

We considered a typical AGILE case in the context of Galactic γ-ray transients when the position of the source was allowed to vary and for a single source when both the flux and position were allowed to vary for each map (M = 1): usually we kept the position and flux of known sources fixed assuming that only one source is in a flaring state in our map (we were able to reduce the size of the map to accomplish this). If we searched for a single flaring source once every two days (N = 182 for one year of observations in AGILE spinning mode), or if we could accept one false detectionduring the year π ≤ 1/N, represented a threshold of T_S ≃ 17.9. If we could accept a false detection once every two years, the threshold was T_S ≃ 20.5.

Table 8 reports the post-trial significance expressed in Gaussian standard deviations for some values of K.

7. Probability of sky-position coincidence in non-overlapping time intervals

We generalized the analysis presented in the previous sections to calculate the probability of two or more detections of a flaring source performed in the same sky position in different independent time intervals. Multiple detections of a source can have a low probability of being consistent with the null hypothesis even when the individual detections are at a low level of T_s. To assess the statistical significance of our detections, we considered the post-trial probability of flare occurrence. We distinguished two cases:

1.
the case of a single flare episode originating from a specificsource within a given error box (that we define as “singleindependent occurrence” or “single post trial occurrence”);
2.
the case of repeated flaring episodes originating from a specific source with a given error box (that we call here “repeated post-trial flare occurrence”.

For each individual AGILE detection, we calculated the post-trial significance of the single independent occurrences, which did not take into account the history of repeated occurrences.

We then combined the history of the sky region and establish the probability of repeated flaring episodes from the same sky position. We calculated the post-trial significance for repeated flare occurrences at the source error-box position as follows.

When we performed one trial for each map (M = 1, we used a list with only one source; this meant that each independent time period was a single trial), the probability of having k or more detectionsover N maps at a specific site with a T_s statistic satisfying T_s ≥ h was given by $P (N,X \geq k) = 1 - \sum_{j = 0}^{k - 1} (\begin{matrix} N \\ j \end{matrix}) p^{j} (1 - p)^{N - j}$ $\hbox{$ P(N, X \ge k) = 1 - \sum_{j=0}^{k-1} \left(\begin{array}{c}N\\j\end{array}\right) p^j (1-p)^{N-j}$}$ , where p = p(h) is the p-value corresponding to the h value given by Eq. (2), $P (N,X = j) = (\begin{matrix} N \\ j \end{matrix}) p^{j} (1 - p)^{N - j}$ $\hbox{$P(N, X=j) = \left(\begin{array}{c}N\\j\end{array}\right) p^j (1-p)^{N-j}$}$ is the probability of exactly j detections in N maps and $P (N,X < k) = \sum_{j = 0}^{k - 1} (\begin{matrix} N \\ j \end{matrix}) p^{j} (1 - p)^{N - j}$ $\hbox{$P(N, X < k) = \sum_{j=0}^{k-1} \left(\begin{array}{c}N\\j\end{array}\right) p^j (1-p)^{N-j}$}$ is the probability of fewer than k detectionsat a specific position in N maps.

When we performed M trials at different positions of the N maps, where M is the number of known or unknown sources in a predefined list, the chance of having k or more detections above the level h in any of the sites with a T_s statistic satisfying T_s ≥ h is given by $P_{M} (N,X \geq k) = 1 - {(\sum_{j = 0}^{k - 1} (\begin{matrix} N \\ j \end{matrix}) p^{j} (1 - p)^{N - j})}^{M}$ $\hbox{$P_M(N, X \ge k) = 1 - \left( \sum_{j=0}^{k-1} \left(\begin{array}{c}N\\j\end{array}\right) p^j (1-p)^{N-j} \right)^M$}$ where P_M(N,X ≥ k) = 1 − P(N,X < k)^M = 1 − (1 − P(N,X ≥ k))^M.

The choice of p(h) depends on our assumptions: if the flare comes froma known source we use Eq. (5), if the flare comes froma previously unknown source we use Eq. (4) (the case without ICL criterion).

Let us consider a simple case in the context of Galactic γ-ray transients, using the p-value function in the case of a complex Galactic region with the position ofa single source left free. If we detect one flare per year at a specific position of the Galactic plane with T_s > 16, and we produce 182 maps per year (once every two days with an integration time of two days), then after the first year the global post-trial significance is 2.16σ, after the second year it is 3.31σ, and after the third year it is 4.16σ. Transient sources, as long as they recur, enable us to have more confidence in their detection as integration time increases.

It might seem thatthis approach adds a bias tothe global significance of a detection of a flaring source, because it may happen that some flares from nearby sources can be “counted” together due to the extension of the 95% contour confidence level. In doubtful cases only a more detailed analysis can exclude this.

8. Conclusions

We have performedextensiveMonte Carlo simulations to characterize the maximum likelihood ratio test for the AGILE-GRID instrument in the context of short timescale (1–2 days) flaring γ-ray sources, both in extragalactic and Galactic fields. In the case of Galactic fields, we have simulated both a simple (without steady sources) and a complex Galactic region. With these simulations, we have calibrated both the T_s distributions (pre-trial significance) and the related false occurrence rate.

After the introduction of the post-trial significance, we calculated the post-trial probabilities for both single and multiple occurrences. In particular, we calculated the probability of two or more detections of a flaring source at the same sky position in different independent time intervals. With this approach, we took into account the presence of many flaring episodes originating from the same sky region, by combining their histories and adding information that was not present in the single episode and in the post-trial evaluation. We have referred to this as “repeated post-trial flare occurrence”.

In this paper, we have provided a method for converting the T_s produced by any of the various methods made available by the AGILE analysis software, into a probability. This information

can be used by anyone who performs analysis onGRID data using the AGILE Guest Observer Program.

¹

Two independent γ-ray transient search systems have been developed. One system operates at INAF/IASF-Bologna (Bulgarelli et al. 2009), which is able to process the datawithin 1.5 h (from the last photon acquired in orbit to the alert generation), and to generate alerts via e-mail and SMS to the mobile phones of the AGILE Collaboration. A second system operates at the ADC (AGILE Data Center) in Frascati; it can react within an average time of three hours, and generates alerts via e-mail but with more accurate data processing.

²

See ASI Data Center web site, http://agile.asdc.asi.it/

Acknowledgments

The AGILE Mission is funded by the Italian Space Agency (ASI) with scientific and programmatic participation by the Italian Institute of Astrophysics (INAF) and the Italian Institute of Nuclear Physics (INFN). We acknowledge financial contribution from the agreement ASI-INAF I/009/10/0.

References

Bulgarelli, A., Trifoglio, M., Gianotti, F., Di Cocco, G., & Tavani, M. 2009, ASP Conf. Ser., 411, 362 [NASA ADS] [Google Scholar]
Cash, W. 1979, ApJ, 228, 939 [NASA ADS] [CrossRef] [Google Scholar]
Giacconi, R., Gorenstein, P., Gursky, H., & Waters, J. R. 1967, ApJ, 148, L119 [NASA ADS] [CrossRef] [Google Scholar]
Mattox, J. R., Bertsch, D. L., Chiang, J., et al. 1996, ApJ, 461, 39 [Google Scholar]
Pittori, C., Verrecchia, F., Chen, A. W., et al. 2009, A&A, 506, 1563 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Tavani, M., Barbiellini, G., Argan, A., et al. 2009a, A&A, 502, 995 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Tavani, M., Bulgarelli, A., Piano, G., et al. 2009b, Nature, 462, 620 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
Wilks, S. S. 1938, Ann. Math. Stat., 9, 60 [Google Scholar]

All Tables

Table 1

Parameters and number of trials for the performed extragalactic and Galactic field simulations.

In the text

Table 2

Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25 for different values of g_iso, the isotropic emission coefficient.

In the text

Table 3

Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25, g_iso = 6, for two levels of exposure.

In the text

Table 4

Percentages of detections for some T_s thresholds (with related p-values when the null hypothesis is true) when the null hypothesis is false with a source at the (l, b) = (160, 0)° position with a simulated flux of 180 × 10^-8 photons cm^-2 s^-1.

In the text

Table 5

List of Cygnus region sources for E > 100 MeV.

In the text

Table 6

Correspondence between p-value and T_s value for different methods of analysis.

In the text

Table 7

Mean value of the calculated g_iso parameter.

In the text

Table 8

Post-trial significance expressed in Gaussian standard deviations (σ).

In the text

All Figures

	Fig. 1 Extragalactic (b = 90°) exposure map used for the simulations(in units of cm² ssr) in Galactic coordinates. Notice the larger exposure near the celestial pole.The white circles are the positions of the trials, the green circles indicate the high and low chosen exposure regions.
In the text

Fig. 2

T_s distribution (left side) and p-distribution (right side) of a simulated empty field, with g_gal = 1 and g_iso = 6 (where these parameters are allowed to vary during the analysis), flux free, and position fixed. The bluecrosses are the calculated distribution, the black line is the best fit according to Eq. (3), the red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 3

Comparison betweenp-value distributions for simulated (blue) and real (green) empty extragalactic fields. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 4

The effect of the hypothesis formulation (trial selection). The p-value distribution depends on the constraint on the position of the source: a comparison of the different analysis methods (in particular, when the position of the source is either kept fixed or allowed to vary in the case of absence of a source at the (l, b) = (160, 0)° location. Histograms are the p-value distribution for an empty Galactic field when the null hypothesis is true. Blue and red histograms have the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of the source allowed to vary. The blue histogram contains all trials regardless of the calculated position, while the red histogram contains the trials that respect the ICL criterion. The brown histogram corresponds to the position of the source when it is kept fixed and flux free. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2}\chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 5

The blue histogram is the T_s distribution for the empty Galactic region when the null hypothesis for a source at the position (l, b) = (160, 0)° is true, T_ss = 4, t_lcl = 5.99147, and both the flux and position of the source are allowed to vary. Black line is the best-fit function described in Eq. (4), N₂ = 5. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2}\chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 6

The effect of steady sources in the ensemble of models. The blue line is the PDF for Cygnus field when the null hypothesis for Cygnus X-3 is true, and the red line is the PDF for an empty Galactic field when the null hypothesis is true, with the parameters T_ss = 4, t_lcl = 5.99147, when the flux and position of source in the alternative hypothesis are allowed to vary, but g_gal and g_iso parameters are fixed, ICL criterion. Black line is the best-fit function described in Eq. (5). The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 7

The effect of the hypothesis formulation (trial selection). The histograms show the p-value distributions in the presence of a simulated source at the location of (l, b) = (160, 0)° with flux 180 × 10^-8 photons cm^-2 s^-1. Red histograms have the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of the source left free with ICL rejection criterion. Blue histograms have t_lcl = 5.99147, flux and position of the source left free without ICL rejection criterion. Brown histograms have the flux left free and the position of the source kept fixed.

In the text

	Fig. 8 The binned counts map of the real (left side) and simulated (right side) Cygnus fieldforan integration time of 0.5 years (AGILE counts, E > 100 MeV). The real data is taken from July 2007 to October 2009; the map is centered on (l, b) = (78.75, 0)° in Galactic coordinates with a bin size of 0.1°.
In the text

Fig. 9

The p-value distributions change as a function of the t_lcl parameter. Simulations of the Cygnus region with no source present at Cygnus X-3 position were analyzed for different t_lcl values without applying the ICL rejection criterion. The flux and position of the hypothetical source at the Cygnus X-3 were allowed to vary, the g_gal and g_iso parameters were kept fixed, and T_ss = 4. Green histogram: t_lcl = 0; black histogram: t_lcl = 2.29575 (corresponds to a 68% confidence level for two degrees of freedom); blue histogram: t_lcl = 5.99147. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 10

Comparison between different radii of analysis. The histograms are the p-value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 0, flux and position of Cygnus X-3 left free, ICL rejection applied. Red histogram:radius of analysis = 10°; blue histogram: radius of analysis = 5°. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, and the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 11

Comparison between g_gal and g_iso parameters free or fixed.The blue histogram is the p-distribution forthe Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 0, flux and position of Cygnus X-3 left free, g_gal and g_iso parameters fixed.The red histogram has the same parameters but with g_gal and g_iso parameters left free. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 12

The effect of unmodeled sources. The histograms are the p-value distribution for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of Cygnus X-3 left free, g_gal and g_iso parameters fixed. The red histogram shows the result when all the simulated sources are included in the models, while the black histogram shows the result when only the source at Cygnus X-3 position is included. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Fig. 13

The effect of a poor estimation of the Galactic diffuse emission. The histograms are the p- value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: T_ss = 4, t_lcl = 5.99147, flux and position of Cygnus X-3 left free. The true value of g_iso in the simulated data is 7.7, the value of g_gal used in the analysis is fixed to 0.67. Blue thin line: simulated g_gal = 0.67, g_iso fixed to 7.7; blue thick line: simulated g_gal = 0.67, g_isoleft free; red thick line: simulated g_gal = 0.67 ∗ 1.1, g_isoleft free; black thick line: simulated g_gal = 0.67 ∗ 0.89, g_isoleft free. The red dotted line is the $\frac{1}{2} χ_{1}^{2}$ $\hbox{$\frac{1}{2} \chi^2_1$}$ theoretical distribution, the green dashed line is the $χ_{1}^{2}$ $\hbox{$\chi^2_1$}$ theoretical distribution, the cyan dotted-dashed line is the $\frac{1}{2} χ_{3}^{2}$ $\hbox{$\frac{1}{2} \chi^2_3$}$ distribution.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Bulgarelli, A., Trifoglio, M., Gianotti, F., Di Cocco, G., & Tavani, M. 2009, ASP Conf. Ser., 411, 362 [NASA ADS] [Google Scholar]

[2] Cash, W. 1979, ApJ, 228, 939 [NASA ADS] [CrossRef] [Google Scholar]

[3] Giacconi, R., Gorenstein, P., Gursky, H., & Waters, J. R. 1967, ApJ, 148, L119 [NASA ADS] [CrossRef] [Google Scholar]

[4] Mattox, J. R., Bertsch, D. L., Chiang, J., et al. 1996, ApJ, 461, 39 [Google Scholar]

[5] Pittori, C., Verrecchia, F., Chen, A. W., et al. 2009, A&A, 506, 1563 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[6] Tavani, M., Barbiellini, G., Argan, A., et al. 2009a, A&A, 502, 995 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[7] Tavani, M., Bulgarelli, A., Piano, G., et al. 2009b, Nature, 462, 620 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]

[8] Wilks, S. S. 1938, Ann. Math. Stat., 9, 60 [Google Scholar]

Evaluating the maximum likelihood method for detecting short-term variability of AGILE γ-ray sources

1. Introduction

2. Analysis method

2.1. Hypothesis formulation

3. Monte Carlo simulations

4. Extragalactic empty field

4.1. Monte Carlo simulation

4.2. Real observation

5. Monte Carlo simulations of Galactic fields

5.1. A simple Galactic region

5.2. A complex Galactic region: the case of the Cygnus field

5.3. Deviation from the nominal distribution

5.3.1. The effect of the tlcl parameter

5.3.2. The effect of the radius of analysis

5.3.3. The effect of keeping ggal and giso either free or fixed

5.3.4. The effect of unmodeled point sources

5.3.5. The effect of errors in the diffuse emission: error estimation using ggal

6. Pre-trials and post-trials significance

7. Probability of sky-position coincidence in non-overlapping time intervals

8. Conclusions

Acknowledgments

References

All Tables

All Figures

5.3.1. The effect of the t_lcl parameter

5.3.3. The effect of keeping g_gal and g_iso either free or fixed

5.3.5. The effect of errors in the diffuse emission: error estimation using g_gal