Issue |
A&A
Volume 540, April 2012
|
|
---|---|---|
Article Number | A79 | |
Number of page(s) | 12 | |
Section | Astronomical instrumentation | |
DOI | https://doi.org/10.1051/0004-6361/201118023 | |
Published online | 28 March 2012 |
Evaluating the maximum likelihood method for detecting short-term variability of AGILE γ-ray sources
1 INAF/IASF–Bologna, via Gobetti 101, 40129 Bologna, Italy
e-mail: bulgarelli@iasfbo.inaf.it
2 INAF/IASF–Roma, via del Fosso del Cavaliere 100, 00133 Roma, Italy
3 INAF/IASF – Milano, via E. Bassini 15, 20133 Milano, Italy
4 Dip. di Fisica, Univ. “Tor Vergata”, via della Ricerca Scientifica 1, 00133 Roma, Italy
Received: 5 September 2011
Accepted: 7 January 2012
Context. The AGILE space mission (whose instrument is sensitive to the energy ranges 18–60 keV, and 30 MeV–50 GeV) has been operating since 2007. Assessing the statistical significance of the time variability of γ-ray sources above 100 MeV is a primary task of the AGILE data analysis. In particular, it is important to verify the instrument sensitivity in terms of Poisson modeling of the data background, and to determine the post-trial confidence of detections.
Aims. The goals of this work are: (i) to evaluate the distributions of the likelihood ratio test for both “empty” fields and regions of the Galactic plane, and (ii) to calculate the probability of false detections over multiple time intervals.
Methods. We describe in detail the techniques used to search for short-term variability in the AGILE γ-ray source database. We describe the binned maximum likelihood method used for the analysis of AGILE data, and the numerical simulations that support the characterization of the statistical analysis. We apply our method to both Galactic and extragalactic transients, and provide a few examples.
Results. After checking the reliability of the statistical description tested with the real AGILE data, we obtain the distribution of p-values for blind and specific source searches. We apply our results to the determination of the post-trial statistical significance of detections of transient γ-ray sources in terms of pre-trial values.
Conclusions. The results of our analysis allow a precise determination of the post-trial significance of γ-ray sources detected by AGILE.
Key words: gamma-rays: general / methods: statistical / methods: data analysis
© ESO, 2012
1. Introduction
The current generation of γ-ray space missions, AGILE and Fermi, are sensitive in the energy range above 100 MeV up to tens of hundreds of GeV. These missions are providing a wealth of data on a variety of γ-ray sources, both in our Galaxy and at extragalactic distances. Compared with the previous generation of γ-ray instruments (e.g., EGRET onboard the Compton Gamma-Ray Observatory), the AGILE and Fermi-LAT γ-ray imagers are based on silicon detectors with optimal spatial resolution and much improved background rejection. These characteristics allow positioning of accuracies of a few arcminutes to be achieved for intense sources and very large fields of view (FOV). Both AGILE and Fermi-LAT have FOVs of more than 100 degrees across, i.e., 2.5 sr, which is of great relevance to the statistical analysis of the γ-ray sources. On the basis of the current detector performances, it is crucial to perform a statistical analysis specialized to the specific modes of operation of the two γ-ray missions. In particular, the distinction between non-significant and significant steady/transient sources must be supported by a specific treatment of the satellite γ-ray data. The relatively large effective areas at γ-ray energies, and the very large FOVs, lead to the need for long exposures (measured as the product of effective area and the on-source duration). The statistical significance of γ-ray source detections must address the pre- vs. post-trial significance. As we see, the characteristics of the pointing and the specific search (whether a “blind” search or a search for a specific source) have a direct influence on the proper statistical treatment. In this paper, we investigate the statistical analysis of γ-ray sources detected by the AGILE mission.
The AGILE-GRID instrument for observations in the γ-ray has an energy range of 30 MeV–50 GeV (Tavani et al. 2009a). The AGILE data are down-linked approximately every 100 minutes and sent to the AGILE Data Center (ADC), which is part of the ASI Science Data Center (ASDC) for data reduction, scientific processing, and archiving. The ASDC then forwards the AGILE data to the AGILE team local sites where a quick look analysis is performed.
Owing to the low detection rate of events and the extent of the AGILE GRID point-spread function (PSF), statistical techniques such as the maximum likelihood method (or estimator) are required to analyze the AGILE point sources. Similar analysis techniques were applied previously to EGRET data (Mattox et al. 1996).
For steady sources and sources with high γ-ray flux, the significance of a detection increases as a function of the observation duration. This is not necessarily the case for variable sources with low flux: a short integration time (the integration period should be of the same timescale as the duration of the astronomical phenomena under study) may yield a low significance level because the observation is photon limited; for a longer integration time, the source may disappear entirely because the observation is background limited.
The main purpose of this work is the evaluation of the likelihood ratio test in the context of short timescale (1–2 days) flaring γ-ray sources. The likelihood ratio test is used to compare two ensembles of hypotheses, one in which a γ-ray source is present, and another (the null hypothesis) in which it is absent. The determination of the likelihood ratio distribution in the case of the null hypothesis is used to evaluate the occurrence of false positive detections.
The search for γ-ray transients (Galactic and extragalactic) detectable on timescales of 1–2 days is one of the major activities performed by the AGILE Collaboration1.
The main goals of this work are: (i) to evaluate the distribution of the likelihood ratiotest statistic (Ts) for empty fields both outside of and within the Galactic plane; (ii) to evaluate the Ts distribution in the case of one flaring source with different combinations of parameters; and (iii) to calculate the confidence levels corresponding to the samefalse detection probability in multiple non-overlapping time periods.
We performedMonte Carlo simulations to determine the Ts distributions both in the presence and absence of a flaring source, which may differ from the theoretically predicted distribution for a number of reasons that wediscuss below. We compare these results with the AGILE data where feasible, and show how the formulation of the hypotheses influences the Ts distribution.
In Sect. 2, we present the AGILE -GRID maximum likelihood analysis method. In Sect. 3, we report a general description of theperformedMonte Carlo simulations. In Sect. 4, we characterize the Ts distribution of a simulated extragalactic empty field and compare it with a real observation case. Section 5 characterizes the Ts distribution for two simulated Galactic fields, a simple and a complex case; in the latter, we analyze the effectthat uncertainties in the analysis parameters produce on the Ts distribution. In Sect. 6, we describe the pre- and post-trial probability, and in Sect. 7 we consider multiple detections at the same sky position.
2. Analysis method
The likelihood ratio test is used to compare two ensembles of models, one of which is a subset of the other, each of which can be characterized by a set of parameters. In the most common case, one of the ensembles of models is the null hypothesis, while the other, of which the null hypothesis is a subset, is the alternative hypothesis, corresponding to, for example, the hypothesis of the existence of a source. In each case, the values of the parameters are foundby means of a maximum likelihood method or estimator that maximizes the likelihood of producing the data given in the ensemble of models. The application of likelihood to photon-counting experiments is described in (Cash 1979). Details of how the likelihood is calculatedin the context of γ-ray data analysis can be found in Mattox et al. (1996). The likelihood ratio is then simply the ratio of these two maximum likelihoods, and the test statistic Ts is defined as (1)where L0 and L1 are the maximum value of the likelihood function for the null hypothesis and the alternative hypothesis, respectively.
In the AGILE-GRID case, the data are binned into FITS count maps, while each model is a linear combination of isotropic and Galactic diffuse components of the γ-ray emission and point sources.The γ-ray exposure maps, and Galactic diffuse emission maps are then used to calculate the models. Among the parameters that may be varied to find the maximum likelihood are the coefficients of the diffuse and point source components.
To describe a single point source four parameters in general are used: the predicted source counts sc, the spectral index ssi (the spectrum of each source is assumed to be a power law), and two parameters corresponding to the position of the source (sl,sb, in Galactic coordinates). It is possible to keep each parameter either free or fixed; a free parameter is allowed to vary to find the maximum likelihood. Possible combinations include: (1) only allowing the flux to vary, (2) allowing both the position and flux to vary, (3) allowing both the spectrum and flux to vary, or (4) leaving all four parameters to be free. Throughout this paper, we keep the spectral index fixed to 2.1; for this kind of source, the AGILE-GRID instrument has a PSF at 30° off-axis for E > 100 MeV of 2.1°, for E > 400 MeV of 1.1°, and for E > 1 GeV of 0.8°.
The two parameters that we use to describe the Galactic (diffuse) and isotropic γ-ray emission are: (i) ggal, the coefficient of the Galactic diffuse emission model, and (ii) giso, the isotropic diffuse intensity. A value of ggal ≤ 1 is expected if the Galactic diffuse emission model is correct. Values of giso between 1 and 15 × 10-5 cm-2 s-1 sr-1 are expected, depending on both pointing strategy and onboard and background filter rejection. These two parameters can be either left free or fixed independently. For short timescale variability (less then 3–4 days) of γ-ray sources, we usually first estimate these parameters with a longer timescale integration, and then fix them for the short timescale analysis, assuming that these components do not vary significantly on the shorter timescales. We note that for the AGILE data used in this analysis (based on the standard filter FM3.119), the isotropic component is dominated by instrumental charged particle background rather than by the extragalactic diffuse emission, in contrast to data from EGRET and Fermi-LAT.
The values of the parameters that maximize the likelihood are those describing the model in the ensemble that are most likely to reproduce the data.
The null hypothesis corresponds to the absence of the point source, while the alternative hypothesis corresponds to its presence. The null hypothesis, clearly, is a subset of the alternative hypothesis, corresponding to a source with zero flux. The Galactic diffuse and isotropic coefficients, as well as the parameters of other known point sources in the FOV, must be kept either fixed or free in the same manner when evaluating both the null and alternative hypothesis.
To limit the effect of systematic errors far from the position of the hypothesized source, the data bins evaluated (and their predicted values according to the models) are limited to an analysis region of radius 5° or 10° centered on the source position.
From Wilks’s theorem (Wilks 1938), the Ts distribution ϕ is expected to asymptotically follow in the null hypothesis, where n − m is the number of additional parameters that are optimized in the alternative hypothesis. In the most simple case, n − m = 1 (e.g., in the case of the determination of the flux of a single source). This means that from Wilks’s theorem, Ts is expected to be asymptotically distributed as
in the null hypothesis. The expected departure of the distribution from
is of order (N) − 1/2, where N is the number of samples. In our context, the number of samplesis the number of photons that carry information about all the parameters; these are all the photons in the analysis region. This is true regardless of the number thatis eventually estimated to come from the point source.
When there are multiple sources whose fluxes are allowed to vary, the following procedure,divided into two loops, is used to find the maximum Ts. In the first loop the sources are first sorted according to the hypothesized flux. One by one, the sources are added to the model, from highest to lowest flux. If the source flux is allowed to vary, then the maximum likelihood is found in both the presence and absence of the source. If the position is allowed to vary,the first fit is done at a fixed position and the resulting Ts is compared to a location confidence level threshold (tlcl). If Ts > tlcl, then Ts is again maximized with variable flux and position. We define Tss to be a threshold for promoting the source to the second loop of the algorithm: if the final Ts is greater than Tss, the source is considered significant and added to both the null and alternative hypotheses for the other sources. If not, it is considered undetected, and is set to zero flux for all subsequent analyses. For the sources over the Tss threshold (with or without a location confidence level),the second loop is similar to the first loop, except that all of the sources marked significant in the first loop are contained in the models from the beginning. The sources are again evaluated one at a time from highest to lowest flux. The Ts of each source is again maximized, and set to its final value. The values of the parameters tlcl and Tss affect the behavior of the procedure, where tlcl = 5.99147 corresponds to a 95% confidence level for two degrees of freedom.
The maximum likelihood estimator developed for AGILE constrains the flux of a source, and therefore the source counts sc, to be either greater than or equal to zero. Since the ensemble of models considered is half of the theoretically possible number, the shape of the Ts distribution differs from that of Wilks’s theorem by being asymptotically distributed as instead of
(Mattox et al. 1996).
To compare the data distribution of Ts produced by the AGILE analysis procedurewith that predicted by Wilks’s theorem, we performed a series ofMonte Carlo simulations of AGILE data. Each simulation of an analysis region and its subsequent maximum likelihood analysis constitutes a singletrial. The probability that the result of a trial in an empty field has Ts ≥ h (that is the complement of the cumulative distribution function) is (2)which is also called the p-value p = P(Ts ≥ h). This is the pre-set (pre-trial) type-1 error (a false positive, rejecting the null hypothesis when in fact it is true). Given a statistical distribution, a “p-value” assigned to a given value of a random variable is defined as the probability of obtaining that value or larger when the null hypothesis is true. This value may be interpreted as an “occurrence-rate”, that is, how many trials occur on average before obtaining a false detection at a level equal or greater to h.
2.1. Hypothesis formulation
In the context of the γ-ray transient analysis, the null hypothesis is defined as an analysis region containing only steady and known sources with no flaring sources present. We can translate this into the ensemble of models by keeping the flux of the flaring source fixed to zero, and thefluxes of steady sources fixed to their known fluxes. In the alternative hypothesis that a flaring source is present, the flux (and position if specified) of this source is allowed to be free and thefluxes of steady sources are fixed to their known fluxes. In this work, we restrict our analysis to hypotheses of single flaring sources, neglecting alternative hypotheses of two or more flaring sources in the same analysis region.
Additional knowledge of the source, e.g. from other wavelengths, can add useful additional constraints about the position of the source in the hypothesis formulation. In Sect. 5, we show that this additional knowledge can change the Ts distribution, thereby reduce the occurrence rate of false detection.
Parameters and number of trials for the performed extragalactic and Galactic field simulations.
For the analysis of a flaring source, we consider two possible scenarios:
-
The flaring source is unknown: in this case, the position and fluxparameters are allowed to be free and optimized with respect tothe input data: the starting (l,b) position is usually the count peak found in the smoothed map. If the Ts is higher than a well-defined threshold, the alternative (flaring source) hypothesis is accepted, and a counterpart search may be performed.
-
The source is known and the alternative hypothesis is that this source is in a flaring state. This scenario can be further subdivided into two cases:
-
(a)
the mean flux of the source is below the background level ofthe sky region, providing only an upper limit over longintegrations;
-
(b)
the mean flux of the sourceis above the background level of the sky region and is detected over long integrations.
-
(a)
3. Monte Carlo simulations
Monte Carlo simulations of AGILEγ-raydata were used to characterize the maximum likelihood analysis procedure. Simulated data were generated using a background model (Galactic diffuse radiation model and isotropic background) and the AGILE-GRID instrument response functions (version I0023 of the calibration matrices for effective area, energy dispersion, and PSF).The energy range used is 100 MeV–50 GeV. The simulated observations were generated by adding Poisson-distributed deviates to each pixel. Each bin of the generated maps (counts, exposure, and Galactic emission maps) was analyzed exactly as flight data as described in Sect. 2. The bin size chosen for the simulations is 0.25°, the same size used in theAGILE-GRID daily monitoring.
The exposure level chosen for the simulations was a level equivalent to a mean value of a one-day pointing/two-day spinning AGILE observation mode.
In Table 1, we report the parameters and the number of trials for the performed simulations.
4. Extragalactic empty field
4.1. Monte Carlo simulation
We simulated an extragalactic empty field without flaring or steady sources. The simulation was performed with an AGILE FOV of 60°. Figure 1 shows the exposure map, and the bins used in this simulation to perform the trials. This is a typical two-day exposure map in spinning mode, but in this context the key point is the level of exposure, and not its shape. We analyzed positions within 50° of the center of the map to exclude area of low exposure, which we also do in every day sky monitoring.
We performed a maximum likelihood analysis at positions corresponding to every fifthdegree on the map. The spacing was chosen to ensure that the analyses would be independent of each another. The positionis kept fixed and the flux allowed to vary, implying that there is one additional parameter in the alternative hypothesis. We repeated the count map simulations and analysis for different values of the coefficient of the isotropic diffuse component (giso = 6 and 12) that were consistent with values found in real AGILE observations. Figure 2 shows the resulting Ts distribution (left panel) and the related p-value distribution (right panel). We fit this Ts distribution with the function: (3)
In Table 2, we report the results of the fit for different background giso levels. The function δ in the first bin takes into account the constraint on the source counts (sc ≥ 0) . In this simple case, the simulated distribution isclose to the expected value of (see the η parameter in Table 2).
In Table 3, we report the results of the fit for different levels of exposure. Figure 1 shows the regions with different exposure levels.
![]() |
Fig. 1 Extragalactic (b = 90°) exposure map used for the simulations(in units of cm2 ssr) in Galactic coordinates. Notice the larger exposure near the celestial pole.The white circles are the positions of the trials, the green circles indicate the high and low chosen exposure regions. |
![]() |
Fig. 2 Ts distribution (left side) and p-distribution (right side) of a simulated empty field, with ggal = 1 and giso = 6 (where these parameters are allowed to vary during the analysis), flux free, and position fixed. The bluecrosses are the calculated distribution, the black line is the best fit according to Eq. (3), the red dotted line is the |
4.2. Real observation
We compared the simulated data shown in the last section to a real AGILE observation. The observation block chosen was OB74102, in which AGILE was pointed at the north Galactic pole for along exposure. For each day of the observation counts, exposure and gas maps were generated and analyzed. As in the Monte Carlo simulations, a maximum likelihood analysis was performed for a hypothetical source position at every fifthdegree to ensure the independence of each trial. The results are shown in Fig. 3. Taking into account the limits of the statistics collected from this real observation, real and simulated data are compatible at the 1σ error level.
Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25 for different values of giso, the isotropic emission coefficient.
5. Monte Carlo simulations of Galactic fields
We performed simulations of two regionsof the Galactic plane: a Galactic regionwith a low density of potential sources, and a complex Galactic region (the Cygnus region). These two regions represent two extremes of the AGILE analysis.
5.1. A simple Galactic region
We performed a simulation of a relatively simple Galactic region by assuming only that the Galactic diffuse and isotropic emissions do not originate in either steady or flaring sources. This calculation was aimed at evaluating the photon density function and the p-value distribution of the AGILE likelihood maximum estimator in the presence of the Galactic diffuse emission. We chose a region centered on (l, b) = (160, 0) (Galactic coordinates) with a one-day exposure level in pointing mode. The parameters used in the simulation are ggal = 1 and giso = 3. During the analysis, the spectrum of any hypothetical source was kept fixed.
To analyze the flux of a source whose position is known, we fixed the position of the source, and allow the flux to vary in the alternative hypothesis. The resulting p-value distribution is shown as the brown histogram in Fig. 4. Using Eq. (3) we found that the best-fit solution is δ = 0.8600 ± 0.0003 and η = 0.4540 ± 0.0005, for N1 = 1. However, owing to the systematic errors in the event reconstruction, there are cases in which the source is detectable but at a position farther away from the known position than a purely statistical analysis would predict. To handle these cases, we kept the position of the source free and we developed an analysis criterion, which we call ICL (Inside Confidence contour Level). In this criterion, if the contour level is present, Ts ≥ tICL (and we fix tICL = 9 for the AGILE analysis), and the position of the source under investigation is outside the contourlevel found by the maximum likelihood procedure, we reset Ts to 0. In principle, a contour always exists (even if it is not necessarily closed or connected), but sometimes our software fails to find it.The contour is always searched for at tlcl.
We used the ICL criterion only in the case of a known source. The Ts distributions presented hereafter that are position-free and without an ICL criterion are used in the analysis of an unknown source. The reason for rejecting the event is that the technique is used to weed out detections that we are unsure are coincident with the source.
![]() |
Fig. 3 Comparison betweenp-value distributions for simulated (blue) and real (green) empty extragalactic fields. The red dotted line is the |
Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25, giso = 6, for two levels of exposure.
We compared the results with those obtained when the ICL criterion was applied in the standard analysis (position left free). The p-value distributions are reported in Fig. 4 (the red histogram for the ICL criterion, the blue histogram without the ICL criterion,both with Tss = 4, tlcl = 5.99147). The blue and red histograms show a pronounced dip just above Tlcl with respect to the brown histogram because sources with Ts > Tlcl may increase their Ts during relocalization (shift to the right). We highlight the Ts distribution produced by the application of the ICL criterion because it is used by the AGILE automated quick-look analysis when searching for flares from known sources.
![]() |
Fig. 4 The effect of the hypothesis formulation (trial selection). The p-value distribution depends on the constraint on the position of the source: a comparison of the different analysis methods (in particular, when the position of the source is either kept fixed or allowed to vary in the case of absence of a source at the (l, b) = (160, 0)° location. Histograms are the p-value distribution for an empty Galactic field when the null hypothesis is true. Blue and red histograms have the following parameters: Tss = 4, tlcl = 5.99147, flux and position of the source allowed to vary. The blue histogram contains all trials regardless of the calculated position, while the red histogram contains the trials that respect the ICL criterion. The brown histogram corresponds to the position of the source when it is kept fixed and flux free. The red dotted line is the |
![]() |
Fig. 5 The blue histogram is the Ts distribution for the empty Galactic region when the null hypothesis for a source at the position (l, b) = (160, 0)° is true, Tss = 4, tlcl = 5.99147, and both the flux and position of the source are allowed to vary. Black line is the best-fit function described in Eq. (4), N2 = 5. The red dotted line is the |
Figure 5 reports the Ts distribution when the null hypothesis is true, and is produced by the maximum likelihood analysis with the parameters: Tss = 4 and tlcl = 5.99147, and both the flux and position of a source at the (l, b) = (160, 0)° position are allowed to vary. The related p-value distribution is shown in Fig. 4 (blue histogram). The ICL criterion was not applied, i.e. no rejection was applied based on the compatibility of the location contour with the position of the source. Fitting thisdistribution with the function, (4)
we found that N1 = 1 (if Ts < Tss, no optimization of the position takes place and therefore the only free parameter is the flux of the source), N2 = 5, δ = 0.89 ± 4.5 × 10-4, η1 = 0.35 ± 5.1 × 10-4, and η2 = 3.96 × 10-3 ± 3 × 10-5. We used functions with N = 5 (d.o.f.) solely as an analytical approximation to the functional form produced by this process. The translation (Ts − tlcl) is due to the switch between the fixed and free position regime(see Sect. 2).
Equation (4) is applicable to blind searches for unknown sources. When searching for flares from a known source (i.e. our hypothesis isthat there is a known source at the (l, b) = (160, 0)° position), the appropriate alternative hypothesis should exclude sources for which the known source position lies outside the location contour. Therefore, after applying the ICL rejection criterion to the 95% confidence level, we report the resulting p-value distribution in Fig. 4 (red line) compared with the blue histogram of the same figure, to which no selection criterion was applied. The effect of an appropriate hypothesis formulation is evident. The application of the ICL rejection reduced the number of degrees of freedom. After fitting this distribution with Eq. (4), we found that the histogram has a distribution between N2 = 3 and N2 = 4, owing to the ICL selection criterion.
The analytical expression that can be used with tICL = 9 where T1 = 14 is given by (5)
for the parameters values N1 = 1, N2 = 5, N3 = 5, N4 = 1, δ = 0.89 ± 4.4 × 10-4, η1 = 0.34 ± 5.1 × 10-4, η2 = 4.5 × 10-3 ± 5.7 × 10-5, η3 = 1.27 × 10-2 ± 1.7 × 10-4, and η4 = 0.91 ± 3.3 × 10-2. This expression approximates the expected behavior of the analysis, which shifts gradually from a source location algorithm with many free parameters near the threshold where the location contour is large, to an analysis more similar to a fixed-position analysis at high Ts where the location contour is small. Figure 6 reports the Ts distribution when the null hypothesis is true, that is produced by a maximum likelihood analysis with the parameters Tss = 4, and tlcl = 5.99147, when the flux and position are allowed to vary, after applying the ICL criterion; the red histogram corresponds to a region centered at (l, b) = (160, 0)° and the blue histogram is related to the Cygnus region. The black line is the best-fit solution reported in Eq. (5). The reported distributions correspond to the standard quick-look analysis of AGILE data forempty Galactic regions. We note that when the selection criterion is altered the expected p-values also change with respect to the expected theoretical distribution (cyan dashed line).
![]() |
Fig. 6 The effect of steady sources in the ensemble of models. The blue line is the PDF for Cygnus field when the null hypothesis for Cygnus X-3 is true, and the red line is the PDF for an empty Galactic field when the null hypothesis is true, with the parameters Tss = 4, tlcl = 5.99147, when the flux and position of source in the alternative hypothesis are allowed to vary, but ggal and giso parameters are fixed, ICL criterion. Black line is the best-fit function described in Eq. (5). The red dotted line is the |
The analysis performed with the position of the source kept fixed yields lower p-values than the analysis for which the position is allowed to vary, either with (red histogram of Fig. 4) or without (blue histogram) the ICL criterion. However, the fraction of detections above a given TS threshold in the presence of a real source is also lower when the source position is kept fixed, as shown by the histograms in Fig. 7. Table 4 reports the number of detections (in %) for some Ts thresholds.
Percentages of detections for some Ts thresholds (with related p-values when the null hypothesis is true) when the null hypothesis is false with a source at the (l, b) = (160, 0)° position with a simulated flux of 180 × 10-8 photons cm-2 s-1.
![]() |
Fig. 7 The effect of the hypothesis formulation (trial selection). The histograms show the p-value distributions in the presence of a simulated source at the location of (l, b) = (160, 0)° with flux 180 × 10-8 photons cm-2 s-1. Red histograms have the following parameters: Tss = 4, tlcl = 5.99147, flux and position of the source left free with ICL rejection criterion. Blue histograms have tlcl = 5.99147, flux and position of the source left free without ICL rejection criterion. Brown histograms have the flux left free and the position of the source kept fixed. |
5.2. A complex Galactic region: the case of the Cygnus field
We simulated observations of the Cygnus region both including and not including a source at the position of Cygnus X-3 to test the analysis procedure in a complex case with nearby point sources and Galactic diffuse emission. We chose this field because it is one of the most complex cases that our analysis procedure must address. Cygnus X-3 is a well-known microquasar (Giacconi et al. 1967), showing variable emission at all wavelengths, including repeated γ-ray flaring activity above 100 MeV as detected by AGILE (Tavani et al. 2009b). This target was chosen because it displays significant variability in the γ-ray energy range that is strongly correlated with other wavelengths. The list of simulated sources of the Cygnus region is reported in Table 5. In Fig. 8, we show a half-year year integration of AGILE data from the Cygnus region and a simulation of a comparable integration time using the same parameters used to simulate the short trials, demonstrating that the underlying model is sound.
The null hypothesis is that no γ-ray source coincident with Cygnus X-3 is present in the AGILE data, while the alternative hypothesis is that a source coincident with Cygnus X-3 emits γ-rays. The parameters of the other sources and the diffuse emission coefficients were all kept fixed.
![]() |
Fig. 8 The binned counts map of the real (left side) and simulated (right side) Cygnus fieldforan integration time of 0.5 years (AGILE counts, E > 100 MeV). The real data is taken from July 2007 to October 2009; the map is centered on (l, b) = (78.75, 0)° in Galactic coordinates with a bin size of 0.1°. |
List of Cygnus region sources for E > 100 MeV.
If we fix the position of the source as already described in the previous section, using Eq. (3) we find the best-fit solution with δ = 0.65 ± 3.6 × 10-4, η = 4.6 × 10-1 ± 4.2 × 10-4, N1 = 1. The two η parameters of the cases of empty and complex Galactic fields are very similar. Therefore, in the Tables 6 and 8 we report a single value for the fixed position analysis that is valid in both cases.
Keeping the position of Cygnus X-3 free and fitting with Eq. (4), we find that N1 = 1, N2 = 5, δ = 0.85 ± 4.6 × 10-4, η1 = 0.46 ± 6.2 × 10-5, and η2 = 6.15 × 10-3 ± 0.4 × 10-5. This fitting is appropriate for blind searches of unknown sources in complex Galactic regions.
Keeping the position of Cygnus X-3 free, applying the ICL criterion, and fitting with Eq. (5), we found that N1 = 1, N2 = 5, N3 = 5, N4 = 1, δ = 0.84 ± 2 × 10-4, η1 = 0.49 ± 2.8 × 10-4, η2 = 6.7 × 10-3 ± 3.2 × 10-5, η3 = 1.8 × 10-2 ± 1.0 × 10-5, and η4 = 1.32 ± 1.9 × 10-2. This expression approximates the expected behavior of the analysis, as already established in the case of an empty Galactic field (see Sect. 5.1). The reported distributions correspond to the standard quick-look analysis of AGILE data for complex Galactic regions.
Figure 6 compares the probability density function for the Cygnus field (see Table 5) with this empty Galactic field. As expected, the effect ofa more complexregion is an increase in the number of false detections.
With the performed Monte Carlo simulation, we determine the p-value function for the most common hypothesis formulations. On the basis of these simulation, we are able to establish the Ts level for each p-value and constrain the false occurrence rate. Table 6 reports the correspondence between p-value and Ts value for different methods of analysis, in addition to the theoretical reference for ,
, and
.
5.3. Deviation from the nominal distribution
In the following we report the effect of uncertainties in the analysis parameters on the shape of the distributions. The analyses were performed for the case of the complex Galactic region.
5.3.1. The effect of the tlcl parameter
We performed additional maximum likelihood analyses using different values of the tlcl parameter, including the case of tlcl = 0. In the case of a blind search for unknown sources, the maximum likelihood estimator could be used as a source finder, instead of an hypothesis validator, by setting tlcl = 0. In this case the optimization of the position was performed regardless of the Ts found in the first step with fixed positions. The resulting p-value distributions are shown in Fig. 9 (with tlcl = 0 shown in green) including all trials without ICL rejection. As expected, higher tlcl values correspond to lower p-values because the optimization of the position starts at higher Ts values in the first loop of the maximum likelihood procedure.
Correspondence between p-value and Ts value for different methods of analysis.
![]() |
Fig. 9 The p-value distributions change as a function of the tlcl parameter. Simulations of the Cygnus region with no source present at Cygnus X-3 position were analyzed for different tlcl values without applying the ICL rejection criterion. The flux and position of the hypothetical source at the Cygnus X-3 were allowed to vary, the ggal and giso parameters were kept fixed, and Tss = 4. Green histogram: tlcl = 0; black histogram: tlcl = 2.29575 (corresponds to a 68% confidence level for two degrees of freedom); blue histogram: tlcl = 5.99147. The red dotted line is the |
5.3.2. The effect of the radius of analysis
Figure 10 shows that no appreciable differences are produced by changing the radius of analysis. The comparison was performed for the case of tlcl = 0 applying the ICL criterion, but should also be valid for analyses using the othercriteria.
![]() |
Fig. 10 Comparison between different radii of analysis. The histograms are the p-value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 0, flux and position of Cygnus X-3 left free, ICL rejection applied. Red histogram:radius of analysis = 10°; blue histogram: radius of analysis = 5°. The red dotted line is the |
5.3.3. The effect of keeping ggal and giso either free or fixed
In Fig. 11, we show the effect of keeping ggal and giso parameters fixed (blue) and free (red): the p-values with free parameters are larger. This result is expected because fixing these parameters reduces the range of possible hypotheses explored. The comparison was performed for the case of tlcl = 0 after applying the ICL criterion, but should also be valid for analyses using the other criteria.
![]() |
Fig. 11 Comparison between ggal and giso parameters free or fixed.The blue histogram is the p-distribution forthe Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 0, flux and position of Cygnus X-3 left free, ggal and giso parameters fixed.The red histogram has the same parameters but with ggal and giso parameters left free. The red dotted line is the |
5.3.4. The effect of unmodeled point sources
We performed a Monte Carlo simulation in which the simulated data contain all of the sources listed in Table 5. This was followed by a maximum likelihood analysis with models containing only a point source at the location of Cygnus X-3, in order to evaluate the effect of nearby unmodeled point sources. The resulting p-value distribution is shown in Fig. 12, where the red histogram is the p-value distribution with the analysis models containing all the sources used in the simulation and the black histogram is the p-distribution with only the source at the Cygnus X-3 location. This illustrates that it is extremely important to model existing sources correctly.
![]() |
Fig. 12 The effect of unmodeled sources. The histograms are the p-value distribution for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 5.99147, flux and position of Cygnus X-3 left free, ggal and giso parameters fixed. The red histogram shows the result when all the simulated sources are included in the models, while the black histogram shows the result when only the source at Cygnus X-3 position is included. The red dotted line is the |
5.3.5. The effect of errors in the diffuse emission: error estimation using ggal
We performed a preliminary investigation of the effect of systematic errors in the diffuse emission model on the analysis results by fixing the ggal parameter during the analysis at a value ggal = 0.67, which is slightly different from the one used in simulating the data; giso = 7.7 is simulated but the giso parameter is allowed to vary during the analysis. Figure 13 reports the p-value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true where the results of three analyses are compared; one with simulated ggal = 0.67 (blue thick lines), one with simulated ggal = 0.67 ∗ 1.1 (red thick lines), and one with simulated ggal = 0.67 ∗ 0.89 (black thick lines). Table 7 reports the resulting calculated giso.
Mean value of the calculated giso parameter.
As expected, giso is larger in value when ggal is sufficiently small and vice versa. If ggal is under-estimated, then the number of false detections when the source is absent increases (see the red lines of Fig. 13), because background photons are assigned to the source. If the ggal parameter is over-estimated, then the number of false detections when the source is absent decreases (see black line of Fig. 13) because the diffuse model is already too high at the position of Cygnus X-3.
![]() |
Fig. 13 The effect of a poor estimation of the Galactic diffuse emission. The histograms are the p- value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 5.99147, flux and position of Cygnus X-3 left free. The true value of giso in the simulated data is 7.7, the value of ggal used in the analysis is fixed to 0.67. Blue thin line: simulated ggal = 0.67, giso fixed to 7.7; blue thick line: simulated ggal = 0.67, gisoleft free; red thick line: simulated ggal = 0.67 ∗ 1.1, gisoleft free; black thick line: simulated ggal = 0.67 ∗ 0.89, gisoleft free. The red dotted line is the |
Post-trial significance expressed in Gaussian standard deviations (σ).
6. Pre-trials and post-trials significance
We have demonstrated how the Monte Carlo simulations can be used to characterize the Ts distributions produced by the AGILE-GRID maximum likelihood analysis procedure. In the end, we have derived the probability, or p-value, of finding a false positive detection (rejecting the null hypothesis when it is true) in a single observation.
In practice, for each region of the sky we performed many trials during the daily monitoring to search for transient γ-ray events. The probability of obtaining a single false detection over a large number of trials is therefore much higher than p. In the AGILE context, we performed two kinds of analysis for each analyzed map:
-
1.
a blind search for unknown sources, in which we searched formore than one source at a time with free positions;
-
2.
searches from a list of known sources, where we searched for more than one source at a time with fixed source positions.
We defined K = M·N to be the number of independent trials, where N is the number of maps and M is the number of unknown sources in the first case or the number of sources in the predefined list in the second case. If we have only one source in both cases M = 1.
Since the probability of not making a false positive error in a single trial is 1 − p, the probability of not making any false positive error is (1 − p)K (type-I error), hence the probability of making at least one false positive error is π = 1 − (1 − p)K. This is defined as the post-trial probability, which is also referred to as the experiment-wide error rate, while p is denoted as the pre-trial probability, or comparison-wise error rate. For an experiment-wide false positive rate of π, we constrained the comparison-wise error rate withthe Dune-Ŝidák correction p ≤ 1 − (1 − π)1/K.
We considered a typical AGILE case in the context of Galactic γ-ray transients when the position of the source was allowed to vary and for a single source when both the flux and position were allowed to vary for each map (M = 1): usually we kept the position and flux of known sources fixed assuming that only one source is in a flaring state in our map (we were able to reduce the size of the map to accomplish this). If we searched for a single flaring source once every two days (N = 182 for one year of observations in AGILE spinning mode), or if we could accept one false detectionduring the year π ≤ 1/N, represented a threshold of TS ≃ 17.9. If we could accept a false detection once every two years, the threshold was TS ≃ 20.5.
Table 8 reports the post-trial significance expressed in Gaussian standard deviations for some values of K.
7. Probability of sky-position coincidence in non-overlapping time intervals
We generalized the analysis presented in the previous sections to calculate the probability of two or more detections of a flaring source performed in the same sky position in different independent time intervals. Multiple detections of a source can have a low probability of being consistent with the null hypothesis even when the individual detections are at a low level of Ts. To assess the statistical significance of our detections, we considered the post-trial probability of flare occurrence. We distinguished two cases:
-
1.
the case of a single flare episode originating from a specificsource within a given error box (that we define as “singleindependent occurrence” or “single post trial occurrence”);
-
2.
the case of repeated flaring episodes originating from a specific source with a given error box (that we call here “repeated post-trial flare occurrence”.
For each individual AGILE detection, we calculated the post-trial significance of the single independent occurrences, which did not take into account the history of repeated occurrences.
We then combined the history of the sky region and establish the probability of repeated flaring episodes from the same sky position. We calculated the post-trial significance for repeated flare occurrences at the source error-box position as follows.
When we performed one trial for each map (M = 1, we used a list with only one source; this meant that each independent time period was a single trial), the probability of having k or more detectionsover N maps at a specific site with a Ts statistic satisfying Ts ≥ h was given by , where p = p(h) is the p-value corresponding to the h value given by Eq. (2),
is the probability of exactly j detections in N maps and
is the probability of fewer than k detectionsat a specific position in N maps.
When we performed M trials at different positions of the N maps, where M is the number of known or unknown sources in a predefined list, the chance of having k or more detections above the level h in any of the sites with a Ts statistic satisfying Ts ≥ h is given by where PM(N,X ≥ k) = 1 − P(N,X < k)M = 1 − (1 − P(N,X ≥ k))M.
The choice of p(h) depends on our assumptions: if the flare comes froma known source we use Eq. (5), if the flare comes froma previously unknown source we use Eq. (4) (the case without ICL criterion).
Let us consider a simple case in the context of Galactic γ-ray transients, using the p-value function in the case of a complex Galactic region with the position ofa single source left free. If we detect one flare per year at a specific position of the Galactic plane with Ts > 16, and we produce 182 maps per year (once every two days with an integration time of two days), then after the first year the global post-trial significance is 2.16σ, after the second year it is 3.31σ, and after the third year it is 4.16σ. Transient sources, as long as they recur, enable us to have more confidence in their detection as integration time increases.
It might seem thatthis approach adds a bias tothe global significance of a detection of a flaring source, because it may happen that some flares from nearby sources can be “counted” together due to the extension of the 95% contour confidence level. In doubtful cases only a more detailed analysis can exclude this.
8. Conclusions
We have performedextensiveMonte Carlo simulations to characterize the maximum likelihood ratio test for the AGILE-GRID instrument in the context of short timescale (1–2 days) flaring γ-ray sources, both in extragalactic and Galactic fields. In the case of Galactic fields, we have simulated both a simple (without steady sources) and a complex Galactic region. With these simulations, we have calibrated both the Ts distributions (pre-trial significance) and the related false occurrence rate.
After the introduction of the post-trial significance, we calculated the post-trial probabilities for both single and multiple occurrences. In particular, we calculated the probability of two or more detections of a flaring source at the same sky position in different independent time intervals. With this approach, we took into account the presence of many flaring episodes originating from the same sky region, by combining their histories and adding information that was not present in the single episode and in the post-trial evaluation. We have referred to this as “repeated post-trial flare occurrence”.
In this paper, we have provided a method for converting the Ts produced by any of the various methods made available by the AGILE analysis software, into a probability. This information
can be used by anyone who performs analysis onGRID data using the AGILE Guest Observer Program.
Two independent γ-ray transient search systems have been developed. One system operates at INAF/IASF-Bologna (Bulgarelli et al. 2009), which is able to process the datawithin 1.5 h (from the last photon acquired in orbit to the alert generation), and to generate alerts via e-mail and SMS to the mobile phones of the AGILE Collaboration. A second system operates at the ADC (AGILE Data Center) in Frascati; it can react within an average time of three hours, and generates alerts via e-mail but with more accurate data processing.
See ASI Data Center web site, http://agile.asdc.asi.it/
Acknowledgments
The AGILE Mission is funded by the Italian Space Agency (ASI) with scientific and programmatic participation by the Italian Institute of Astrophysics (INAF) and the Italian Institute of Nuclear Physics (INFN). We acknowledge financial contribution from the agreement ASI-INAF I/009/10/0.
References
- Bulgarelli, A., Trifoglio, M., Gianotti, F., Di Cocco, G., & Tavani, M. 2009, ASP Conf. Ser., 411, 362 [NASA ADS] [Google Scholar]
- Cash, W. 1979, ApJ, 228, 939 [NASA ADS] [CrossRef] [Google Scholar]
- Giacconi, R., Gorenstein, P., Gursky, H., & Waters, J. R. 1967, ApJ, 148, L119 [NASA ADS] [CrossRef] [Google Scholar]
- Mattox, J. R., Bertsch, D. L., Chiang, J., et al. 1996, ApJ, 461, 39 [Google Scholar]
- Pittori, C., Verrecchia, F., Chen, A. W., et al. 2009, A&A, 506, 1563 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Tavani, M., Barbiellini, G., Argan, A., et al. 2009a, A&A, 502, 995 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Tavani, M., Bulgarelli, A., Piano, G., et al. 2009b, Nature, 462, 620 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
- Wilks, S. S. 1938, Ann. Math. Stat., 9, 60 [Google Scholar]
All Tables
Parameters and number of trials for the performed extragalactic and Galactic field simulations.
Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25 for different values of giso, the isotropic emission coefficient.
Best-fit parameters in the case of an empty field for simulated sky maps with bin size of 0°.25, giso = 6, for two levels of exposure.
Percentages of detections for some Ts thresholds (with related p-values when the null hypothesis is true) when the null hypothesis is false with a source at the (l, b) = (160, 0)° position with a simulated flux of 180 × 10-8 photons cm-2 s-1.
All Figures
![]() |
Fig. 1 Extragalactic (b = 90°) exposure map used for the simulations(in units of cm2 ssr) in Galactic coordinates. Notice the larger exposure near the celestial pole.The white circles are the positions of the trials, the green circles indicate the high and low chosen exposure regions. |
In the text |
![]() |
Fig. 2 Ts distribution (left side) and p-distribution (right side) of a simulated empty field, with ggal = 1 and giso = 6 (where these parameters are allowed to vary during the analysis), flux free, and position fixed. The bluecrosses are the calculated distribution, the black line is the best fit according to Eq. (3), the red dotted line is the |
In the text |
![]() |
Fig. 3 Comparison betweenp-value distributions for simulated (blue) and real (green) empty extragalactic fields. The red dotted line is the |
In the text |
![]() |
Fig. 4 The effect of the hypothesis formulation (trial selection). The p-value distribution depends on the constraint on the position of the source: a comparison of the different analysis methods (in particular, when the position of the source is either kept fixed or allowed to vary in the case of absence of a source at the (l, b) = (160, 0)° location. Histograms are the p-value distribution for an empty Galactic field when the null hypothesis is true. Blue and red histograms have the following parameters: Tss = 4, tlcl = 5.99147, flux and position of the source allowed to vary. The blue histogram contains all trials regardless of the calculated position, while the red histogram contains the trials that respect the ICL criterion. The brown histogram corresponds to the position of the source when it is kept fixed and flux free. The red dotted line is the |
In the text |
![]() |
Fig. 5 The blue histogram is the Ts distribution for the empty Galactic region when the null hypothesis for a source at the position (l, b) = (160, 0)° is true, Tss = 4, tlcl = 5.99147, and both the flux and position of the source are allowed to vary. Black line is the best-fit function described in Eq. (4), N2 = 5. The red dotted line is the |
In the text |
![]() |
Fig. 6 The effect of steady sources in the ensemble of models. The blue line is the PDF for Cygnus field when the null hypothesis for Cygnus X-3 is true, and the red line is the PDF for an empty Galactic field when the null hypothesis is true, with the parameters Tss = 4, tlcl = 5.99147, when the flux and position of source in the alternative hypothesis are allowed to vary, but ggal and giso parameters are fixed, ICL criterion. Black line is the best-fit function described in Eq. (5). The red dotted line is the |
In the text |
![]() |
Fig. 7 The effect of the hypothesis formulation (trial selection). The histograms show the p-value distributions in the presence of a simulated source at the location of (l, b) = (160, 0)° with flux 180 × 10-8 photons cm-2 s-1. Red histograms have the following parameters: Tss = 4, tlcl = 5.99147, flux and position of the source left free with ICL rejection criterion. Blue histograms have tlcl = 5.99147, flux and position of the source left free without ICL rejection criterion. Brown histograms have the flux left free and the position of the source kept fixed. |
In the text |
![]() |
Fig. 8 The binned counts map of the real (left side) and simulated (right side) Cygnus fieldforan integration time of 0.5 years (AGILE counts, E > 100 MeV). The real data is taken from July 2007 to October 2009; the map is centered on (l, b) = (78.75, 0)° in Galactic coordinates with a bin size of 0.1°. |
In the text |
![]() |
Fig. 9 The p-value distributions change as a function of the tlcl parameter. Simulations of the Cygnus region with no source present at Cygnus X-3 position were analyzed for different tlcl values without applying the ICL rejection criterion. The flux and position of the hypothetical source at the Cygnus X-3 were allowed to vary, the ggal and giso parameters were kept fixed, and Tss = 4. Green histogram: tlcl = 0; black histogram: tlcl = 2.29575 (corresponds to a 68% confidence level for two degrees of freedom); blue histogram: tlcl = 5.99147. The red dotted line is the |
In the text |
![]() |
Fig. 10 Comparison between different radii of analysis. The histograms are the p-value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 0, flux and position of Cygnus X-3 left free, ICL rejection applied. Red histogram:radius of analysis = 10°; blue histogram: radius of analysis = 5°. The red dotted line is the |
In the text |
![]() |
Fig. 11 Comparison between ggal and giso parameters free or fixed.The blue histogram is the p-distribution forthe Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 0, flux and position of Cygnus X-3 left free, ggal and giso parameters fixed.The red histogram has the same parameters but with ggal and giso parameters left free. The red dotted line is the |
In the text |
![]() |
Fig. 12 The effect of unmodeled sources. The histograms are the p-value distribution for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 5.99147, flux and position of Cygnus X-3 left free, ggal and giso parameters fixed. The red histogram shows the result when all the simulated sources are included in the models, while the black histogram shows the result when only the source at Cygnus X-3 position is included. The red dotted line is the |
In the text |
![]() |
Fig. 13 The effect of a poor estimation of the Galactic diffuse emission. The histograms are the p- value distributions for the Cygnus field when the null hypothesis for Cygnus X-3 is true with the following parameters: Tss = 4, tlcl = 5.99147, flux and position of Cygnus X-3 left free. The true value of giso in the simulated data is 7.7, the value of ggal used in the analysis is fixed to 0.67. Blue thin line: simulated ggal = 0.67, giso fixed to 7.7; blue thick line: simulated ggal = 0.67, gisoleft free; red thick line: simulated ggal = 0.67 ∗ 1.1, gisoleft free; black thick line: simulated ggal = 0.67 ∗ 0.89, gisoleft free. The red dotted line is the |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.