γray DBSCAN: a clustering algorithm applied to FermiLAT γray data
I. Detection performances with real and simulated data
^{1}
ISDC, University of Geneva,
Chemin d’Ecogia 16,
1290
Versoix,
Switzerland
email:
andrea.tramacere@unige.ch
^{2}
Politecnico di Milano, Piazza L. da Vinci 32,
20133
Milano,
Italy
Received:
30
July
2012
Accepted:
30
September
2012
Context. The density based spatial clustering of applications with noise (DBSCAN) is a topometric algorithm used to cluster spatial data that are affected by background noise. For the first time, we propose this method to detect sources in γray astrophysical images obtained from the FermiLAT data, where each point corresponds to the arrival direction of a photon.
Aims. We investigate the detection performance of the γray DBSCAN in terms of detection efficiency and rejection of spurious clusters.
Methods. We used a parametric approach, exploring a large volume of the γray DBSCAN parameter space. By means of simulated data we statistically characterized the γray DBSCAN, finding signatures that distinguish purely random fields from fields with sources. We defined a significance level for the detected clusters and successfully tested this significance with our simulated data. We applied the method to real data and found an excellent agreement with the results obtained with simulated data.
Results.We find that the γray DBSCAN can be successfully used in detecting clusters in γray data. The significance returned by our algorithm is strongly correlated with that provided by the maximum likelihood analysis with standard FermiLAT software, and can be used to safely remove spurious clusters. The positional accuracy of the reconstructed cluster centroid compares to that returned by standard maximum likelihood analysis, allowing one to look for astrophysical counterparts in narrow regions, which minimizes the chance probability in the counterpart association.
Conclusions.We found that γray DBSCAN is a powerful tool for detecting of clusters in γray data. It can be used to look for both pointlike sources and extended sources, and can be potentially applied to any astrophysical field related to detecting clusters in data. In a companion paper we will present the application of the γray DBSCAN to the full FermiLAT sky, discussing the potential of the algorithm to discover new sources.
Key words: gamma rays: general / methods: statistical / methods: data analysis / methods: numerical
© ESO, 2013
1. Introduction
Modern γray telescopes operating at energies above the MeV window provide eventresolved observational data. Each event (after the reconstruction process) is typically described by a tuple (i.e., an ordered list of elements) storing sky coordinates, arrival time, and energy. Discrete sources (either pointlike or extended) are detected with various methods. Given the discrete topological nature of γray images, methods based on cluster search, such as the minimum spanning three (MST; Campana et al. 2007, 2012), have successfully been used. One of the main advantages of topometric methods compared to methods using the spatial binning is to minimize the impact of the poor energydependent point spread function (PSF), typical of γray telescopes, and to preserve the spatial information of each event. Moreover, these methods are able to detect sources compounded by a small amount of events, but they need to be finetuned to take the background properly into account. The problem of background rejection is the most penalizing feature of topometric methods. Therefore we here present for the first time a method based on the density based spatial clustering of applications with noise (DBSCAN) algorithm (Ester et al. 1996). The DBSCAN is a topometric algorithm used to cluster spatial data that are affected by background noise. Compared to other topometric methods, it has the advantage to embed the discrimination between signal (cluster) and background (noise) inside the algorithm itself, according to the local density of events within a typical scanning brush, i.e., within a given scanning area.
The aim of the present paper is to show the potential of the method, and its statistical characterization when applied to astrophysical γray data. We applied this method to detect pointlike sources in the FermiLAT data. We explored a large volume of the γray DBSCAN parameter space by means of simulated data, and we provide a statistical characterization of the γray DBSCAN, finding signatures that distinguish purely random fields from fields with sources. We defined a significance level for the detected clusters, and we successfully tested this significance with our simulated data. We applied the method to real FermiLAT γray data and we found an excellent agreement with the results obtained with simulated data.
In a companion paper (Tramacere, in prep.), we will apply the method to the FermiLAT sky, investigating specific questions related to the FermiLAT response functions, showing the potential to the discovery new sources, in particular of small clusters located at high galactic latitude, or clusters on the Galactic plane that are affected by a strong background.
The paper is organized as follows. In Sect. 2 we describe the logic of the DBSCAN method and present the algorithm implemented to analyze γray data, the γray DBSCAN. In Sect. 3 we discuss some caveats regarding the application of the γray DBSCAN algorithm to γray data. In Sect. 4 we study the statistical properties of the γray DBSCAN detection, using a simulated test field with only noise, and five simulated test fields with noise plus pointlike sources. In Sect. 5 we evaluate the detection performance of the method in terms of positional accuracy, cluster reconstruction, and rejection of spurious clusters. In Sect. 6 we investigate the significance of the clusters and describe our algorithmic implementation. In Sect. 7 we finally use our method with real FermiLAT data, investigating the detection performance, and comparing the γray DBSCAN clusters significance to that returned by the maximum likelihood method with standard FermiLAT software^{1}. In Sect. 8, we present our conclusions and discuss future developments and applications.
2. The γray DBSCAN algorithm
The DBSCAN (Ester et al. 1996) is a topometric algorithm used to cluster spatial data that are affected by background noise. Some modifications have been developed to adapt the original DBSCAN algorithm to our study. Our algorithm is mainly built upon the following criteria:

1.
In a list of photons D, where each element p_{i} is a tuple storing positional sky coordinates, ρ(p_{k},p_{l}) is the angular distance between two photons p_{k} and p_{l}.

2.
We iterate over the full photon list D. A seed cluster is built when a minimum number of photons K + 1 is enclosed within a circle of radius ε centered on p_{i}.

3.
For each photon , we build the photon list by collecting all photons p_{k} that meet the condition ρ(p_{l},p_{k}) < ε, and .

4.
For each photon , if the number of photons enclosed within a circle of radius ε centered on p_{j} is ≤K and , then p_{j} will be attached to the final photon list of the cluster without a recursive search for more neighbors, these points are defined densityreachable.

5.
For each photon , if the number of photons enclosed within a circle of radius ε centered on p_{j} is >K and , p_{j} is attached to the , and step 3 is repeated recursively.

6.
When both conditions at step 4 and 5 are false, the cluster C_{m} is built by joining the densityreachable events to those in the and in the lists.

7.
The process starts again from step 1 searching for new clusters, skipping the events already flagged as noise or clusters, until all events in D are flagged as cluster, or noise, or densityreachable events.

8.
At the end of the process the full photon list will be partitioned as follows: (1)
In this way highdensity areas are classified as clusters (sources), conversely lowdensity areas are classified as noise (background). The recursive call of step 3 is not implemented in the original DBSCAN algorithm and represents a novelty. This new feature allows us to reconstruct clusters significantly larger than the ε radius, which makes it unlikely that a single clusters is fragmented into small satellite clusters. Moreover, it allows the possibility to reconstruct extended structures, in particular extended sources, or filamentary structures in the background.
After the clustering process, each photon in D will be described by a tuple storingthe photon position (both in galactic and celestial coordinates), the photon class type (noise or cluster), and the ID of the cluster the photon belongs to. Each cluster C_{m} will be described by a tuple storing the position of the centroid with his positional error, the ellipse of the cluster containment, the cluster effective radius (r_{eff}), and number of photons in the cluster (N_{p}). The ellipse of the cluster containment is defined by major and minor semiaxis (σ_{x} and σ_{y}) and the inclination angle (σ_{alpha}) of the major semiaxis w.r.t. the latitudinal coordinate (b or Dec). To evaluate the ellipse axis we use the principal component analysis method (PCA; Jolliffe 1986). This method uses the eigenvalue decomposition of the covariance matrix of the two position arrays x and y. By definition, the square root of the first eigenvalue will correspond to σ_{x} and the second to σ_{y}. The axes represent the two orthogonal directions of maximum variance of the cluster. The effective radius is defined as . To find the centroid of the cluster and its uncertainty, we used a weighted average of the position of each photon in C_{m}, as follows:

We define the first order centroid (C_{ave}) as the average of the position ofeach cluster photon: C_{ave} = (⟨x⟩, ⟨y ⟩).

We define the weight array, according to the distance between p_{k} ∈ C_{m} and C_{ave}: w_{k} = 1/ρ(p_{k},C_{ave}).

The cluster centroid C_{ctr} will result from the average of the position of each cluster point weighted by w_{k}.

The centroid position uncertainty (pos_{err}) is determined by propagating the error on the weighted average of C_{ctr}. We have numerically verified that pos_{err} corresponds to a ≈95% positional uncertainty.
3. Caveat on the application to γray data
The application of clustering methods, such as the γray DBSCAN, leads to practical difficulties that are mostly related to the instrument PSF and to gradient and/or structures in the background. To deal with these problems without biasing the detection results, we recommend to apply some criteria that we discuss in the following.
We first comment on the PSF impact. The PSF imposes a limit on the capability of an instrument to resolve sources separated by a distance smaller than the PSF size. Sources with sizes smaller than the PSF are classified as pointlike, otherwise they are classified as extended. Another complication is that the PSF often depends on the energy; in the case of FermiLAT, the 68% containment angle of the reconstructed incoming photon direction, for normal incidence photons, has a size of about 5 degrees at 100 MeV (Ackermann et al. 2012), and scales down to a few tenths of degree above the GeV energies^{2}. The size of the PSF is strongly connected to the size ε of the γray DBSCAN scanning brush. Indeed, if ε is much smaller than the PSF size, it might occur the risk to loose clusters characterized by small N_{p}, or to fragment a cluster with large N_{p} in smaller fake satellite clusters. We stress that the formation of satellite clusters is a very rare event, thanks to our recursive DBSCAN implementation, which we explain in Sect. 2. In contrast, if ε is much larger than the PSF, it is likely to build extended clusters contaminated by the background or by close sources.
A careful and selfconsistent analysis of the effects of the energy dependence of the PSF, and in general of problems related to the FermiLAT response function, is beyond the scope of this paper, where we focus mostly on a statistical characterization of the method. These topics will be investigated in the companion paper (Tramacere, in prep.).
A second relevant problem, is the inhomogeneity of the background, which affects the choice of both ε and K. If the background is homogeneous over the entire field, the best choice of a single pair of values of ε and K guarantees a safe rejection of the background. Indeed, values of ε and K such that the average density of photons within ε is significantly higher than the average density of the background photons, make it unlikely that a cluster grows from a background fluctuation. Unfortunately, the γray sky shows strong gradients of background, in particular at low galactic latitudes. To solve this problem, one could think to adapt the value of ε and K according to a local value of the background photon density. Since ε has a strong constraint imposed by the PSF, one should tune mainly the value of K. The drawback is that as we increase the value of K to compensate for the background, we decrease the capability to detect clusters with small N_{p}. To overcome this difficulty, we adopted an alternative solution. We used a unique pair of values of ε and K for each field, where ε is mostly constrained by the PSF, and K by the field average background, and we take into account the background inhomogeneities by defining a significance level of the cluster, according to the signaltonoise ratio (S/N; Li & Ma 1983), evaluated from the local background. This is explained in detail in Sect. 6. The capability to reject clusters according to a low significance level allows one to relax the constrain on ε and K, increasing the number of clusters detected, hence increasing the detection ratio, and at the same time allows one to reject spurious sources, because of the significance threshold. To avoid that the background is so high that the fluctuations in the background events can lead to densities comparable to those of weak sources, it’s recommended to apply a cut in energy, to make this possibility rare. To optimize the ratio between background and cluster events, we use a threshold energy of 3 GeV in the following, that mitigates the possible bias caused by the background fluctuations.
Fig. 1 Photon map for the sky test field 1, with the result of the γray DBSCAN detection for K = 5 and ε = 0.17 deg. The blue crosses refer to the simulated sources, the green boxes to 51 detected true clusters, and the red boxes to the 2 fake ones. The black dots represent the background events, the remaining colors indicate cluster events. 

Open with DEXTER 
4. Statistical properties of the γray DBSCAN clusters
4.1. The test fields
In this section we study the statistical properties of the clusters, looking for signatures that characterize random Poissonian fields and fields with pointlike sources. To accomplish this task we compare results obtained for a test field with only noise (random test field) and the five test fields with noise plus pointlike sources (sky test fields 1–5).
As skytest fields we use the same fields as in Campana et al. (2012). Each of these five sky fields covers a broad sky region with a galactic longitude extension of 80° < l < 170° and a galactic latitude extension of 40° < b < 65°. The γray background was simulated using the standard gtobssim^{3} tool, developed by the FermiLAT collaboration, simulating both the Galactic and isotropic components for a twoyearlong period, using a threshold energy of 3 GeV for a total amount of 9322 photons. To this photon list we added 70 simulated sources: for each source, the number of photons was chosen from a probability distribution given by a powerlaw with exponent 2 from a minimum value of 4 up to 40 photons, joined to a constant tail up of to 240 photons. The number of the sources is similar to that reported in the FermiLAT Second Source Catalog (Nolan et al. 2012, 2FGL hereafter), in the same region of the sky. The source events are spatially distributed with a bivariate Gaussian probability density function (PDF) with deg, centered at the source location. Five simulated test fields were generated, adding the simulated sources to the diffuse background. The only difference in the five realizations is the source location, randomly chosen to have different brightness contrast between sources and the background. The random test field covers the same area as the sky test fields and a number of events equal to the sky test field1 (background and sources) for a total amount of 11 044 events.
In Fig. 1 we show the photon map for the sky test field 1 and the result of the γray DBSCAN detection for K = 5 and ε = 0.17 deg. We detect 51 true clusters, and only 2 fake ones. A cluster is defined as true if the position of the simulated source falls within a circle centered on the cluster centroid, with a radius equal to 2pos_{err}.We call the remaining clusters fake. In Fig. 2, we show a closeup of two true clusters. The black ellipses correspond to the ellipses of the cluster containment, and the purple and orange thick points represent the cluster points, while the black thick dots represent the background.
4.2. Test strategy
We investigated the statistical properties of the γray DBSCAN clusters, in particular signatures that distinguish purely random fields from fields with pointlike sources, and their dependence on K and ε. To systematically investigate a broad volume of the parameter space, we used a parametric approach. We set the range of ε in [0.1 ÷ 0.50] deg. with a step of 0.01 deg, and the range of K in [2 ÷ 15], with a step of 1. The total amount of detection trials for each test field was 574. We collected the statistics of the trials and investigated the distribution of r_{eff} and N_{p}, and their connection with ε and K, respectively.
Fig. 2 Closeup of two true clusters reported in Fig. 1. The ellipses correspond to the ellipse of the cluster containment. The purple and orange points represent the cluster points, the black dots represent the background events, the blue crosses the position of the simulated sources, and green boxes the position of the cluster centroid. 

Open with DEXTER 
4.3. Statistics of r_{eff} and connection with ε
We started by investigating the distribution of the log (r_{eff}) values for the random and the sky test field 1. The distribution for the detections collected over the full Kε parameter space (top left panel of Fig. 3) shows a symmetric shape well fitted by a Gaussian distribution (lognormal w.r.t. r_{eff}), with the mean value of ⟨log _{10}(r_{eff})⟩ ≃ −0.45 (corresponding to ⟨r_{eff}⟩ ≃ 0.3 deg) and a dispersion of σ_{log 10(reff)} ≃ 0.23.The lognormal distribution provides a reasonable description of the empirical distributions also for individual pairs of (K, ε) values. An example is given in panel c of Fig. 3 for K = 3, ε = 0.3 deg, where the best fit values are ⟨log _{10}(r_{eff})⟩ ≃ −0.51, and σ_{log 10(reff)} ≃ 0.16. We now investigate the empirical distribution of log _{10}(r_{eff}) for fields with pointlike sources. In the right panel of Fig. 3, we show the case of the sky test field 1. The distributions of log _{10}(r_{eff}) are still described by a by a normal. For fake clusters (red dashed line) the best fit values of the mean (⟨log _{10}(r_{eff})⟩ ≃ −0.46) and of the dispersion (σ_{log 10reff} ≃ 0.24) are very similar to those found for the random test field. In contrast, the true cluster distribution (blue hatched histogram) is peaking around the value of log _{10}(r_{eff}) ≃ −0.67 deg, corresponding to r_{eff} ≃ 0.21 deg, very close to the value of the dispersion σ^{sim} = 0.20 deg, which was used to simulate the sources.Since the simulation parameter σ^{sim} reproduces the effect of the instrumental PSF, we observe that the typical size of the reconstructed clusters for nonrandom fields, is constrained by the PSF, suggesting that the empirical rule is to set the value of ε of about the PSF size.
To investigate the connection between ε and the PSF more accurately, we analyzed the statistical properties of the quantity r_{eff}/ε as a function of ε. For each value of ε, we determined the median, and the twosided 1σ confidence level (CL) interval around the median of the r_{eff}/ε distributions. In the left panel of Fig. 4 we plot the r_{eff}/ε median (blue solid circles) and 1σ CL region as a function of ε for the random field. We note that the r_{eff}/ε trend is slightly increasing with ε, and that the 1σ CL region is consistent with r_{eff}/ε = 1, but the upper boundary shows a systematic increase compared to the lower boundary for ε ≳ 0.30 deg. The trend for the true clusters in sky test field 1 (right panel Fig. 4), shows a different behavior. The median of r_{eff}/ε (red solid circles) is slightly decreasing with ε, showing that for trueclusters r_{eff} is not sensitive to the size of ε, being mostly constrained by the simulated PSF size. As expected, for the fake clusters (blue dashed line), the trend is almost identical to that of the clusters in the random field.
Fig. 3 Panel a) distribution of the values of log _{10}r_{eff} for the random field case, for the full parameter space (black line) and fit by means of Gaussian distribution (blue line). Panel b) the same as in the top panel, for K = 3 and ε = 0.3 deg. Panel c) distribution of log _{10}r_{eff} for the sky test field 1, for fake clusters (red solid line), and true clusters (blue solid line, hatched histogram). The dashed lines represent the Gaussian best fit. 

Open with DEXTER 
Fig. 4 Left panel: the r_{eff}/ε statistical distribution as a function of ε for the random field case. The blue solid circles represent the median, and the gray shaded area represents the 1σ confidence level region, for each value of ε. Right panel: the same as in the bottom left panel for the sky test field 1. The red solid circles represent the median of the true clusters case, and the grey area the 1σ confidence level region. The dashed line shows the 1σ confidence level region, for the fake clusters. 

Open with DEXTER 
Fig. 5 Left panels: the distribution of N_{p} for the random test field for K = 2, ε = 0.20 deg (panel a), red solid boxes). The empty blue bar line represents a Poissonian best fit. Panel b) shows K = 2 ε = 0.30 deg (purple solid triangles). Panel c) shows the full Kε parameter space, the solid black line represent a lognormal best fit. Right panels: panel c) shows the distribution of (black solid line), and its best fit by means of a lognormal distribution (red dashed line). Panel d) shows the N_{p} distribution for the fake clusters in the sky test field 1 (red solid circles), and the blue empty bars show a Poissonian best fit. Panel e) shows the N_{p} distribution for the fake clusters in the sky test field 1 (blue hatched histogram), the lognormal best fit (red dashed line), and the Poissonian fit (solid black line). 

Open with DEXTER 
Fig. 6 Panel b) the N_{p} statistical distribution as a function of K for the random field case. The blue solid circles represent the median and the gray shaded area represents the 1σ confidence level region around the median for each value of K. The dashed black line represents the N_{p} = K + 1 law. Panel e) same as in panel b) for the sky test field 1 case. Panel a) number of detected clusters for the random test field case (blue solid points) as a function of K, and best fit by means of a Poissonian survival function (red empty bars). Panel c) number of detected cluster for the sky test field 1 case (black solid points), for the fake clusters as a function of K, and best fit by means of a Poissonian survival function (red empty bars). Panel d) number of detected cluster for the sky test field 1 case (red solid boxes) for the true clusters as a function of K, and best fit by means of a Poissonian survival function (black empty boxes). 

Open with DEXTER 
4.4. Statistics of N_{p} and connection with K
We now investigate the statistics of the distribution of the number of photons per cluster. For random fields we expect that the number of photons in a cluster follows a Poisson distribution. Indeed, for a generic twodimensional Poisson process, the probability to observe a number of events (N(S) = j) enclosed by a surface S is given by (2)where λ is the average spatial density. Translating S in terms of ε^{2}, we can rewrite (3)from which it follows that given the value of K and ε, the probability to find a cluster as function of K and ε will be given by (4)that is the Poissonian survival function. Owing to the logic of the DBSCAN clustering process, the Poisson statistics cannot be extended from ε to r_{eff} for any value of ε. Indeed, a cluster is not a simple collection of points enclosed within a surface S, this holds only within the εsized circle, the seed of the cluster (C^{∗}). If we consider the annulus defined between ε and the cluster radius r_{clus}, not all points in the annulus will be cluster members, but only those that are at least density reachable. This implies that we expect a deviation from the Poisson statistics, when r_{eff} is significantly larger than ε, i.e. ε ≳ 0.3 deg (according to the analysis presented in the previous section). This expected deviation from the Poissonian statistics is confirmed by the plots in the left panels of Fig. 5. In panel a we show the distribution of N_{p} for K = 2 and ε = 0.20 deg. We note that the Poisson distribution (Eq. (3)) gives a reasonable description of the empirical distribution. In contrast, for ε = 0.30 deg (panel b) we observe that the Poisson distribution shows stronger deviations, in particular for K > 6. When we take into account the N_{p} distribution for the full parameter space (panel c), the Possonian distribution is failing to provide a reasonable description of the empirical distribution, whilst a lognormal distribution gives a good fit.
Fig. 7 Isolevel maps for D_{fake} (panel a)), D_{true} (panel b)), D_{eff} (panel c)), and Q (panel d)), for the sky test field 1. The white lines show isolevel = 0, the black lines show isolevel = 0.68, and the blue lines show isolevel = 0.95. 

Open with DEXTER 
The lognormal trend of N_{p} is consistent with the lognormal trend of the r_{eff} distribution. Since the number of photons in a cluster will be approximatively , we can write the PDF of N_{p}(5)To evaluate the distribution of we can use the standard transformation theory of random variables (RV; Papoulis 1965). It can be easily proved that for an RV X with a lognormal distribution, (6)the RV Y = X^{2} will follow a lognormal distribution given by (7)Indeed, our distribution, for the random field (panel d, Fig. 5), is fitted by a lognormal distribution peaking at ≃0.03 deg^{2}. Hence, according to Eq. (5) we expect that also f(N_{p}) will follow a lognormal distribution, when N_{p} is not ruled by a Poissonian statistics.
We verified that the same statistical trends describe the real sky fields. Panels e and f in Fig. 5 show the statistical distribution of N_{p} for the sky test field 1. In agreement with the analysis for the random test field, we see that the fakeclusters (ε = 0.30 deg, panel e in Fig. 5) are described by a Possonian statistic, whilst the true clusters (panel f in Fig. 5) are better described by a lognormal distribution (red dashed line) than by a Poissonian distribution (solid black line). We also observe that the lognormal law reasonably describes the empirical distribution only for values of N_{p} ≲ 50, but shows significant deviation in the tail, consistent with the statistics of our simulated source population.
To complete this statistical characterization, we investigated the distribution of the number of detected clusters as a function of the threshold K. According to Eq. (4), we expect that the number of detected clusters for a random field follows a Poisson survival distribution. Plot a of Fig. 6 confirms our hypothesis; indeed, the Poisson survival function provides a reasonable description of the empirical distribution. The same holds for fake clusters of the sky test field 1 (plot c Fig. 6). In contrast, for true clusters (panel d Fig. 6), the Poisson survival distribution is unable to reproduce the observed trend, consistently with the nonPoissonian statistic of the simulated clusters. Panels b and e of Fig. 6s show the 1–σ CL region for N_{p} as a function of K. Both for random and sky field true clusters the lower boundary of the region is constrained by the equation y = K + 1, which is consistent with the γray DBSCAN logic. In contrast, the upper boundary shows a different behavior. For the random field, the upper boundary deviates from the lower boundary, which is compatible with the fluctuations of the events around the ε circle, and ranges from about 8 to about 16. In contrast, for the true sky field the upper boundary is constrained by the statistics of the number of events in the simulated sources, and ranges from about 60 to 100.
5. Testing the detection performance with simulated γray data
In this section we investigate the detection performance of the γray DBSCAN. We first study the dependency of the detection efficiency on K and ε and their impact on the spurious ratio, and on the detection efficiency. Then, we investigate the capability of the algorithm to reconstruct the simulated clusters, and the positional accuracy of the reconstructed centroids. We test the detection performance of the γray DBSCAN, using as benchmark the five sky test fields used in the previous section, exploring the same parameter space.
5.1. Detection efficiency and spurious ratio as a function of K and ε
To investigate the detection performance of the γray DBSCAN, we ran for each of the five sky test fields and for each pair of values K, ε, a γray DBSCAN detection. For each detection run, we built a cluster catalog. Starting from this, we built the corresponding candidate catalog. This is a list of sources built by taking into account two possible biases, the confusion and the multiple association, in detail:

A cluster is defined as true, i.e., with a possible counterpart, if theposition of the simulated source falls within a circle centered onthe cluster centroid, with a radius equal to 2pos_{err}.

Two or more true clusters are defined as confused if they have the same counterpart

A true cluster has a multiple association if it has more than one counterpart.
We stress that the number of confused clusters is negligible, indeed, the average number of confused clusters per run is about 0.08, and no confused clusters are found for K > 4. Moreover, the average number of multiple associations per run is about 0.2.
The final candidate catalog will count a number of candidate sources N_{src}, each identified by a unique SRC_{ID}. The number of spurious sources will be N_{fake} = N_{src} − N_{true}. To characterize the performance, we define the following parameters:

the detection efficiency: (8)where N_{sim}(N_{p}sim. > K) is the number of simulated sources with a number of simulated events larger than K

the true detection ratio D_{true} = N_{true}/N_{src}

the spurious detection ratio D_{fake} = N_{fake}/N_{src}

the overall detection quality factor (Q), which takes into account the tradeoff between D_{eff} and D_{fake}, defined as (9)
The D_{eff} parameter shows the fraction of simulated clusters above the threshold N_{p}sim = K detected by the method after subtracting the fake ones. Hence, it does not provide an indication of the spurious contamination. For this reason we introduced the Q parameter, which rescales the D_{eff} according to the ratio between fake clusters, and found clusters N_{src}. We recall that according to the D_{eff} definition in Eq. (8), it is possible to obtain values of D_{eff} > 1.0. Assuming to have a simulated cluster where, for a given K and ε, the corresponding seed cluster has a size N^{∗} = N_{p}sim. = K. If there are no background events within the circle of radius ε, this cluster will be rejected. If we have one or more background events contained within the circle of radius ε, i.e., N^{∗} > K, the cluster will be detected. For this reason, we then report a value of D_{eff} = 1.0. The same applies to Q.
Fig. 8 Panel a) red solid boxes show the mean positional error of the centroid for true clusters in sky test field 1 and the standard deviation (vertical error bar) vs. N_{p}. The clusters are binned in N_{p}, with the bin width indicated by the horizontal error bar. The black solid circles represent the corresponding trend for the distance between the cluster centroid and the simulated source position.Panel b) the distribution of the distance between the simulated source position and the cluster centroid, expressed in arcsec, for ε = 0.10 deg (black line), ε = 0.15 deg (blue line), and ε = 0.20 deg (red lines). Panel c) the cumulative distributions corresponding to panel b). 

Open with DEXTER 
Summary of the detections obtained for all the five sky fields for detections with a number of fake sources ≤6.
Fig. 9 Top panel: the average number of photons associated to each clusters N_{p}, and their dispersion (vertical bar) vs. the number of photons simulated (N_{p} sim). The red points refer to the sub parameter space ε = 0.15 deg, and the solid blue circles to the ε = 0.20 deg subspace. The solid green lines represent the law N_{p} = N_{p} sim. The dashed lines represent the law N_{p} = N_{p} sim. ± 10. Bottom panel: the corresponding fractional deviation (N_{p} − N_{p}sim.)/N_{p}sim. 

Open with DEXTER 
In Fig. 7 we summarize the detection runs for sky test field 1 for the full parameters space with K > 2. Panel a shows the isolevel map of the fake clusters detection ratio. The gradient in the isolevel map is quite sharp, and roughly half of the parameter space shows no fake clusters (white isolevel line). To have a better understanding of the impact of fake clusters, it’s interesting to compare the D_{fake} isolevel map to the D_{true} isolevel map (panel b Fig. 7). The map shows also in this case a sharp gradient, and the region with D_{true} > 0.95 overlaps the D_{fake} = 0 region. These two maps clearly show the region of the parameter space where the algorithm has the best performance, but the D_{true} and D_{fake} ratios do not provide information on the ratio between the number of true detected clusters and the number of simulated clusters. For this point more information is provided by the D_{eff} isolevel map (panel c, Fig. 7). To focus on the “effective” volume of the parameter space, we hide the region where D_{eff} < 0 with a white area. We note that the isolevel lines D_{eff} = 0 and the isomap lines in the maximum gradient area show a positive correlation between K and ε, meaning that an increased value of ε requires an increased value of K to obtain a better background rejection. To evaluate the tradeoff between D_{true} and D_{fake} better, we plot in panel d of Fig. 7 the isolevel map of Q. This plot shows that the area corresponding to Q > 0.95 is consistent with that found for D_{eff}. In Table 1 we report the D_{eff} values obtained for all five sky fields, for detections with a number of fake sources ≤6. The average values of true clusters ranges between 44 and 51, with the fake ones ranging between 1 and 3, and an average D_{eff} between 0.96 and 1.0. This is a very promising result.
5.2. Cluster reconstruction and positional accuracy
The positional accuracy of the topometric methods, is probably the most important feature of this class of algorithms. In Sect. 2, we have described our weighting method to reconstruct the centroid of the cluster.
Panel a of Fig. 8 shows with red solid boxes the mean positional error of the cluster centroids and the standard deviation (vertical error bar) vs. N_{p}, for the true clusters of sky test field 1 with ε ≤ 30 deg. The clusters are binned in N_{p}, with the bin width indicated by the horizontal error bar. As expected, the uncertainty on the reconstructed cluster centroid is (solid red line). The solid black circles represent the corresponding trend for the separation between the simulated cluster position and the reconstructed cluster centroid. For N_{p} ≳ 30, the separation is below 2′. In panel b of Fig. 8 we plot the distribution histogram of the angular separation between the position of the simulated source and the position of the cluster centroid. For the three cases of ε = 0.10 deg, ε = 0.15 deg, and ε = 0.20 deg, the positional error is below the 1.5′, for 68% of the sample.
Fig. 10 Left panel: the distribution (blue line) of the square of the significance for the fake clusters in sky test field 1 for the full K, ε parameter space compared to a χ^{2} distribution with one degree of freedom. Right panel: the spurious ratio D_{fake} for S_{cls} > 4.0, the white line shows the isolevel D_{fake} = 0.0. 

Open with DEXTER 
In addition to positional accuracy, is also important to understand the capability of the γray DBSCAN to reconstruct the simulated cluster in terms of number of photons. Indeed, this information gives an idea of the average number of background photons contaminating the reconstructed cluster. In the top left panel of Fig. 9, we show the scatter plot of N_{p} vs. the number of simulated events (N_{p} sim.). The solid points represent the average value of N_{p}, for a given value of N_{p} sim., and the error bar corresponds to the standard deviation. The solid green line represents N_{p} = N_{p} sim., and the dashed upper and lower lines represent N_{p} = N_{p} ± 10 sim. For ε = 0.15 deg and ε = 0.20 deg, the scatter is bounded by the dashed lines, showing that the highest excess in the N_{p} is about 10 photons, independently of N_{p} sim. For ε = 0.15 deg, the number of reconstructed photons systematically underestimates the simulated number, whilst the ε = 0.20 deg case does not shows this bias. It is possible to appreciate this effect better in the bottom left panel of Fig. 9, where we show the fractional reconstruction error (N_{p} − N_{p} sim.)/N_{p} sim. vs. N_{p} sim. The solid green line represents the 0 error, and the dashed lines represent the ± 20% boundaries.
The bias on N_{p} for ε = 0.15 deg again shows the strong correlation between ε and the PSF radius. When ε is smaller than the σ^{sim} (that in our simulations reproduces the PSF effect), the number of reconstructed events N_{p} is systematically smaller than N_{p} sim., in contrast, when the ε radius matches the PSF radius size (ε = 0.20 deg), the bias disappears.
6. Cluster significance, background inhomogeneities, and rejection of spurious clusters
Even though we have identified the region of the Kε parameter space where the detection efficiency is higher and the probability to detect fake clusters is lower, in the application to real data it is mandatory to provide a significance level that expresses the probability that a cluster is not originated in a background fluctuation. We propose a method derived from the Li & Ma (1983) approach, based on the evaluation of the S/N. A significance method based on the S/N fits the γray DBSCAN implementation well, because the algorithm directly provides a partition of the photon list in cluster and noise events. Hence, for each cluster we can easily evaluate the S/N, knowing the exact nature of each event. The procedure to evaluate the significance is summarized by the following items:
Fig. 11 Aitoff projection of the Fermi sky region. The purple boxes represent the γray DBSCAN sources (K = 8,ε = 0.21 deg). The green crosses are the 2FGL sources with TS > 16, the red crosses those with TS ≤ 16. There are no fake sources, and the γray DBSCAN finds all the sources with TS > 16, except one, enclosed by the red circle, whose center is positioned at the edge of the field. 

Open with DEXTER 

1.
For each cluster we define an annular region with an inner radius r_{in} and an external radius r_{out}.

2.
r_{in}is set to an initial value of r_{in} = 2r_{eff}, and is adaptively increased with a step of r_{in}/10 for a maximum of 10 trials until at least the 95% of the cluster events are enclosed within r_{in}.

3.
r_{out}is set to 3r_{in}.

4.
We count all cluster events and the background events that are enclosed within the circle with radius r_{in} and are centered on the cluster centroid.

5.
We determine the background level, rescaling the number of background events in r_{in} < r < r_{out}, to a circle with radius r_{in}.

6.
To evaluate possible gradients in the background, we select a region far enough from the cluster to properly sample the background level, and close enough to the cluster to measure a local background level. For this, we define the radius , and evaluate the average background level () in a circle of radius r_{in}, centered on each point in .

7.
If no background points are found in , we set .

8.
By comparing to , we evaluate the fraction of noise already resolved by the γray DBSCAN and evaluate the effective background level by correcting for .

9.
We evaluate the significance according to the Likelihood Ratio Test (LRT) method proposed by Li & Ma (1983): (10)
Assuming that a cluster is due to a background fluctuation, the variable is expected to follow a chi square distribution, with one degree of freedom (χ(1)^{2}). In the left panel of Fig. 10, we plot the distribution of for the fake clusters in sky test field 1 (blue histogram), compared to a χ(1)^{2} distribution. The empirical distribution is well described by the expected χ(1)^{2} distribution, proving that the value of S_{cls} can be used as the “significance” of the detected cluster. A highly illustrating example of the power of S_{cls} in rejecting fake clusters is given by the plot in the right panel in Fig. 10, where we plot the D_{fake} ratio isolevel map, applying the selection S_{cls} > 4.0. The fake ratio is 0 for the parameter space with ε ≲ 0.25 deg. For 0.25 deg ≲ ε ≲ 0.35 deg, there are fluctuations showing D_{fake} ≲ 0.05. The fake ratio shows a significant increase only for ε ≳ 0.40 deg and K ≲ 8, but we stress that in this region of the parameter space ε is more than twice of the PSF size, hence this is a region of the parameter space that should not be used in the detection with real data.
7. Application to real FermiLAT data
The last step in our investigation of the γray DBSCAN is the application to real FermiLAT γray data. We selected the same region of the sky as was used for the simulated test field ( 80° < l < 170°, and 40° < b < 65°) and extracted all the photons with energy E > 3 GeV. The photons are collected for the same time span of the 2FGL catalog. We repeated the detection test performed with simulated data (see Sects. 5 and 6), restricting the parameter space to 2 ≤ K ≤ 10 and 0.10 ≤ ε ≤ 0.30 deg.
To properly understand the detection performance, we need to take into account that the 2FGL catalog has been built using photons with an energy threshold of 100 MeV, whilst we used a value of 3 GeV. A possibility is to select sources with a reported flux higher than zero, in the 3–10 GeV band flux column of the 2FGL. This fluxbased selection is not the best way to study the detection performance of the γray DBSCAN, indeed, the flux does not contain an unambiguous relation with the significance of the detection for that energy threshold. A more reliable criterion is to select the sources according to the significance reported in the 2FGL. The 2FGL detection significance is given by the . The TS is the test statistic defined as TS = 2(log L(source) − log L(no source)), where L is the likelihood of the data given the model with or without a source present at a given position on the sky (Nolan et al. 2012). We applied a selection according to and refer to the corresponding source list (counting 35 sources) as 2FGL_{TS>16}.
An example of the application of the γray DBSCAN to real FermiLAT data is given in Fig. 11, where we report an Aitoff projection in galactic coordinates of the analyzed γray sky region. The red crosses represent the 2FGL sources with TS < 16 in the 3–10 GeV band, and the green crosses represent those with TS ≥ 16. The purple boxes represent the γray DBSCAN sources found for K = 8, ε = 0.21 deg. For this choice of parameters, we find no fake sources, and we find all sources with TS > 16, except for one that is enclosed by the red circle and is positioned at the edge of the sky region, with a galactic latitude l = 64.85 deg. In Table 2 we summarize the detection performance for detections with a number of fake sources ≤4. Values of true clusters range between 35 and 34, out of the 35 present in the 2FGL_{TS>16}. The fake ones range between 1 and 4, and we obtain an average detection efficiency of D_{eff} = 0.94.
In Fig. 12 we compare the localization performance of the γray DBSCAN algorithm with that returned by the likelihood analysis implemented in the Fermi Science Tools. For each source in our 2FGL_{TS>16} list, associated to one or more γray DBSCAN clusters, we plot the the error on the position of the reconstructed cluster centroid and its standard deviation (represented by the error bar) vs. the 95% positional uncertainty reported in the 2FGL. We evaluate the 2FGL 95% positional uncertainty as , where σ_{95,min} and σ_{95,max} are the semimajor and semiminor axes of the 95% confidence source location region. The dashed red line represents a linear best fit with a slope of ≃0.99 and an intercept of ≃9.53, showing that the error on the position of the reconstructed cluster centroid, performed with a threshold of 3 GeV, is of the same order of the 95% positional uncertainty reported in the 2FGL catalog, performed above 100 MeV.
To test the reliability of the significance S_{cls} to reject spurious sources, we plot the D_{fake} and D_{eff} based on the 2FGL_{TS>16} catalog in Fig. 13. Panels a and b correspond to no selection on S_{cls}. Both the D_{fake} and the D_{eff} trends are very similar to the case of the simulated sky. If we apply a significance cut of S_{cls} > 2.0 (panels c,d), we observe that the spurious ratio is D_{fake} ≤ 0.05 for almost half of the parameter space (region to the right of the purple line). The more severe cut of S_{cls} > 4.0 (panels d and e), removes all fake clusters for ε ≲ 0.20 deg, except two for ε ≲ 0.15 deg. Only for ε ≳ 0.25 deg, the D_{fake} ratio shows a significant increase, ranging from 0.05 up to ≃0.1. In agreement with our analysis on simulated data, the region of the parameter space where ε is comparable to the PSF size gives the better performance.
To have an additional confirmation about the robustness of our significance we plot in the right panel of Fig. 14, S_{cls} vs. . For each source in our 2FGL_{TS>16} list, associated to one or more γray DBSCAN cluster, we plot the in the 3–10 GeV band vs. the average value of S_{cls} and its standard deviation (represented by the error bar). The average value of S_{cls} and its standard deviation are evaluated from the list of all clusters associated to the same 2FGL source. The solid blue boxes represent the full K, ε parameter space case, and the red solid circles represent the ε = 0.10 deg case. The dashed black line represents a linear best fit. The slope of the linear fit is ≃0.5. The strong correlation in the scatter plots (r ≃ 0.98, for both data sets) proves that our significance implementation is consistent with the reported in the 2FGL, and the slope of the linear fit suggests that .
Summary of the detection performance for the real FermiLAT field for detections with a number of fake sources ≤4.
Fig. 12 Scatter plot of the positional error of the γray DBSCAN clusters vs. the positional error of the corresponding associated 2FGL_{TS>16} sources. For each 2FGL_{TS>16} source associated to one or more γray DBSCAN clusters, we plot the error on the position of the reconstructed cluster centroid and its standard deviation (represented by the error bar). The dashed red line represents a linear best fit with a slope of ≃ 0.99 and an intercept of ≃ 9.53. 

Open with DEXTER 
Fig. 13 D_{fake} (left panels) and D_{eff} (right panels) for the real sky detections using the 2FGL_{TS>16} catalog. Panels a), b) no cut on S_{cls} applied. Panels c), d) S_{cls} > 2.0. Panels e), f) S_{cls} > 4.0. 

Open with DEXTER 
Fig. 14 Left panel: scatter plot of S_{cls} vs. . For each source in our 2FGL_{TS>16} list, associated to one or more γray DBSCAN cluster, we plot the in the 3–10 GeV band vs. the average values of S_{cls} and its standard deviation (represented by the error bar). Right panel: the distribution (blue line) of the square of the significance for the fake clusters in the FermiLAT real sky for the full K, ε parameter space compared to a χ^{2} distribution with one degree of freedom. 

Open with DEXTER 
8. Conclusions
For the first time, we have used the DBSCAN to detect sources in γray astrophysical images. We implemented a new version of the DBSCAN, the γray DBSCAN, which is optimized for the application to γray astrophysical images with relevant background noise. Our γray DBSCAN presents the novelty of recursive call of the DBSCAN algorithm, which allows an excellent reconstruction of the cluster with an effective background rejection. We tested the algorithm with a sample of simulated γray FermiLAT fields to give a statistical characterization of the method and to benchmark the detection performance. The results, with the simulated γray data, are summarized below.

The radius of the γray DBSCAN scanning brush ε has a strong correlation with the instrumental PSF radius. We find that the typical size of the reconstructed true cluster is on the order of the simulated PSF size σ^{sim}, and that the precision of the reconstructed centroid is on the order of .

The number of reconstructed events N_{p} is ruled by the Poissonian statistics in the randomfields and for the fake clusters. In contrast, for true clusters, the statistics of N_{p} is ruled by that of the simulated sources.

The fractional error on the reconstructed event number is about 20% for N_{p}sim. ≲ 50, and is negligible for higher values, with best performance obtained when ε ≃ σ^{sim}.

We investigated the detection performance for a wide range of the K, ε parameter space and identified the region with the best performance in terms of detection efficiency, and spurious ratio.

We implemented an algorithm to estimate the S/N, able to deal with local background inhomogeneities and nearby sources contamination, and we successfully used the S/N estimate to determine the significance of the clusters, using the definition in Li & Ma (1983).

Our cluster significance, S_{cls}, for random clusters follows the χ(1)^{2} statistics and can be used to reject spurious sources. The chance to find spurious sources for S_{cls} > 4 is negligible. This means that our S_{cls} is a robust and reliable tool to reject spurious sources, and that χ(1)^{2} statistics can be used to evaluate the probability of a cluster to be spurious.
We successfully applied the γray DBSCAN to real FermiLAT data. We found an excellent agreement with results from the simulated fields. We tested our detection performance using the 2FGL source catalog with a cut. The results, with the real FermiLAT γray data, are:

The error on the position of the reconstructed cluster centroid,performed with a threshold of 3 GeV, is on the same order as the 95%positional uncertainty reported in the 2FGL, performed above100 MeV.

We tested the γray DBSCAN significance, finding that it is strongly correlated with the TS provided in the 2FGL. The significance cut allows one to safely remove spurious clusters.

The detection efficiency with real data is excellent, we are able to find all 35 sources with .

When working with ε on the order of the instrumental PSF size, we obtain the best performance in terms of spurious rejection and detection efficiency,
In general, we find that the γray DBSCAN is a very powerful detection method to find clusters in γray images, corresponding to real sources. It has the great advantage to deal selfconsistently with gradient in the background, providing an effective rejection of spurious clusters. Our implementation of the detection significance, in addition to the algorithm to evaluate local fluctuations in the background, allows one to apply a statistically significant selection, making the rejection of spurious sources even more effective.
In a companion paper (Tramacere, in prep.), we will a apply the method to the FermiLAT sky, showing the potential to discover new sources, in particular small clusters located at high galactic latitude, or clusters on the Galactic plane that are affected by a strong background. We will also investigate how to include the energy dependence of the PSF into the γray DBSCAN algorithm, and how to improve the detection performance taking into account other FermiLAT calibration properties.
We remark that, since the γray DBSCAN also provides density maps, it can potentially be used to detect largescale
structures in the Galactic γray background, providing patterns to compare to the interstellar gas distribution. We also stress that the applications of this method are not limited to γray images, but can potentially be used for any application related to the detection of spatial, and/or spatio/temporal clusters.
Acknowledgments
We are grateful to E. Massaro, R. Campana and E. Bernieri for helpful comments and for providing us the simulated test fields. We are grateful to G. Tosti for helpful comments. We thank the anonymous referee for providing us with constructive comments and useful suggestions.
References
 Ackermann, M. Ajello, M. Albert, A. et al. (FermiLAT collaboration) 2012, ApJS, 203, 4 [NASA ADS] [CrossRef] [Google Scholar]
 Campana, R., Massaro, E., Gasparrini, D., Cutini, S., & Tramacere, A. 2007, MNRAS, 383, 1166 [NASA ADS] [CrossRef] [Google Scholar]
 Campana, R., Massaro, E., Bernieri, E., Tinebra, F., & Tosti, G. 2012, submitted [Google Scholar]
 Ester, M., Kriegel, H., Sander, J., & Xu, X. 1996, In Proc. 2nd International Conference on Knowledge Discovery and Data Mining [Google Scholar]
 Jolliffe, I. T. 1986, Principal component analysis [Google Scholar]
 Li, T.P., & Ma, Y.Q. 1983, ApJ, 272, 317 [NASA ADS] [CrossRef] [Google Scholar]
 Nolan, P. L., Abdo, A. A., Ackermann, M., et al. 2012, ApJS, 199, 31 [NASA ADS] [CrossRef] [Google Scholar]
 Papoulis, A. 1965, Probability, Randon Variables and Stochastic Processes [Google Scholar]
All Tables
Summary of the detections obtained for all the five sky fields for detections with a number of fake sources ≤6.
Summary of the detection performance for the real FermiLAT field for detections with a number of fake sources ≤4.
All Figures
Fig. 1 Photon map for the sky test field 1, with the result of the γray DBSCAN detection for K = 5 and ε = 0.17 deg. The blue crosses refer to the simulated sources, the green boxes to 51 detected true clusters, and the red boxes to the 2 fake ones. The black dots represent the background events, the remaining colors indicate cluster events. 

Open with DEXTER  
In the text 
Fig. 2 Closeup of two true clusters reported in Fig. 1. The ellipses correspond to the ellipse of the cluster containment. The purple and orange points represent the cluster points, the black dots represent the background events, the blue crosses the position of the simulated sources, and green boxes the position of the cluster centroid. 

Open with DEXTER  
In the text 
Fig. 3 Panel a) distribution of the values of log _{10}r_{eff} for the random field case, for the full parameter space (black line) and fit by means of Gaussian distribution (blue line). Panel b) the same as in the top panel, for K = 3 and ε = 0.3 deg. Panel c) distribution of log _{10}r_{eff} for the sky test field 1, for fake clusters (red solid line), and true clusters (blue solid line, hatched histogram). The dashed lines represent the Gaussian best fit. 

Open with DEXTER  
In the text 
Fig. 4 Left panel: the r_{eff}/ε statistical distribution as a function of ε for the random field case. The blue solid circles represent the median, and the gray shaded area represents the 1σ confidence level region, for each value of ε. Right panel: the same as in the bottom left panel for the sky test field 1. The red solid circles represent the median of the true clusters case, and the grey area the 1σ confidence level region. The dashed line shows the 1σ confidence level region, for the fake clusters. 

Open with DEXTER  
In the text 
Fig. 5 Left panels: the distribution of N_{p} for the random test field for K = 2, ε = 0.20 deg (panel a), red solid boxes). The empty blue bar line represents a Poissonian best fit. Panel b) shows K = 2 ε = 0.30 deg (purple solid triangles). Panel c) shows the full Kε parameter space, the solid black line represent a lognormal best fit. Right panels: panel c) shows the distribution of (black solid line), and its best fit by means of a lognormal distribution (red dashed line). Panel d) shows the N_{p} distribution for the fake clusters in the sky test field 1 (red solid circles), and the blue empty bars show a Poissonian best fit. Panel e) shows the N_{p} distribution for the fake clusters in the sky test field 1 (blue hatched histogram), the lognormal best fit (red dashed line), and the Poissonian fit (solid black line). 

Open with DEXTER  
In the text 
Fig. 6 Panel b) the N_{p} statistical distribution as a function of K for the random field case. The blue solid circles represent the median and the gray shaded area represents the 1σ confidence level region around the median for each value of K. The dashed black line represents the N_{p} = K + 1 law. Panel e) same as in panel b) for the sky test field 1 case. Panel a) number of detected clusters for the random test field case (blue solid points) as a function of K, and best fit by means of a Poissonian survival function (red empty bars). Panel c) number of detected cluster for the sky test field 1 case (black solid points), for the fake clusters as a function of K, and best fit by means of a Poissonian survival function (red empty bars). Panel d) number of detected cluster for the sky test field 1 case (red solid boxes) for the true clusters as a function of K, and best fit by means of a Poissonian survival function (black empty boxes). 

Open with DEXTER  
In the text 
Fig. 7 Isolevel maps for D_{fake} (panel a)), D_{true} (panel b)), D_{eff} (panel c)), and Q (panel d)), for the sky test field 1. The white lines show isolevel = 0, the black lines show isolevel = 0.68, and the blue lines show isolevel = 0.95. 

Open with DEXTER  
In the text 
Fig. 8 Panel a) red solid boxes show the mean positional error of the centroid for true clusters in sky test field 1 and the standard deviation (vertical error bar) vs. N_{p}. The clusters are binned in N_{p}, with the bin width indicated by the horizontal error bar. The black solid circles represent the corresponding trend for the distance between the cluster centroid and the simulated source position.Panel b) the distribution of the distance between the simulated source position and the cluster centroid, expressed in arcsec, for ε = 0.10 deg (black line), ε = 0.15 deg (blue line), and ε = 0.20 deg (red lines). Panel c) the cumulative distributions corresponding to panel b). 

Open with DEXTER  
In the text 
Fig. 9 Top panel: the average number of photons associated to each clusters N_{p}, and their dispersion (vertical bar) vs. the number of photons simulated (N_{p} sim). The red points refer to the sub parameter space ε = 0.15 deg, and the solid blue circles to the ε = 0.20 deg subspace. The solid green lines represent the law N_{p} = N_{p} sim. The dashed lines represent the law N_{p} = N_{p} sim. ± 10. Bottom panel: the corresponding fractional deviation (N_{p} − N_{p}sim.)/N_{p}sim. 

Open with DEXTER  
In the text 
Fig. 10 Left panel: the distribution (blue line) of the square of the significance for the fake clusters in sky test field 1 for the full K, ε parameter space compared to a χ^{2} distribution with one degree of freedom. Right panel: the spurious ratio D_{fake} for S_{cls} > 4.0, the white line shows the isolevel D_{fake} = 0.0. 

Open with DEXTER  
In the text 
Fig. 11 Aitoff projection of the Fermi sky region. The purple boxes represent the γray DBSCAN sources (K = 8,ε = 0.21 deg). The green crosses are the 2FGL sources with TS > 16, the red crosses those with TS ≤ 16. There are no fake sources, and the γray DBSCAN finds all the sources with TS > 16, except one, enclosed by the red circle, whose center is positioned at the edge of the field. 

Open with DEXTER  
In the text 
Fig. 12 Scatter plot of the positional error of the γray DBSCAN clusters vs. the positional error of the corresponding associated 2FGL_{TS>16} sources. For each 2FGL_{TS>16} source associated to one or more γray DBSCAN clusters, we plot the error on the position of the reconstructed cluster centroid and its standard deviation (represented by the error bar). The dashed red line represents a linear best fit with a slope of ≃ 0.99 and an intercept of ≃ 9.53. 

Open with DEXTER  
In the text 
Fig. 13 D_{fake} (left panels) and D_{eff} (right panels) for the real sky detections using the 2FGL_{TS>16} catalog. Panels a), b) no cut on S_{cls} applied. Panels c), d) S_{cls} > 2.0. Panels e), f) S_{cls} > 4.0. 

Open with DEXTER  
In the text 
Fig. 14 Left panel: scatter plot of S_{cls} vs. . For each source in our 2FGL_{TS>16} list, associated to one or more γray DBSCAN cluster, we plot the in the 3–10 GeV band vs. the average values of S_{cls} and its standard deviation (represented by the error bar). Right panel: the distribution (blue line) of the square of the significance for the fake clusters in the FermiLAT real sky for the full K, ε parameter space compared to a χ^{2} distribution with one degree of freedom. 

Open with DEXTER  
In the text 