Issue |
A&A
Volume 602, June 2017
|
|
---|---|---|
Article Number | A86 | |
Number of page(s) | 12 | |
Section | Catalogs and data | |
DOI | https://doi.org/10.1051/0004-6361/201629552 | |
Published online | 20 June 2017 |
Research and characterisation of blazar candidates among the Fermi/LAT 3FGL catalogue using multivariate classifications⋆
1 LUTH, Observatoire de Paris, PSL Research University, CNRS, Université Paris Diderot, 5 place Jules Janssen, 92190 Meudon, France
e-mail: julien.lefaucheur@obspm.fr
2 APC, AstroParticule et Cosmologie, Université Paris Diderot, CNRS/IN2P3, CEA/Irfu, Observatoire de Paris, Sorbonne Paris Cité, 10 rue Alice Domon et Léonie Duquet, 75205 Paris Cedex 13, France
e-mail: santiago.pita@apc.in2p3.fr
Received: 19 August 2016
Accepted: 15 February 2017
Context. In the recently published 3FGL catalogue, the Fermi/LAT collaboration reports the detection of γ-ray emission from 3034 sources obtained after four years of observations. The nature of 1010 of those sources is unknown, whereas 2023 have well-identified counterparts in other wavelengths. Most of the associated sources are labelled as blazars (1717/2023), but the BL Lac or FSRQ nature of 573 of these blazars is still undetermined.
Aims. The aim of this study was two-fold. First, to significantly increase the number of blazar candidates from a search among the large number of Fermi/LAT 3FGL unassociated sources (case A). Second, to determine the BL Lac or FSRQ nature of the blazar candidates, including those determined as such in this work and the blazar candidates of uncertain type (BCU) that are already present in the 3FGL catalogue (case B).
Methods. For this purpose, multivariate classifiers – boosted decision trees and multilayer perceptron neural networks – were trained using samples of labelled sources with no caution flag from the 3FGL catalogue and carefully chosen discriminant parameters. The decisions of the classifiers were combined in order to obtain a high level of source identification along with well controlled numbers of expected false associations. Specifically for case A, dedicated classifications were generated for high (| b | >10◦) and low (| b | ≤10◦) galactic latitude sources; in addition, the application of classifiers to samples of sources with caution flag was considered separately, and specific performance metrics were estimated.
Results. We obtained a sample of 595 blazar candidates (high and low galactic latitude) among the unassociated sources of the 3FGL catalogue. We also obtained a sample of 509 BL Lacs and 295 FSRQs from the blazar candidates cited above and the BCUs of the 3FGL catalogue. The number of expected false associations is given for different samples of candidates. It is, in particular, notably low (~9/425) for the sample of high-latitude blazar candidates from case A.
Key words: gamma rays: galaxies / galaxies: active / BL Lacertae objects: general / methods: statistical / catalogs
Full Tables 5 and 7 are only available at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A86
© ESO, 2017
1. Introduction
The LAT telescope, on board the Fermi satellite, has been mapping the γ-ray sky (above 100 MeV) since 2008 with unprecedented angular resolution and sensitivity. In the recently published 3FGL catalogue (Acero et al. 2015), the Fermi/LAT collaboration reports the detection of γ-ray emission from 3034 sources above 4σ significance, obtained after four years of observations. Among these sources, 2025 have been associated1 with sources of well known types detected at other wavelengths. Most of them are active galactic nuclei (AGN) (1752), and, of particular interest here, blazars (1717), among which 660 are labelled as BL Lacertae objects (BL Lac), 484 as flat spectrum radio quasar (FSRQ) and 573 as blazars of undetermined type. The remaining fraction is composed of galactic sources, mainly pulsars (166) and also supernovæ remnants or pulsar wind nebulæ (85). Nevertheless, one third of the 3FGL catalogue sources are still of unknown nature because of the lack of firmly identified counterparts at other wavelengths. It is likely that a significant fraction of these unassociated sources are blazars, considering the incompleteness of counterpart catalogues, the existence of γ-ray sources with multiple candidate associations due to the large error localisation of the Fermi/LAT, and also a deficit seen at low values (| b | ≤10 ◦) in the latitude distribution of Fermi blazars (Acero et al. 2015).
The understanding of the blazar population and its evolution – for example the validity of the “blazar sequence” – and the determination of the extragalactic background light (EBL) are key topics in high-energy astrophysics (Sol et al. 2013) which are currently limited, observationally, by the small number of detected blazars. For this reason, several studies have addressed the question of the nature of the Fermi/LAT catalogues’ sources of unknown type. Two different approaches are generally used, one based on machine-learning classification methods and the other based on multiwavelength identifications or associations, they are described below.
The first approach is based on the exploitation of statistical differences imprinted in the Fermi/LAT catalogues, such as variability and spectral shape, between different populations of sources. Classifications are built with machine-learning algorithms, using given sets of discriminant parameters, to search for particular types of sources among the unassociated ones. Ackermann et al. (2012) identified AGN and pulsar candidates among the 630 unassociated sources of the 1FGL catalogue (Abdo et al. 2010b) with a classification built on the decisions of two individual classifiers based on random forest and logical regression multivariate methods. They proposed a list of 221 AGN and 134 pulsar candidates. To search for possible dark matter candidates in the sample of the 269 unassociated sources located at high galactic latitude (| b | >10 ◦) of the 2FGL catalogue (Nolan et al. 2012), Mirabal et al. (2012) focused on the outliers of their own AGN and pulsar classifications built with a random forest method. They proposed a list of 216 AGN candidates. Hassan et al. (2013) then identified 235 possible BL Lac or FSRQ candidates among the 269 blazars of unknown type in the 2FGL catalogue by combining the decisions of two classifiers based on support vector machine and random forest methods. In another study, using a combination of neural network and random forest methods, and introducing new strongly-discriminant parameters, Doert & Errando (2014) identified a sample of 231 AGN candidates among 576 unassociated sources of the 2FGL catalogue. Recently, Saz Parkinson et al. (2016) applied a random forest and a logistic regression algorithm to identify pulsar and AGN candidates among the unassociated sources in the 3FGL catalogue. They proposed a list of 334 pulsar candidates and 559 AGN candidates. Finally, Chiaro et al. (2016) applied a neural network to identify BL Lacs and FSRQs among the blazar candidates of uncertain type (BCU) in the 3FGL catalogue. They obtained a list of 314 BL Lac candidates and 113 FSRQ candidates.
Number of sources used to build each classifier and to derive its performance metrics.
The second approach consists of finding possible counterparts in different wavelength bands, beyond what was done by the Fermi/LAT collaboration for their public catalogues. Aside from determining the nature of the source, the better localisation of candidate counterparts simplifies more detailed identification efforts at other wavelengths. A first attempt by Massaro et al. (2011) used the assumption that blazars occupy a special position in the colour-colour diagram constructed with the first three filters of the WISE satellite (Wright et al. 2010). By building “blazar” regions with a selected sample of infrared blazars, and by comparing the distance of the unassociated sources in the colour space to these regions, one can identify candidates for blazar-like counterparts. This method has been improved several times and applied to the 2FGL catalogue (Massaro et al. 2012a,b, 2013b), thus providing lists of possible blazar counterparts. In Massaro et al. (2013b) the authors provide 149 infrared counterparts corresponding to 109 2FGL unassociated sources. There is, however, no estimate of the number of false associations, as the method is based only on a selected sample of blazars and does not consider the behaviour of other infrared source classes. Source contamination in searches for counterparts in infrared catalogues is illustrated in D’Abrusco et al. (2014). Other attempts have been made with non-parametric techniques, such as kernel density estimators, using additional information obtained in radio (Massaro et al. 2013a,c) or X-rays (Paggi et al. 2013), to identify potential blazar counterparts for a few tens of unassociated sources in the 2FGL catalogue. Finally, one can deal with unassociated sources individually, this is done in the study of Acero et al. (2013) for a limited sample of sources, by combining multiwavelength observations and analysing the spectral energy distributions of the sources.
The aim of this study is two-fold. First, to significantly increase the number of γ-ray blazar candidates from a search among the large number of Fermi/LAT 3FGL unassociated sources (case A). Second, to determine the nature (BL Lac or FSRQ) of the blazar candidates, including those determined as such in this work and those labelled as BCU (blazar candidates of uncertain type) in the 3FGL catalogue (case B). For each case, classifiers based on two different machine-learning algorithms were built using only parameters from the 3FGL catalogue and combined in order to increase the overall performance. Specifically for case A, those classifiers were trained separately for low (| b | ≤10 ◦) and high (| b | >10 ◦) galactic latitudes. Special attention was devoted to the estimation of their performance metrics. This paper is organised as follows. In Sect. 2 we describe the samples of sources used in this work, and also the selected sets of discriminant parameters. In Sect. 3 we present two selected machine-learning algorithms and their settings. In Sect. 4 we describe the training of the classifiers and performance evaluation. Results are then presented in Sect. 5 and discussed in Sect. 6.
2. Data samples and discriminant parameters
2.1. Data samples for classifier building
The aim of the first study (case A) is to identify blazar candidates among the unassociated sources of the 3FGL catalogue (Acero et al. 2015). Considering that at high galactic latitudes the unassociated sources are likely to be either blazars or pulsars (Mirabal et al. 2012), classifiers were built and tested using a sample of 1572 blazars (including BL Lacs, FSRQs and BCUs) and a sample of 134 pulsars, regardless of their galactic latitudes. On the other hand, as at low galactic latitudes the unassociated sources are likely to be blazars or any type of galactic sources, other classifiers were built and tested using the same sample of 1572 blazars and a sample of 183 galactic sources, corresponding to 134 pulsars, 34 pulsar wind nebulae (PWN) or supernova remnants (SNR), and also a few globular clusters and binaries. Only sources that have no caution flags2 in the 3FGL catalogue were considered. For each case, these samples were split into training and test samples (respectively 70% and 30%) following a procedure explained in Sect. 4.1. The test sample was used to determine the performance of the classifiers built with the training sample. In addition, a sample of identified or associated flagged sources3 was used only to estimate the performance of the classifiers specifically for flagged sources. The “high latitude” and “low latitude” classifiers were applied to unassociated sources with galactic latitudes | b | >10 ◦ and | b | ≤10 ◦, respectively. Numbers are summarised in Table 1.
The aim of the second study (case B) is to determine the nature (BL Lac or FSRQ) of blazar candidates in the 3FGL catalogue for which this information is not known. In this case classifiers were built and tested using a sample of 638 sources labelled as BL Lacs and a sample of 448 sources labelled as FSRQs in the 3FGL catalogue. Here also, only sources with no flag were considered to build and test the classifiers. As the flagged sources were few in number (22 and 36, for BL Lacs and FSRQs respectively), it was not possible to derive a reliable estimation of their performance when applied to flagged sources. For this reason, classifiers were applied only to the sample of blazar candidates of unknown type and with no flag (486 BCUs from the 3FGL catalogue and also the blazar candidates resulting from the case A study). Numbers are summarised in Table 1.
2.2. Discriminant parameters
To distinguish between blazars and other source classes (case A), two types of parameters appear to be particularly powerful. First, those quantifying the variability of the sources, which is a distinguishing feature of blazars over month-long time scales. And second, spectral parameters, as blazar spectra are generally well adjusted by a simple power law or a log parabola, whereas pulsars, for example, generally show a curved spectrum typically well adjusted by a broken power law or a power law with an energy cut-off. With this in mind, we reviewed the available parameters in the 3FGL catalogue and also examined those already used in previous studies (Ackermann et al. 2012; Ferrara et al. 2012; Mirabal et al. 2012; Doert & Errando 2014; Saz Parkinson et al. 2016). We finally selected six discriminant parameters, considering individually the increase of separation power and the stability that they provide to the classifiers. Five of these parameters have been used in previous studies: , defined as σc/σ where σc is the significance of the curvature and σ is the detection significance (Doert & Errando 2014); the normalised variability, called
, given by the ratio between the index variability TS and σ (Doert & Errando 2014); and the hardness ratios4HR23 and HR34 as well as their difference HR23−HR34 (Ackermann et al. 2012). We note that we chose to discard the hardness ratios HR12 and HR45 in our selection for the lack of control of their discriminant power5. Additionally, we introduced a new parameter, called λ, defined as the ratio between the spectral index of the preferred hypothesis and the spectral index of the power law hypothesis, called γ. Although for only 17% of sources in the 3FGL catalogue an alternative hypothesis is preferred over a power law, this ratio increases to 76% for pulsars while it is only 9% for blazars. The distribution of λ (when different from 1) shows an interesting separation power for blazars and pulsars, see for example Fig. 1a. A selection of scatter plots is shown in Fig. 1 for the selected set of discriminant parameters, considering the different source samples.
![]() |
Fig. 1 Scatter plots for selected couples of discriminant parameters. Blazars are represented with blue circles, sources belonging to our Galaxy (except pulsars) with downward-pointing red triangles, pulsars with upward-pointing green triangles and unassociated sources with black dots. |
![]() |
Fig. 2 Scatter plots for selected couples of discriminant parameters. BL Lacs are represented with blue squares, FSRQs with red circles and blazars of uncertain type with black dots. |
The selection of a set of BL Lac/FSRQ discriminant parameters (case B) follows a similar approach. The photon index γ, the pivot energy Ep (which is somewhat correlated to the position of the high energy peak) and the normalised variability were selected (Fig. 2 shows three scatter plots illustrating strong separation power). It is indeed shown in the Fermi/LAT 3LAC catalogue6 (Ackermann et al. 2015), that FSRQs tend to have softer spectra than BL Lacs, that their high energy peaks tend to be located at lower energies, and that they tend to show stronger variability. These parameters were also used in a similar study applied to the 2FGL catalogue by Hassan et al. (2013). The set of six parameters selected above for the search of blazar candidates among the unassociated 3FGL sources was also investigated. In addition to
which was already selected, the hardness ratios HR23 and HR34 were also chosen. The other parameters were discarded, as they showed poor BL Lac/FSRQ separation power.
3. Binary classifications based on machine-learning algorithms
For this work, several machine-learning algorithms were tested in order to identify blazar candidates among the 3FGL unassociated sources (case A) and also to determine the BL Lac or FSRQ nature of blazars of unknown type (case B). Using the Toolkit for Multivariate Data Analysis (TMVA) package (Hoecker et al. 2007) it quickly appeared that, for a given set of discriminant parameters, methods based on random forests, neural networks, support vector machines and boosted decision trees could reach comparable performance with very little tuning. The choice was made to use two of these methods corresponding to different philosophies, the boosted decision trees (BDT) and a multilayer perceptron (MLP) neural network. In order to reduce the false association rate, the decisions of both classifiers were combined, then used to tag a source only if both classifiers agree on its nature.
The BDT machine-learning algorithm is based on decision trees, a classifier structured on repeated yes/no decisions designed to separate “positive” and “negative” classes of events. Thereby, the phase space of the discriminant parameters is split into two different regions. The boosting algorithm, here AdaBoost (Freund & Schapire 1996), generates a forest of weak decision trees and combines them to provide a final strong decision. At each step, misclassified events are given an increasing weight. Then, the generation of the following tree is done with these weighted events allowing the tree to become specialised on these difficult cases. At the end of the boosting phase, new events can be processed by the forest of trees. All decisions are then combined to give a weighted response according to the specialisation of the trees. Preliminary tests, performed in order to assess the stability of BDT classifiers, have shown that similar performance was reached for a large range of BDT settings. Considering this we decided to use the same settings for both case A and case B studies, with values relatively close to those of the TMVA BDT default. Thus, a large forest of short trees (ntrees = 400, depth = 3) was generated with a learning rate of 0.2. The learning algorithm differs slightly from the original AdaBoost: before the generation of a decision tree, during the boosting phase, the events of the training samples are selected n times according to a given probability following a Poisson law of parameter 0.8 (UseBaggedBoost = true, BaggedSampleFraction = 0.8).
Neural networks methods are based on artificial neurons. It is possible to linearly separate two populations of events by building a binary classifier with a single neuron. The latter is composed of as many inputs as there are discriminant parameters and one output describing the nature of the events. To each input is associated a weight7. Inside the neuron, using the weights, a linear combination of the discriminant parameters is formed and then used as input for a transfer function which gives the output value of the neuron. It is then possible to find the best values of the weights allowing to get the minimum rate of misclassified events by using a feedback process with the training sample. Once this phase is finished, unknown events can be classified. To tackle more complex problems, with non-linear separations between classes of events, a possible solution is to use a multilayer perceptron neural network (Rumelhart et al. 1986). The latter is composed of at least one layer of neurons, called a hidden layer, located between the input layer (made of as much single-input neurons as there are discriminant parameters) and the output layer (made of a single neuron). Additionally, each neuron is allowed to have direct connections with only the neurons of the following layer. The same procedure used for a single neuron is followed to adjust the weights. As for the BDT classifiers, we found out that similar performance can be reached with a large range of MLP settings. We decided to use the same settings for both case A and B studies. We set the MLP architecture to a single hidden-layer composed of Nvar + 10 neurons8 and we used the back-propagation algorithm to find the minimum of the error function. Following the suggestion of Hoecker et al. (2007), the input variables were normalised between −1 and + 1 for the neural network. Finally, as the positive and negative samples have different sizes, we normalised the events in order to have samples with identical sizes9 (NormMode = EqualNumEvents).
4. Training of classifiers and performance evaluation
4.1. Splitting labelled sources into training and test samples
A standard practice to build and evaluate a classifier is to create training and test samples by randomly selecting, for example, 70% and 30% of each group of labelled events (here identified or associated sources). The training sample is used to build the classifier, and the test sample to determine the performance metrics. This random split is generally a good choice. However in this work we had to handle with small data sets (sometimes composed of subsets of sources, e.g. the galactic source sample for the case A study with 134 pulsars, 34 SNR or PWN, and 15 other galactic sources). As shown by Brain & Webb (1999) this implies that the variance of classifiers corresponding to different randomly selected training subsamples is likely to be important, leading to performance which could be significantly mis-estimated.
To minimise such a mis-estimation, we characterised the average performance of the BDT and MLP classifiers (with respect to a large number of random splits of labelled sources into training and test samples), and selected a single split which provides a pair of classifiers with performance as close as possible to this average behaviour. To do that, we performed 100 iterations of the following sequence:
-
1.
random split of the labelled samples in training (70%) and test(30%) samples;
-
2.
training of the BDT and MLP methods using the same training sample;
-
3.
performance evaluation for BDT and MLP using the same test sample.
The receiver operating characteristic (ROC) curves10 obtained for these 100 splittings are shown for case A (low and high galactic latitude) and B studies in Fig. 3. In each case, the training/test split which provides the performance closest to the average behaviour was selected (χ2 minimisation).
![]() |
Fig. 3 ROC curves corresponding to 100 random splittings of the samples of labelled sources (with no flag) used for classifier building. Performance for sources with no flag were estimated using the test samples (green curves). Specific performance for flagged sources were estimated using the samples of labelled sources with a flag (red curves). The left and right columns show respectively the results for the BDT and MLP classifiers. Results for case A high galactic latitude, case A low galactic latitude and case B are shown in rows a), b), and c), respectively. |
Performance summary for the classifications of cases A and B estimated on the test samples.
4.2. Cutoff determination on the training sample
The procedure explained above provided a pair of BDT and MLP classifiers for each study, along with a training/test split of the sample of labelled sources. To determine the optimal cutoff (ζ⋆) in the distribution of the score (ζ) generated by each classifier, we used a ten-fold cross validation method on the training sample, following the sequence:
-
1.
splitting of the training sample in ten equal-size subsamples;
-
2.
training of the BDT and MLP classifiers on nine subsamples and application on the remaining 10th;
-
3.
iteration over the ten subsamples until all the subsamples were tested;
-
4.
building of the BDT and MLP ROC curves on all the ten tested subsamples of the training sample;
-
5.
determination of the
and
values considering a criterium defined below.
A single criterium which ensures a low rate of false positives along with a relatively high rate of true positives was used for all the studies (case A and B). Our choice was to consider as cutoffs the values and
which provide for each classifier a false positive rate of 10%. Consequently, for case B, two different cutoff values were obtained for the search of BL Lacs or FSRQs among blazars (subsequently referred to as the BL Lacs against FSRQs or the FSRQs against BL Lacs studies, respectively). All the cutoffs are summarised in Table 2.
4.3. Performance metrics
![]() |
Fig. 4 ROC curves for classifiers used in case A high galactic latitude a), case A low galactic latitude b) and case B c). In each case the left column shows the ROC curves (green for BDT, blue for MLP) obtained with the training sample using the ten-fold cross-validation method, as explained in Sect. 4.2. The right column shows the ROC curves obtained when applying classifiers to the test sample (solid line) or to the sample of labelled sources with flag (dashed lines). |
Performance summary for the classifications of case A estimated on the flagged source samples.
Figure 4 shows the ROC curves for each study and each classifier, first determined using the corresponding training sample (with the ten-fold cross-validation method presented in Sect. 4.2) and then using the test sample. For the performance metrics evaluation, classifiers were applied to the test samples. We then used the cutoffs described above. Combining the outputs of the BDT and MLP classifiers, we obtained a true positive rate of 95.6% and a false positive rate of 7.3% for the blazars against pulsars study (case A, to be applied to high galactic latitude sources). For the blazars against galactic sources study (case A, to be applied to low galactic latitude sources) we obtained slightly lower performance, with a true positive rate of 87.1% and a false positive rate of 9.1%. This loss of classifier performance is exclusively due to the inclusion of all galactic sources (in addition to pulsars) to the initial training sample. For case B, similar performances were obtained for the BL Lacs against FSRQs and the FSRQs against BL Lacs studies, with true positive rates of 83.9% and 84.4% and false positive rates of 8.9% and 10.9%, respectively. All the true and false positive rates for the BDT and MLP classifiers (individual and combined) are summarised in Table 2.
As shown in Figs. 4a and b (also visible in Figs. 3a and b), the ROC curves obtained when applying the case A classifiers to a sample of flagged sources are significantly different to the ones obtained to the test sample. Consequently, considering the cutoff values obtained in Sect. 4.2, we used the samples of flagged sources to determine specific performance metrics for these categories of sources. Combining the outputs of the BDT and MLP classifiers, we obtained true and false positive rates of 88.9% and 18.8% respectively for the blazars against pulsars study, and true and false positive rates of 81.4% and 28.0% respectively for the blazars against galactic sources study. In both cases, as compared to the performance obtained on non-flagged sources, the true positive rates are slightly reduced (by ~7%) whereas the false positive rates significantly increase by a factor ~2.6–3. The performance metrics for flagged sources are summarised in Table 3.
5. Results
The results presented below were obtained by combining the BDT and MLP decisions for each study.
For the blazars against pulsars study (case A, to be applied to high galactic latitude sources), the classifiers were applied to 531 unassociated sources (422 not flagged, 109 flagged) with | b | >10 ◦. This results in 425 blazar candidates (345 not flagged, 80 flagged) with the number of false associations estimated to ~9.3 (4.8 not flagged and 4.5 flagged). For the blazars against galactic sources study (case A, to be applied to low galactic latitude sources), the classifiers were applied to 416 unassociated sources (169 not flagged, 247 flagged) with | b | ≤10 ◦. This results in 72 blazar candidates among the 169 unassociated sources with no flag, with the number of false associations estimated to be approximately nine. In addition we obtained 98 blazar candidates among the 247 unassociated sources with a flag, but this sample is dominated by false associations, which are estimated to be ~54. Results are summarised in Table 4 and a short sample of sources is shown in Table 5.
Summary of results obtained when applying the classifiers to the high and low galactic latitude unassociated sources.
For case B, the classifiers were applied to 903 blazar candidates (only sources with no flag were considered), 486 being labelled as BCU in the 3FGL catalogue and 417 being labelled as blazar candidates in our case A study. From this we obtained a list of 509 BL Lac candidates with an estimated number of ~29 false associations and a list of 295 FSRQ candidates with an estimated number of ~70 false associations, hence leaving 99 blazars with uncertain type11. Details are given in Table 6 and a short sample of sources is shown on Table 7.
The lists of blazar candidates obtained in this work are available at https://unidgamma.in2p3.fr in FITS format.
6. Discussion and conclusions
The work presented here is in the continuity of previous studies (Ackermann et al. 2012; Mirabal et al. 2012; Doert & Errando 2014; Hassan et al. 2013) which used machine-learning algorithms based on parameters from different Fermi/LAT catalogues (1FGL, 2FGL) to address the question of the nature (blazar or other) of unassociated sources or the nature (BL Lac or FSRQ) of blazars whose type is undetermined12. The specificity of this work, beyond the fact that it deals with the recently published 3FGL catalogue, is that it shows how performance of classifiers differ for flagged or non-flagged sources, and it provides for each list of candidates an estimation of the number of false associations.
This study provides a list of 497 blazar candidates, with an expected number of false associations ~18 (not including the 98 low galactic latitude flagged candidates, for which we expect a high number of false associations). This represents a substantial contribution to the knowledge of the γ-ray emitting blazars population, and complements the population of 1559 blazars in the 3LAC catalogue.
Example illustrating the structure of tables provided in https://unidgamma.in2p3.fr as an output of the case A study.
Similarly to our case A study, Saz Parkinson et al. (2016) in a recently published paper tackle the question of the nature of the 3FGL unassociated sources. Their work is based on a combination of a random forest and a logistic regression method, trained using a set of nine discriminant parameters to separate samples of well identified blazars and pulsars. Among their selected parameters five have a corresponding parameter in our study with the same physical content, but in our case two were corrected to reduce the flux dependency of their separation power ( and
, following a prescription of Doert & Errando (2014)); their remaining parameters (HR12 and HR45) were discarded in our work as it is likely that they introduce biases in the performance of classifiers, specially when applied to low flux sources (see Sect. 2.2). In addition, we note that the effect on the classifier behaviour of flagged sources (present in their training and test sample or in the sample of unassociated sources) was not taken into account. Also, the sample of sources labelled as SNR or PWN was not used for their classifier training while this kind of sources represent a non-negligible fraction of galactic sources. Applied to the set of unassociated sources, their classifiers give a list of 559 blazar candidates, with no indication of the expected number of false associations13. Setting aside the sample of low galactic latitude sources (dominated by flagged sources, potentially including SNR and PWN), they have 481 sources with galactic latitude | b | >10 ◦, 444 (~90%) being also in our corresponding sample of 497 candidates. We note however that the difference (~10%) is much higher than our expected number of false associations, which is ~18.
Concerning the list of BL Lac and FSRQ candidates resulting from this work (case B), we note a clear dominance of BL Lacs, which represent ~63%. This is close to the BL Lac dominance already observed in the 3LAC catalogue (~59%)14. Putting together the lists of BL Lacs and FSRQs from 3LAC catalogue and from this work, we obtain a sample of 1113 BL Lacs and 709 FSRQs. BL Lacs represent then ~61%.
Summary of results obtained when applying the classifiers to the BCUs and to the blazar candidates from case A.
In addition, an interesting comparison can be made between the population of BL Lacs and FSRQs of the 3LAC catalogue and the population of our BL Lac and FSRQ candidates in terms of the position of their synchrotron peak frequencies. For that, we considered only our candidates which were initially labelled as BCUs because only those have available information about the synchrotron peak frequencies in the 3FGL catalogue.
Results of the BL Lac/FSRQ classifications (case B) for the first ten BCUs with no flag.
Using the HSP, ISP and LSP definitions of Ackermann et al. (2015), corresponding respectively to high-synchrotron-peaked, intermediate-synchrotron-peaked and low-synchrotron-peaked, we note that our population of BL Lac candidates is dominated by HSP (46% HSP) as it is the case for the BL Lacs in the 3LAC catalogue (43% HSP). Similarly, our population of FSRQ candidates and the FSRQs in the 3LAC catalogue are both dominated by LSP (78% and 88%, respectively).
The lists of BL Lac and FSRQ candidates resulting from this work can be compared to those recently obtained by Chiaro et al. (2016). Using a single classifier (MLP) built only from variability features to separate BL Lacs and FSRQs, they obtained interesting performance for BL Lacs, with true and false positive rates of ~84% and ~5% (compared to ~84% and ~9% in our case B study). For FSRQs they obtained true and false positive rates of ~69% and ~12% (~84% and ~11% in our study). Applied to the BCUs in the 3FGL catalogue, their classifier provides a list of 314 BL Lac and 113 FSRQ candidates. The comparison with our corresponding 295 BL Lac candidates shows a good agreement, as ~91% of our candidates are seen also as BL Lac by Chiaro et al. (2016), ~6% are still undetermined and only ~3% obtain an FSRQ label. This ~3% represent approximately nine sources, which is close to our expected number of false associations, ~13. A poorer agreement is found for the FSRQ candidates. Among our 146 FSRQ candidates, only 92 are seen as FSRQs by Chiaro et al. (2016), while 23 are seen as BL Lacs and 31 remain of undetermined type. Interestingly, considering the distribution of the normalised variability , which carries in our case B study the information on temporal variability, the 23 sources for which we don’t find agreement with Chiaro et al. (2016) are located in a region corresponding to the overlap between BL Lacs and FSRQs. However, considering different combinations of our selected spectral parameters, these 23 sources appear clearly as being preferentially FSRQs than BL Lacs. This illustrates the interest of taking into account spectral parameters for BL Lac/FSRQ separation purposes.
Finally, an interesting validation of the quality of our results is provided by a recent campaign of spectroscopic observations performed by Álvarez Crespo et al. (2016a and 2016b). They measured with different telescopes the optical spectra of 60 γ-ray blazar candidates selected on the basis of their IR colours or their low radio frequency spectra and belonging to different Fermi/LAT catalogues (principally BCUs or potential counterparts for unassociated sources). Their list contains five unassociated sources and 26 BCUs with no flag in the 3FGL catalogue. Our case B study found a high-confidence classification for 27 out of these 31 sources as BL Lacs or FSRQs, 25 of which are spectroscopically confirmed by Álvarez Crespo et al. (2016a and 2016b). We note that one of the two remaining sources is WISE J014935.28+860115.4, which shows an optical spectrum dominated by the host galaxy and is identified as BL Lac/galaxy by Álvarez Crespo et al. (2016a). The other is WISEA J122127.20-062847.8, and is not clearly established as the correct counterpart of the γ-ray source 3FGL J1221.5–0632 (Álvarez Crespo et al. 2016b; Massaro et al. 2013b).
This study contributes significantly to increase and better constrain the sample of γ-ray blazars, based on the γ-ray detections performed by Fermi/LAT in four years of observation. We expect that it will trigger multiwavelength follow-ups to assert the veracity of the proposed associations. Additionally, the blazar candidate samples might be of particular interest for contemporary very high energy γ-ray experiments using the imaging atmospheric Cherenkov technique such as H.E.S.S., MAGIC, and VERITAS, and later for the next generation of arrays currently under construction by the Cherenkov Telescope Array (CTA) Consortium. At present, population studies of very high energy blazars are indeed limited by the small number of detected sources (~70), which is strongly dominated by BL Lac objects.
The flags in the 3FGL catalogue indicate that a possible problem arose during the analysis of the γ-ray sources (Acero et al. 2015).
We use the definition of hardness ratio given in Ackermann et al. (2012) which is , where Φi is the integral flux in the energy band i and ⟨ Ei ⟩ is the mean energy of the band.
When a source is not detected in one of the five energy bands provided in the 3FGL catalogue, a 2σ upper limit is used by Acero et al. (2015) instead of a flux measurement, leading to a shift of the hardness ratio determination. This is in particular the case of parameters HR12 and especially HR45, that have the bigger fractions of upper limits.
A ROC curve illustrates the performance of a classifier as its score threshold varies, representing the true positive rate against the false positive rate (Fawcett 2006).
It is not straightforward to compare the list of candidates provided by studies based on different Fermi/LAT catalogues (1FGL, 2FGL, 3FGL), the number of unassociated sources and the number of blazars with undetermined type being significantly different from one catalogue to another. To keep track of previous works we will indicate in Table 5 those of our candidates that have been proposed elsewhere.
We note also the continuously increasing fraction of BL Lacs among the blazars of known type in the EGRET/Fermi-LAT energy range. From ~25% in the Third EGRET Catalog (Hartman et al. 1999), it has increased to ~50%, ~56% and ~59%, respectively in the 1LAC (Abdo et al. 2010a), 2LAC (Ackermann et al. 2011) and 3LAC (Ackermann et al. 2015) catalogues. Such an evolution is probably the reflect of the sensitivity improvement, specially in the GeV domain, first when passing from EGRET to Fermi-LAT and second due to the evolution of the analyses methods used in the different Fermi-LAT AGN catalogues.
Acknowledgments
We thank Catherine Boisson (LUTH, Observatoire de Paris) and Arache Djannati-Ataï (APC, IN2P3/CNRS) for useful discussions at different levels of this work. We also thank the referee for useful comments and suggestions that led to improvements in the manuscript. We acknowledge the financial support of the APC laboratory. This study used TMVA15 (Hoecker et al. 2007): an open-source toolkit for multivariate data analysis. STILTS16 (Taylor 2006) was used to manipulate tabular data and to cross match catalogues.
References
- Abdo, A. A., Ackermann, M., Ajello, M., et al. 2010a, ApJ, 715, 429 [NASA ADS] [CrossRef] [Google Scholar]
- Abdo, A. A., Ackermann, M., Ajello, M., et al. 2010b, ApJS, 188, 405 [NASA ADS] [CrossRef] [Google Scholar]
- Acero, F., Donato, D., Ojha, R., et al. 2013, ApJ, 779, 133 [NASA ADS] [CrossRef] [Google Scholar]
- Acero, F., Ackermann, M., Ajello, M., et al. 2015, ApJS, 218, 23 [Google Scholar]
- Ackermann, M., Ajello, M., Allafort, A., et al. 2011, ApJ, 743, 171 [NASA ADS] [CrossRef] [Google Scholar]
- Ackermann, M., Ajello, M., Allafort, A., et al. 2012, ApJ, 753, 83 [NASA ADS] [CrossRef] [Google Scholar]
- Ackermann, M., Ajello, M., Atwood, W. B., et al. 2015, ApJ, 810, 14 [NASA ADS] [CrossRef] [Google Scholar]
- Álvarez Crespo, N., Masetti, N., Ricci, F., et al. 2016a, AJ, 151, 32 [NASA ADS] [CrossRef] [Google Scholar]
- Álvarez Crespo, N., Massaro, F., Milisavljevic, D., et al. 2016b, AJ, 151, 95 [NASA ADS] [CrossRef] [Google Scholar]
- Brain, D., & Webb, J. I. 1999, in Proc. 4th Australian Knowledge Acquisition Workshop, Sydney, NSW, 117 [Google Scholar]
- Chiaro, G., Salvetti, D., La Mura, G., et al. 2016, MNRAS, 462, 3180 [NASA ADS] [CrossRef] [Google Scholar]
- D’Abrusco, R., Massaro, F., Paggi, A., et al. 2014, ApJS, 215, 14 [NASA ADS] [CrossRef] [Google Scholar]
- Doert, M., & Errando, M. 2014, ApJ, 782, 41 [NASA ADS] [CrossRef] [Google Scholar]
- Fawcett, T. 2006, Pattern Recogn. Lett., 27, 861 [CrossRef] [Google Scholar]
- Ferrara, E. C., Ojha, R., Monzani, M. E., & Omodei, N. 2012, in 2012 Fermi & Jansky Proc. - eConf C1111101 [Google Scholar]
- Freund, Y., & Schapire, R. E. 1996, in Machine Learning, Proc. 13th Conf., 148 [Google Scholar]
- Hartman, R. C., Bertsch, D. L., Bloom, S. D., et al. 1999, ApJS, 123, 79 [NASA ADS] [CrossRef] [Google Scholar]
- Hassan, T., Mirabal, N., Contreras, J. L., & Oya, I. 2013, MNRAS, 428, 220 [NASA ADS] [CrossRef] [Google Scholar]
- Hoecker, A., Speckmayer, P., Stelzer, J., et al. 2007, ArXiv e-prints [arXiv:physics/0703039] [Google Scholar]
- Massaro, F., D’Abrusco, R., Ajello, M., Grindlay, J. E., & Smith, H. A. 2011, ApJ, 740, L48 [NASA ADS] [CrossRef] [Google Scholar]
- Massaro, F., D’Abrusco, R., Tosti, G., et al. 2012a, ApJ, 750, 138 [NASA ADS] [CrossRef] [Google Scholar]
- Massaro, F., D’Abrusco, R., Tosti, G., et al. 2012b, ApJ, 752, 61 [NASA ADS] [CrossRef] [Google Scholar]
- Massaro, F., D’Abrusco, R., Giroletti, M., et al. 2013a, ApJS, 207, 4 [NASA ADS] [CrossRef] [Google Scholar]
- Massaro, F., D’Abrusco, R., Paggi, A., et al. 2013b, ApJS, 206, 13 [NASA ADS] [CrossRef] [Google Scholar]
- Massaro, F., D’Abrusco, R., Paggi, A., et al. 2013c, ApJS, 209, 10 [NASA ADS] [CrossRef] [Google Scholar]
- Mirabal, N., Frías-Martinez, V., Hassan, T., & Frías-Martinez, E. 2012, MNRAS, 424 [Google Scholar]
- Nolan, P. L., Abdo, A. A., Ackermann, M., et al. 2012, ApJS, 199, 31 [NASA ADS] [CrossRef] [Google Scholar]
- Paggi, A., Massaro, F., D’Abrusco, R., et al. 2013, ApJS, 209, 9 [NASA ADS] [CrossRef] [Google Scholar]
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. 1986, Nature, 323, 533 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
- Saz Parkinson, P. M., Xu, H., Yu, P. L. H., et al. 2016, ApJ, 820, 8 [NASA ADS] [CrossRef] [Google Scholar]
- Sol, H., Zech, A., Boisson, C., et al. 2013, Astropart. Phys., 43, 215 [NASA ADS] [CrossRef] [Google Scholar]
- Taylor, M. B. 2006, in Astronomical Data Analysis Software and Systems XV, eds. C. Gabriel, C. Arviset, D. Ponz, & S. Enrique, ASP Conf. Ser., 351, 666 [Google Scholar]
- Wright, E. L., Eisenhardt, P. R. M., Mainzer, A. K., et al. 2010, AJ, 140, 1868 [Google Scholar]
All Tables
Number of sources used to build each classifier and to derive its performance metrics.
Performance summary for the classifications of cases A and B estimated on the test samples.
Performance summary for the classifications of case A estimated on the flagged source samples.
Summary of results obtained when applying the classifiers to the high and low galactic latitude unassociated sources.
Example illustrating the structure of tables provided in https://unidgamma.in2p3.fr as an output of the case A study.
Summary of results obtained when applying the classifiers to the BCUs and to the blazar candidates from case A.
Results of the BL Lac/FSRQ classifications (case B) for the first ten BCUs with no flag.
All Figures
![]() |
Fig. 1 Scatter plots for selected couples of discriminant parameters. Blazars are represented with blue circles, sources belonging to our Galaxy (except pulsars) with downward-pointing red triangles, pulsars with upward-pointing green triangles and unassociated sources with black dots. |
In the text |
![]() |
Fig. 2 Scatter plots for selected couples of discriminant parameters. BL Lacs are represented with blue squares, FSRQs with red circles and blazars of uncertain type with black dots. |
In the text |
![]() |
Fig. 3 ROC curves corresponding to 100 random splittings of the samples of labelled sources (with no flag) used for classifier building. Performance for sources with no flag were estimated using the test samples (green curves). Specific performance for flagged sources were estimated using the samples of labelled sources with a flag (red curves). The left and right columns show respectively the results for the BDT and MLP classifiers. Results for case A high galactic latitude, case A low galactic latitude and case B are shown in rows a), b), and c), respectively. |
In the text |
![]() |
Fig. 4 ROC curves for classifiers used in case A high galactic latitude a), case A low galactic latitude b) and case B c). In each case the left column shows the ROC curves (green for BDT, blue for MLP) obtained with the training sample using the ten-fold cross-validation method, as explained in Sect. 4.2. The right column shows the ROC curves obtained when applying classifiers to the test sample (solid line) or to the sample of labelled sources with flag (dashed lines). |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.