A&A 431, 1177-1187 (2005)
B. Vollmer1,2 - E. Davoust3 - P. Dubois1 - F. Genova1 - F. Ochsenbein1 - W. van Driel4
1 - CDS, Observatoire astronomique de Strasbourg, UMR 7550, 11 rue de l'Université, 67000 Strasbourg, France
2 - Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, 53121 Bonn Germany
3 - UMR 5572, Observatoire Midi-Pyrénées, 14 avenue E. Belin, 31400 Toulouse, France
4 - Observatoire de Paris, Section de Meudon, GEPI, CNRS UMR 8111 and Université Paris 7, 5 place Jules Janssen, 92195 Meudon Cedex, France
Received 30 March 2004 / Accepted 19 October 2004
A new tool to extract cross-identifications and radio continuum spectra from radio catalogues contained in the VIZIER database of the CDS is presented. The code can handle radio surveys at different frequencies with different resolutions. It has been applied to 22 survey catalogues at 11 different frequencies containing a total of 3.5 million sources, which resulted in over 700 000 independent radio cross-identifications and 67 000 independent radio spectra with more than two frequency points. A validation of the code has been performed using independent radio cross-correlations from the literature. The mean error of the determined spectral index is 0.3. The code produces an output of variable format that can easily be adapted to the purpose of the user.
Key words: astronomical data bases: miscellaneous - radio continuum: general
The total number of records in radio-source catalogues has increased dramatically in the last two decades. Three major increases are due to R. Dixon's "Master Source List'' in the seventies (for the first version see Dixon 1970; for an error list see Andernach 1989), to the 87GB (Gregory & Condon 1991), GB6 (Gregory et al. 1996) and PMN (Wright et al. 1994, 1996; Griffith et al. 1994, 1995) surveys in 1991 and to the release of NVSS (Condon et al. 1998), FIRST (White et al. 1998) and WENSS (Rengelink et al. 1997) in 1996/1997 (see also Andernach 1999). To study the nature of these celestial objects detected at radio frequencies and to take full advantage of the huge amount of information contained in the catalogues, one has to know their spectral energy distribution (SED) over as large a frequency range as available. Since each catalogue (except Dixon's master source list) contains information at a single frequency, cross-identification between different catalogues is essential for the study of these sources.
Radio source cross-identifications in the centimeter to meter wavelength domain are particularly difficult to obtain, because the underlying radio surveys can have huge differences in sensitivity and/or spatial resolution. Since the resolution depends on the observed frequency and the telescope diameter, low frequency surveys made with a small single-dish telescope can have resolutions of up to tens of arcmins, while high-frequency observations with a large single dish telescope or an interferometer can have resolutions of a few arcsecs to tens of arcsecs.
On the other hand, the cross-identification of radio sources at different frequencies is made easier by using the fact that, in the vast majority of sources, the SED has a power-law distribution. The radiation mechanism is either synchrotron emission from relativistic electrons gyrating in a magnetic field, or emission of hot thermal electrons. Synchrotron emission produces a power law spectrum with a possible cut-off or reversal of the spectral index at low frequencies due to self-absorption or comptonization. The spectrum of thermal electrons is flat, at least in the optically thin domain. Over the frequency range in which the majority of radio surveys were made, the spectra are thus well defined by a power law, i.e. as a straight line in the (flux density)- (frequency) plot commonly used in radio continuum astronomy.
The cross-identification procedure assigns radio sources at different frequencies to one physical object. In this way accurate radio positions can be determined for these objects using the high frequency observations, their radiation processes can be studied, and a search for specific objects (data mining) becomes possible.
The VIZIER database (Ochsenbein et al. 2000) at the Centre de Données astronomique de Strasbourg (CDS) contains approximately 500 catalogues with radio data. Of these 500, about 70 catalogues are from systematic surveys (for a list of the major surveys see Andernach 1999). Using VIZIER, only a cone search (where a central position and a radius is used) is possible on all, or on a subset of, these catalogues. This procedure gives a list of all radio sources within the search region without any cross-identification.
Cross-identifications of radio sources within SIMBAD (Wenger et al. 2000) are essentially made on bibliographic grounds. Sources are only merged when a newly published radio catalogue gives alternate names, which were already known to SIMBAD. Thus there is a clear need to establish links between the radio catalogues (cross-identifications) and to include the sources in SIMBAD.
There are two main kinds of radio surveys: (i) single dish; and (ii) interferometric. Since interferometers cannot detect extended structures larger than the size corresponding to the angular resolution of the shortest baselines (ten/tens of arcmin for compact configurations), only structures of this size, or smaller, can be detected in sources of large extent. This makes the identification and unique spectral index determination complex for very extended sources, if they were observed with both single-dish and interferometer telescopes.
In addition, the observations can be divided into two subclasses: (i) systematic surveys; and (ii) surveys of a given source sample (e.g. SN remnants, H II regions, etc.). While the first category is more suitable for integration into VIZIER, the second one is easier to feed into SIMBAD. Out of the 500 catalogues listed in VIZIER when choosing "radio'', about 220 contain independent observations. Of these, about 80 represent systematic surveys and about 140 surveys of source samples.
The characteristics of radio surveys that have to be taken into account when searching for cross-identifications are
The catalogues can have different formats. For a given source one may find positions, position errors, a name, the peak and integrated flux densities and their errors, the major/minor axis, the position angle and various flags (confusion, extended source, warnings, etc.). The use of unified content descriptors (UCDs), which is the classification scheme in which all the astronomical parameters accessible in VIZIER are stored, will make the data access much easier in the future.
There are three categories of cross-identification between radio sources:
This paper is structured in the following way: Sect. 2 describes the method used to make the catalogue tables uniform, which is required because the catalogue table entries often differ. The code algorithm is discussed in Sect. 3, followed by a discussion on the detailed code structure (Sect. 4). The code performance is presented in Sect. 5 and the results are shown in Sect. 6. The code is validated by the comparison of our spectral indices with independent estimates from the literature (Sect. 7). The summary and conclusions are given in Sect. 8.
The main difficulty in treating simultaneously different radio catalogue tables lies in the variety of their table structures. All tables have entries for the name, coordinates, and the flux density of the sources. The name and coordinates can be in different epochs (B1950 or J2000). Other possible entries are:
SPECFIND uses its own standard for the catalogue entries. All coordinates are in J2000 and the source names are chosen to be in accordance with the SIMBAD nomenclature. Table 1 shows the list of SPECFIND catalogue entries.
Table 1: SPECFIND catalogue standard.The integrated flux densities are taken directly from the radio catalogues, except for the GB6, 87GB and MIYUN catalogues, which give only peak flux densities . In the GB6 and 87GB catalogues we take the peak flux density as the integrated flux density for sources smaller than 1.1 times the beamsize (3 5), and calculate the integrated flux density as
Table 2: Definitions of the flux density error when not taken directly from the catalogues.
The flags are based on those taken from the catalogues. For the moment, however, SPECFIND does not make use of the different flags.
SPECFIND is a hierarchical code. It classifies a source j as parent, sibling or child with respect to a given source i at different stages where stage 2 and 3 are refinements of stage 1.
While the parents and children at the same frequency are not used for the moment, the parents and children at different frequencies are taken together with the siblings to determine the radio spectrum.
The proximity search is done using the treecode-method routines written by J. Barnes
(see e.g. Barnes & Hut 1986). Since this code is well
documented, we will not describe the method nor the routines.
We adapted these routines to spherical geometry.
The angular distance between two sources, which are located at
is calculated in spherical geometry:
Since the treecode works in plane geometry, the polar caps ( ) have to be treated separately. The rest of the sky is divided into equal RA slices, which ensures an approximately equal number of sources per slice. For the next neighbour search within each slice, sources within a somewhat larger field than the slice are used. The RA offset between this field and the slice is taken to be three times the largest source extent of all catalogues.
Since the largest extent of all sources is , the maximum overlap is . At a declination of , a linear separation in RA of corresponds to an angular separation of . Proximity searches within both polar regions were also performed without using the treecode ( calculations), which showed no change of the result with respect to the treecode.
To check if source j is located within the ellipsoidal source extent of source i, the ARC projection is used for all catalogues. This projection, which preserves angular distances, is used for Schmidt plates (to first order) and in mapping with single-dish radio telescopes. Although the SIN4 projection is applied for interferometric maps, we used the ARC projection for these as well, since (i) the majority of our catalogues are single-dish measurements; (ii) the largest source extents are found in single dish observations because of their larger beamsize; and (iii) the ARC and SIN projections are similar. An additional search is included for sources that are located around RA = 000000.
This routine takes into account the flux densities of the sources at the same frequency. Since the sources are frequently point-like, i.e. their size fitted from the map equals the beamsize/resolution of the antenna used (unresolved sources), a source observed with a given beamsize is classified as a parent of another source observed with a smaller beamsize at the same frequency. Consequently, a source observed with a given beamsize is classified as a child of another source observed with a larger beamsize at the same frequency. A source i can have only one parent but several children from a different survey at the same frequency. Some or all children may be the same physical object - this can be verified with the help of the flux densities measured at the same frequency. The routine makes the following classification: a parent has a larger flux density and a larger extent/resolution, a sibling has an equal flux density within the errors and an equal extent/resolution within 25%, a child has a smaller flux density and a smaller extent/resolution.
Once the sources are joined in the way described in Sect. 4.2, the family dependences are verified, i.e. for a given source cross-checks are performed. These checks are performed for all source entries;
This is the most important routine and thus the heart of SPECFIND. It uses the method of the least absolute deviation to make a linear fit in the - plane, where is the frequency and is the flux density at frequency . This method is more robust against outlying points in a spectrum than a standard least-squares deviation () fit (see Press et al. 2002).
For this algorithm, the best way to find a maximum number of spectra without a too high risk of misidentifications is to set the flux density errors of all sources that are smaller than 20% of their flux density to this 20% value and multiplying flux density errors by a factor 1.5. In this way all catalogues have approximately the same relative error. Moreover, these relatively large errors can compensate for a not too strong flattening of the spectral index at low frequencies due to the synchrotron turnover or at high frequencies due to an increasing fraction of thermal emission.
The structure of the spectrum-finding algorithm is the following: for a given set of sources for which all family relations were determined, their flux measurements at different frequencies are grouped together into an array and sorted by frequency. If the number of different frequencies is greater than two, the spectrum-finding algorithm passes through the following steps:
In order to investigate the efficiency of the spectrum-finding algorithm, we inspected by eye the data of the sources for which no spectrum could be found. We optimised the code to find a maximum number of radio spectra with a relatively small number of misidentifications (see also Sect. 7).
In order to avoid using points that are too close to each other in frequency, and therefore not independent, the frequency intervals between the different points of the spectrum are checked. The routine calculates the frequency intervals and determines the ratio between the second largest and the largest frequency interval. If this ratio is smaller than 0.02, the spectrum is rejected.
Then, in order to avoid ambiguous radio sources of a given frequency, which are attributed to two distinct physical objects, the "center of mass'' coordinates are calculated for both objects, where the inverse of the survey resolution is used for the "mass''. The ambiguous source is then attributed solely to the object whose "center of mass'' position is nearest to the source position.
This routine ensures that if a source j fits the spectrum determined for source i (where source i is included), then source i also appears in the spectrum of source j. In this way it is ensured that a radio source belongs to only one single physical object.
In practice the spectrum-finding algorithm is too efficient. Through chance alignments, sources that were observed with a large beamsize or which have a large extent are sometimes identified as physically belonging together (via the radio spectrum). Thus it happens that, while a source j fits the spectrum determined for source i, source i is not included in the spectrum of source j. There may be several reasons for this: (i) the spectrum of source i is erroneous, (ii) the spectrum of source j is erroneous, (iii) the spectral index varies with frequency, (iv) the real errors on the flux densities of one of the sources are larger than . The task to make all spectra consistent is quite complicated, because all sources are interconnected via their siblings, which are interconnected via their own siblings, etc.
In order to decide which spectrum to take, in case of inconsistency between two
spectra, the following scheme is applied:
let be the spectral indices of source i and j.
In a final step the siblings of a given source are compared to the siblings of all its siblings. If there are less than three common siblings and both spectral indices are non zero and the numbers of siblings are different, the spectrum of the source with the smallest number of siblings is removed (i.e. the siblings at a different frequency than the source are removed). If this is not the case, both sources are complemented with the missing siblings.
At the end of the data processing for one subfield (RA slices and polar caps), the results are written in an ASCII file. Since all necessary information for all sources is stored in the code, the output format can be chosen freely and adapted to the user's purpose. For the moment we create two principal outputs: (i) a file with the information necessary to plot spectra; and (ii) a file that can be used as input for SIMBAD.
On a PC with 512 MB RAM and a frequency of 1.4 GHz, the whole data processing of all 3.5 million sources can be performed in less than 3 h, less than one quarter of which is needed to read the input files and to write the final ASCII files. The proximity search in one single slice is performed in 5-10 min. This relatively long time is due to the spherical geometry.
We have based our selection of radio source catalogues on a list of major surveys of discrete radio sources (see Table 1 of Andernach 1999). For the moment we have included the 22 largest of the 66 cited radio catalogues (Table 3). These catalogues were downloaded from the VIZIER database.
Table 3: SPECFIND catalogue entries.The 3C and 3CR catalogues were added for historical reasons. The GB6 and 87GB catalogues are not independent. In fact 87GB is based on a subset of the data used for GB6, which is more sensitive. The total number of sources is 3 488 352. The NVSS catalogue provides 50% of these entries. For this number of sources the sky is divided into 12 RA slices and the 2 polar caps ( ).
After completion of the data processing, SPECFIND found 757 894 independent associations, i.e. sources with at least one parent, sibling, or child. The number of independent spectra is 66 866, of which more than 90 % include an NVSS point. SPECFIND identified 5 objects with a spectral index . Only pulsars and relic sources in galaxy clusters show such steep spectra. The inspection of these objects by eye showed that the objects, including a FIRST source, are extended (jet/lobe structure) and/or confused. The ensemble of 66 866 independent spectra forms the basis for our further analysis. The percentage of sources, for which spectra could be identified by SPECFIND is listed in Col. 7 of Table 3.
The highest percentage of sources with detected spectra is 72% of all sources for the JVAS and B2 surveys. This is expected for the JVAS, because it is a survey of sources selected on the basis of their flat spectral index. The large surveys produce fewer spectra, because the other surveys do not cover the same region of the sky as deeply as the former. In the SUMSS survey only 1.8% of the source have spectra, because there is a lack of deep enough radio surveys in the southern hemisphere.
The distribution of the 66 866 spectral indices is shown in Fig. 1. The distribution peaks at , which is consistent with previous works (Vigotti et al. 1989; Kulkarni et al. 1990; Zhang et al. 2003). There is a wing towards positive spectral indices, which is most probably caused by sources with a flat spectrum due to thermal electrons.
The number of sources as a function of the number of independent frequency points in the radio spectra for independent sources is shown in Fig. 2. About 104 sources have 5 points and about 50 sources have 8 points. The number of sources decreases approximately exponentially between 3 and 6 points and falls off more rapidly for an even larger number of points.
The spectral index as a function of the flux density at 325 MHz (49 cm; WENSS) is shown in Fig. 3. As already seen in Fig. 1, most of the spectral indices have values around -0.9. Since the NVSS and WENSS surveys are by far the deepest surveys that cover the largest area on the sky (the whole northern hemisphere), the largest number of associations is found for sources in these two catalogues. Zhang et al. (2003) have correlated them and found 185 800 corresponding sources, which represents the maximum number of sources for which a spectrum can be found. We found only half of them, because we require at least 3 points for the spectrum.
The straight, almost horizontal edge of the distribution in the left part of the plot (marked as (a) in Fig. 3) is due to a selection effect. For these low flux density sources with a steep spectrum, SPECFIND found a source at 20 cm (NVSS) and 50 cm (WENSS), but none at 6 cm (GB6, BWE, 87GB, MITG, PMN), where the sensitivity of the surveys (20 mJy) is insufficient. At higher flux densities ( S325 > 300 mJy) sources from other catalogues with lower sensitivities are found. Most of the sources have spectral indices <-0.6(Fig. 1). This translates into a limiting sensitivity of 100 mJy at 325 MHz. The number of WENSS sources with flux densities that exceed this value is 68 732, close to the number of identified radio spectra. The vertical edge in the lower left part of the plot (marked as (b) in Fig. 3) is mainly due to the limiting flux density of the B3 survey.
If one wants to increase significantly the number of independent spectra with respect to the existing catalogues, a deep and extended survey at an independent wavelength, preferentially smaller than 10 cm, is needed.
In order to validate our algorithm, we compared the spectral indices determined by SPECFIND
with those given in various catalogues.
The radio catalogues in the VIZIER database that include spectral indices are listed in Table 5;
for the PMNS catalogue we calculated the spectral index from the data given
in that catalogue. The columns are: (1) catalogue name;
(2) the frequency used for the cross-identification in SPECFIND; (3) other frequencies for the
determination of the spectral index; (4) number of sources with spectral index.
The catalogues B3, MITG, PKS, PMNS-S, and FA87 use only two frequencies for the
determination of the spectral index, whereas the other catalogues use more than three
frequencies. The percentage of sources for which a spectrum could be identified
is listed in Col. 2 of Table 6. The high percentage of
radio spectra obtained validates the SPECFIND spectrum identification algorithm.
For the direct comparison of spectral indices we only chose indices for which
both methods used approximately the same frequency interval.
|Figure 1: Number distribution of spectral indices.|
|Open with DEXTER|
|Figure 2: The number of sources as a function of the number of independent points in the radio spectra.|
|Open with DEXTER|
|Figure 3: The spectral index as a function of the flux density at 325 MHz.|
|Open with DEXTER|
Only one pulsar spectrum could be identified by SPECFIND (Table 4). This is due to the small flux densities and steep spectral indices of the pulsar population listed in PULSARS.
A comparison between the spectral indices given in these catalogues and those found by SPECFIND are shown in Figs. 4-11.
In general, the scatter is smaller for smaller spectral indices, because there is no change in the spectral index in the frequency interval of interest. The increase of the scatter for large spectral indices (flatter spectra) in the AGN-QSO, FA87 and 1Jy catalogues (Figs. 6, 8 and 9) is due to the fact that these are radio sources whose spectra peak around 5 GHz. The determination of their spectral index therefore depends critically on the frequency interval used. These intervals are more precisely known for the other catalogues, which consequently show a smaller scatter.
The standard deviations of the difference between the spectral indices determined by SPECFIND and those determined in the catalogues (Table 5) are listed in Table 6. The best correlation (0.10) between the two is found for the B3 catalogue, and the worst (0.36) for the AGN-QSO catalogue. The low consistency of spectral indices in the AGN-QSO catalogue may be either due to the small frequency range used (4850/2700 MHz), to variability, or both. In general, the standard deviation is 0.3. Thus we conclude that the spectral indices determined by SPECFIND have an error of about 0.3.
Table 4: Pulsar spectrum identified by SPECFIND.
|Figure 4: MITG spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
|Figure 5: PMN-S spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
|Figure 6: AGN-QSO spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
|Figure 7: PKS spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
|Figure 8: FA87 spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
|Figure 9: 1Jy spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
|Figure 10: USSR spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
|Figure 11: B3 spectral index versus SPECFIND spectral index.|
|Open with DEXTER|
Table 5: Catalogues with spectral indices.
Table 6: Standard deviation of the spectral index difference.
SPECFIND is a very efficient tool to identify radio spectra using radio catalogues of different formats. SPECFIND can handle radio surveys of very different resolutions and sensitivities. It has been applied to 22 survey catalogues at 11 different frequencies containing a total of 3.5 million sources, leading to more than 700 000 independent radio cross-identifications and 67 000 independent radio spectra with more than two independent frequencies. The code was tested and its results validated by a comparison between the spectral indices found by SPECFIND and those determined by other authors. The determined spectral indices have an error of about . Negative spectral indices have smaller errors, while the error of positive spectral indices can be larger, mainly because of the occurrence of a peak in the spectrum. The code is quite rapid (less than 3 h running time on a standard PC for 3.5 million sources) and since it is written in C, it can be run on virtually all PCs with at least 512 MB RAM. It produces an output of variable format that can be adapted easily to the purpose of the user. A special output to enter the cross-identifications into SIMBAD has been developed. The code has been optimised to find a maximum number of spectra with a relatively small number of misidentifications. It represents thus a promising tool to extract radio spectra from a large sample of radio catalogues, like those stored in VIZIER. Although at present only 22 catalogue entries can be accessed by SPECFIND, this number will be increased in the future. The advantage of this procedure is that radio spectra and cross-identifications are established in advance, and can be stored in a "master list'' within VIZIER, allowing a fast and efficient access. We expect to make the cross-identifications available soon via the VIZIER database.
We would like to thank H. Andernach for very helpful discussions and O. V. Verkhodanov for his comment on ambiguous sources and "centers of mass''.