A&A 482, 483-498 (2008)
DOI: 10.1051/0004-6361:20079222

Cluster analyses of gigahertz-peaked spectrum sources with self-organising maps[*],[*]

I. Torniainen1 - M. Tornikoski1 - M. Turunen1 - M. Lainela2 - A. Lähteenmäki1 - T. Hovatta1 - M. G. Mingaliev3 - M. F. Aller4 - H. D. Aller4

1 - TKK Helsinki University of Technology, Metsähovi Radio Observatory, Metsähovintie 114, 02540 Kylmälä, Finland
2 - Tuorla Observatory, University of Turku, Väisäläntie 20, 21500 Piikkiö, Finland
3 - Special Astrophysical Observatory, Russian Academy of Sciences, Nizhnij Arkhyz, Karachaevo-Cherkesia, 369167 Russia
4 - Department of Astronomy, University of Michigan, Ann Arbor, MI, 48109, USA

Received 9 December 2007 / Accepted 24 January 2008

Context. Gigahertz-peaked spectrum (GPS) sources and high frequency peakers (HFPs) are among the smallest of active galactic nuclei currently believed to represent the earliest phases in the evolution of extragalactic radio sources. Recently there has been evidence of contamination by other types of radio sources among the GPS and HFP samples, but the confirmed GPS sources or HFPs also seem to form a very heterogeneous population.
Aims. We study the statistical clustering of the GPS sources and the HFPs by taking as many source parameters as possible to find homogeneous groups among the sources. We expect the clustering to give us insight into the physical parameters that play a role in different source populations.
Methods. We have collected a sample of 206 GPS sources and HFPs from the literature and gathered a massive database of various source properties, such as the redshift, the size, the polarization, the magnitudes, and the properties of the radio continuum. To visualize and to cluster these multidimensional data we used self-organising maps (SOM), which are neural networks trained by an unsupervised algorithm. We have classified the sources with an auxiliary classification to trace the locations of different types of radio continuum spectra on the map.
Results. The sources form distinctive clusters on the map, which is supported by the accordant organisation of the non-numerical parameters not used in the analysis, such as the radio morphology and the optical identification. Our results confirm that the blazars contaminating the GPS and the HFP samples are physically different from the genuine GPS sources and HFPs, and they should be excluded from the samples. The genuine GPS sources form various clusters, which indicates the existence of different subpopulations, besides the expected galaxy-quasar dualism.

Key words: galaxies: active - galaxies: quasars: general - radio continuum: galaxies

1 Introduction

The gigahertz-peaked spectrum (GPS) sources and the high frequency peakers (HFPs) are a heterogenous group of compact (linear size LS < 1 kpc) extragalactic objects. They are active galactic nuclei (AGN) with convex radio continuum spectra, peaking in the GHz frequencies (GPS sources) or higher (HFPs). They can be divided into two types: galaxies and quasars. The galaxy-type sources are found at lower redshifts ( 0.1 < z < 1, O'Dea et al. 1996) and are less variable than the quasar-type sources. They also have lower turnover frequencies than the GPS quasars and exhibit symmetric VLBI morphologies, whereas the quasars usually have complex or core-jet morphologies (Stanghellini et al. 1997,2001).

The currently favoured view is that the galaxy-type GPS sources and HFPs are intrinsically small due to their young age (e.g., Phillips & Mutel 1980,1982; Polatidis & Conway 2003) and that quasar-type sources are large-scale radio sources which appear small due to a projection effect (e.g., Stanghellini 2003). VLBI observations have revealed extended emission around some GPS sources - both galaxies and quasars. The galaxy-type sources may be explained by recurrent activity of the nucleus (e.g. Baum et al. 1990; Stanghellini et al. 1990). This was, nonetheless, found probable for only one source out of a sample of six GPS sources associated with extended emission (Stanghellini et al. 2005). The quasar-type core-jet sources associated with extended emission are most likely truly extended sources at such high redshifts that most of the large-scale structures are below detection limits (Stanghellini et al. 2005). However, there are also quasar-type GPS sources with symmetric VLBI morphology, and they may be intrinsically small and young sources.

Another explanation for the small size of the galaxy-type sources, i.e. confinement by a dense ambient medium, has been suggested by, e.g., Baum et al. (1990) and O'Dea et al. (1991). In this scenario, the source is old and has remained small in size due to external pressure that prevents the radio lobes from growing beyond the galactic center. Recently, there have not been any studies supporting this view.

In our previous papers we searched for high peaking GPS sources and studied some of the known ones. In the first paper (Tornikoski et al. 2001, hereafter Paper I) we identified several new southern high peaking sources and found variability in some of the known ones, and in the second paper (Torniainen et al. 2005, hereafter Paper II) we widened our study to the northern hemisphere and found mild to extreme variability in all the known GPS, CSS, or HFP sources monitored in Metsähovi. The change in the shape of the spectrum differed from source to source: a minority of sources maintained their convex shape and the peak frequency independent of the state of the activity, but for most of the sources the shape changed from flat to inverted as the activity increased. Majority of the sources in Paper II were quasars, and it was obvious that the quasar-type GPS samples were severly contaminated by blazars. This inspired us to study the possible contamination of GPS galaxies. For the third paper (Torniainen et al. 2007, hereafter Paper III), we collected a sample of 96 galaxy-type GPS sources and gathered as much radio data for them as possible. After studying their overall radio spectra, and the spectral and variability indices, we found that for only a third of the sources the GPS classification is well-grounded. For a third there are not enough data for firm conclutions, and a third of the sources were flat- or inverted-spectrum sources.

Recently, also Labiano et al. (2007a) produced a new master list of GPS sources, where some sources exhibiting strong radio variability had been excluded. This new list does not, however, include the findings of Papers II and III.

It has also been confirmed by other studies (e.g. Orienti et al. 2007; Tinti et al. 2005) that there are different populations among the GPS sources and the HFPs identified in the literature, some of them having truly constant convex spectral shape and some of them having only temporarily inverted shape of the spectrum.

Most of the GPS samples in the literature have been selected by combining datapoints from different catalogues originating from different epochs and picking up sources with convex radio continuum spectrum. This has been done without paying much attention on the effect of variability when using non-simultaneous datapoints. Also single epoch multifrequency observations or observations spanning only a couple of years have been used. These approaches have proved to generate very heterogeneous samples since a peaked spectrum can be caused by several different effects. On the basis of the sample contamination, the most severe cases are variable flat-spectrum sources observed when one flaring component dominates the spectrum and creates temporarily inverted spectrum that lasts even for months or years. There are also consistently convex-spectrum sources with high variability as well as sources with virtually no flux density variations. There are galaxies and quasars, compact symmetric objects (CSOs), and core-jet or complex VLBI morphologies. Some sources are detected in the X-rays or the $\gamma$-rays or both, some remain undetected. For the majority of the sources, there is a very limited amount of continuum observations in the radio band as well as in other parts of the spectrum, not to mention information on other source properties, e.g., the emission lines, the size, and the column densities.

There seems to be no clear and simple common factor present in our sample of genuine GPS sources (Paper II; Paper III): there are both quasars and galaxies, CSOs and core-jets, and variable and non-variable sources. Intrigued by this variety of objects classified as GPS sources, and craving for clarity about their physical nature, we wanted to take as many parameters as possible into account and run neural clustering analyses for a complete sample of GPS sources.

A self-organized map (SOM) is an unsupervised neural network, used, for example, for visualization of multidimensional data, classification and clustering. The algorithm tries to place the objects on a multidimensional map so that the Euclidian distance of the parameter vectors of similar objects is minimized. In astronomy, neural networks have been used mainly for classification of objects (e.g., Brett et al. 2004; Miller & Coe 1996; Rajaniemi & Mähönen 2002).

In Sect. 2, we present the sample and the collection and processing of the data. In Sect. 3, the SOM analyses are presented in detail. The results are presented and discussed in Sect. 4, and the conclusions are given in Sect. 5.

2 Sample and data

We collected a sample of 206 GPS and HFP sources for a detailed study of the underlying populations of various kinds of sources, both among all the GPS sources identified in the literature and the genuine GPS sources with constant shape and peak of the spectrum.

The sources in the sample and their references are listed in Table 1. Some sources from these papers were left out since there were no coordinates for them or it was likely that there would not be a sufficient number of parameters for them to be used in the analyses.

Table 1: The sources and their classification.

The parameters used in the analyses are listed together with their references in Table 2. The data for each source are available in electronic form at the CDS. Some of the parameters are described in more detail below.

2.1 Source size

The linear sizes have been collected from numerous references and they have been obtained by various instruments or VLBI networks and at various frequencies. Therefore the values are not perfectly comparable, but rather give some guidelines of the source size. In addition, in the original papers the sizes were calculated with very different values of cosmological parameters and hence were not comparable as such. Thus we recalculated the linear sizes with the latest estimates of cosmological parameters (H0=71 km s-1 Mpc-1, $\Omega _{\rm M} = 0.27, \Omega_{\rm vac} = 0.73$) using the javascript calculator created by Edward L. Wright[*] (Wright 2006).

For some sources, several different sizes were given in the literature. We selected the largest size obtained with VLBI, since when studying the compactness of the source, the largest observed size gives the most significant information. If there were sizes obtained with both VLBI and interplanetary scintillation (IPS) method (Jeyakumar et al. 2000), we selected the VLBI size for conformity, but accepted IPS sizes when there was no other information available. The scale of the sizes varied substantially so we used logarithm of the size in the analyses.

2.2 Magnitudes

The Gunn system r and i magnitudes from Stanghellini et al. (1993) were converted to Cousins R and I magnitudes using equations given by Schombert et al. (1990). For the sources with only one of the Gunn magnitudes, the Cousins magnitude was estimated using the mean value of r-R or i-I of the sources in Stanghellini et al. (1993). The brightest magnitude was chosen if there were several values for one source, except when a fainter value was simultaneous with the I magnitude observation.

2.3 Radio spectrum parameters

When calculating the parameters of the radio continuum spectra, we used the method developed for Paper III. To overcome the drawbacks of varying amount of data points at unevenly sampled frequency coverage, the frequency range between 0.05-360 GHz was divided to logarithmically equidistant intervals, chosen to be as wide as the fractional interval between 8 GHz and 10 GHz. The data at each interval was bound to the logarithmic centre of the interval. The median of the flux density and the fractional variability index (Var $_{\Delta S} = (S_{\max} - S_{\min})/S_{\min}$) were calculated for each of these data bins.

We calculated also three other variability indices which included the error estimates for the flux densities, but chose to use the above-mentioned quantity because the errors were not available for all the data in the CATS database, and thus the values would have not been consistent for all the sources and the data bins. The results were substantially the same regardless of the choice of the variability index.

The median flux density of each bin was used to model the shape of the spectra by fitting the following equation from Kovalev et al. (2000), rearranged by Dallacasa et al. (2000),

\log S = a - \sqrt{b^2+ (c \log \nu - d)^2)},
\end{displaymath} (1)

where S is the flux density at frequency $\nu$, and a, b, c, and d are the fit parameters.

In general, Eq. (1) models the spectra in this sample very well, but since it does not model any physical properties, we made adjustments to the fits of some sources, as described below.

Some of the sources in our sample have abundant data on both sides of the turnover, so that the flux density peak and the turnover frequency are easy to determine and there are no difficulties in interpreting the fits. However, there is a number of sources in which the fits from Eq. (1) do not represent the slope of the data accurately, and for those sources, picked up by visual examination, logarithmic linear fits were also applied. In some of these cases the flat top of the peak was not used for the linear fit in order to better model the declining slope.

The majority of the sources have insufficient data for determining both spectral indices reliably, or even at all. For some sources the optically thin part of the spectrum is available and the spectral index[*] $\alpha$ is calculated either with Eq. (1) or linearly. Some sources show only a very wide round top of the spectrum, and it is not feasible to fit spectral indices to these kind of data. Using Eq. (1), the slopes are calculated from extrapolated values as follows. When applicable, frequencies 0.1 MHz and 1 MHz are selected to represent the optically thick, and 100 GHz and 1000 GHz the optically thin part of the spectrum. These values, selected far from the turnover to make sure that the slope has levelled out, are substituted into Eq. (1) to get the respective, modelled flux density values, and the basic formula for the slope in the logarithmic scales is used to derive the spectral indices: $\alpha = (\log S_2 - \log S_1)/(\log \nu_2 - \log \nu_1)$. The source spectra with the applicable fits are presented in Figs. 7.

In determing the peak frequency, we used the value derived from the fit when it was applicable. For some sources the spectrum was so flat that the fit yielded peak frequencies far beyond the frequency range of the data. Then the peak frequency was determined visually or omitted depending on the shape of the spectrum. For several cases the peak frequency from the fit matched the start of the declining part of the spectra but there was no clear information on the rising part of the spectrum. In these cases we considered the peak frequency from the fit an upper (in some cases a lower) limit of the possible turnover.

The rest frame peak frequency was calculated and for the sources with no redshift information available we used a generic value of z = 1, which is close to the median value of our sample (0.93).

From the spectral indices, the turnover frequency, and the variability in the radio frequencies we derived other quantities to describe the shape of the radio spectrum. The symmetry of the spectrum was calculated as $\alpha_{\rm below} / (- \alpha_{\rm above})$. The curvature of the spectrum is defined as the change in the spectral indices over the spectrum, i.e. $\alpha_{\rm below} - \alpha_{\rm above}$. The width of the spectrum, FWHM in decades of frequency, was calculated from the fitted function by taking the difference of the frequencies below and above the turnover where the flux density was half of the highest value.

Because the amount of data varied from source to source and the data were not distributed evenly in the optically thick and thin parts of the spectrum, we wanted to put more weight on the spectrum parameters of sources with abundant radio data, and less on the sources with sparse data that probably did not describe the true shape of the spectrum. In order to calculate weighting factors for the spectrum indices $\alpha_{\rm below}$ and $\alpha_{\rm above}$, the number of empty and non-empty fitting data bins in the corresponding frequency intervals were defined as:
n = N(``non-empty data bins'') and
m = N(``empty data bins'').

Now the weighting factor $q'_{\rm b}$ for $\alpha_{\rm below}$ can be given as:

\begin{displaymath}q'_{\rm b} = 1 - [n + (\pi/4)*m]^{-1},
\end{displaymath} (2)

where the factor $\pi$/4 is used to reduce the significance of empty data bins. The final weighting factor $q_{\rm b}$ is then generated by normalizing $q'_{\rm b}$ to interval [0, 1] in respect to all the processed $\alpha_{\rm below}$ values. $q_{\rm a}$ for $\alpha_{\rm above}$ is calculated in an identical manner.

Weighting factors for the spectrum index derivatives, curvature, symmetry, and FWHM, were then approximated with geometric mean sqrt( $q_{\rm b} * q_{\rm a}$) where applicable. These factors were then applied to SOM training using weighting mask matrix.

The source power at 5 GHz was calculated using the median flux density of the databin around 5 GHz. The luminosity distance was calculated with the fundamental formula from e.g., Altschuler (1989), and the generic redshift of z = 1 was used for sources without redshift information.

2.4 Auxiliary classification

In addition to calculating numerical parameters to be used in the analyses, we used the radio spectral parameters for classifying of the spectra. The spectral classes and their criteria follow the approach of Paper III:

This classification is used to trace the locations of different types of spectra on the maps.

Table 2: Parameters and their references used in the analyses.

3 Analyses

A self-organising map (Kohonen 2001) is a neural network, which can be used for cluster analyses, visualization of multidimensional data, and classification. We have chosen to use this method for its intuitive way of visualizing multidimensional data, and its ability to analyse incomplete data matrices. One of its other benefits is that the network is trained in an unsupervized manner, i.e. there is no user input on the classification. Therefore the clustering is not biased by any antecedent results.

A SOM consists of neurons, which are organized in a N-dimensional grid, usually N = 2 for the most convenient visualization. In the 2-dimensional case, the lattice of neurons can be hexagonal or rectangular, and the lattice can be folded into cylindrical or toroidal shape. In this paper, a simple flat 2-dimensional hexagonal grid of neurons was chosen.

In each neuron i, there is a randomly initialized weight vector $\vec{w}_i$ of D dimensions. The input data are also considered to consist of vectors, input vectors $\vec{x}$, of D dimensions. Each input vector represents an observation of the input data and each dimension represents a parameter of the observation. Thus, in this paper, each input vector is a single GPS source, and each component of this vector represents one property of the source.

The map is trained by taking one input vector and comparing it with all the weight vectors to find the best-matching unit (BMU) c, the neuron of which the weight vector $\vec{w}_c$ is closest (usually in Euclidian distance) to the input vector:

\begin{displaymath}\vert\vert\vec{x} - \vec{m}_c\vert\vert = min\vert\vert\vec{x} - \vec{m}_i\vert\vert.
\end{displaymath} (3)

The weight vectors of the BMU and its topological neighbours are then updated to resemble the input vector even more:

\begin{displaymath}\vec{m}_i(t+1) = \vec{m}_i + a(t) h_{ci}(r(t))[\vec{x}(t) - \vec{m}_i(t)],
\end{displaymath} (4)

where t denotes time (training step), a(t) is learning rate, and hci(r(t)) neighbourhood function depending on neighbourhood radius r(t). The learning rate and the neighbourhood radius typically decrease with time, so that the amount of change and the number of affected neurons decrease as more training steps are completed. The learning rate decreases from 1 to 0 usually using a function that is inversely proportional to time. The neighbourhood radius is usually large at the beginning, allowing the map to adapt more rapidly, and becomes smaller as the training progresses, so that the map is finally fine-tuned to the delicate details of the input data.

This comparison and updating can be done in two different ways: using either sequential or batch training. In sequential training the comparison is done by taking one input vector at a time and updating the map before proceeding to the next input vector. We have used batch training, in which all the input data are gone through once before updating the weight vectors with the weighted averages of the samples.

When the training is completed, the map will have formed a representation of the observations by adjusting its vectors according to observed variables. In practice, this allows us to locate the sources by their properties on the map and thus clustering similar sources together. The neurons, with the sources they harbour, can be divided into clusters by different clustering methods.

We have used SOM Toolbox[*] version 2.0 for Matlab. The grid size and the topology were optimized to the data by the software. We chose to use the centroid method for the clustering, because it creates clusters by calculating the centroid of the whole cluster instead of creating chains between similar sources like the linkage methods do. We also tested clustering using cluster averages, the neighbourhood function, and the Ward method, but the results did not differ substantially.

4 Results and discussion

The maps are presented in Figs. 2-6, only available in electronic form via http://www.aanda.org.

The most important tools in interpreting the results of the analyses are the U-matrix and the component planes. The U-matrix is a representation of the average neigbourhood distances of each neuron, with an additional hexagon between every neighbouring neuron to illustrate the distance between the pair. If the data were clearly divided into different clusters, the U-matrix would show clear light (red) borders, representing large distances, between the neurons which belong to different clusters. Dark-coloured (blue) areas represent groups of similar neurons, where the differences between the neighbours are small. Component planes show the projection of the value of each parameter on the map grid, i.e. it can be thought as a contour map describing the location of low and high values of the parameter.

Because of the space needed for text in the neurons, the separate maps, i.e. all except the combined U-matrix and the component plane view, have been rotated 90 $\hbox{$^\circ$ }$ counterclockwise. In the discussion below, we use the coordinates of the separate maps, so when referring to the upper left corner of a map, the corresponding area in the combined U-matrix and component plane plot is in the upper right corner.

Below, the maps are presented and discussed in a general manner and only some specific clusters of interest are discussed in more detail.

4.1 Map of all sources and all parameters

When all sources in the sample and all the collected parameters are used, the maps presented in Figs. 2-6 are produced. Combining the information of the auxiliary classification (Fig. 3), VLBI morphology (Fig. 4) and the optical identification (Fig. 5), we can see that the map can be divided roughly into four quarters. The two upper quarters are populated with quasar-type sources, the upper right with gps-type spectra and the upper left with other types of spectra. In the lower part of the map there are mostly galaxy-type sources; again the right side is dominated by sources with gps-type spectra while the left side has types f, f/s, c, and n. Each of these quarters can be divided into several clusters of similar sources, and their typical properties can be studied in the plot of the U-matrix and the component planes (Fig. 2).

As they are not numerical quantities, neither the optical identification, VLBI morphology, nor the auxiliary classification were used by the algorithm, and therefore the formation of groups of sources similar in these properties is likely to reflect some deeper similarities between the sources. (Because the auxiliary classification has been done using the parameters of the radio spectrum, the classes are linked to different parts of the map according to the values of the radio parameters. However, the clear division of classes on the map is an indication that we have chosen to use quantities that really seem to reflect some essential properties of the sources.)

Below, we first give an overview of the four quarters and then present some individual clusters in more detail.

The upper left quarter of the map consists of flat- and convex-spectrum sources and sources with inverted spectrum during outbursts. There is pronounced variability in the sources in this part of the map; there are quasars, low polarizarion quasars (LPQs), highly polarized quasars (HPQs), and, in particularly, on the outermost edge and the upper corner, BL Lac objects (BLOs). This upper left quarter contains sources with typical blazar properties, i.e. these sources have been misidentified as GPS sources because of the temporary GPS shape of their spectra during radio flares. This is not surprising, as it has been noticed earlier (e.g., 20; Tinti et al. 2005; Paper II) that some previously classified GPS sources show blazar-like behaviour, and, indeed, they have proved to be blazars when taking a closer look.

In this ``blazar quarter'' there are also two gps sources (B0528-250, B1758+388), which seem to be misclassified by the auxiliary classification. Their radio continuum spectra show GPS-like characteristics but this may be due to lack of flux density monitoring, which would likely reveal more blazar-like behaviour. In Papers II and III we have shown that monitoring must be continued for several years in order to find out the true nature of a source. Neither of these sources have been monitored for more than $\sim$3 years near the turnover, so there is not yet compelling evidence that these sources would maintain their GPS-type spectrum in all levels of activity.

The lower parts of the left side of the map are mostly populated by galaxies with unknown VLBI morphologies. These sources are characterized by low redshifts and rather low radio powers at 5 GHz. Besides the radio data we have calculated, there is very little information available of the other properties of these sources. Typical spectrum types in this area are f, f/s, and n.

The right side of the map contains sources with confirmed gps-type spectra. The uppermost third of the right side is populated by high redshift quasars, which have high radio powers. The sources near the vertical mid-line of the map are larger in size and have flatter spectra and higher variability than the sources in the right corner. The sources on the uppermost rows are mostly unresolved by VLBI, however, there are two CSOs in the right corner and a group of core-jets a bit lower in the middle.

Below these quasar-type sources, in the middle of the right side, there is a bundle of galaxies, mostly with CSO morphologies. These low-redshift sources have rather low radio powers, especially on the right edge, where also the sizes of the sources are small. The bottom right of the map consists of galaxies with mostly unknown VLBI morphologies, and, in addition to gps-type spectra, there are also s, f/s, and n types of spectra. The sizes of these sources are quite large near the vertical mid-line of the map and smaller near the corner. Variability is low in the bottom rows of the map, but this may be due to small number of observations, which indicates lack of proper monitoring. The curvature of the spectrum is high in the bottom right corner of the map, and the spectral index above the turnover is steep in the area which extends nearly to the mid-line. The spectral index below the peak is high in the corner, but declines rapidly towards the vertical mid-line, which is due to the lack of the optically thich part of the spectrum of the auxiliary classes present in the area.

4.1.1 Some notes on individual clusters

In this paper we mainly concentrate on the outcome of the SOM analysis and the general trends we observe in the GPS source subpopulations. A more detailed analysis of the physical properties of the various subpopulations will be the topic of a subsequent paper.

In Fig. 6 the similar neurons have been clustered together by the centroid method. We have analyzed the map using different numbers of clusters, and have chosen to present the clustering with $N_{\rm clusters} = 16$, because the clusters seem to represent well the different areas in the map, and the division of sources into clusters does not seem to be too coarse or too fine-tuned. Depending on the number of clusters, some single sources may switch clusters. The cluster memberships cannot be considered definitive but rather suggestive for individual sources. The map is plotted in Fig. 6 together with the cluster numbering generated by the algorithm. The sources in each cluster have been listed in Table 3 and a summary of the properties of the cluster is presented in Table 4. Here we present some of the most interesting clusters and their properties, together with some preliminary interpretations of the nature of the related sources.

Table 3: Clustering of the sources ( $N_{\rm clust} = 16$).

The cluster number 5 in the lower right corner of the map contains sources with CSO, CD, and unknown morphologies. There is also one core-jet object. There are altogether 26 sources, out of which 17 are galaxies, 5 are quasars, and 4 sources with empty fields or with no information. The component planes show steep spectral indices and narrow FWHM in the outermost corner, whereas, when going leftwards, the sizes of the sources increase, and the turnover frequencies decrease, which can also be seen from the auxiliary classification of the spectra. In the upper and right edges of the cluster there are gps-type spectra, whereas in the lower left part of the cluster there are sources with s and f/s types of spectra.

Table 4: Summary of the properties of sources in each cluster.

The sizes of six sources in this cluster are known; the median value (0.93 kpc) is greater than the median of all the sources in the sample (0.11 kpc). The rest frame turnover frequency has a decreasing gradient towards the lower left part of the cluster; the median of the cluster is 1.92 GHz, whereas the median of all the sources is 4.91 GHz. It is also lower than in other clusters with gps-type spectra, except for the clusters number 7 and 8. However, the value is affected by undefinable turnover frequencies of the sources with s and f/s types of spectra, and therefore only represents the $\nu_{\rm peak}$ of the gps-type sources in the cluster.

There are at least three sources with some evidence of young age: B0703+468, a CSO quasar, generally suggested to be a young source by Stanghellini et al. (2005); B1225+368, low radiative age found by Murgia et al. (1999); and B2201+098 with a lower limit of kinematic age of less than 1000 years found by Gugliucci et al. (2005). None of the sources have been reported to have related extended emission around them. This cluster could represent young sources, with the youngest in the upper and right parts of the cluster and the older, possibly CSS sources of which the turnover frequency has already decreased below the observed frequencies, in the lower left part of the cluster.

The cluster 7 harbours galaxy-type sources with mostly gps spectra and low turnover frequencies (median 0.64 GHz), small sizes (median 0.25 kpc), and CSO, CD or CT morphologies. These sources have rather high variability; the median of the highest variability index is 1.73, which is observed at the frequency of $\sim$8 GHz, yet, at least the source B1031+567 has been discovered - among six other CSOs - to have extremely stable flux densities at this frequency band on timescales ranging from one week to ten months (Fassnacht & Taylor 2001). The radio powers are intermediate (median $\log~(P_{5~\rm GHz}) = 26.5$) in this cluster. Out of the total 11 sources, there are at least four sources with kinematic age estimates, ranging from $\sim$380 yr to $\sim$3000 yr (e.g., Giroletti et al. 2003; Polatidis & Conway 2003), and two sources with other hints of young age (Murgia et al. 1999). Therefore, it is likely that the sources in this cluster represent galaxy-type symmetric radio sources in their youth. However, the low turnover frequency, in fact, the lowest of all clusters, is in contradiction with the evolutionary scheme, where the turnover of a new-born radio source is at high frequencies and decreases as the source grows larger. At least there does not seem to be any cluster of sources clearly representing the next phase of evolution of these sources, which is not totally surprising as the one of the key selection criteria of GPS classification has been turnover frequency of 0.5 GHz. However, there are sources with s and f/s types of spectra, especially in the clusters 5 and 2 for which the turnover frequency has not been determined, and therefore they don't contribute to the median turnover frequency of their clusters.

The borders of this cluster do not change when the number of the clusters is changed. This provides additional support that the sources form a homogeneous population that is not likely to mix or merge with the neighbouring clusters. The unexpected combination of low turnover frequency, small linear size and confirmed very young age of the sources in this cluster may require reformulation of the views on what is the cause of the turnover.

The cluster 10 consists of gps sources and sources with c and n types of spectra. This cluster stays also well-defined when the value of $N_{\rm clust}$ is changed. There are both galaxy- and quasar-types of sources. The median redshift of this cluster is 1.62, as there are no redshifts available for any of the galaxies. The rest frame turnover frequencies are high (median 9.9 GHz) and the sizes of the two sources with known LS are small. The radio powers are high in this cluster, the median $\log~(P_{5~\rm GHz})$ = 27.4. Morphologically these sources are CSOs and CDs, and there are two unresolved sources. These sources could also be young sources, but different from the sources in the cluster 7, as these have substantially higher turnover frequencies and radio powers.

The cluster 6 on the right edge of the map consists of low-redshift galaxies with gps-type spectra and CSO morphologies. The sizes of the sources are small (median 0.027 kpc) and their spectra are narrow (median FWHM 1.0 decades of frequency), and have their turnovers at rather high frequencies (median 6.5 GHz). For B1404+286, there is a kinematic age estimate of 100-200 years obtained by Polatidis & Conway (2003) who also have found an upper limit for the expansion velocity for B1934-638 in the cluster. Two of the sources have been suggested to exhibit recurrent activity.

\par\includegraphics[width=15.8cm,clip]{9222fig1.eps} \end{figure} Figure 1: Size vs. rest frame turnover frequency plotted for each cluster, for all the sources and for sources with auxiliary gps classification. The clusters 4 and 13 did not have any sources with information on both the size and the turnover frequency. When confirmed by statistics, the slope of the anticorrelation is plotted. The solid line depicts the anticorrelation without the outlier in the plots of the cluster 14, all the sources, and gps sources. The dashed line shows the effect of the outlier.
Open with DEXTER

The main differences between this cluster and the clusters 7 and 10 are the very low redshift, the small size, the low radio power (the lowest of all clusters), the high column density of neutral hydrogen (although the cluster 10 does not have any column density information to compare to), and the high spectral curvature of the sources in this cluster. The median turnover frequency (6.5 GHz) is substantially higher than in the cluster 7.

The cluster number 11 is populated by a mixture of gps quasars and galaxies. Most of the sources have core-jet morphologies, but there is also one CSO and one unresolved object. Three of the core-jet sources exhibit high variability so that their auxiliary classification is gps,v. For the rest of the sources, variability is not pronounced (median of the maximum variability index without gps,v sources is 1.2) eventhough there is a median of 35 observations from which the variability index has been calculated. The sizes of both the variable and the non-variable sources are small (median 0.037 kpc) and the turnover frequencies high (median 8.3 GHz). One of the variable sources (B1607+268) has been estimated to be only $\sim$2200 yr old (Nagai et al. 2006), and one source has extended emission around it, while three others have not shown related extended emission (Stanghellini et al. 2005). The spectrum of one source has been succesfully fitted to a free-free absorption model (Kameno et al. 2003).

The cluster 16 in the upper right corner of the map has a mixture of CSOs and unresolved VLBI morphologies. The sources are quasars except for two sources which are galaxies.The sources have gps-type spectra except for the three sources with gps,v and one with c spectra. The sources in this cluster are characterized by extremely high rest frame turnover frequencies (median 19.8 GHz), high curvature (median 1.9), and high column densities $N_{\rm H,X}$ and $N_{\rm HI}$, although the number of column density measurements is quite low (6 and 2, respectively). There are four sources associated with free-free absorption (FFA) in the literature (e.g., Kameno et al. 2003; Bicknell et al. 1997), for two sources there are kinematic age estimates of $\sim$180 yr and $\sim$900 yr (Polatidis & Conway 2003), and there is also the prototype recurrent source B0108+388, which is the only source with information on extended emission in the cluster. The median curvature of the spectra is 1.9, which is close to the characteristic FFA curvature value of $\sim$2 (the exact value depends on the homogeneity or clumpiness of the absorbing medium, Bicknell et al. 1997). This cluster may represent free-free absorbed sources.

There does not seem to be any cluster clearly hosting a population of frustrated sources. The number of column density measurements is rather low, and there are no other possible indicators of the density of the environments of the sources. Therefore the possibility cannot be ruled out completely, but it seems unlikely that any of the current clusters represent frustrated sources.

4.2 Size - turnover anticorrelation

As mentioned in Sect. 2.1, the linear size LS information is not of uniform quality. However we believe the values are accurate enough to study the linear size - turnover frequency anticorrelation. It was discovered by Fanti et al. (1990) for CSS sources and confirmed later by O'Dea & Baum (1997) for a combined sample of CSS and GPS sources. O'Dea & Baum (1997) found a correlation of $\nu_{\rm peak} \propto LS^{-0.65}$.

The sizes and turnover frequencies (in the source frame) for each cluster have been plotted to Fig. 1. There is also a size- $\nu_{\rm peak}$ plot of all the sources and of sources with gps classification. Pearson Correlation tests were performed to study if the logarithms of the variables were linearly correlated, and a model was plotted for the correlating clusters. For most of the clusters, there were not enough size and turnover frequency data to allow the determination of the correlation, but the clusters 11 and 16 show statistically significant anticorrelation between the variables. Also for the cluster 14 the anticorrelation is confirmed, if the outlier (B0201+113 with a size of 27.3 kpc obtained with the IPS method by Jeyakumar et al. 2000) is excluded.

The anticorrelation is also valid when the entire sample and all the gps sources are studied. When the outlier B0201+113 is included, the slope of the entire sample is the same as in the sample of O'Dea & Baum (1997), but when excluding the outlier, our slope becomes steeper (-0.75). The same value is obtained when considering only the sources with gps type spectra. For the clusters 11 and 14 the slope is steeper (-0.86 and -0.81, respectively), and for the cluster 16 the slope is flatter (-0.72), however, the variations may not be intrinsic but due to the low number of data points and the incoherence of the VLBI measurements.

The upper left end of the $LS-\nu_{\rm peak}$ distribution is populated by blazars of the clusters 12 and 15, and high peaking sources of the gps clusters 11 and 16. The small size and the high turnover frequency in blazars are due to a small viewing angle and relativistic beaming, whereas at least for four sources in the cluster 16 this explanation in unlikely because the sources are CSOs, i.e. they have large viewing angles. Therefore, the continuous distribution of data in the $LS-\nu_{\rm peak}$ plot does not necessarily imply that the sources are just scaled versions of each other or connected by evolution.

5 Conclusions

We have collected a sample of 206 GPS sources and HFPs presented in the literature, and collected various parameters for them. We have analyzed the sample with self-organising neural networks using centroid clustering analyses. The method and the parameters we have used seem to describe the essence of the sources rather well, as the distributions of the VLBI morphology and the optical identification are consistent with the clustering structure, eventhough they were not used in the analyses.

Our results confirm the contamination of GPS samples by small, beamed blazar-type sources. Over a quarter of the cluster map is populated by variable flat-spectrum quasars and sources with inverted spectra during outbursts. These sources should be excluded from GPS samples, and the results of the GPS studies in which they have been included should be reconsidered.

Sources with confirmed gigahertz-peaked spectrum form different clusters, and it seems likely that there are various different populations of GPS sources in addition to the quasar - galaxy dualism.

Our analyses produce a cluster of very young (confirmed by kinematic age estimates) galaxy CSOs with rather low radio powers and low intrinsic turnover frequencies, which is in contradiction with the view that the youngest sources would have the highest turnover frequencies. There is also a cluster, consisting of a mixture of CSO and CD quasars and galaxies with high peak frequencies and high radio powers, which also could be young sources but of different type than in the above-mentioned cluster. We have also identified a cluster that may represent free-free absorbed sources as well as a cluster of quasars and galaxies with mostly core-jet morphologies and consistent GPS-type spectra.

We have confirmed the size-turnover frequency anticorrelation presented in the literature, with a somewhat steeper correlation factor of -0.75. However, the factor is identical for the entire sample and the sample where only the genuine GPS sources have been included. This cannot be interpreted as a sign of any common evolution or that all the sources would be simply scaled versions of each other. Substantial fraction of sources in the high turnover - small size end of the distribution are blazars, foreshortened by small viewing angles and having boosted emission, i.e. the mechanism connecting the small size and the high turnover frequency is different than in truly small and possibly young sources.

The authors made use of the database CATS (Verkhodanov et al. 1997) of the Special Astrophysical Observatory. The authors acknowledge the support of Academy of Finland to the Metsähovi observing projects. UMRAO is supported in part by funds from the NSF and by funds from the University of Michigan Department of Astronomy. This research made use of the NASAIPAC Extragalactic Database (NED), which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under the contract with the National Aeronautics and Space Administration.



Online Material

\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig2.eps}\end{figure} Figure 2: U-matrix and the component planes of all parameters when all the sources in the sample are analyzed.
Open with DEXTER

\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig3.eps}\end{figure} Figure 3: Sources marked by their auxiliary classification on the grid of SOM.
Open with DEXTER

\par\includegraphics[angle=90,width=16cm,clip]{9222fig4.eps}\end{figure} Figure 4: VLBI morphology of the sources, cso = compact symmetric object (cyan), cj = core-jet object (yellow), cd = compact double (light green), cx = complex (blue), unres = unresolved (red), unknown = no observations (pink), ct = compact triple (light blue), s = stellar (orange), gl = gravitational lense (violet), ln = linear (purple). The number in the parenthesis describes the number of sources with the related morphology, and the size of the dot the total number of sources in the neuron.
Open with DEXTER

\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig5.eps}\end{figure} Figure 5: Sources marked by their optical identification on the grid of SOM.
Open with DEXTER

\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig6.eps}\end{figure} Figure 6: Source names and the cluster numbers on a map of the clusters $N_{\rm clust} = 16$.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7a.eps} \end{figure} Figure 7: Radio spectra of sources with the fitted curves. The median values of each data bin are marked with red colour.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7b.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7c.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7d.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7e.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7f.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7g.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7h.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7i.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7j.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7k.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7l.eps} \end{figure} Figure 7: continued.
Open with DEXTER

\par\includegraphics[width=17cm]{9222fi7m.eps} \end{figure} Figure 7: continued.
Open with DEXTER

Copyright ESO 2008