Contents

A&A 482, 483-498 (2008)
DOI: 10.1051/0004-6361:20079222

Cluster analyses of gigahertz-peaked spectrum sources with self-organising maps[*],[*]

I. Torniainen1 - M. Tornikoski1 - M. Turunen1 - M. Lainela2 - A. Lähteenmäki1 - T. Hovatta1 - M. G. Mingaliev3 - M. F. Aller4 - H. D. Aller4


1 - TKK Helsinki University of Technology, Metsähovi Radio Observatory, Metsähovintie 114, 02540 Kylmälä, Finland
2 - Tuorla Observatory, University of Turku, Väisäläntie 20, 21500 Piikkiö, Finland
3 - Special Astrophysical Observatory, Russian Academy of Sciences, Nizhnij Arkhyz, Karachaevo-Cherkesia, 369167 Russia
4 - Department of Astronomy, University of Michigan, Ann Arbor, MI, 48109, USA

Received 9 December 2007 / Accepted 24 January 2008

Abstract
Context. Gigahertz-peaked spectrum (GPS) sources and high frequency peakers (HFPs) are among the smallest of active galactic nuclei currently believed to represent the earliest phases in the evolution of extragalactic radio sources. Recently there has been evidence of contamination by other types of radio sources among the GPS and HFP samples, but the confirmed GPS sources or HFPs also seem to form a very heterogeneous population.
Aims. We study the statistical clustering of the GPS sources and the HFPs by taking as many source parameters as possible to find homogeneous groups among the sources. We expect the clustering to give us insight into the physical parameters that play a role in different source populations.
Methods. We have collected a sample of 206 GPS sources and HFPs from the literature and gathered a massive database of various source properties, such as the redshift, the size, the polarization, the magnitudes, and the properties of the radio continuum. To visualize and to cluster these multidimensional data we used self-organising maps (SOM), which are neural networks trained by an unsupervised algorithm. We have classified the sources with an auxiliary classification to trace the locations of different types of radio continuum spectra on the map.
Results. The sources form distinctive clusters on the map, which is supported by the accordant organisation of the non-numerical parameters not used in the analysis, such as the radio morphology and the optical identification. Our results confirm that the blazars contaminating the GPS and the HFP samples are physically different from the genuine GPS sources and HFPs, and they should be excluded from the samples. The genuine GPS sources form various clusters, which indicates the existence of different subpopulations, besides the expected galaxy-quasar dualism.

Key words: galaxies: active - galaxies: quasars: general - radio continuum: galaxies

1 Introduction

The gigahertz-peaked spectrum (GPS) sources and the high frequency peakers (HFPs) are a heterogenous group of compact (linear size LS < 1 kpc) extragalactic objects. They are active galactic nuclei (AGN) with convex radio continuum spectra, peaking in the GHz frequencies (GPS sources) or higher (HFPs). They can be divided into two types: galaxies and quasars. The galaxy-type sources are found at lower redshifts ( 0.1 < z < 1, O'Dea et al. 1996) and are less variable than the quasar-type sources. They also have lower turnover frequencies than the GPS quasars and exhibit symmetric VLBI morphologies, whereas the quasars usually have complex or core-jet morphologies (Stanghellini et al. 1997,2001).

The currently favoured view is that the galaxy-type GPS sources and HFPs are intrinsically small due to their young age (e.g., Phillips & Mutel 1980,1982; Polatidis & Conway 2003) and that quasar-type sources are large-scale radio sources which appear small due to a projection effect (e.g., Stanghellini 2003). VLBI observations have revealed extended emission around some GPS sources - both galaxies and quasars. The galaxy-type sources may be explained by recurrent activity of the nucleus (e.g. Baum et al. 1990; Stanghellini et al. 1990). This was, nonetheless, found probable for only one source out of a sample of six GPS sources associated with extended emission (Stanghellini et al. 2005). The quasar-type core-jet sources associated with extended emission are most likely truly extended sources at such high redshifts that most of the large-scale structures are below detection limits (Stanghellini et al. 2005). However, there are also quasar-type GPS sources with symmetric VLBI morphology, and they may be intrinsically small and young sources.

Another explanation for the small size of the galaxy-type sources, i.e. confinement by a dense ambient medium, has been suggested by, e.g., Baum et al. (1990) and O'Dea et al. (1991). In this scenario, the source is old and has remained small in size due to external pressure that prevents the radio lobes from growing beyond the galactic center. Recently, there have not been any studies supporting this view.

In our previous papers we searched for high peaking GPS sources and studied some of the known ones. In the first paper (Tornikoski et al. 2001, hereafter Paper I) we identified several new southern high peaking sources and found variability in some of the known ones, and in the second paper (Torniainen et al. 2005, hereafter Paper II) we widened our study to the northern hemisphere and found mild to extreme variability in all the known GPS, CSS, or HFP sources monitored in Metsähovi. The change in the shape of the spectrum differed from source to source: a minority of sources maintained their convex shape and the peak frequency independent of the state of the activity, but for most of the sources the shape changed from flat to inverted as the activity increased. Majority of the sources in Paper II were quasars, and it was obvious that the quasar-type GPS samples were severly contaminated by blazars. This inspired us to study the possible contamination of GPS galaxies. For the third paper (Torniainen et al. 2007, hereafter Paper III), we collected a sample of 96 galaxy-type GPS sources and gathered as much radio data for them as possible. After studying their overall radio spectra, and the spectral and variability indices, we found that for only a third of the sources the GPS classification is well-grounded. For a third there are not enough data for firm conclutions, and a third of the sources were flat- or inverted-spectrum sources.

Recently, also Labiano et al. (2007a) produced a new master list of GPS sources, where some sources exhibiting strong radio variability had been excluded. This new list does not, however, include the findings of Papers II and III.

It has also been confirmed by other studies (e.g. Orienti et al. 2007; Tinti et al. 2005) that there are different populations among the GPS sources and the HFPs identified in the literature, some of them having truly constant convex spectral shape and some of them having only temporarily inverted shape of the spectrum.

Most of the GPS samples in the literature have been selected by combining datapoints from different catalogues originating from different epochs and picking up sources with convex radio continuum spectrum. This has been done without paying much attention on the effect of variability when using non-simultaneous datapoints. Also single epoch multifrequency observations or observations spanning only a couple of years have been used. These approaches have proved to generate very heterogeneous samples since a peaked spectrum can be caused by several different effects. On the basis of the sample contamination, the most severe cases are variable flat-spectrum sources observed when one flaring component dominates the spectrum and creates temporarily inverted spectrum that lasts even for months or years. There are also consistently convex-spectrum sources with high variability as well as sources with virtually no flux density variations. There are galaxies and quasars, compact symmetric objects (CSOs), and core-jet or complex VLBI morphologies. Some sources are detected in the X-rays or the $\gamma$-rays or both, some remain undetected. For the majority of the sources, there is a very limited amount of continuum observations in the radio band as well as in other parts of the spectrum, not to mention information on other source properties, e.g., the emission lines, the size, and the column densities.

There seems to be no clear and simple common factor present in our sample of genuine GPS sources (Paper II; Paper III): there are both quasars and galaxies, CSOs and core-jets, and variable and non-variable sources. Intrigued by this variety of objects classified as GPS sources, and craving for clarity about their physical nature, we wanted to take as many parameters as possible into account and run neural clustering analyses for a complete sample of GPS sources.

A self-organized map (SOM) is an unsupervised neural network, used, for example, for visualization of multidimensional data, classification and clustering. The algorithm tries to place the objects on a multidimensional map so that the Euclidian distance of the parameter vectors of similar objects is minimized. In astronomy, neural networks have been used mainly for classification of objects (e.g., Brett et al. 2004; Miller & Coe 1996; Rajaniemi & Mähönen 2002).

In Sect. 2, we present the sample and the collection and processing of the data. In Sect. 3, the SOM analyses are presented in detail. The results are presented and discussed in Sect. 4, and the conclusions are given in Sect. 5.

  
2 Sample and data

We collected a sample of 206 GPS and HFP sources for a detailed study of the underlying populations of various kinds of sources, both among all the GPS sources identified in the literature and the genuine GPS sources with constant shape and peak of the spectrum.

The sources in the sample and their references are listed in Table 1. Some sources from these papers were left out since there were no coordinates for them or it was likely that there would not be a sufficient number of parameters for them to be used in the analyses.


 

 
Table 1: The sources and their classification.

Source
ID Ref. Origcl Ref. Aux. Class Cluster
(1) (2) (3) (4) (5) (6) (7)

B0000+212
GAL 13 HFP 2 gps 6
B0002+051 QSO 13 HFP 2 gps 10
B0018+729 GAL 16 GPS 16 s 1
B0019-000 GAL 16 GPS 16 n 1
B0022-423 GAL 5 GPS 5 gps 5
B0026+346 GAL 7 GPS 7 gps 7
B0034+078 GAL 13 HFP 2 gps 10
B0039+230 EF 7 GPS 7 f 13
B0048-097 BLO 20 GPS 20 idb 12
B0105-122 GAL 15 GPS 15 n 5
B0108+388 GAL 16 GPS 16 gps 16
B0113+241 EF 13 HFP 2 f 12
B0116+319 GAL 12 GPS 12 gps 7
B0144+209 EF 7 GPS 7 gps 5
B0153+744 LPQ 6 GPS 11 c,v 14
B0159+839 QSO 5 GPS 5 f 3
B0201+113 QSO 7 GPS 7 gps 14
B0204-306 GAL 15 GPS 15 n 2
B0207-224 GAL 15 GPS 15 n 5
B0208+040 GAL 15 GPS 15 f/s 5
B0215+015 HPQ 21 HFP 2 idb 14
B0218+357 BLO 7 GPS 7 f,v 14
B0237-233 QSO 17 GPS 17 gps 14
B0238-084 GAL 5 GPS 5 idb 1
B0240-217 GAL 15 GPS 15 f/s 1
B0248+430 LPQ 17 GPS 17 f,v 14
B0316+162 GAL 16 GPS 16 f/s 2
B0320+053 GAL 15 GPS 15 f/s 1
B0326+349 QSO 3 HFP 2 f 3
B0332-403 HPQ 20 GPS 20 f,v 15
B0354+231 QSO 13 HFP 2 f,v 12
B0359-294 GAL 15 GPS 15 f/S 2
B0400+258 QSO 7 GPS 7 f,v 15
B0404+768 GAL 16 GPS 16 gps 14
B0405-280 GAL 15 GPS 15 n 2
B0405-395 GAL 15 GPS 15 s 2
B0424+328 GAL 13 HFP 2 gps 6
B0428+205 GAL 16 GPS 16 gps 7
B0431-026 GAL 15 GPS 15 f/s 4
B0437-454 QSO 5 GPS 5 f 12
B0439-337 GAL 15 GPS 15 gps 2
B0454-088 GAL 15 GPS 15 f/s 2
B0454-234 HPQ 20 GPS 20 f,v 15
B0457+024 QSO 17 GPS 17 gps 16
B0500+019 GAL 15 GPS 15 gps 11
B0507+179 QSO 10 GPS 5 f 1
B0516+087 EF 13 HFP 2 f 12
B0528+134 LPQ,BLO 16 GPS 16 idb 14
B0528-250 QSO 5 GPS 5 gps 15
B0537-441 BLO 20 GPS 20 f/s,v 15
B0552+398 QSO 24 GPS 17 gps,v 11
B0554-026 GAL 16 GPS 16 n 2
B0602+780 GAL 16 GPS 16 n 2
B0621+446 BLO 2 HFP 2 f 3
B0633+595 EF 13 HFP 2 f 15
B0636+680 QSO 2 HFP 2 gps 16
B0642+449 LPQ 13 HFP 2 gps,v 16
B0646+600 QSO 2 HFP 2 f,v 12
B0651+410 GAL 2 HFP 2 c 1
B0700+470 GAL 14 GPS 14 f/s 4
B0703+468 QSO 1 GPS 16 gps 5
B0706+460   14 GPS 14 gps,v 5
B0710+439 GAL 16 GPS 16 gps 16
B0711+356 QSO 7 GPS 7 gps,v 10
B0718+374 QSO 13 HFP 2 gps 10
B0738+313 QSO 17 GPS 17 c 14
B0741-063     GPS 18 gps 5
B0742+103 GAL 17 GPS 17 gps 14
B0743-006 QSO 8 GPS 17 gps 11
B0802+103 QSO 7 GPS 7 s 5
B0858-279 QSO 5 GPS 5 c 15
B0902+490 QSO 5 GPS 5 f/s 9
B0904+039 GAL 23 GPS 7 n 2
B0910+151 GAL 15 GPS 15 n 1
B0914+114 GAL 16 GPS 16 s 5
B0923+392 QSO 2 HFP 2 gps,v 11
B0930+493 QSO 14 GPS 14 c 9
B0941-080 GAL 17 GPS 17 f/s 1
B1013+054 QSO 13 HFP 2 f 12
B1031+567 GAL 16 GPS 16 gps 7
B1039+811 LPQ 10 GPS 9 f,v 12
B1042-269 GAL 15 GPS 15 n 2
B1043+066 QSO 2 HFP 2 c 10
B1054+004 GAL 15 GPS 15 f/s 4
B1057-797 QSO 20 GPS 20 c 9
B1100+223 EF 5 GPS 5 n 10
B1107+109 GAL 15 GPS 15 n 4
B1107-187 GAL 15 GPS 15 n 1
B1117+146 GAL 17 GPS 17 f/s 7
B1118-056 EF 5 GPS 5 f 12
B1120-274 GAL 15 GPS 15 n 8
B1127-145 QSO 17 GPS 17 c,v 14
B1132-000 GAL 15 GPS 15 f/s 4
B1133+432 EF 23 GPS 14 gps 5
B1143-245 QSO 17 GPS 17 gps 16
B1144+352 GAL 14 GPS 14 f 1
B1144+542 QSO 14 GPS 14 f 15
B1146+531 QSO 2 HFP 2 f 15
B1148-171 QSO 7 GPS 7 f 12
B1200+045 GAL 15 GPS 15 f 4
B1225+368 QSO 14 GPS 14 gps 5
B1245-197 QSO 17 GPS 17 c 14
B1323+321 GAL 16 GPS 16 gps 7
B1323+799 GAL,QSO 10 GPS 9 f 9
B1324+574 GAL 14 GPS 14 gps 8
B1333+459 QSO 2 HFP 2 c 15
B1334-127 HPQ 20 GPS 20 c,v 15
B1343-300 GAL 15 GPS 15 s 2
B1345+125 GAL 16 GPS 16 c 7
B1347-218 GAL 15 GPS 15 n 4
B1349+027 GAL 15 GPS 15 s 4
B1349-439 BLO 20 GPS 20 c 12
B1350+113 GAL 15 GPS 15 f/s 5
B1354-174 GAL 7 GPS 7 f 9
B1355+441 GAL 14 GPS 14 gps 11
B1357+769 QSO 10 GPS 9 f 12
B1358+624 GAL 16 GPS 16 gps 8
B1404+286 GAL 16 GPS 16; 2 gps,v 6
B1410+138 EF 13 HFP 2 c 2
B1422+231 QSO 13 HFP 2 f 14
B1427+109 QSO 2 HFP 2 gps 16
B1433-040 GAL 16 GPS 16 n 5
B1442+101 QSO 17 GPS 17 n 14
B1444-339 GAL 15 GPS 15 n 2
B1455+080 BLO 13 HFP 2 f,v 12
B1502+036 QSO 2 HFP 2 c 3
B1503-091 GAL 15 GPS 15 n 5
B1509+054 GAL 13 HFP 2 gps 6
B1518+046 QSO 17 GPS 17 gps 5
B1519-273 BLO 10 GPS 5 c,v 15
B1526+670 QSO 2 HFP 2 gps,v 16
B1540-077 GAL 15 GPS 15 n 1
B1543+005 GAL 15 GPS 15 n 14
B1545-120 GAL 15 GPS 15 n 5
B1548-302     GPS 7 n 4
B1553-062 GAL 15 GPS 15 f/s 2
B1557-004 GAL 15 GPS 15 n 5
B1600+335 GAL 16 GPS 16 c 3
B1601+112 BLO 13 HFP 2 f,v 12
B1601-222 GAL 15 GPS 15 gps 6
B1604+315 GAL 16 GPS 16 c 9
B1607+268 GAL 16 GPS 16 gps,v 11
B1614+051 QSO 2 HFP 2 gps 16
B1622+665 GAL 2 HFP 2 gps 6
B1638+124 GAL 15 GPS 15 c 7
B1645+635 QSO 2 HFP 2 f 13
B1646+028 GAL 15 GPS 15 f/s 5
B1714+193 QSO 13 HFP 2 f 4
B1726+769 QSO 10 GPS 9 f 4
B1732+094 GAL 15 GPS 15 gps 6
B1734+508 GAL 2 HFP 2 c 10
B1749+096 BLO 2 HFP 2 idb 12
B1751+278 GAL 5 GPS 5 n 2
B1758+388 QSO 2 HFP 2 gps 15
B1803+784 BLO 10 GPS 9 f,v 12
B1807+170 BLO 13 HFP 2 f 12
B1824+271 GAL 23 GPS 5 s,v 1
B1839+389 QSO 2 HFP 2 c 15
B1843+356 GAL 16 GPS 16 gps,v 16
B1848+283 QSO 2 HFP 2 gps 16
B1851+488 QSO 5 GPS 5 f 12
B1853+376 GAL 13 HFP 2 c 10
B1934-638 GAL 5 GPS 5 gps 6
B1936-155 HPQ 19 GPS 19 f,v 15
B1954-388 HPQ 20 GPS 20 c 12
B2000-330 QSO 5 GPS 5 c 16
B2007+777 BLO 10 GPS 9 f,v 12
B2008-068 GAL 16 GPS 16 gps 5
B2008-159 LPQ 22 GPS 19 gps,v 15
B2019+050 GAL 13 HFP 2 gps 11
B2021+614 GAL 6 GPS 11 gps 7
B2022+171 LPQ 13 HFP 2 gps,v 14
B2050+364 GAL 5 GPS 5 c 2
B2053-201 GAL 7 GPS 7 s 3
B2055+055 GAL 15 GPS 15 f/s 5
B2059+034 QSO 2 HFP 2 f 12
B2112+283 EF 13 HFP 2 c 15
B2121+053 QSO 2 HFP 2 idb 15
B2121-014 GAL 15 GPS 15 s 2
B2126-158 QSO 17 GPS 17 gps 16
B2126-185 QSO 7 GPS 7 n 4
B2128+048 GAL 17 GPS 17 gps 10
B2128-123 LPQ 20 GPS 20 c 15
B2134+004 LPQ 17 GPS 17; 2 gps 11
B2136+141 LPQ 20 GPS 20 c 14
B2149+056 GAL 15 GPS 15 gps 11
B2153-119 S 4 GPS 7 n 4
B2154-183 QSO 7 GPS 7 n 5
B2201+098 GAL 13 HFP 2 c 5
B2205+166 QSO 13 HFP 2 f 12
B2209+236 QSO 13 HFP 2 f,v 12
B2210+016 GAL 17 GPS 17 f/s 7
B2236+124 QSO 5 GPS 5 f 12
B2254+024 QSO 2 HFP 2 f 13
B2254-204 BLO 20 GPS 20 c 9
B2255-282 LPQ 20 GPS 20 idb 15
B2318+049 QSO 2 HFP 2 c 12
B2322-040 GAL 16 GPS 16 gps 5
B2323+790 GAL 5 GPS 5 n,v 2
B2327+335 QSO 2 HFP 2 c 12
B2333-528 GAL 5 GPS 5 f/s 5
B2337+264 GAL 5 GPS 5 gps 16
B2337-063 GAL 15 GPS 15 f/s 5
B2342+821 QSO 16 GPS 16 gps 8
B2352+495 GAL 17 GPS 17 gps 7
B2353+816 BLO 10 GPS 9 f 4

Notes for the columns:
(1) Source name in B1950 coordinates;
(2) Optical identification: BLO = BL Lac object, EF = empty field, GAL = galaxy,
HPQ = high polarization quasar, LPQ = low polarization quasar, S = stellar, QSO = quasar;
(3) Reference for Col. 2: 1 = Augusto et al. (2006), 2 = Dallacasa et al. (2000);
3 = Dallacasa et al. (2002); 4 = de Vries et al. (1995); 5 = de Vries et al. (1997);
6 = Impey et al. (1991); 7 = Jeyakumar et al. (2000); 8 = Labiano et al. (2007a);
9 = Mingaliev et al. (2001); 10 = NED; 11 = O'Dea et al. (1991); 12 = O'Dea et al. (2005);
13 = Orienti et al. (2006a); 14 = Snellen et al. (1995); 15 = Snellen et al. (2002a);
16 = Stanghellini et al. (1993); 17 = Stanghellini et al. (1998); 18 = Steppe et al. (1995);
19 = Tornikoski et al. (2000); 20 = Tornikoski et al. (2001); 21 = Véron-Cetty & Véron (2006);
22 = Wills et al. (1992); 23 = Xiang et al. (2005); 24 = Xiang et al. (2006);
(4) Original GPS/HFP classification of the source;
(5) Reference for Col. 4, as in Col. 3;
(6) Auxiliary spectrum classification: gps = genuine gigahertz-peaked spectrum,
n = not enough data for GPS identification, s = steep spectrum, f = flat spectrum,
f/s = flat at low frequencies, steep at high frequencies, c = convex spectrum,
idb = inverted during bursts, v = variability of Var $_{\Delta S} > 3$;
(7) Number of the cluster in which the source is located

The parameters used in the analyses are listed together with their references in Table 2. The data for each source are available in electronic form at the CDS. Some of the parameters are described in more detail below.

  
2.1 Source size

The linear sizes have been collected from numerous references and they have been obtained by various instruments or VLBI networks and at various frequencies. Therefore the values are not perfectly comparable, but rather give some guidelines of the source size. In addition, in the original papers the sizes were calculated with very different values of cosmological parameters and hence were not comparable as such. Thus we recalculated the linear sizes with the latest estimates of cosmological parameters (H0=71 km s-1 Mpc-1, $\Omega _{\rm M} = 0.27, \Omega_{\rm vac} = 0.73$) using the javascript calculator created by Edward L. Wright[*] (Wright 2006).

For some sources, several different sizes were given in the literature. We selected the largest size obtained with VLBI, since when studying the compactness of the source, the largest observed size gives the most significant information. If there were sizes obtained with both VLBI and interplanetary scintillation (IPS) method (Jeyakumar et al. 2000), we selected the VLBI size for conformity, but accepted IPS sizes when there was no other information available. The scale of the sizes varied substantially so we used logarithm of the size in the analyses.

2.2 Magnitudes

The Gunn system r and i magnitudes from Stanghellini et al. (1993) were converted to Cousins R and I magnitudes using equations given by Schombert et al. (1990). For the sources with only one of the Gunn magnitudes, the Cousins magnitude was estimated using the mean value of r-R or i-I of the sources in Stanghellini et al. (1993). The brightest magnitude was chosen if there were several values for one source, except when a fainter value was simultaneous with the I magnitude observation.

2.3 Radio spectrum parameters

When calculating the parameters of the radio continuum spectra, we used the method developed for Paper III. To overcome the drawbacks of varying amount of data points at unevenly sampled frequency coverage, the frequency range between 0.05-360 GHz was divided to logarithmically equidistant intervals, chosen to be as wide as the fractional interval between 8 GHz and 10 GHz. The data at each interval was bound to the logarithmic centre of the interval. The median of the flux density and the fractional variability index (Var $_{\Delta S} = (S_{\max} - S_{\min})/S_{\min}$) were calculated for each of these data bins.

We calculated also three other variability indices which included the error estimates for the flux densities, but chose to use the above-mentioned quantity because the errors were not available for all the data in the CATS database, and thus the values would have not been consistent for all the sources and the data bins. The results were substantially the same regardless of the choice of the variability index.

The median flux density of each bin was used to model the shape of the spectra by fitting the following equation from Kovalev et al. (2000), rearranged by Dallacasa et al. (2000),

 \begin{displaymath}
\log S = a - \sqrt{b^2+ (c \log \nu - d)^2)},
\end{displaymath} (1)

where S is the flux density at frequency $\nu$, and a, b, c, and d are the fit parameters.

In general, Eq. (1) models the spectra in this sample very well, but since it does not model any physical properties, we made adjustments to the fits of some sources, as described below.

Some of the sources in our sample have abundant data on both sides of the turnover, so that the flux density peak and the turnover frequency are easy to determine and there are no difficulties in interpreting the fits. However, there is a number of sources in which the fits from Eq. (1) do not represent the slope of the data accurately, and for those sources, picked up by visual examination, logarithmic linear fits were also applied. In some of these cases the flat top of the peak was not used for the linear fit in order to better model the declining slope.

The majority of the sources have insufficient data for determining both spectral indices reliably, or even at all. For some sources the optically thin part of the spectrum is available and the spectral index[*] $\alpha$ is calculated either with Eq. (1) or linearly. Some sources show only a very wide round top of the spectrum, and it is not feasible to fit spectral indices to these kind of data. Using Eq. (1), the slopes are calculated from extrapolated values as follows. When applicable, frequencies 0.1 MHz and 1 MHz are selected to represent the optically thick, and 100 GHz and 1000 GHz the optically thin part of the spectrum. These values, selected far from the turnover to make sure that the slope has levelled out, are substituted into Eq. (1) to get the respective, modelled flux density values, and the basic formula for the slope in the logarithmic scales is used to derive the spectral indices: $\alpha = (\log S_2 - \log S_1)/(\log \nu_2 - \log \nu_1)$. The source spectra with the applicable fits are presented in Figs. 7.

In determing the peak frequency, we used the value derived from the fit when it was applicable. For some sources the spectrum was so flat that the fit yielded peak frequencies far beyond the frequency range of the data. Then the peak frequency was determined visually or omitted depending on the shape of the spectrum. For several cases the peak frequency from the fit matched the start of the declining part of the spectra but there was no clear information on the rising part of the spectrum. In these cases we considered the peak frequency from the fit an upper (in some cases a lower) limit of the possible turnover.

The rest frame peak frequency was calculated and for the sources with no redshift information available we used a generic value of z = 1, which is close to the median value of our sample (0.93).

From the spectral indices, the turnover frequency, and the variability in the radio frequencies we derived other quantities to describe the shape of the radio spectrum. The symmetry of the spectrum was calculated as $\alpha_{\rm below} / (- \alpha_{\rm above})$. The curvature of the spectrum is defined as the change in the spectral indices over the spectrum, i.e. $\alpha_{\rm below} - \alpha_{\rm above}$. The width of the spectrum, FWHM in decades of frequency, was calculated from the fitted function by taking the difference of the frequencies below and above the turnover where the flux density was half of the highest value.

Because the amount of data varied from source to source and the data were not distributed evenly in the optically thick and thin parts of the spectrum, we wanted to put more weight on the spectrum parameters of sources with abundant radio data, and less on the sources with sparse data that probably did not describe the true shape of the spectrum. In order to calculate weighting factors for the spectrum indices $\alpha_{\rm below}$ and $\alpha_{\rm above}$, the number of empty and non-empty fitting data bins in the corresponding frequency intervals were defined as:
n = N(``non-empty data bins'') and
m = N(``empty data bins'').

Now the weighting factor $q'_{\rm b}$ for $\alpha_{\rm below}$ can be given as:

\begin{displaymath}q'_{\rm b} = 1 - [n + (\pi/4)*m]^{-1},
\end{displaymath} (2)

where the factor $\pi$/4 is used to reduce the significance of empty data bins. The final weighting factor $q_{\rm b}$ is then generated by normalizing $q'_{\rm b}$ to interval [0, 1] in respect to all the processed $\alpha_{\rm below}$ values. $q_{\rm a}$ for $\alpha_{\rm above}$ is calculated in an identical manner.

Weighting factors for the spectrum index derivatives, curvature, symmetry, and FWHM, were then approximated with geometric mean sqrt( $q_{\rm b} * q_{\rm a}$) where applicable. These factors were then applied to SOM training using weighting mask matrix.

The source power at 5 GHz was calculated using the median flux density of the databin around 5 GHz. The luminosity distance was calculated with the fundamental formula from e.g., Altschuler (1989), and the generic redshift of z = 1 was used for sources without redshift information.

  
2.4 Auxiliary classification

In addition to calculating numerical parameters to be used in the analyses, we used the radio spectral parameters for classifying of the spectra. The spectral classes and their criteria follow the approach of Paper III:

This classification is used to trace the locations of different types of spectra on the maps.


   
Table 2: Parameters and their references used in the analyses.
Parameter Label on the SOM References
Optical identification ID Tornikoski et al. (2001); O'Dea et al. (2005); Stanghellini et al. (1993); Snellen et al. (2002a); Orienti et al. (2006a); Hewitt & Burbidge (1993); Jeyakumar et al. (2000); Stanghellini et al. (1998); Impey et al. (1991); de Vries et al. (1997); NED; Xiang et al. (2005); Augusto et al. (2006); Xiang et al. (2006); Snellen et al. (1995); Tornikoski et al. (2000); Wills et al. (1992); Dallacasa et al. (2000)
Redshift z O'Dea et al. (1996); Hewitt & Burbidge (1993); Jeyakumar et al. (2000); Véron-Cetty & Véron (2003); de Vries et al. (1997); NED; de Vries et al. (2007); Xiang et al. (2006); O'Dea et al. (2005); Orienti et al. (2006b); Stanghellini et al. (1993); Xiang et al. (2005); Labiano et al. (2007b); Stanghellini et al. (1998); Impey & Tapia (1990); Impey et al. (1991); Snellen et al. (1995); Tinti & de Zotti (2006); Dallacasa et al. (2000)
Size Size_kpc O'Dea & Baum (1997); Xiang et al. (2006); Dallacasa et al. (1998); Xiang et al. (2005); Orienti et al. (2006a); Jeyakumar et al. (2000); Augusto et al. (2006); Gupta et al. (2006); Labiano et al. (2007b); Stanghellini et al. (2001); Best et al. (1999); Gurvits et al. (1999)
Power at 5 GHz P_5 GHz Calculated as described in the text
Optical polarization p_opt Fugmann & Meisenheimer (1988); Impey & Tapia (1990); Visvanathan & Wills (1998); Impey et al. (1991); Marcha et al. (1996); Wills et al. (1992); O'Dea (1998)
Radio polarization p_radio Ricci et al. (2004); Zukowski et al. (1999); Steppe et al. (1995); Aller et al. (2003); Homan & Lister (2006)
B magnitude   Siebert et al. (1998); Véron-Cetty & Véron (2003); Labiano et al. (2007b); Dallacasa et al. (2002)
V magnitude   Véron-Cetty & Véron (2003); Dallacasa et al. (2002); O'Dea et al. (1991); Wills et al. (1992); Hewitt & Burbidge (1993); Barvainis et al. (2005); Impey & Tapia (1990); Labiano et al. (2007b); Impey et al. (1991)
R magnitude   O'Dea et al. (1996); Labiano et al. (2007b); Véron-Cetty & Véron (2003); Dallacasa et al. (2002); Stanghellini et al. (1993); O'Dea et al. (1991); Tinti & de Zotti (2006)
I magnitude   de Vries et al. (1995); Xiang et al. (2005); Guainazzi et al. (2006); Stanghellini et al. (1993); Dallacasa et al. (2002); de Vries et al. (2000)
V-R colour V-R Dallacasa et al. (2002), and calculated values
B-V colour B-V Véron-Cetty & Véron (2003), and calculated values
U-B colour U-B Véron-Cetty & Véron (2003), and calculated values
VLBI morphology - Xiang et al. (2006); O'Dea et al. (1991); Dallacasa et al. (1998); Lister et al. (2002); Xiang et al. (2005); Orienti et al. (2006a); Augusto et al. (2006); Jeyakumar et al. (2000); Gugliucci et al. (2005); Fey & Charlot (1997)
R-I colour R-I O'Dea et al. (1996,1991)
Hydrogen column density in X-rays N_H_X Siemiginowska et al. (2003); Siebert et al. (1998); Guainazzi et al. (2006); Bloom et al. (1999); Vink et al. (2006); Elvis et al. (1994)
Power law slope in X-rays Gamma Siemiginowska et al. (2003); Siebert et al. (1998); Guainazzi et al. (2006); Vink et al. (2006)
Hydrogen column density 21 cm N_H_21 Gupta et al. (2006); Orienti et al. (2006b); Pihlström et al. (2003)
O-E colour O-E Snellen et al. (2002b)
Variability index vi Calculated as described in the text
Number of observations N_vi Calculated as described in the text
Rest frame turnover frequency nu_peak,rest Calculated as described in the text
Optically thick spectral index alpha_b Calculated as described in the text
Optically thin spectral index alpha_a Calculated as described in the text
Curvature Curvature Calculated as described in the text
Symmetry Symmetry Calculated as described in the text
Width of the spectrum FWHM Calculated as described in the text

  
3 Analyses

A self-organising map (Kohonen 2001) is a neural network, which can be used for cluster analyses, visualization of multidimensional data, and classification. We have chosen to use this method for its intuitive way of visualizing multidimensional data, and its ability to analyse incomplete data matrices. One of its other benefits is that the network is trained in an unsupervized manner, i.e. there is no user input on the classification. Therefore the clustering is not biased by any antecedent results.

A SOM consists of neurons, which are organized in a N-dimensional grid, usually N = 2 for the most convenient visualization. In the 2-dimensional case, the lattice of neurons can be hexagonal or rectangular, and the lattice can be folded into cylindrical or toroidal shape. In this paper, a simple flat 2-dimensional hexagonal grid of neurons was chosen.

In each neuron i, there is a randomly initialized weight vector $\vec{w}_i$ of D dimensions. The input data are also considered to consist of vectors, input vectors $\vec{x}$, of D dimensions. Each input vector represents an observation of the input data and each dimension represents a parameter of the observation. Thus, in this paper, each input vector is a single GPS source, and each component of this vector represents one property of the source.

The map is trained by taking one input vector and comparing it with all the weight vectors to find the best-matching unit (BMU) c, the neuron of which the weight vector $\vec{w}_c$ is closest (usually in Euclidian distance) to the input vector:

\begin{displaymath}\vert\vert\vec{x} - \vec{m}_c\vert\vert = min\vert\vert\vec{x} - \vec{m}_i\vert\vert.
\end{displaymath} (3)

The weight vectors of the BMU and its topological neighbours are then updated to resemble the input vector even more:

\begin{displaymath}\vec{m}_i(t+1) = \vec{m}_i + a(t) h_{ci}(r(t))[\vec{x}(t) - \vec{m}_i(t)],
\end{displaymath} (4)

where t denotes time (training step), a(t) is learning rate, and hci(r(t)) neighbourhood function depending on neighbourhood radius r(t). The learning rate and the neighbourhood radius typically decrease with time, so that the amount of change and the number of affected neurons decrease as more training steps are completed. The learning rate decreases from 1 to 0 usually using a function that is inversely proportional to time. The neighbourhood radius is usually large at the beginning, allowing the map to adapt more rapidly, and becomes smaller as the training progresses, so that the map is finally fine-tuned to the delicate details of the input data.

This comparison and updating can be done in two different ways: using either sequential or batch training. In sequential training the comparison is done by taking one input vector at a time and updating the map before proceeding to the next input vector. We have used batch training, in which all the input data are gone through once before updating the weight vectors with the weighted averages of the samples.

When the training is completed, the map will have formed a representation of the observations by adjusting its vectors according to observed variables. In practice, this allows us to locate the sources by their properties on the map and thus clustering similar sources together. The neurons, with the sources they harbour, can be divided into clusters by different clustering methods.

We have used SOM Toolbox[*] version 2.0 for Matlab. The grid size and the topology were optimized to the data by the software. We chose to use the centroid method for the clustering, because it creates clusters by calculating the centroid of the whole cluster instead of creating chains between similar sources like the linkage methods do. We also tested clustering using cluster averages, the neighbourhood function, and the Ward method, but the results did not differ substantially.

  
4 Results and discussion

The maps are presented in Figs. 2-6, only available in electronic form via http://www.aanda.org.

The most important tools in interpreting the results of the analyses are the U-matrix and the component planes. The U-matrix is a representation of the average neigbourhood distances of each neuron, with an additional hexagon between every neighbouring neuron to illustrate the distance between the pair. If the data were clearly divided into different clusters, the U-matrix would show clear light (red) borders, representing large distances, between the neurons which belong to different clusters. Dark-coloured (blue) areas represent groups of similar neurons, where the differences between the neighbours are small. Component planes show the projection of the value of each parameter on the map grid, i.e. it can be thought as a contour map describing the location of low and high values of the parameter.

Because of the space needed for text in the neurons, the separate maps, i.e. all except the combined U-matrix and the component plane view, have been rotated 90 $\hbox{$^\circ$ }$ counterclockwise. In the discussion below, we use the coordinates of the separate maps, so when referring to the upper left corner of a map, the corresponding area in the combined U-matrix and component plane plot is in the upper right corner.

Below, the maps are presented and discussed in a general manner and only some specific clusters of interest are discussed in more detail.

4.1 Map of all sources and all parameters

When all sources in the sample and all the collected parameters are used, the maps presented in Figs. 2-6 are produced. Combining the information of the auxiliary classification (Fig. 3), VLBI morphology (Fig. 4) and the optical identification (Fig. 5), we can see that the map can be divided roughly into four quarters. The two upper quarters are populated with quasar-type sources, the upper right with gps-type spectra and the upper left with other types of spectra. In the lower part of the map there are mostly galaxy-type sources; again the right side is dominated by sources with gps-type spectra while the left side has types f, f/s, c, and n. Each of these quarters can be divided into several clusters of similar sources, and their typical properties can be studied in the plot of the U-matrix and the component planes (Fig. 2).

As they are not numerical quantities, neither the optical identification, VLBI morphology, nor the auxiliary classification were used by the algorithm, and therefore the formation of groups of sources similar in these properties is likely to reflect some deeper similarities between the sources. (Because the auxiliary classification has been done using the parameters of the radio spectrum, the classes are linked to different parts of the map according to the values of the radio parameters. However, the clear division of classes on the map is an indication that we have chosen to use quantities that really seem to reflect some essential properties of the sources.)

Below, we first give an overview of the four quarters and then present some individual clusters in more detail.

The upper left quarter of the map consists of flat- and convex-spectrum sources and sources with inverted spectrum during outbursts. There is pronounced variability in the sources in this part of the map; there are quasars, low polarizarion quasars (LPQs), highly polarized quasars (HPQs), and, in particularly, on the outermost edge and the upper corner, BL Lac objects (BLOs). This upper left quarter contains sources with typical blazar properties, i.e. these sources have been misidentified as GPS sources because of the temporary GPS shape of their spectra during radio flares. This is not surprising, as it has been noticed earlier (e.g., 20; Tinti et al. 2005; Paper II) that some previously classified GPS sources show blazar-like behaviour, and, indeed, they have proved to be blazars when taking a closer look.

In this ``blazar quarter'' there are also two gps sources (B0528-250, B1758+388), which seem to be misclassified by the auxiliary classification. Their radio continuum spectra show GPS-like characteristics but this may be due to lack of flux density monitoring, which would likely reveal more blazar-like behaviour. In Papers II and III we have shown that monitoring must be continued for several years in order to find out the true nature of a source. Neither of these sources have been monitored for more than $\sim$3 years near the turnover, so there is not yet compelling evidence that these sources would maintain their GPS-type spectrum in all levels of activity.

The lower parts of the left side of the map are mostly populated by galaxies with unknown VLBI morphologies. These sources are characterized by low redshifts and rather low radio powers at 5 GHz. Besides the radio data we have calculated, there is very little information available of the other properties of these sources. Typical spectrum types in this area are f, f/s, and n.

The right side of the map contains sources with confirmed gps-type spectra. The uppermost third of the right side is populated by high redshift quasars, which have high radio powers. The sources near the vertical mid-line of the map are larger in size and have flatter spectra and higher variability than the sources in the right corner. The sources on the uppermost rows are mostly unresolved by VLBI, however, there are two CSOs in the right corner and a group of core-jets a bit lower in the middle.

Below these quasar-type sources, in the middle of the right side, there is a bundle of galaxies, mostly with CSO morphologies. These low-redshift sources have rather low radio powers, especially on the right edge, where also the sizes of the sources are small. The bottom right of the map consists of galaxies with mostly unknown VLBI morphologies, and, in addition to gps-type spectra, there are also s, f/s, and n types of spectra. The sizes of these sources are quite large near the vertical mid-line of the map and smaller near the corner. Variability is low in the bottom rows of the map, but this may be due to small number of observations, which indicates lack of proper monitoring. The curvature of the spectrum is high in the bottom right corner of the map, and the spectral index above the turnover is steep in the area which extends nearly to the mid-line. The spectral index below the peak is high in the corner, but declines rapidly towards the vertical mid-line, which is due to the lack of the optically thich part of the spectrum of the auxiliary classes present in the area.

4.1.1 Some notes on individual clusters

In this paper we mainly concentrate on the outcome of the SOM analysis and the general trends we observe in the GPS source subpopulations. A more detailed analysis of the physical properties of the various subpopulations will be the topic of a subsequent paper.

In Fig. 6 the similar neurons have been clustered together by the centroid method. We have analyzed the map using different numbers of clusters, and have chosen to present the clustering with $N_{\rm clusters} = 16$, because the clusters seem to represent well the different areas in the map, and the division of sources into clusters does not seem to be too coarse or too fine-tuned. Depending on the number of clusters, some single sources may switch clusters. The cluster memberships cannot be considered definitive but rather suggestive for individual sources. The map is plotted in Fig. 6 together with the cluster numbering generated by the algorithm. The sources in each cluster have been listed in Table 3 and a summary of the properties of the cluster is presented in Table 4. Here we present some of the most interesting clusters and their properties, together with some preliminary interpretations of the nature of the related sources.

 

 
Table 3: Clustering of the sources ( $N_{\rm clust} = 16$).
Cluster 1 2 3 4 5 6 7 8
Sources B0018+729 B0204-306 B0159+839 B0431-026 B0022-423 B0000+212 B0026+346 B1120-274
  B0019-000 B0316+162 B0326+349 B0700+470 B0105-122 B0424+328 B0116+319 B1324+574
  B0238-084 B0359-294 B0621+446 B1054+004 B0144+209 B1404+286 B0428+205 B1358+624
  B0240-217 B0405-280 B1502+036 B1107+109 B0207-224 B1509+054 B1031+567 B2342+821
  B0320+053 B0405-395 B1600+335 B1132-000 B0208+040 B1601-222 B1117+146  
  B0507+179 B0439-337 B2053-201 B1200+045 B0703+468 B1622+665 B1323+321  
  B0651+410 B0454-088   B1347-218 B0706+460 B1732+094 B1345+125  
  B0910+151 B0554-026   B1349+027 B0741-063 B1934-638 B1638+124  
  B0941-080 B0602+780   B1548-302 B0802+103   B2021+614  
  B1107-187 B0904+039   B1714+193 B0914+114   B2210+016  
  B1144+352 B1042-269   B1726+769 B1133+432   B2352+495  
  B1540-077 B1343-300   B2126-185 B1225+368      
  B1824+271 B1410+138   B2153-119 B1350+113      
    B1444-339   B2353+816 B1433-040      
    B1553-062     B1503-091      
    B1751+278     B1518+046/7      
    B2050+364     B1545-120      
    B2121-014     B1557-004      
    B2323+790     B1646+028      
          B2008-068      
          B2055+055      
          B2154-183      
          B2201+098      
          B2322-040      
          B2333-528      
          B2337-063      
Cluster 9 10 11 12 13 14 15 16
Sources B0902+490 B0002+051 B0500+019 B0048-097 B0039+230 B0153+744 B0332-403 B0108+388
  B0930+493 B0034+078 B0552+398 B0113+241 B1645+635 B0201+113 B0400+258 B0457+024
  B1057-797 B0711+356 B0743-006 B0354+231 B2254+024 B0215+015 B0454-234 B0636+680
  B1323+799 B0718+374 B0923+392 B0437-454   B0218+357 B0528-250 B0642+449
  B1354-174 B1043+066 B1355+441 B0516+087   B0237-233 B0537-441 B0710+439
  B1604+315 B1100+223 B1607+268 B0646+600   B0248+430 B0633+595 B1143-245
  B2254-204 B1734+508 B2019+050 B1013+054   B0404+768 B0858-279 B1427+109
    B1853+376 B2134+004 B1039+811   B0528+134 B1144+542 B1526+670
    B2128+048 B2149+056 B1118-056   B0738+313 B1146+531 B1614+051
        B1148-171   B0742+103 B1333+459 B1843+356
        B1349-439   B1127-145 B1334-127 B1848+283
        B1357+769   B1245-197 B1519-273 B2000-330
        B1455+080   B1422+231 B1758+388 B2126-158
        B1601+112   B1442+101 B1839+389 B2337+264
        B1749+096   B1543+005 B1936-155  
        B1803+784   B2022+171 B2008-159  
        B1807+170   B2136+141 B2112+283  
        B1851+488     B2121+053  
        B1954-388     B2128-123  
        B2007+777     B2255-282  
        B2059+034        
        B2205+166        
        B2209+236        
        B2236+124        
        B2318+049        
        B2327+335        


The cluster number 5 in the lower right corner of the map contains sources with CSO, CD, and unknown morphologies. There is also one core-jet object. There are altogether 26 sources, out of which 17 are galaxies, 5 are quasars, and 4 sources with empty fields or with no information. The component planes show steep spectral indices and narrow FWHM in the outermost corner, whereas, when going leftwards, the sizes of the sources increase, and the turnover frequencies decrease, which can also be seen from the auxiliary classification of the spectra. In the upper and right edges of the cluster there are gps-type spectra, whereas in the lower left part of the cluster there are sources with s and f/s types of spectra.


   
Table 4: Summary of the properties of sources in each cluster.
Cluster z Size [kpc] lg (P_5GHz) pol_opt [%]
number N med min max N med min max N med min max N med min max
1 12 0.267 0.005 0.821 5 0.09 0.00 0.27 13 26.23 22.86 26.87 2 0.55 0.47 0.63
2 16 0.692 0.235 1.195 5 0.39 0.11 2.35 19 26.80 25.58 27.84 0 - - -
3 4 0.456 0.156 1.100 2 40.32 0.48 80.16 6 26.57 25.75 27.81 0 - - -
4 10 0.680 0.530 1.344 1 0.01 0.01 0.01 14 26.81 26.51 27.43 0 - - -
5 18 1.339 0.178 1.980 6 0.93 0.24 39.07 26 27.32 24.94 28.02 0 - - -
6 8 0.192 0.077 0.735 7 0.03 0.01 0.30 8 25.83 24.89 27.09 1 0.61 0.61 0.61
7 11 0.362 0.060 1.150 10 0.25 0.04 1.12 11 26.54 25.07 27.62 4 1.18 0.30 1.44
8 3 0.650 0.431 0.735 2 1.17 0.39 1.95 4 26.89 26.77 27.30 0 - - -
9 4 2.630 1.970 3.147 3 0.02 0.01 0.03 7 27.83 27.15 28.57 1 9.30 9.30 9.30
10 5 1.620 0.990 1.900 2 0.14 0.04 0.24 9 27.37 26.92 27.89 1 1.00 1.00 1.00
11 7 0.740 0.473 2.365 8 0.04 0.01 0.35 9 27.26 26.98 29.07 6 2.21 0.40 7.80
12 16 0.847 0.050 1.809 5 0.01 0.01 0.05 26 27.12 24.55 27.75 6 9.75 6.00 21.50
13 2 2.230 2.081 2.379 1 0.06 0.06 0.06 3 27.77 27.42 27.95 1 1.67 1.67 1.67
14 17 1.715 0.556 3.626 14 0.16 0.02 27.27 17 28.23 26.89 29.01 6 0.71 0.30 2.66
15 18 1.644 0.501 3.095 9 0.01 0.01 0.04 20 27.94 27.20 28.49 9 10.60 1.90 27.10
16 12 2.790 0.518 3.773 9 0.06 0.01 0.25 14 28.25 27.03 28.89 2 1.57 0.87 2.27
Cluster pol_radio [%] V-R B-V U-B
number N med min max N med min max N med min max N med min max
1 3 0.58 0.04 3.40 3 1.50 0.91 1.50 3 1.07 0.00 1.34 1 0.61 0.61 0.61
2 0 - - - 3 1.10 0.60 1.40 2 -0.25 -0.40 -0.10 0 - - -
3 1 35.20 35.20 35.20 1 0.00 0.00 0.00 2 0.35 0.20 0.49 1 -0.53 -0.53 -0.53
4 0 - - - 0 - - - 0 - - - 0 - - -
5 5 3.20 1.20 7.10 2 0.73 0.21 1.25 3 0.25 0.10 1.40 1 -0.84 -0.84 -0.84
6 1 0.70 0.70 0.70 4 1.02 0.93 1.40 2 0.65 0.52 0.78 1 -0.05 -0.05 -0.05
7 6 0.17 0.00 1.21 4 1.80 -0.10 2.50 3 0.00 -0.50 0.00 0 - - -
8 1 1.64 1.64 1.64 1 0.30 0.30 0.30 2 0.00 0.00 0.00 0 - - -
9 1 3.80 3.80 3.80 0 - - - 1 0.60 0.60 0.60 0 - - -
10 1 1.09 1.09 1.09 1 0.41 0.41 0.41 1 0.35 0.35 0.35 1 -1.11 -1.11 -1.11
11 3 2.20 1.00 2.73 3 -0.38 -0.80 -0.30 3 0.30 0.06 1.00 2 -0.63 -0.94 -0.31
12 4 3.31 0.70 3.57 4 0.47 0.36 0.49 9 0.58 0.00 1.49 5 -0.63 -0.84 -0.47
13 0 - - - 0 - - - 1 0.08 0.08 0.08 0 - - -
14 5 3.82 2.20 5.49 6 0.91 0.00 1.50 11 0.20 -1.00 0.80 4 -0.61 -0.70 -0.37
15 11 3.10 1.43 7.10 2 0.72 0.03 1.40 12 -0.40 -2.80 0.58 3 -0.55 -0.89 -0.48
16 5 2.60 0.44 4.00 8 0.17 -4.90 1.70 8 -0.28 -1.20 1.60 1 1.70 1.70 1.70
Cluster R-I $\nu_{\rm peak,rest}$ [GHz] $\alpha_{\rm b}$ $\alpha_{\rm a}$
number N med min max N med min max N med min max N med min max
1 3 1.00 0.40 1.50 9 0.91 0.26 2.10 8 0.24 0.02 0.50 10 -0.74 -0.84 1.00
2 2 0.40 0.10 0.70 14 1.55 0.74 8.87 11 0.48 0.11 0.79 19 -0.74 -1.00 -0.47
3 1 0.50 0.50 0.50 2 7.94 1.53 14.35 4 0.23 0.06 0.31 4 -0.42 -0.61 -0.19
4 0 - - - 9 0.85 0.30 1.85 8 0.16 0.00 1.00 11 -0.70 -1.04 -0.44
5 3 0.50 0.20 0.60 21 1.92 0.52 8.49 15 0.64 0.20 1.50 25 -1.17 -1.83 1.00
6 1 0.20 0.20 0.20 8 6.48 0.83 13.04 8 1.19 0.92 1.98 8 -0.94 -1.30 -0.31
7 3 0.40 0.30 0.60 11 0.64 0.35 5.30 9 0.56 0.24 0.77 11 -0.58 -0.90 1.00
8 1 0.30 0.30 0.30 4 1.13 0.74 2.09 3 1.16 1.10 1.22 4 -0.79 -0.84 -0.66
9 1 1.40 1.40 1.40 6 7.01 4.62 44.39 5 0.40 0.35 0.51 7 -0.51 -0.84 -0.08
10 1 1.50 1.50 1.50 9 9.86 1.58 14.00 9 0.63 0.36 0.76 9 -0.63 -0.85 -0.36
11 1 0.60 0.60 0.60 9 8.30 1.52 31.15 9 0.80 0.66 1.11 9 -0.86 -1.11 -0.66
12 0 - - - 15 11.29 3.42 90.90 16 0.18 0.04 0.53 18 -0.18 -0.53 1.00
13 0 - - - 1 5.70 5.70 5.70 3 0.44 0.05 1.00 1 -0.35 -0.35 -0.35
14 1 1.70 1.70 1.70 13 5.48 0.66 32.09 13 0.45 0.19 1.15 15 -0.62 -1.07 -0.23
15 0 - - - 18 20.47 6.81 52.61 15 0.46 0.21 0.70 16 -0.47 -0.62 -0.20
16 2 0.98 0.76 1.20 14 19.83 3.65 66.65 14 1.10 0.51 2.65 14 -0.95 -2.12 -0.43
Cluster Symmetry Curvature FWHM Variability index
number N med min max N med min max N med min max N med min max
1 6 -0.56 -0.82 -0.37 6 0.99 0.85 1.34 1 2.55 2.54 2.54 13 1.14 0.39 7.36
2 14 -0.36 -0.82 0.00 14 1.00 0.56 1.58 3 1.27 0.98 1.43 19 0.16 0.05 1.53
3 2 -0.19 -0.37 0.00 2 0.65 0.62 0.68 2 2.21 2.16 2.27 6 1.21 0.36 1.92
4 6 -0.43 -0.96 -0.36 6 0.92 0.44 1.12 0 - - - 14 0.36 0.17 1.69
5 18 -0.59 -1.49 0.08 18 1.69 1.17 3.33 11 1.01 0.73 1.22 26 0.35 0.11 2.35
6 8 0.14 0.01 1.40 8 2.14 1.43 2.62 8 1.02 0.78 1.35 8 0.79 0.35 3.66
7 9 0.00 -0.34 0.25 9 1.15 0.79 1.48 9 1.59 1.18 1.96 11 1.73 0.60 4.04
8 3 0.43 0.38 0.44 3 1.93 1.76 2.01 3 1.02 0.92 1.21 4 0.79 0.44 1.26
9 5 0.00 -0.38 0.00 5 1.01 0.70 1.24 4 1.49 1.34 1.73 7 0.62 0.26 1.71
10 9 0.00 -0.48 0.16 9 1.22 0.72 1.52 8 1.24 1.04 1.67 9 0.84 0.38 3.07
11 9 0.01 -0.29 0.05 9 1.76 1.31 2.23 9 1.25 0.92 1.37 9 2.13 0.58 8.16
12 12 0.00 0.00 0.13 12 0.37 0.04 1.05 4 3.54 2.21 5.59 26 2.97 1.00 18.16
13 1 0.09 0.09 0.09 1 0.79 0.79 0.79 0 - - - 3 2.52 1.86 2.93
14 13 -0.17 -0.75 0.28 13 1.12 0.47 2.12 10 1.46 1.10 2.04 17 2.49 0.95 22.75
15 15 0.00 -0.35 0.44 15 0.92 0.62 1.24 10 1.59 1.18 2.09 20 3.14 1.27 14.11
16 14 0.12 -1.19 1.59 14 1.93 1.25 3.70 14 1.18 0.84 1.50 14 1.75 0.64 4.15
         
Cluster N_vi N_H_X [1022 cm-2] Gamma N_HI [1020 cm-2]
number N med min max N med min max N med min max N med min max
1 13 9.00 2.00 133.00 1 0.10 0.10 0.10 1 2.81 2.81 2.81 2 1.14 1.00 1.27
2 19 4.00 2.00 8.00 0 - - - 0 - - - 2 6.98 6.27 7.69
3 6 6.50 2.00 24.00 0 - - - 0 - - - 0 - - -
4 14 6.00 3.00 8.00 0 - - - 0 - - - 0 - - -
5 26 5.00 2.00 27.00 0 - - - 0 - - - 2 1.03 0.64 1.42
6 8 11.00 3.00 17.00 0 - - - 0 - - - 5 3.98 1.99 26.91
7 11 38.00 4.00 155.00 3 0.66 0.50 1.00 1 1.43 1.43 1.43 8 2.20 0.38 12.20
8 4 6.00 2.00 16.00 1 3.00 3.00 3.00 1 1.24 1.24 1.24 2 1.37 0.75 1.99
9 7 7.00 2.00 9.00 0 - - - 0 - - - 0 - - -
10 9 10.00 2.00 17.00 1 0.30 0.30 0.30 1 1.50 1.50 1.50 0 - - -
11 9 60.00 17.00 386.00 2 0.30 0.10 0.50 2 1.90 1.62 2.18 4 11.78 0.99 35.42
12 26 10.50 2.00 220.00 2 0.04 0.02 0.06 2 2.67 2.15 3.18 0 - - -
13 3 9.00 9.00 14.00 0 - - - 0 - - - 0 - - -
14 17 112.00 9.00 1845.00 3 0.33 0.21 3.60 3 1.47 1.24 1.56 4 1.83 0.99 2.66
15 20 26.00 7.00 164.00 3 0.04 0.03 0.07 3 2.20 1.91 2.44 0 - - -
16 14 9.00 2.00 114.00 6 1.83 0.44 57.00 3 1.75 1.59 3.32 2 45.40 11.00 79.80
Cluster O-E                        
number N med min max                        
1 1 -5.25 -5.25 -5.25                        
2 0 - - -                        
3 1 0.96 0.96 0.96                        
4 1 1.15 1.15 1.15                        
5 0 - - -                        
6 3 2.08 1.64 2.44                        
7 4 2.24 0.67 2.73                        
8 1 2.19 2.19 2.19                        
9 1 0.41 0.41 0.41                        
10 2 0.53 0.40 0.65                        
11 3 0.85 0.71 2.32                        
12 3 1.29 0.34 1.35                        
13 3 0.72 0.43 2.01                        
14 5 2.05 1.09 2.96                        
15 4 0.87 0.54 1.58                        
16 3 1.21 0.41 1.45                        

The sizes of six sources in this cluster are known; the median value (0.93 kpc) is greater than the median of all the sources in the sample (0.11 kpc). The rest frame turnover frequency has a decreasing gradient towards the lower left part of the cluster; the median of the cluster is 1.92 GHz, whereas the median of all the sources is 4.91 GHz. It is also lower than in other clusters with gps-type spectra, except for the clusters number 7 and 8. However, the value is affected by undefinable turnover frequencies of the sources with s and f/s types of spectra, and therefore only represents the $\nu_{\rm peak}$ of the gps-type sources in the cluster.

There are at least three sources with some evidence of young age: B0703+468, a CSO quasar, generally suggested to be a young source by Stanghellini et al. (2005); B1225+368, low radiative age found by Murgia et al. (1999); and B2201+098 with a lower limit of kinematic age of less than 1000 years found by Gugliucci et al. (2005). None of the sources have been reported to have related extended emission around them. This cluster could represent young sources, with the youngest in the upper and right parts of the cluster and the older, possibly CSS sources of which the turnover frequency has already decreased below the observed frequencies, in the lower left part of the cluster.

The cluster 7 harbours galaxy-type sources with mostly gps spectra and low turnover frequencies (median 0.64 GHz), small sizes (median 0.25 kpc), and CSO, CD or CT morphologies. These sources have rather high variability; the median of the highest variability index is 1.73, which is observed at the frequency of $\sim$8 GHz, yet, at least the source B1031+567 has been discovered - among six other CSOs - to have extremely stable flux densities at this frequency band on timescales ranging from one week to ten months (Fassnacht & Taylor 2001). The radio powers are intermediate (median $\log~(P_{5~\rm GHz}) = 26.5$) in this cluster. Out of the total 11 sources, there are at least four sources with kinematic age estimates, ranging from $\sim$380 yr to $\sim$3000 yr (e.g., Giroletti et al. 2003; Polatidis & Conway 2003), and two sources with other hints of young age (Murgia et al. 1999). Therefore, it is likely that the sources in this cluster represent galaxy-type symmetric radio sources in their youth. However, the low turnover frequency, in fact, the lowest of all clusters, is in contradiction with the evolutionary scheme, where the turnover of a new-born radio source is at high frequencies and decreases as the source grows larger. At least there does not seem to be any cluster of sources clearly representing the next phase of evolution of these sources, which is not totally surprising as the one of the key selection criteria of GPS classification has been turnover frequency of 0.5 GHz. However, there are sources with s and f/s types of spectra, especially in the clusters 5 and 2 for which the turnover frequency has not been determined, and therefore they don't contribute to the median turnover frequency of their clusters.

The borders of this cluster do not change when the number of the clusters is changed. This provides additional support that the sources form a homogeneous population that is not likely to mix or merge with the neighbouring clusters. The unexpected combination of low turnover frequency, small linear size and confirmed very young age of the sources in this cluster may require reformulation of the views on what is the cause of the turnover.

The cluster 10 consists of gps sources and sources with c and n types of spectra. This cluster stays also well-defined when the value of $N_{\rm clust}$ is changed. There are both galaxy- and quasar-types of sources. The median redshift of this cluster is 1.62, as there are no redshifts available for any of the galaxies. The rest frame turnover frequencies are high (median 9.9 GHz) and the sizes of the two sources with known LS are small. The radio powers are high in this cluster, the median $\log~(P_{5~\rm GHz})$ = 27.4. Morphologically these sources are CSOs and CDs, and there are two unresolved sources. These sources could also be young sources, but different from the sources in the cluster 7, as these have substantially higher turnover frequencies and radio powers.

The cluster 6 on the right edge of the map consists of low-redshift galaxies with gps-type spectra and CSO morphologies. The sizes of the sources are small (median 0.027 kpc) and their spectra are narrow (median FWHM 1.0 decades of frequency), and have their turnovers at rather high frequencies (median 6.5 GHz). For B1404+286, there is a kinematic age estimate of 100-200 years obtained by Polatidis & Conway (2003) who also have found an upper limit for the expansion velocity for B1934-638 in the cluster. Two of the sources have been suggested to exhibit recurrent activity.


  \begin{figure}
\par\includegraphics[width=15.8cm,clip]{9222fig1.eps} \end{figure} Figure 1: Size vs. rest frame turnover frequency plotted for each cluster, for all the sources and for sources with auxiliary gps classification. The clusters 4 and 13 did not have any sources with information on both the size and the turnover frequency. When confirmed by statistics, the slope of the anticorrelation is plotted. The solid line depicts the anticorrelation without the outlier in the plots of the cluster 14, all the sources, and gps sources. The dashed line shows the effect of the outlier.

The main differences between this cluster and the clusters 7 and 10 are the very low redshift, the small size, the low radio power (the lowest of all clusters), the high column density of neutral hydrogen (although the cluster 10 does not have any column density information to compare to), and the high spectral curvature of the sources in this cluster. The median turnover frequency (6.5 GHz) is substantially higher than in the cluster 7.

The cluster number 11 is populated by a mixture of gps quasars and galaxies. Most of the sources have core-jet morphologies, but there is also one CSO and one unresolved object. Three of the core-jet sources exhibit high variability so that their auxiliary classification is gps,v. For the rest of the sources, variability is not pronounced (median of the maximum variability index without gps,v sources is 1.2) eventhough there is a median of 35 observations from which the variability index has been calculated. The sizes of both the variable and the non-variable sources are small (median 0.037 kpc) and the turnover frequencies high (median 8.3 GHz). One of the variable sources (B1607+268) has been estimated to be only $\sim$2200 yr old (Nagai et al. 2006), and one source has extended emission around it, while three others have not shown related extended emission (Stanghellini et al. 2005). The spectrum of one source has been succesfully fitted to a free-free absorption model (Kameno et al. 2003).

The cluster 16 in the upper right corner of the map has a mixture of CSOs and unresolved VLBI morphologies. The sources are quasars except for two sources which are galaxies.The sources have gps-type spectra except for the three sources with gps,v and one with c spectra. The sources in this cluster are characterized by extremely high rest frame turnover frequencies (median 19.8 GHz), high curvature (median 1.9), and high column densities $N_{\rm H,X}$ and $N_{\rm HI}$, although the number of column density measurements is quite low (6 and 2, respectively). There are four sources associated with free-free absorption (FFA) in the literature (e.g., Kameno et al. 2003; Bicknell et al. 1997), for two sources there are kinematic age estimates of $\sim$180 yr and $\sim$900 yr (Polatidis & Conway 2003), and there is also the prototype recurrent source B0108+388, which is the only source with information on extended emission in the cluster. The median curvature of the spectra is 1.9, which is close to the characteristic FFA curvature value of $\sim$2 (the exact value depends on the homogeneity or clumpiness of the absorbing medium, Bicknell et al. 1997). This cluster may represent free-free absorbed sources.

There does not seem to be any cluster clearly hosting a population of frustrated sources. The number of column density measurements is rather low, and there are no other possible indicators of the density of the environments of the sources. Therefore the possibility cannot be ruled out completely, but it seems unlikely that any of the current clusters represent frustrated sources.

4.2 Size - turnover anticorrelation

As mentioned in Sect. 2.1, the linear size LS information is not of uniform quality. However we believe the values are accurate enough to study the linear size - turnover frequency anticorrelation. It was discovered by Fanti et al. (1990) for CSS sources and confirmed later by O'Dea & Baum (1997) for a combined sample of CSS and GPS sources. O'Dea & Baum (1997) found a correlation of $\nu_{\rm peak} \propto LS^{-0.65}$.

The sizes and turnover frequencies (in the source frame) for each cluster have been plotted to Fig. 1. There is also a size- $\nu_{\rm peak}$ plot of all the sources and of sources with gps classification. Pearson Correlation tests were performed to study if the logarithms of the variables were linearly correlated, and a model was plotted for the correlating clusters. For most of the clusters, there were not enough size and turnover frequency data to allow the determination of the correlation, but the clusters 11 and 16 show statistically significant anticorrelation between the variables. Also for the cluster 14 the anticorrelation is confirmed, if the outlier (B0201+113 with a size of 27.3 kpc obtained with the IPS method by Jeyakumar et al. 2000) is excluded.

The anticorrelation is also valid when the entire sample and all the gps sources are studied. When the outlier B0201+113 is included, the slope of the entire sample is the same as in the sample of O'Dea & Baum (1997), but when excluding the outlier, our slope becomes steeper (-0.75). The same value is obtained when considering only the sources with gps type spectra. For the clusters 11 and 14 the slope is steeper (-0.86 and -0.81, respectively), and for the cluster 16 the slope is flatter (-0.72), however, the variations may not be intrinsic but due to the low number of data points and the incoherence of the VLBI measurements.

The upper left end of the $LS-\nu_{\rm peak}$ distribution is populated by blazars of the clusters 12 and 15, and high peaking sources of the gps clusters 11 and 16. The small size and the high turnover frequency in blazars are due to a small viewing angle and relativistic beaming, whereas at least for four sources in the cluster 16 this explanation in unlikely because the sources are CSOs, i.e. they have large viewing angles. Therefore, the continuous distribution of data in the $LS-\nu_{\rm peak}$ plot does not necessarily imply that the sources are just scaled versions of each other or connected by evolution.

  
5 Conclusions

We have collected a sample of 206 GPS sources and HFPs presented in the literature, and collected various parameters for them. We have analyzed the sample with self-organising neural networks using centroid clustering analyses. The method and the parameters we have used seem to describe the essence of the sources rather well, as the distributions of the VLBI morphology and the optical identification are consistent with the clustering structure, eventhough they were not used in the analyses.

Our results confirm the contamination of GPS samples by small, beamed blazar-type sources. Over a quarter of the cluster map is populated by variable flat-spectrum quasars and sources with inverted spectra during outbursts. These sources should be excluded from GPS samples, and the results of the GPS studies in which they have been included should be reconsidered.

Sources with confirmed gigahertz-peaked spectrum form different clusters, and it seems likely that there are various different populations of GPS sources in addition to the quasar - galaxy dualism.

Our analyses produce a cluster of very young (confirmed by kinematic age estimates) galaxy CSOs with rather low radio powers and low intrinsic turnover frequencies, which is in contradiction with the view that the youngest sources would have the highest turnover frequencies. There is also a cluster, consisting of a mixture of CSO and CD quasars and galaxies with high peak frequencies and high radio powers, which also could be young sources but of different type than in the above-mentioned cluster. We have also identified a cluster that may represent free-free absorbed sources as well as a cluster of quasars and galaxies with mostly core-jet morphologies and consistent GPS-type spectra.

We have confirmed the size-turnover frequency anticorrelation presented in the literature, with a somewhat steeper correlation factor of -0.75. However, the factor is identical for the entire sample and the sample where only the genuine GPS sources have been included. This cannot be interpreted as a sign of any common evolution or that all the sources would be simply scaled versions of each other. Substantial fraction of sources in the high turnover - small size end of the distribution are blazars, foreshortened by small viewing angles and having boosted emission, i.e. the mechanism connecting the small size and the high turnover frequency is different than in truly small and possibly young sources.

Acknowledgements
The authors made use of the database CATS (Verkhodanov et al. 1997) of the Special Astrophysical Observatory. The authors acknowledge the support of Academy of Finland to the Metsähovi observing projects. UMRAO is supported in part by funds from the NSF and by funds from the University of Michigan Department of Astronomy. This research made use of the NASAIPAC Extragalactic Database (NED), which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under the contract with the National Aeronautics and Space Administration.

References

 

  
6 Online Material


  \begin{figure}
\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig2.eps}\end{figure} Figure 2: U-matrix and the component planes of all parameters when all the sources in the sample are analyzed.


  \begin{figure}
\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig3.eps}\end{figure} Figure 3: Sources marked by their auxiliary classification on the grid of SOM.


  \begin{figure}
\par\includegraphics[angle=90,width=16cm,clip]{9222fig4.eps}\end{figure} Figure 4: VLBI morphology of the sources, cso = compact symmetric object (cyan), cj = core-jet object (yellow), cd = compact double (light green), cx = complex (blue), unres = unresolved (red), unknown = no observations (pink), ct = compact triple (light blue), s = stellar (orange), gl = gravitational lense (violet), ln = linear (purple). The number in the parenthesis describes the number of sources with the related morphology, and the size of the dot the total number of sources in the neuron.


  \begin{figure}
\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig5.eps}\end{figure} Figure 5: Sources marked by their optical identification on the grid of SOM.


  \begin{figure}
\par\includegraphics[angle=90,width=16.4cm,clip]{9222fig6.eps}\end{figure} Figure 6: Source names and the cluster numbers on a map of the clusters $N_{\rm clust} = 16$.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7a.eps} \end{figure} Figure 7: Radio spectra of sources with the fitted curves. The median values of each data bin are marked with red colour.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7b.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7c.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7d.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7e.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7f.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7g.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7h.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7i.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7j.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7k.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7l.eps} \end{figure} Figure 7: continued.


  \begin{figure}
\par\includegraphics[width=17cm]{9222fi7m.eps} \end{figure} Figure 7: continued.



Copyright ESO 2008