A&A 472, 131-140 (2007)
DOI: 10.1051/0004-6361:20066945

Globular clusters of the Local Group - statistical classification[*]

Tanuka Chattopadhyay1 - Asis Kumar Chattopadhyay2


1 - Shibpur Dinobundhoo College, 412/1 G.T. Road (South), Howrah 711102, India,
Visiting Associate, Inter-University Center for Astronomy and Astrophysics, Post Bag 4, Ganeshkhind, Pune 411007, India
2 - Department of Statistics, Calcutta University, 35 Ballygunge Circular Road, Calcutta 700019, India

Received 15 December 2006 / Accepted 10 April 2007

Abstract
Aims. To find an objective classification of the globular clusters in our Galaxy, M 31, and LMC.
Methods. A new method of Cluster Analysis (CA)was carried out and the set of parameters for this method was selected through an objective process, Principal Component Analysis (PCA). Robustness of the classification was established using bootstrap samples.
Results. In every case they exhibit multi-population structure instead of bimodality as is found in many spirals and giant elliptical galaxies. The kinematics of MW and M 31 GCs are examined in support of these sub-populations in the cluster system. It is found that for MW and M 31 GCs a disc, inner halo, and outer halo populations of GCs are more likely to exist than only the disc and halo populations of GCs in MW as concluded by Zinn (1985, ApJ, 293, 424). This supports the existence of three populations more firmly explained by Zinn (1993, Globular Cluster-Galaxy Connection, 38) and Mackey & Gilmore (2004, MNRAS, 355, 504) whereas only two populations are found for LMC GCs. The new multivariate analysis increases the importance of the inclusion of many parameters while at the same time it eliminates less significant parameters and helps to enunciate a unique theory of galaxy formation.

Key words: Galaxy: kinematics and dynamics - galaxy: globular clusters: general - galaxies: Local Group - methode: statistical

   
1 Introduction

Globular clusters have long been considered as the unique tool for studying galaxy formation and evolution, but the first step is to study the formation process of the globular clusters (GCs) themselves. There are various theories regarding the formation of GCs. Among them (1) gaseous merger, (2) in situ GC formation, and (3) tidal stripping are important. According to merger theory, GCs in gE and cD galaxies were created by the gaseous merger of two progenitor spiral galaxies (Ashman & Zepf 1992; Zepf & Ashman 1993) so that there are two populations of GCs. One is the metal-poor GCs of the progenitor spirals and other is the metal-rich GCs formed in the collision of high velocity gas. One of the most noteworthy success of this model is the discovery of the protoglobular clusters in the currently merging galaxies (Whitmore et al. 1993; Schweizer et al. 1996). In the merger model the high SN galaxies said to be created by the merger of several normal SN galaxies. But in practice the GC systems of high SN galaxies do not have more metal-rich GCs but instead have more metal-poor ones. This means that GC metallicity gradients are steeper in high SN galaxies contrary to the prediction of Ashman & Zepf (1992). According to the most favourable theory (Forbes et al. 1997) GCs are considered to be formed in situ star formation episodes during a collapse process. In the first episode a chaotic merging of many small gaseous subunits (Searle & Zinn 1978; Katz 1992) occurs. During this phase a small fraction of the gas turns into stars and most of them reside in GCs, i.e. the ratio of stars in GCs to field stars is large (Forbes et al. 1997). So, the GCs formed in this phase are metal-poor. In the second phase, the stars enrich the medium and now field stars form in large numbers (due to more efficient cooling at high metallicity) and the GCs which form in this phase are metal-rich. In the third phase the remaining gas settles as a galactic disc at the centre of the galaxy. Spiral galaxies may be considered as an extreme case of this process. Here there is almost no pre-galaxy star formation. So the GCs which form in the second phase are metal-poor compared to the metal-rich GCs in the second phase for elliptical galaxies. Spirals then go on to form a prominent disc and associated GCs in the third phase of collapse. But the main difficulty of this process is the lack of a detailed mechanism for creating distinct phases of GCs formation from a single halo collapse. Also, this analysis is intended to find the bimodality of a single parameter which is colour.

In the present study we carry out a multivariate analysis, which is likely to be more appropriate in a multivariate set up. Early attempts to analyse the characteristics of the Galaxy and GCs by statistical methods were carried out by Brosche (1973), Peterson & King (1975), Brosche & Lentes (1984), Eigenson & Yatsyk (1989), Djorgovski (1991), Covino & Fracassini (1993). Some correlations of the slope of GCs present day mass function (PDMF) with other parameters were studied by Capaccioli et al. (1991) who found that two or at most three significant parameters determine the PDMF slope. This problem was also discussed by Djorgovski (1991). Several correlations among GCs metallicity and galaxy parameters were studied by van den Bergh (1975), Brodie & Huchra (1991), Forbes et al. (1996), Harris (1991), Djorgovski (1995). Djorgovski (1995) uses core radius, velocity dispersion, central surface brightness, and mass-to-light ratio to define a Fundamental Plane for the GCs of elliptical galaxies. These correlations provide rough distance indicators for GCs. In an earlier work Covino & Fracassini (1993) carried out a Principal Component Analysis (PCA) followed by Cluster Analysis (CA) for more than one parameter at a time to study the clustering nature of GCs in different galaxies in the Local Group. Their analysis suffers from several difficulties. First in their study the parameter set is not same for all galaxies. Also the number of parameters is very large (6 for M 31, 8 for LMC, and 10 for Milky Way) so that it is difficult to study the clustering nature in finer details and their interrelations. Finally, the number of clusters, which they selected in an ad hoc manner, is the eighth hierarchy stage. Also the sample size is small.

In the present problem we have first used PCA to search for the optimum set of parameters which gives the maximum variation for the globular clusters in Milky Way, M 31, and LMC. This method reduces the number of parameters to be selected for CA and hence serves the purpose of PCA. Then we took that optimum set of parameters and applied a new method of CA (Sugar & James 2003) which finds the optimum number of groups of GCs instead of choosing group in an ad hoc manner. Also in our case the sample size of Milky Way is almost double. This helps to sort out the optimum number of clusters and optimum set of parameters so that a more efficient theory of galaxy formation can be developed on the basis of this multivariate analysis. In this connection it is also important to mention that under the given set up a partitioning algorithm for cluster analysis is more appropriate than a hierarchical algorithm since the set up is not nested. The new method for finding the optimum number of clusters is based on partitioning algorithm.

In the present paper we closely follow the approach of Covino & Fracassini (1993) in order to classify GCs. The new aspects of our study are as follows:

1.
We incorporate the recent catalogue of Milky Way GCs (Harris 1996) which contains data on 147 GCs which is almost double the sample size (74) used by Covino & Fracassini (1993). Also we used the recent catalogues of M 31 GCs (Barmby et al. 2002) and LMC GCs (Mackey & Gilmore 2003) which include the structural parameters together with photometric parameters unlike the sample used by Covino & Fracassini (1993). Also the core radii values used for LMC GCs were measured using HST which gives more accurate measurements than the previous values.

2.
We use PCA to search for the optimum set of parameters giving maximum variation for all the GCs in Milky Way, M 31, and LMC instead of using the method for comparative study between the GCs among the galaxies.

3.
We used that optimum set of parameters for CA instead of using different sets of parameters for different galaxies in an ad hoc manner and this process increases the consistency of the study.

4.
We used a method of CA (Hartigan 1975) which is based on a partitioning algorithm and then applied a new technique to find the optimum number of groups, not a number selected in an ad hoc manner as the eighth hierarchy stage is selected in the paper by Covino & Fracassini (1993). We have taken different bootstrap samples generated from the original sample to test the robustness of the results of the analyses.
In the following sections we discuss the methods, sample sets and finally the results obtained from the analysis.

2 Method

In order to study the underlying nature of the data under consideration we have to start from the correlation matrix because Principal Component Analysis is based on this correlation or covariance matrix. Although a scatter plot is an essential first step in studying the the association between two variables, it is often useful to quantify the strength of the association by calculating a summary index. One commonly used measure is the correlation coefficient (Pearson's correlation coefficient) denoted by r or rxy, which measures the strength of linear correlation between the values of two parameters x and y.

In Principal Component Analysis (Chattopadhyay & Chattopadhyay 2006) we are interested in discovering which parameters in a data set form coherent subgroups that are relatively independent of one another. The specific aim of the analysis is to reduce a large number of parameters to a smaller number while retaining maximum spread among experimental units. The analysis therefore helps us to determine the optimum set of parameters causing the overall variations in the nature of GCs. PCA has been discussed in detail in Appendix A.

Cluster analysis is the art of finding groups in data. Over the last forty years different algorithms and computer programs have been developed for CA. The choice of a clustering algorithm depends both on the type of data available and on the particular purpose. Generally clustering algorithms can be divided into two principal types viz. partitioning and hierarchical methods.

A partitioning method constructs K clusters i.e. it classifies the data into K groups which together satisfy the requirement of a partition such that each group must contain at least one object and each object must belong to exactly one group. So there are at most as many groups as there are objects (K <=n). Two different clusters cannot have any object in common and the K groups together add up to the full data set. Partitioning methods are applied if one wants to classify the objects into K clusters where K is fixed (which should be selected optimally). The aim is usually to uncover a structure that is already present in the data. The K- means method of (MacQueen 1967) is probably the most widely applied partitioning clustering technique.

Hierarchical algorithms do not construct single partition with K clusters but they deal with all values of K in the same run. The partition with K=1 is a part of the output (all objects are together in the same cluster) and also the situation with K=n (each object forms a separate cluster). In between all values of K=2,3,... n-1 are covered in a kind of gradual transition. The only difference between K=r and K=r+1 is that one of the r clusters splits in order to obtain r+1 clusters or two of the (r+1) clusters combined to yield r clusters. Under this method either we start with K=n and move hierarchically step-by-step, where at each step two clusters are merged, depending on similarity until only one is left i.e. K=1 (agglomerative) or the reverse, i.e. start with K=1 and move step-by-step, where at each step one cluster is divided into two- (depending on dissimilarity) until K=n (divisive). Most of the previous works (Covino & Fracassini 1993) were done on the basis of hierarchical clustering. But we feel that for the problem under consideration the partitioning method is more applicable because (a) A partitioning method tries to select best clustering with K groups which is not the goal of hierarchical method. (b) A hierarchical method can never repair what was done in previous steps. (c) Partitioning methods are designed to group items rather than variables into a collection of K clusters. (d) Since a matrix of distances (similarities) does not have to be determined and the basic data do not have to be stored during the computer run partitioning methods can be applied to much larger data sets. For K- means algorithm (Hartigan 1975) the optimum value of K can be obtained in different ways.

By using this algorithm we first determined the structures of sub populations (clusters) for varying numbers of clusters taking K=2, 3, 4 etc. For each such cluster formation we computed the values of a distance measure $d_{K} = (1/p) {\rm min}_{x} E[( x_{K} -
c_{K} )'(x_{K} - c_{K})]$ which is defined as the distance of the xK vector (values of the parameters) from the centre cK (which is estimated as the mean value) p is the order of the xK vector. Then the algorithm for determining the optimum number of clusters is as follows (Sugar & James 2003). Let us denote by d'K the estimate of dK at the Kth point. Then d'K is the minimum achievable distortion associated with fitting K centres to the data. A natural way of choosing the number of clusters is to plot d'K versus Kand look for the resulting distortion curve (Fig. 5). This curve is always monotonic decreasing. Initially one would expect much smaller drops for K greater than the true number of clusters because past this point adding more centres simply partitions within groups rather than between groups. According to Sugar & James (2003), for a large number of items the distortion curve when transformed to an appropriate negative power (p/2), will exhibit a sharp "jump'' (if we plot K versus transformed d'K). Then we calculated the jumps in the transformed distortion as  $J_{K} = ({d'_K}^{-{\bf {p/2}}} - d_{K-1}'^{\bf -{p/2}}$).

The optimum number of clusters is the value of K associated with the largest jump. The largest jump can be determined by plotting JK against K and the highest peak will correspond to the largest jump (Figs. 67 and 17).

3 Data set

Our analysis is based on three samples of GCs in Milky Way, M 31, and LMC which have appropriate photometric and structural parameter values.
Sample 1. This consists of 135 GCs taken from the catalogue of Harris (1996) which have nonzero values of all the parameters used for CA. The parameters used for PCA are distance from the galactic centre ( $R_{\rm gc}$), absolute visual magnitude (MV), colour (B-V), concentration parameter (c), core radius ($R_{\rm c}$), central surface brightness ($\mu_{V}$), radial velocity ($V_{\rm r}$), metallicity ([Fe/H]), and horizontal branch ratio (HBR). We are mainly concentrating on parameters which are intrinsic and independent in nature.
Sample 2. This consists of 35 GCs of the catalogue of Barmby et al. (2002). The parameters used are $R_{\rm gc}$, MV, b(B-V), [Fe/H], c, $R_{\rm c}$, Vr, and $\mu_{V}$. We have converted the visual magnitudes from Barmby et al. (2000) to absolute visual magnitudes using a distance of 770 kpc (Sparke & Gallagher 2000) for M 31.
Sample 3. This consists of 23 GCs used in the paper by Mackey & Gilmore (2003). The parameters used are  $R_{\rm gc}$, MV, (B-V), c, $R_{\rm c}$, Vr, [Fe/H], and $\mu_{V}$. We have converted the visual magnitudes from Mackey & Gilmore (2003) to absolute visual magnitudes using a distance of 49 kpc (Sparke & Gallagher 2000) for LMC.

In Tables 35, and 7 we have mentioned the elements of the correlation matrices corresponding to Samples 1, 2, and 3 respectively.

After finding the different groups by cluster analysis it is necessary to study the properties of the groups identified in terms of their ages along with other parameters. Again we have used B-V as one of the parameters in PCA. As ages and B-Vvalues are subjected to measurement errors it is worthwhile to identify the nature of errors included in the data.

Table 1: Errors and extinctions in ages and colours of the GCs in Milky Way (MW) and LMC.

Table 2: Error analysis for the GCs of MW and LMC.

Table 3: Correlation matrix for the parameters of Sample 1.

For Milky Way we considered 43 GCs and for LMC we considered 23 GCs. Their ages and corresponding errors can be obtained from Chaboyer et al. (1992) and Mackey & Gilmore (2003) respectively and are listed in Table 1. The means and standard deviations of these errors are listed in Table 2. The means and standard deviations (SD) of extinctions in colour E(B-V) for Milky Way (MW) and LMC GCs are also listed in Table 2. It is interesting to note that most of the error and extinction distributions are Gaussian as indicated in Table 2 and Figs. 1-4 respectively. These show that the ages and colours of Milky Way and LMC GCs are consistent with respect to the measurement errors and extinctions. To study errors corresponding to ages for MW GCs we excluded some of the outliers and the fit is good which is evident from the Anderson Darling (AD) statistic. In Appendix B we have discussed Quantile Quantile Plot and Anderson Darling Statistic which have been used for fitting of Normal Distribution.


  \begin{figure}
\par\includegraphics[width=7cm,clip]{6945fig1.eps}
\end{figure} Figure 1: QQ Normal plot for positive errors in ages of LMC GCs.
Open with DEXTER

From the above findings it may be inferred that since the error distributions are Gaussian (symmetric), the errors are supposed to be averaged out in final analysis and results are not likely to be affected by them.

4 Results and discussions

4.1 Principal component analysis


  \begin{figure}
\par\includegraphics[width=7cm,clip]{6945fig2.eps}
\end{figure} Figure 2: QQ Normal plot for extinctions of LMC GCs.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=7cm,clip]{6945fig3.eps}
\end{figure} Figure 3: QQ Normal plot for errors in ages of MW GCs.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=7cm,clip]{6945fig4.eps}
\end{figure} Figure 4: QQ Normal plot for extinctions of MW GCs.
Open with DEXTER

We begin with a minimal number of parameters (selected by the trial and error method) and search for principal components giving the maximum percentage of total variation. In this respect we can say that we included many parameters like central surface brightness, colour and radial velocities, but they do not give maximum variation in PCA. Some of the parameter sets are given for comparison in Table 4 (e.g. S1( $R_{\rm gc}, M_{V}, R_{\rm c}$), S2( $R_{\rm gc}, M_{V}, c$), S3( $R_{\rm gc}$, $B-V, V_{\rm r}$), S4( $R_{\rm gc}, M_{V}, \mu_{V}$), S5( $R_{\rm gc}, M_{V}, c, R_{\rm c}$), S6( $M_V, c, [{\rm Fe/H}]$), S7( $c, R_{\rm c}, [{\rm Fe/H}]$), S8( $HBR, [{\rm Fe/H}],
c$). Only the parameter set S7( $c, R_{\rm c}, [{\rm Fe/H}]$) has maximum variation (85.7 percent) as seen from last column of Table 4 with two principal components having eigen values greater than or nearly equal to 1. PCA analysis for Sample 2 and Sample 3 are listed in Tables 6 and 8 respectively. This also shows that the parameter sets S7 ( $[{\rm Fe/H}], c, R_{\rm c}$) and S6 ( $[{\rm Fe/H}], c, R_{\rm c}$) in Tables 6 and 8 respectively give maximum variation with two principal components with eigen values greater than or nearly equal to 1 (94.4 percent and 96.1 percent in last columns). The group means are given in Tables 9 to 13.

Table 4: Results of PCA for Sample 1.

Table 5: Correlation matrix for the parameters of Sample 2.

Table 6: Results of PCA for Sample 2.

Table 7: Correlation matrix for the parameters of Sample 3.

It is found that the statistical dimensionality for the GCs in all the galaxies in the Local Group is two which involves the parameters core radius ($R_{\rm c}$), concentration parameter (c) and metallicity. This is minimum but gives a total variation as high as 96 percent. This is maximum compared to all previous analyses (Kormendy & Djorgovski 1989; Djorgovski & de Carvalho 1990; Santiago & Djorgovski 1993; De Carvalho & Djorgovski 1992). This is the goal of PCA.

4.2 Cluster analysis

We found in the previous analysis that the maximum variation among the GCs in the Milky Way, M 31, and, LMC is due to the parameter set ( $[{\rm Fe/H}], c, R_{\rm c}$). The metallicity values for GCs in M 31 are available for 35 clusters. So the sample size is slightly reduced for M 31 in our case compared to Covino & Fracassini (1993). The results for CA for Sample 1 are shown in Table 9 and Figs. 5 and 6 respectively. The jumps are at 4 and 6 respectively. To test the robustness of the classification we took several bootstrap samples generated from the original sample. In most of the situations it was found both from the jumps and from the ( $[{\rm Fe/H}], R_{\rm c}$) plots that the proper number of groups (clusters) should be 3. We took this decision because of the following reasons.

1.
From the plots of cluster vs jumps in most of the situations we observed that there was a maximum peak corresponding to 3 clusters (Fig. 7).

2.
From the ( $[{\rm Fe/H}], R_{\rm c}$) plots we also found if we choose the optimum number as 3 then the physical classification is quite clear whereas if we choose 4 or more clusters as optimum then the classification is rather messy. These features are presented in Figs. 13 and 8 respectively for the original Sample 1.

3.
For our conclusion that the optimum number of clusters is 3 we have deviated slightly from the original algorithm (Sugar & James 2003) because it is a well known fact that cluster analysis is an exploratory data analytic technique and it depends heavily on proper physical explanation.
In support of the above discussion we have presented some of our findings related to bootstrap samples in Table 10 and in Figs. 8-12 respectively. In the tables we have mentioned the mean values for ( $[{\rm Fe/H}], c, R_{\rm c}$) and the cluster points at which we have found peaks. Hence we have taken the optimum number of subgroups statistically and physically as three. Cluster 1, with high metallicities, low core radii, Cluster 2, with the lowest metallicity, low core radii, and Cluster 3 with still low metallicity, and high core radii. These are also reflected in the cluster means listed for these groups in the above tables. So we have carried out CA with K=3 and the group means for all the parameters are shown in Table 11. The ( $[{\rm Fe/H}], R_{\rm c}$),( $[{\rm Fe/H}],
c$), and ( $[{\rm Fe/H}], c, R_{\rm c}$) plots are shown in Figs. 13-15 respectively. The clustering is very prominent in Fig. 15 implying the fact that three dimensional parametric classification is more authentic than the two dimensional one as the classification is not clearly seen e.g. in ( $[{\rm Fe/H}],
c$) plane.
  \begin{figure}
\par\includegraphics[width=7.5cm,clip]{6945fig5.eps}
\end{figure} Figure 5: xy diagram for the number of clusters (K) and corresponding distortions (d'K) for Sample 1.
Open with DEXTER

Table 8: Results of PCA for Sample 3.

Table 9: The group means for the parameters of the GCs of Sample 1.


  \begin{figure}
\par\includegraphics[width=7.5cm,clip]{6945fig6.eps}
\end{figure} Figure 6: xy diagram for the number of clusters (K) and jumps for Sample 1.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=6.5cm,clip]{6945fig7.eps}
\end{figure} Figure 7: xy diagram for the number of clusters (K) and jumps for Bootstrap Samples, dash for second, solid line for fourth and dotted line for third Bootstrap Sample respectively.
Open with DEXTER

Now the kinematics of MW GCs are studied to examine the physical consistency of the classification following Zinn (1985). The basic assumption is that the rotational velocity of each sub group in the classification is constant. For this it is necessary to know the distances of GCs from the galactic centre and the results depend slightly on the values adopted for $R_{\odot}$ and the velocity of the LSR about the galactic centre ($v_{\odot}$) which are taken as 8.2 kpc and 220 km s-1 for the present situation. The analysis follows the method of Frenk & White (1980). The values of rotational velocities ( $v_{\rm rot}$) and velocity dispersion ( $\sigma_{\rm los}$) are listed in Table 11 for each group. Also the group means for heights from the galactic plane (|z|), distances from the galactic centre ( $R_{\rm gc}$), metallicities, concentration parameters (c), core radii, and central surface brightness ($\mu_{V}$) are listed for these sub groups. It is found that for $[{\rm Fe/H}]> -0.8$ the cluster group has substantial rotation for $R_{\rm gc} <4.4$ kpc (Cluster 1) and for $[{\rm Fe/H}] <-0.8$ and GCs have less rotation for $R_{\rm gc} > 4.4$ kpc. This supports the analysis by Zinn (1985). The innermost group (Cluster 1) has substantial rotational velocity ($\sim$124 km s-1), highly metal rich ($\sim$-0.64) having smaller core radii ($\sim$2.71 pc).

  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig8.eps}
\end{figure} Figure 8: [Fe/H] vs. $R_{\rm c}$ (pc) diagram for Sample 1. The suffixes indicate the cluster number.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig9.eps}
\end{figure} Figure 9: [Fe/H] vs. $R_{\rm c}$ (pc) diagram for first Bootstrap Sample 1. The suffixes indicate the cluster number.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig10.eps}
\end{figure} Figure 10: [Fe/H] vs. $R_{\rm c}$ (pc) diagram for second Bootstrap Sample 1. The suffixes indicate the cluster number.
Open with DEXTER

They are concentrated near the galactic disc ( $\vert z\vert \sim 1.73$ kpc) and close to the Galactic centre ( $R_{\rm gc}\sim 4.19$ kpc). The velocity dispersion is comparatively smaller. But outside the inner region there are two groups instead of one as found by Zinn (1985). One group (Cluster 2) has very low metallicity, far from the Galactic disc, comparatively low rotational velocity ($\sim$5 km s-1) with moderate core radii. The other group (Cluster 3) has low metallicity, high core radii, small rotation ($\sim$20 km s-1), highest velocity dispersion ( $\sigma_{\rm los}\sim 131$ km s-1), farthest from the Galactic centre ( $R_{\rm gc}\sim 31.69$ kpc) and concentrated farthest from the Galactic disc ( $\vert z\vert \sim
18.81$ kpc). So they may be associated with GCs of the outer halo. The ages of the MW GCs used are from Chaboyer (1992) and the ages vs metallicities for Clusters 1, 2, and 3 are shown in Fig. 16. The mean ages for these groups are also listed in Table 11. The GCs of the outer halo are younger compared to those of inner halo and their age- metallicity scatter plot shows no correlation. The GCs in the inner halo (Cluster 2) are the oldest population ($\sim$ 1014.83 yr) and the age- metallicity diagram shows a correlation with considerable scatter. The ages of all the GCs in each cluster are not known. The GCs whose ages are available from Chaboyer (1992) are used. So these diagrams suffer from complicity and firm conclusion. Also the rotational velocities calculated for these groups (Clusters 2 and 3) show that GCs in the inner and outer halo have substantially smaller rotation and higher velocity dispersion. All these facts are consistent with the work carried out by Zinn (1993) and subsequently by many authors (van den Bergh 1993; Lynden-Bell & Lynden-Bell 1995; Silk & Wyse 1993). The present analysis differs from the former work in the sense that the classification is done in an objective and more scientific way on the basis of multiple parameters at a time instead of taking color or metallicity or horizontal branch ratio parameter one at a time and making revisions every time e.g. in the works of Zinn (1985) and Zinn (1993) where there are two and three sub populations respectively. So the present analysis selects the optimum, unique group and helps to enunciate unique theory for galaxy formation. Also the choice of the optimum set of parameters for CA was done in an objective way through PCA. So from the present classification it can be concluded that initially there was a halo in which metal poor GCs formed (inner halo). Then the medium got enriched from the evolving stars and a disc of GCs formed which are comparatively metal rich. The GCs of the outer halo might have been accreted from the neighbouring galaxies through tidal accretion. This is concluded from the kinematic properties, absence of age-metallicity correlation and metallicity gradient etc in Cluster 3. This is also suggested by Mackey & Gilmore (2004) on the basis of HB morphology of the Galactic halo GCs and those in the neighbouring dwarf galaxies.
  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig11.eps}
\end{figure} Figure 11: [Fe/H] vs. $R_{\rm c}$ (pc) diagram for third Bootstrap Sample 1. The suffixes indicate the cluster number.
Open with DEXTER

Table 10: The group means for the parameters of the GCs of Bootstrap samples.


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig12.eps}
\end{figure} Figure 12: [Fe/H] vs. $R_{\rm c}$ (pc) diagram for fourth Bootstrap Sample 1. The suffixes indicate the cluster number.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig13.eps}
\end{figure} Figure 13: [Fe/H] vs. $R_{\rm c}$ (pc) diagram for final cluster analysis result for Sample 1. The suffixes indicate the cluster number.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig14.eps}
\end{figure} Figure 14: [Fe/H] vs. c diagram for final cluster analysis result for Sample 1. The suffixes indicate the cluster number.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig15.eps}
\end{figure} Figure 15: [Fe/H], $R_{\rm c}$ (pc), c diagram for final cluster analysis result for Sample 1. The suffixes indicate the cluster number
Open with DEXTER

Table 11: The group means for the parameters of the GCs of Sample 1 in the final analysis.


  \begin{figure}
\par\includegraphics[width=7cm,clip]{6945fig16.eps}
\end{figure} Figure 16: Scatter diagram of age in logarithmic scale (yr) vs. metallicity with available values of ages in clusters 1, 2 and 3 respectively as a result of cluster analysis for Sample 1.
Open with DEXTER

Table 12: The group means for the parameters of the GCs of M 31.


  \begin{figure}
\par\includegraphics[width=7.5cm,clip]{6945fig17.eps}
\end{figure} Figure 17: xy diagram for the number of clusters (K) and jumps for Sample 2.
Open with DEXTER

The CA for M 31 GCs are shown in Table 12 and Fig. 17 respectively. The optimum number in this case is also three like those in MW. Cluster 1 has high metallicity, very low core radii and close to the galactic centre. Cluster 2 has minimum metallicity, comparatively lower core radii, and is very far from the centre. Cluster 3 has comparatively low metallicity, maximum core radii. For calculating the rotational velocities of these groups the function

\begin{displaymath}v = v_{\rm sys}+v_{\rm rot}\sin~(\phi)\end{displaymath}

(Perett et al. 2002) is fitted to the radial velocities of the GCs of M 31 where $\phi$ is the position angle taken from Barmby et al. (2000), $v_{\rm sys}$ is the mean velocity of the M 31 cluster system (de Vaucouleurs et al. 1991) taken as $ -300 \pm4 $ km s-1. They are shown in Table 12. It is found that Cluster 1 and Cluster 3 have comparable rotational velocities while Cluster 2 has somewhat lower rotational velocity. Also Cluster 1 is the most metal rich component of the system and closest to the galactic centre. So it can be associated with the disc part like MW Cluster 1. On the other hand Cluster 2 is the most metal poor component and farthest from the galactic centre. So it can be associated with the outer halo like MW Cluster 3 and Cluster 3 of M 31 can be associated with the inner halo like MW Cluster 2. Also from the ( $[{\rm Fe/H}], R_{\rm c}$) diagram (Fig. 18) it is clear that the groups are very similar to those of MW GCs (Fig. 13). So it can be concluded that formation history of MW and M 31 are more or less similar.


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig18.eps}
\end{figure} Figure 18: [Fe/H], $R_{\rm c}$ (pc) diagram in the cluster analysis for Sample 2. The suffixes indicate the cluster number.
Open with DEXTER

Table 13: The group means for the parameters of the GCs of LMC.

The CA for LMC GCs shows that there are aparantly three groups (Table 13).The third group containing only 3 GCs has more or less similar characteristics as first group (Cluster 1). The GCs in these two groups are almost coeval ($\sim$108 yr). Also the mean metallicities ($\sim$-0.37 and $\sim$-0.34) of these groups and mean distances ($\sim$3 kpc) from the galactic centre are similar. Only the core radii differ. Almost similar features have been found for several other bootstrap samples. As a result we may consider them as a single group. The number of GCs in the second group (Cluster 2) is also very small (4) but this may be due to lack of data points. Since metallicity values are available only for 23 GCs of LMC (Mackey & Gilmore 2003) the sample size has been reduced from that one (39) used by Covino & Fracassini (1993). Now it is a well known fact that metallicity has good correlation with colour (B-V). So if the CA is carried out with ( $c,R_{\rm c}$ and B-V) with the Covino & Fracassini (1993) sample as (B-V) is available for 39 GCs in that sample, then the classification (Fig. 20) shows a good concentration in Cluster 2. On the basis of the above discussion it may be inferred that the actual number of clusters in LMC is likely to be two instead of three as in the cases of MW and M 31. But this feature is not directly reflected through CA due to small sample size. The group means are shown in Table 13. The metallicity vs core radii diagram is also shown in Fig. 19. The ages of the LMC GCs used are from Mackey & Gilmore (2003). The mean ages of the groups are also listed in the table. It is seen that outer GCs (Cluster 1) are younger and more metal rich than those of inner GCs (Cluster 2) which exhibits reverse nature as compared with that of MW and M 31 GCs. Also the outer GCs of LMC are much younger (of the order of Myr) than those of MW GCs. These indicate that the formation history of LMC may be different from that of MW or M 31. This will be clear if a spectroscopic study of the GCs can be carried out and the sample size is increased.


  \begin{figure}
\par\includegraphics[width=8cm,clip]{6945fig19.eps}
\end{figure} Figure 19: [Fe/H], $R_{\rm c}$(pc) diagram in the cluster analysis for Sample 3. The suffixes indicate the cluster number.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=7.5cm,clip]{6945fig20.eps}
\end{figure} Figure 20: B-V, $R_{\rm c}$ (pc) diagram in the cluster analysis for sample of LMC GCs of Covino & Fracassini (1993). The suffixes indicate the cluster number.
Open with DEXTER

5 Conclusions

We have applied multivariate analysis for the reclassification of the globular clusters of our Galaxy, M 31, and LMC. First a Principal Component Analysis is performed to search for the optimum set of parameters giving the maximum over all variation among the GCs in these galaxies. It is found that metallicity, concentration parameter and core radius are the parameters responsible for maximum variation in the GCs of Milky Way and M 31. The statistical dimensionality is two in every situation which is less than those of elliptical galaxies (Santiago & Djorgovski 1992). This procedure is completely different from the method of studying two point correlation or studying the properties of GCs with respect to a single parameter like colour or HB morphology as done by previous authors (Zinn 1985, 1993; Mackey & Gilmore 2004). As the present set up is multivariate, it is quite likely to carry out the analysis by multivariate methods. Also there are various parameters responsible for the variation among GCs. It is better to select the optimum set giving maximum variation. This reduces the difficulty in handling large number of parameters simultaneously and drawing any physical conclusions while at the same time restores the significant parameters responsible for maximum variation. This is the goal of PCA. In the present situation the optimum set does not include HB morphology parameter which is HBR but includes chemical composition and morphological parameters instead (Table 4, S7, and S8). So it is more scientific to perform classification on the basis of these significant parameters selected objectively instead of taking any parameter in a subjective way. Then Cluster Analysis is carried out with respect to this optimum set. Here two and three clusters are found in case of MW, M 31, and LMC GCs. The robustness of the classification is tested by taking a few bootstrap samples generated from the original one. The classification differs from that by Zinn (1985) and is in agreement with Zinn (1993) and Mackey & Gilmore (2004). For MW and M 31 three clusters, disc, inner halo, and outer halo GCs are found. The kinematic properties, age metallicity diagram, metallicity gradients studied for these groups also support the true nature of the classification. The analogous behaviour of the GCs with MW GCs in different parametric planes shows that they have a similar nature as those of MW GCs. Perrett et al. (2002) have calculated the kinematic properties of some 200 GCs in M 31 and have found two groups, disc and halo GCs with metallicities peaked at -0.5 and -1.41 respectively. In the present analysis, three groups have been found with mean metallicities -0.43, -1.41, and -1.51 respectively. The existence of the third group with minimum metallicity is analogous in properties with inner halo GCs of MW. For LMC GCs two groups have been found instead of three though the conclusion is not very firm due to the small size of the sample. The evolution history is likely to differ from those of MW and M 31. As there is controversy in the formation of GCs in disc (Schommer et al. 1992) or in pressure supported halo (van den Bergh 2004) so more clear picture will come out from the spectroscopic study of GCs with a larger sample size.

Acknowledgements
One of the the authors (Tanuka Chattopadhyay) wishes to thank University Grants Commission (UGC), India for awarding her a major research project for the work.The authors are grateful to Prof. Rajaram Nityananda for his useful suggestions. The authors are also thankful to Prof. Duncan Forbes for his careful reading and important suggestions. The authors are grateful to the referee for his careful reading and valuable suggestions to improve the quality of the work.

References

 

  
Online Material

Appendix A: Appendix on principal component analysis

The purpose of the principal component analysis is to reduce the complexity of multivariate data by transforming the data into the principal components space and choosing the first n principal components that explain most of the variation in the original variables.

Let the components xi of a random vector $x=(x_1, x_2, \dots,
x_p)'$ are measured in the same or comparable units. Also assume that the variances of the variables do not vary too much. Let $\Sigma$ denote the covariance matrix of the random vector  $x. \
\Sigma$ is assumed to be at least positive semi definite rank $r(\leq p)$. All the eigen values of $\Sigma$ are real and non negative. Let $\lambda_1 \geq \lambda_2 \dots \geq \lambda_p \geq
0$ be the ordered eigen value of $\Sigma$. Then there exists an orthogonal matrix $G^{p\times p}= (g^{p\times 1}_1 \ g^{p\times
1}_2 \dots g^{p\times 1}_p), \ GG' = I_p$ such that $G' \Sigma \
G=D_{\lambda}$, $ D_{\lambda}= {\rm Diag} (\lambda_1, \lambda_2, \dots,
\lambda_p)$

Consider the transformation

y = G'x (A.1)

Then

\begin{displaymath}{\rm Cov}~(y) = D_{\lambda}.
\end{displaymath} (A.2)

The vectors $g_1, g_2, \dots, g_p$ of the matrix G are called eigen vectors corresponding to the eigen values $\lambda_1, \dots,
\lambda_p$. It can be shown that
                        $\displaystyle \lambda_1$ = $\displaystyle g'_1 \Sigma g_1 = {\rm DM}{\max}_{a:a'a=1} a'\Sigma a$  
$\displaystyle \lambda_2$ = $\displaystyle g'_2 \Sigma g_2 = {\rm DM}{\max}_{b:b'b=1, b'g_1=0}
b'\Sigma b$  
  $\textstyle \vdots$    

and so on.

From the above result it follows that for any set of orthogonal vectors hi, h'i hi = 1 and for any $k\leq p$

\begin{displaymath}\lambda_1+\lambda_2+\dots+\lambda_k = {\rm DM}{\Sigma}^k_{i=1}g'_i
\Sigma g_i \geq {\rm DM}{\Sigma}^k_{i=1} h'_i \Sigma h_i \end{displaymath}

where by DM we denote a diagonal matrix.

For this reason, the component

y1 = g'1 x (A.3)

is called the first principal component.

y2 = g'2 x (A.4)

is called the second principal component and so on.

Here Var( $y_i)=\lambda_i$, Because $\lambda_1\geq \lambda_2 \dots
\lambda_p$, y1 has the largest variance $\lambda_1, y_2$ has the second largest variance $\lambda_2$ and so on. Since

\begin{displaymath}\lambda_1+\lambda_2+\dots+\lambda_p = tr \Sigma =
{\rm DM}{\Sigma}^p_{i=1} \sigma_{ii}
\end{displaymath} (A.5)

the sum of the variances of p principal components is the same as the sum of the variances of the original variables $x_i \dots x_p$. Thus the components with smaller variances could be ignored without significantly affecting the total variance and thereby reducing the number of variables from p to say $k\leq p$.

Many criteria have been suggested by different authors for deciding how many principal components to retain. Some of these criteria are as follows:

1.
Include just enough components to explain some arbitrary amount (say 90%) of the variance.

2.
Exclude those principal components with eigen values below the average. For principal components calculated from the correlation matrix, this criteria excludes components with eigen values less than 1. This criterion has been used in the present paper.

3.
Use of the screeplot technique.

Appendix B: Appendix on q-q plot and anderson darling test

Quantile-Quantile Plot

The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30$\%$) quantile is the point at which 30$\%$ of the data fall below and 70$\%$ fall above that value.

A 45-degree reference line is also plotted. If the two sets come from a population with the same distribution, the points should fall approximately along this reference line. The greater the departure from this reference line, the greater the evidence for the conclusion that the two data sets have come from populations with different distributions.

In order to fit a theoretical distribution to a data set we take the data set as the first sample and observations from the theoretical distribution under consideration (in our case Normal) as the second sample.

Anderson-Darling test

The Anderson-Darling test (Stephens 1974) is used to test if a sample of data came from a population with a specific distribution. It is a modification of the Kolmogorov-Smirnov (K-S) test and gives more weight to the tails than does the K-S test. The K-S test is distribution free in the sense that the critical values do not depend on the specific distribution being tested. The Anderson-Darling test makes use of the specific distribution in calculating critical values. This has the advantage of allowing a more sensitive test and the disadvantage that critical values must be calculated for each distribution.

The Anderson-Darling test is an alternative to the chi-square and Kolmogorov-Smirnov goodness-of-fit tests.

Definition: The Anderson-Darling test is defined as:
H0: The data follow a specified distribution.
Ha: The data do not follow the specified distribution.
Test Statistic: The Anderson-Darling test statistic is defined as

A2 = -N-S

where

\begin{displaymath}S ={\Sigma}^N_{i=1}((2i-1)/N)[{\rm ln}~(F(Y_{i})+{\rm ln}~(1-F(Y_{N+1-i}))]\end{displaymath}

F is the cumulative distribution function of the specified distribution. Note that the Yi are the ordered data.

Significance Level: Critical Region: The critical values for the Anderson-Darling test are dependent on the specific distribution that is being tested. Tabulated values and formulas have been published (Stephens 1974, 1976, 1977, 1979) for a few specific distributions (normal, lognormal, exponential, Weibull, logistic, extreme value type 1). The test is a one-sided test and the hypothesis that the distribution is of a specific form is rejected if the test statistic, A, is greater than the critical value.



Copyright ESO 2007