EDP Sciences
Free Access
Issue
A&A
Volume 510, February 2010
Article Number A78
Number of page(s) 7
Section Galactic structure, stellar clusters, and populations
DOI https://doi.org/10.1051/0004-6361/200912886
Published online 16 February 2010
A&A 510, A78 (2010)

Cluster radius and sampling radius in the determination of cluster membership probabilities

N. Sánchez - B. Vicente - E. J. Alfaro

Instituto de Astrofísica de Andalucía, CSIC, Apdo. 3004, 18080 Granada, Spain

Received 14 July 2009 / Accepted 24 November 2009

Abstract
We analyze the dependence of the membership probabilities obtained from kinematical variables on the radius of the field of view around open clusters (the sampling radius, $R_{\rm s}$). From simulated data, we show that optimal discrimination between cluster members and non-members is achieved when the sampling radius is very close to the cluster radius. At higher $R_{\rm s}$ values, more field stars tend to be erroneously assigned as cluster members. From real data of two open clusters (NGC 2323 and NGC 2311), we infer that the number of identified cluster members always increases with increasing $R_{\rm s}$. However, there is a threshold value $R_{\rm s,opt} $ above which the identified cluster members are severely contaminated by field stars and the effectiveness of membership determination is relatively small. This optimal sampling radius is $\simeq$14 arcmin for NGC 2323 and $\simeq$13 arcmin for NGC 2311. We discuss the reasons for this behavior and the relationship between cluster radius and optimal sampling radius. We suggest that, independently of the method used to estimate membership probabilities, several tests using different sampling radius should be performed to evaluate possible biases.

Key words: methods: data analysis - open clusters and associations: general - open clusters and associations: individual: NGC 2311 - open clusters and associations: individual: NGC 2323

1 Introduction

Large astrometric catalogues derived from surveys covering very wide areas of the sky are allowing the systematic searching of new star systems (see, for example, Myullyari et al. 2003; López-Corredoira et al. 1998; Hoogerwerf & Aguilar 1999; Caballero & Dinis 2008; Kazakevich & Orlov 2002; Zhao et al. 2009, and references therein). The searching process is based on the detection of clearly defined structures in subsets of phase space. Both spatial density peaks and proper motion peaks are indicative of star clusters; peaks detectable in only the proper motion distributions suggest the existence of moving groups, whereas more spread-out and less dense velocity-position correlated structures could be associated with stellar streams. Once these structures have been detected, the next step is to identify possible members of the star system. In the particular case of open clusters, the most often used procedure for selecting possible cluster members is the algorithm designed by Sanders (1971). This algorithm is based on a former model proposed by Vasilevskis et al. (1958) for the proper motion distribution. The model assumes that cluster members and field stars are distributed according to circular and elliptical bivariate normal distributions, respectively. The Sanders' algorithm, or some variation or refinement of it, has been and is still being widely used to estimate cluster memberships either as the only method or as part of a more complete treatment that includes, for example, spatial and/or photometric criteria. Some representative references are Wu et al. (2002), Jilinski et al. (2003), Balaguer-Núñez et al. (2004), Dias et al. (2006), Kraus & Hillenbrand (2007), and Wiramihardja et al. (2009).

With the advent of large catalogues and databases available via internet and future surveys such as the forthcoming Gaia mission of ESA, the interest in developing and applying fully automated techniques is of increasing interest among the astronomical community. However, special care must be taken to avoid obtaining biased results. In this work, we show that the results obtained when using the Sanders' algorithm depend significantly on the choice of the size of the field of view surrounding the cluster. So, once a possible open cluster is detected, it is natural to ask which area of the sky should be sampled to obtain the most reliable membership determinations. It is equally important to ask about the robustness of used methodology, i.e., how the solution changes when the sampled area is varied? Here we explore these subjects by using both simulated and real data. In Sect. 2, we briefly present the method used to determine memberships and describe the simulations that we performed to analyze the expected behavior. The results of applying the Sanders' algorithm on the simulated data are discussed in Sect. 3. After this, in Sect. 4 we use real astrometric data of two open clusters (NGC 2323 and NGC 2311) to evaluate the performance of the algorithm. We discuss strategies to estimate the optimal sampling radius, i.e., the maximum radius beyond which the identified cluster members are expected to be severely contaminated by field stars. The main results of the present work are summarized in Sect. 5.

2 Description of the method

2.1 Membership determination

The key point of the membership discrimination method is the assumption that the distribution of observed proper motions ($\mu_x$, $\mu_y$) can be described by means of two bivariate normal distributions, one circular for the cluster and one elliptical for the field (Vasilevskis et al. 1958). We define $\Phi_{\rm c}$ and $\Phi_{\rm f}$ to be the cluster and field probability density functions, respectively. Then,

\begin{displaymath}\Phi_{\rm c}(\mu_x,\mu_y) = \frac{1}{2\pi\sigma_{\rm c}^2}
\e...
...{\mu_y - \mu_{y,c}}{\sigma_{\rm c}} \right)^2
\right] \right\}
\end{displaymath} (1)

and
                           $\displaystyle \Phi_{\rm f}(\mu_x,\mu_y)$ = $\displaystyle \displaystyle\frac{1}{2\pi\sigma_{x,{\rm f}}\sigma_{y,{\rm f}}
\s...
...t(
\frac{\mu_x - \mu_{x,{\rm f}}}{\sigma_{x,{\rm f}}} \right)^2 \right. \right.$ (2)
    $\displaystyle + \left. \left.
\displaystyle\left( \frac{\mu_y - \mu_{y,{\rm f}}...
...t(\frac{\mu_y-\mu_{y,{\rm f}}}{\sigma_{y,{\rm f}}}\right)
\right] \right\}\ \ ,$  

where $(\mu_{x,c},\mu_{y,c})$ is the cluster distribution centroid with standard deviation $\sigma_{\rm c}$, $(\mu_{x,{\rm f}},\mu_{y,{\rm f}})$ is the field centroid with standard deviations $\sigma_{x,{\rm f}}$ and $\sigma_{y,{\rm f}}$, and $\rho$ is the correlation coefficient of field stars. The probability density function for the whole sample is simply

\begin{displaymath}\Phi(\mu_x,\mu_y) = n_{\rm c} \Phi_{\rm c}(\mu_x,\mu_y) +
n_{\rm f} \Phi_{\rm f}(\mu_x,\mu_y) ,
\end{displaymath} (3)

$n_{\rm c}$ and $n_{\rm f}$ being the normalized numbers of cluster and field stars, respectively. To obtain the unknown parameters (centroids, standard deviations, numbers of members and non-members), an iterative procedure is used by applying the maximum likelihood principle (Sanders 1971). Here we use the algorithm proposed by Cabrera-Caño & Alfaro (1985), which first detects and removes outliers that can produce unrealistic solutions, and then uses a more robust and efficient iterative procedure for the model parameter estimation. Once these parameters are known, the membership probability of the ith stars can be calculated directly as

\begin{displaymath}p(i) = \frac{n_{\rm c} \Phi_{\rm c}(i)}{\Phi(i)} \cdot
\end{displaymath} (4)

2.2 Simulations

We consider a cluster with a given radius $R_{\rm c}$. We define ``cluster radius'' as the radius of the smallest circle that can completely enclose its stars. In true situations, $R_{\rm c}$ is an unknown quantity that has to be estimated a posteriori, but here its value is known and remains constant throughout each simulation. The total number of stars belonging to the cluster is denoted by $N_{\rm c,max}$ and the number of field stars lying exactly within the same sky area of the cluster is $N_{\rm f,cri}$. The independent variable is the radius of the field encircling the cluster. This radius might represent the radius of the field in which the observations are made or the field around the cluster extracted from an astrometric catalogue. We call this variable the sampling radius $R_{\rm s}$, which can be larger or smaller than the cluster radius $R_{\rm c}$.

The numbers of cluster stars and field stars to be simulated are represented by $N_{\rm c,sim}$ and $N_{\rm f,sim}$, respectively. Obviously, the number of clusters stars and field stars within the field of view depend on the size of this field, that is, both $N_{\rm c,sim}$and $N_{\rm f,sim}$ are functions of $R_{\rm s}$. If the field stars are distributed nearly uniformly in space, then $N_{\rm f,sim}$ should increase as the sampling radius increases as

\begin{displaymath}
N_{\rm f,sim} (R_{\rm s}) = N_{\rm f,cri} \left(R_{\rm s}/R_{\rm c} \right)^2 .
\end{displaymath} (5)

The rate at which $N_{\rm c,sim}$ increases with $R_{\rm s}$depends instead on the radial profile of the surface density of cluster stars ( $\Sigma_{\rm c,sim}$). For simplicity, we assume that the surface density at r is given by (Caballero 2008)

\begin{displaymath}
\Sigma_{\rm c,sim} (r) = \frac{\delta N_{\rm c,max}}{2\pi R_{\rm c}^\delta}
r^{\delta - 2} ,
\end{displaymath} (6)

where the index $\delta \leq 2$. For the extreme case $\delta =2$, we have $\Sigma_{\rm c,sim} = N_{\rm c,max} /(\pi R_{\rm c}^2) = {\rm const}.$. Integrating Eq. (6), we obtain the number of cluster stars within a given sampling radius (for $R_{\rm s} \leq R_{\rm c}$),

\begin{displaymath}
N_{\rm c,sim} (R_{\rm s}) = N_{\rm c,max} \left( R_{\rm s}/R_{\rm c} \right) ^\delta.
\end{displaymath} (7)

Negative $\delta$ values make no sense, so this approach is limited to the range $0 < \delta \leq 2$. The role of the parameter $\delta$ is to control how rapidly $N_{\rm c,sim}$increases as $R_{\rm s}$ increases. Thus, we do not need to know the exact functional form as long as we are able to simulate either completely flat ( $\delta =2$) or extremely peaked ( $\delta \simeq 0$) density profiles.

To perform the simulations, we distribute $N_{\rm f,sim}$field stars and $N_{\rm c,sim}$ cluster stars according to bivariate Gaussian distributions in the proper motion space $(\mu_x,\mu_y)$. The routine ``gasdev'' from the Numerical Recipes package (Press et al. 1992) is used to generate normally distributed random numbers. The fields are centered on (0,0) with standard deviations of $\sigma_{x,{\rm f}} =
\sigma_{y,{\rm f}} = \sigma_{\rm f}$. The tests performed using elliptical (rather than circular) distributions for the field stars yielded essentially the same results and trends. The clusters are centered on $(\mu_{x,{\rm c}},\mu_{y,{\rm c}})$and have standard deviations $\sigma_{x,{\rm c}} = \sigma_{y,{\rm c}} =
\sigma_{\rm c}$. Thus, for a given sampling radius $R_{\rm s}$ and according to Eq. (5), we randomly generate $N_{\rm f,sim}$ field stars that follow a bivariate normal distribution in the proper motion space. For the cluster, we generate $N_{\rm c,sim}$ stars according to Eq. (7) when $R_{\rm s} \leq R_{\rm c}$, and we generate $N_{\rm c,sim} = N_{\rm c,max}={\rm const.}$stars when $R_{\rm s} \geq R_{\rm c}$. The three free parameters, excluding those describing the Gaussians, are the total number of stars in the cluster ( $N_{\rm c,max}$), the number of field stars within the cluster area ( $N_{\rm f,cri}$), and the cluster star density profile ($\delta$). For each set of parameters, we performed 100 simulations and calculated both the average values of the studied quantities and their corresponding standard deviations.

3 Results from simulations

For each simulation, we calculated cluster membership probabilities using the method described in Sect. 2.1. We performed several simulations by varying the input parameters (the number of stars in both the cluster and the field, the centroid distance in the proper motion space, and standard deviations) within reasonable ranges. Apart from minor differences, such as the error bars being larger when cluster and field distributions are more similar, all the results and trends remained essentially identical to those described in this section. We begin by showing how the algorithm works. In Fig. 1, we

\begin{figure}
\par\includegraphics[width=17cm,clip]{12886fg1.eps}
\end{figure} Figure 1:

Proper motion for the stars of a random simulation with $N_{\rm c,max}=N_{\rm f,cri}=200$, $\delta =2$, and $R_{\rm s}/R_{\rm c}=1.1$ (see text for details of the meaning of each of these quantities). Left panel shows the distribution for all the 442 simulated stars. Red circles are the field stars centered on (0,0) with $\sigma _{\rm f}=5$and blue circles are the 200 cluster stars centered on (1,0)with $\sigma _{\rm c}=1$. Right panel is a magnification of the central region in which we have marked with circles the stars whose resulting cluster membership probabilities are higher than 0.5 according to the algorithm used.

Open with DEXTER
can see an example of a simulation of a cluster of 200 stars, which has been adequately sampled with $R_{\rm s} = 1.1 R_{\rm c}$The right panel clearly shows the occasional but inevitable ``failures'' of the method. First, cluster stars in the tails of their own distribution may not be recognized as members. Second, field stars located by chance below the cluster distribution may be selected as probable members.

What would happen if we select a larger field? To address this point, we calculated membership probabilities as a function of the sampling radius. Here we consider as cluster members stars with membership probabilities $\geq$0.5 in a Bayesian sense. We performed several tests of different selection criteria. As expected, the number of assigned members depends on the selection criterion used, although the main results and trends presented here remain unchanged. Figure 2 shows

\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg2.eps}}
\end{figure} Figure 2:

Calculated number of field and cluster stars as a function of the sampling radius in units of the cluster radius, $R_{\rm s}/R_{\rm c}$, for simulations with the same set of parameters as Fig. 1. a) Simulation with peaked density profile ( $\delta =0.5$), assigned members are indicated by squares connected by lines. b) Simulation with flat density profile ($\delta =2$), members are indicated by circles connected by lines. Assigned field stars are indicated by vertical bars connected by lines, the length of the bars indicating one standard deviation. The real numbers of simulated stars are shown by dashed lines (cluster) and dotted lines (field).

Open with DEXTER
the number of stars classified as members (which we denote by $N_{\rm c}$) or non-members ($N_{\rm f}$) by the algorithm as a function of the sampling radius. In these particular simulations, the number of assigned members $N_{\rm c}$ is always higher than the true number of cluster stars. Most of the cluster stars are well identified but, as mentioned before, field stars falling below the cluster distribution are also considered as members. For the same reason, the number of field stars is always lower than expected. For $R_{\rm s} < R_{\rm c}$(subsampled cluster), $N_{\rm c}$ increases with $R_{\rm s}$ because obviously the number of cluster stars in the sample increases as $R_{\rm s}$ increases. The rate at which this occurs depends on the cluster density profile, which for simulations with $\delta =2$ in Fig. 2 is exactly the same as for the field (homogeneous distribution). For $R_{\rm s} \geq R_{\rm c}$, we observe a change in the behavior of $N_{\rm c}$. In this case, we do not include new cluster stars in the sample as $R_{\rm s}$ increases, and $N_{\rm c}$ increases slightly because of the new field stars that are erroneously classified as possible members. On the other hand, the number of field stars always increases at a rate roughly proportional to $R_{\rm s}^2$. It is easy to see that, in general, the fraction of cluster stars (shown in Fig. 3) should
\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg3.eps}}
\end{figure} Figure 3:

Calculated fraction of cluster stars as a function of the sampling radius for the same simulations as in Fig. 2. The real (simulated) values are shown by dashed lines.

Open with DEXTER
be a decreasing function of $R_{\rm s}$ for any cluster with $\delta < 2$. Only in the extreme case of homogeneous clusters, the fraction of cluster stars remains constant with $R_{\rm s}$ for $R_{\rm s} < R_{\rm c}$.

Figures 2 and 3 show the number of stars classified as members, although we do not know whether this classification is indeed reliable. To quantify the correctness of the result, we define the matching fraction of the cluster $M_{\rm c}$ to be the net proportion of cluster stars that are well classified. If $N_{\rm ok}$ is the total number of cluster stars correctly classified as members minus the number of cluster stars incorrectly classified as non-members, then $M_{\rm c}=N_{\rm ok}/N_{\rm c,max}$. The value of $M_{\rm c}$ can be a negative number if the number of misclassifications is higher than the number of correct classifications and $M_{\rm c}$ is exactly 1 only when the algorithm classifies correctly all the stars of the cluster. In Fig. 4, we

\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg4.eps}}
\end{figure} Figure 4:

Matching fraction of the cluster (see text) as a function of the sampling radius for the same simulations as in Fig. 2. The error bars are of the order of the symbol sizes but are not shown for clarity.

Open with DEXTER
see that the highest $M_{\rm c}$ value occurs precisely when the sampling radius equals the cluster radius. At smaller sampling radii, the matching fraction of the cluster obviously decreases because the cluster is being subsampled. Interestingly, the matching fraction is also smaller at $R_{\rm s} > R_{\rm c}$, but the reason in this case is that more field stars are being erroneously assigned to the cluster as $R_{\rm s}$ increases. The most robust classification is achieved when the sampling radius is very close to the cluster radius, although, as expected, even in this case the matching fraction does not reach its maximum value $M_{\rm c}=1$. However, the matching fraction is relatively high ( $M_{\rm c}=0.83$) at $R_{\rm s}=R_{\rm c}$ and decreases slowly to 0.71 at $R_{\rm s}=1.5~R_{\rm c}$. Moreover, the behaviors of $N_{\rm c}$ and $N_{\rm f}$ with $R_{\rm s}$ are very similar to those expected (Figs. 2 and 3). This is because both cluster and field stars were assumed to have perfect normal distributions and, therefore, both populations can be well detected by the algorithm since it assumes the same kind of underlying distribution. When using real data, the situation becomes more complex, as discussed in the next section.

4 Results using real data

We use the CdC-SF Catalogue (Vicente et al. 2010), an astrometric catalogue with a mean precision in proper motion of 2.0 mas/yr (1.2 mas/yr for reliable measurements, typically for stars with V < 14). Given the position of a known open cluster, we extract circular fields of varying radius centered on it and then we calculate membership probabilities by using the same algorithm as in Sect. 3. Here we analyze two open clusters that are included in the area covered by this catalogue: NGC 2323 (M 50) and NGC 2311. To minimize the influence of possible outliers on our results, we restrict the sample to $\vert\mu \vert \leq 20$ mas/yr. The number of probable members $N_{\rm c}$, i.e., stars with membership probabilities higher than 0.5, is shown in Fig. 5 as

\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg5.eps}}
\end{figure} Figure 5:

Number of cluster stars $N_{\rm c}$ as a function of the sampling radius $R_{\rm s}$ in arcmin for the open clusters NGC 2323 (squares connected by lines) and NGC 2311 (circles connected by lines).

Open with DEXTER
a function of the sampling radius. In general, $N_{\rm c}$ always increases with increasing $R_{\rm s}$ and there are no relatively flat regions analogous to those observed in Fig. 2 for $R_{\rm s} > R_{\rm c}$. Without previous knowledge of the approximate value of the cluster radius, how can we determine the most reliable result? This is not a trivial question given the large uncertainties involved in the estimation or definition of the cluster radius (see discussion in Sect. 4.2). For example, the radius of the total extent of NGC 2323 estimated by different authors varies widely: 10 arcmin (Claria et al. 1998), 16.7 arcmin (Nilakshi et al. 2002), 15 arcmin (Kalirai et al. 2003), 22.2 arcmin (Kharchenko et al. 2005), 17 arcmin (Sharma et al. 2006, using their own optical data), or 22 arcmin (Sharma et al. 2006, using 2MASS data). Our calculations yield $N_{\rm c}=198$ probable members in a field of radius $R_{\rm s}=17$ arcmin, but this number increases to $N_{\rm c}=336$ for $R_{\rm s}=22$ arcmin. This means that there could be more than 100 undetected members, if we use $R_{\rm s}=17$arcmin and the cluster radius is actually $R_{\rm c}=22$ arcmin. Conversely, there are more than 100 spurious members if we use $R_{\rm s}=22$ arcmin and $R_{\rm c}=17$ arcmin. The fraction of cluster members is shown in Fig. 6.
\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg6.eps}}
\end{figure} Figure 6:

Fraction of cluster stars as a function of the sampling radius for NGC 2323 (squares connected by lines) and NGC 2311 (circles connected by lines). Vertical arrows indicate the optimal sampling radii (see text).

Open with DEXTER
The trend in which $N_{\rm c}/(N_{\rm c}+N_{\rm f})$ decreases with $R_{\rm s}$is qualitatively consistent with the expected behavior (Fig. 3). However, there is a $R_{\rm s}$ value from which the fraction of members increases as $R_{\rm s}$increases and, as mentioned in the previous section, this behavior is only possible if $N_{\rm c}$ increases faster than $N_{\rm f}$ does (i.e., at a rate higher than $\sim$ $R_{\rm s}^2$). The only way that this could happen is if the algorithm introduces many spurious members as $R_{\rm s}$ increases. In other words, there is a critical $R_{\rm s}$ value above which a significant number of spurious members are erroneously included as part of the cluster (see also Piatti et al. 2009). Here we call this critical value the optimal sampling radius, $R_{\rm s,opt} $, and obviously do not recommend using a sampling radius larger than this value. From Fig. 6, we obtain $R_{\rm s,opt} \simeq 14$arcmin for NGC 2323 and $R_{\rm s,opt} \simeq 13$ arcmin for NGC 2311, but we emphasize that these values are valid for the data that we use and, in principle, they cannot be extrapolated to other data sets.

The main reason behind the behavior observed in Fig. 6 is the disagreement between the assumed and the ``true'' underlying distributions of proper motion of field stars. A circular normal bivariate function is a good representation of the cluster probability density function (PDF), the standard deviation being the result of observational errors that prevent the intrinsic velocity dispersion of the cluster from being completely resolved. However, it is known that an elliptical normal bivariate function is not always the most reliable model for the field PDF (see discussions on this subject in Cabrera-Caño & Alfaro 1990; Balaguer-Núñez et al. 2004; Uribe & Brieva 1994; Sánchez & Alfaro 2009; Griv et al. 2009). The combination of several factors, such as galactic differential rotation or peculiar motions, may affect the field star distribution, which usually tends to exhibit non-Gaussian tails. Non-parametric models, which make no a priori assumptions about the cluster or field star distributions, were introduced and used to overcome this problem (cf. Cabrera-Caño & Alfaro 1990; Chen et al. 1997). We note that both the classical parametric and non-parametric methods agree reasonably well with each other only for nearly Gaussian field distributions (see Fig. 5 in Sánchez & Alfaro 2009). When the number of field stars increases and the algorithm tries to fit a Gaussian function to the PDF, the fit tends to produce a wider and flatter function. As a consequence, the membership probabilities (defined as the ratio of the cluster to the total proper motion distribution function) increases and the number of assigned members therefore also increases. This effect is magnified when the cluster distribution becomes ``contaminated'' by many field stars, because the standard deviation of the cluster then tends to increase with the consequent increase in the number of spurious members. The standard deviations estimated for the two clusters being studied are shown in Fig. 7.

\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg7.eps}}
\end{figure} Figure 7:

Estimated standard deviations as a function of the sampling radius for the clusters NGC 2323 (squares connected by lines) and NGC 2311 (circles connected by lines). The bars indicate the uncertainties obtained from bootstrapping.

Open with DEXTER
The error bars were estimated using bootstrap techniques: the calculation is repeated for a series of 100 random resamplings of the data and the standard deviation of the obtained set of values is taken as the associated uncertainty. The standard deviations remain nearly constant ( $\sigma_{\rm c}
\simeq 1.4{-}1.6$ for NGC 2311 and $\sigma_{\rm c} \simeq 0.9{-}1.0$for NGC 2323) in the region in which $R_{\rm s} \lesssim R_{\rm s,opt}$(see also Fig. 6). This is the expected behavior because, in principle, $\sigma_{\rm c}$ should not depend on the sample size. However, above the optimal sampling radius, we can see a gradual increase in $\sigma_{\rm c}$ because of the effect mentioned previously.

4.1 Effectiveness of membership determination

It is not possible in practice to quantify the degree of correlation between identified and true cluster members, such as the matching fraction in Fig. 4. Instead, we can use the concept of effectiveness of membership determination, which is defined to be (Wu et al. 2002; Tian et al. 1998)

\begin{displaymath}
E=1-
\frac{N \sum_{i=1}^N \{ p(i) \left[ 1-p(i) \right]
\}}{\sum_{i=1}^N p(i) \sum_{i=1}^N \left[ 1-p(i) \right]} \ \ ,
\end{displaymath} (8)

where p(i) is the membership probability of the ith star and N is the sample size. This index measures the effectiveness of the membership determination by measuring the separation between field and cluster populations in the probability histogram. The higher the index E, the more effective the membership determination. The maximum E value is obtained when there are two perfectly separated populations of $N_{\rm c}$ stars with membership probabilities p(i)=1 and $N_{\rm f}$ stars with p(i)=0. Figure 8
\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg8.eps}}
\end{figure} Figure 8:

Effectiveness of membership determination (see Eq. (8)) as a function of the sampling radius for the open cluster NGC 2323 (open squares connected by solid lines) and for simulations using parameter values corresponding to those obtained for NGC 2323 (dashed lines).

Open with DEXTER
shows E for the open cluster NGC 2323 as a function of the sampling radius. For the sake of comparison, we also show the result for simulations using the same parameters obtained for NGC 2323. Our most reliable estimation for this cluster ( $R_{\rm s}=R_{\rm s,opt}=14$ arcmin) yielded the following values of proper motions (in mas/yr): $\mu_{x,{\rm c}}=1.09$, $\mu_{y,{\rm c}}=1.13$, $\sigma_{x,{\rm c}}=\sigma_{y,{\rm c}}=1.01$, $\mu_{x,{\rm f}}=+0.77$, $\mu_{y,{\rm f}}=-2.54$, $\sigma_{x,{\rm f}}=6.41$, and $\sigma_{y,{\rm f}}=5.84$. According to the result shown in Fig. 9 (next section), we assume $R_{\rm c}=20$ arcmin and $\delta=1.7$ for the cluster. In addition, we choose $N_{\rm c,max}=250$ and $N_{\rm f,cri}=500$to obtain the measured values $N_{\rm c}=147$ and $N_{\rm f}=231$at $R_{\rm s}=14$ arcmin. The superimposed dashed lines in Fig. 8 are the average values (and their standard deviations) for these simulations. The simulated Evalue remains fairly constant (within the uncertainties) as $R_{\rm s}$ increases up to the value $R_{\rm s} \simeq R_{\rm c} = 20$ arcmin, beyond which it decreases at a relatively rapid rate. For NGC 2323, we see that E begins to decrease more rapidly as $R_{\rm s}$ increases just beyond $R_{\rm s} \simeq R_{\rm s,opt} = 14$ arcmin. The clearest separations between cluster and field stars and the closest agreement with the simulations are achieved in the range $10 \lesssim R_{\rm s} \lesssim 14$ arcmin.

4.2 Cluster radius and optimal sampling radius

When using only kinematical criteria, we propose that the sample size can substantially alter the results obtained (the memberships and the remaining properties derived from there). Thus, the strategy of choosing a field large enough to be sure of covering more than the entire cluster must be performed carefully, especially in dense star fields. According to our simulations (Sect. 3), the most robust membership estimation is achieved when $R_{\rm s} \simeq R_{\rm c}$. This would seem an obvious result, given that for $R_{\rm s} < R_{\rm c}$ the cluster is subsampled, whereas for $R_{\rm s} > R_{\rm c}$ the probability of contamination by field stars is increased. This illustrates that it is important to know the cluster radius reliably before estimating memberships. It is difficult to determine precisely the radius of a cluster because the definition of radius is ambiguous itself, since star clusters have no clearly defined natural boundaries. In this work, we have used the usual definition of $R_{\rm c}$, which is the radius of the circle containing all the cluster members. Most ``geometric'' definitions tend to overestimate the true size, especially for irregularly shaped clusters (Schmeja & Klessen 2006). But this is not the main problem, which is instead that the independent estimations of cluster radii available in the literature usually differ significantly. Angular sizes listed in catalogues as Webda[*] were compiled from older references (e.g., Lynga 1987) in which most of the apparent diameters were estimated from visual inspection. According to Webda, for NGC 2323 $R_{\rm c} = 7$arcmin, whereas Sharma et al. (2006) estimate $R_{\rm c} \sim 20$ arcmin. As mentioned above, it is usual practice to choose a field larger than the apparent area covered by the cluster (taken from the literature) to estimate membership probabilities. However, at least when applying the Sanders' method, assigned members will be spread throughout the selected area because of contamination by field stars. It is probably not coincidental that this is true, for example, for probable members in the Dias catalogue (Dias et al. 2002). How reliable are memberships derived from proper motions? It depends on the ``true'' $R_{\rm c}$ values. Thus, again, a reliable assessment of membership should use some robust estimation of the radius.

A commonly used procedure for determining (or defining) the cluster radius is based on the analysis of the projected radial density profile. Usually, some particular analytical function (for example, a King-like model) is fitted to the density profile and the cluster radius is extracted from this fit. A systematic determination of cluster sizes based on objective and uniform estimations of radial density profiles was performed by Kharchenko et al. (2004). One limitation of this method is the sensitivity of the fit to small variations in the distribution of stars, especially for poorly populated open clusters. The most reliable fits are obtained by using only cluster members, but we are then affected again by the problem of membership determination. As an example, we consider Fig. 9, which

\begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg9.eps}}
\end{figure} Figure 9:

Radial density profiles for the cluster NGC 2323 calculated for the cases $R_{\rm s}=14$ arcmin (solid circles) and $R_{\rm s}=25$ arcmin (open circles). Lines show the best fits to functions of the form $\sim r^{\delta -2}$ (see Eq. (6)). The solid line represents the case $R_{\rm s}=14$ arcmin for which $\delta \simeq 1.7$, and the dashed line corresponds to $R_{\rm s}=25$ arcmin for which $\delta \simeq 1.2$.

Open with DEXTER
compares the density profiles obtained for the open cluster NGC 2323 for two different sampling radii $R_{\rm s}=R_{\rm s,opt}=14$ arcmin and $R_{\rm s}=25$ arcmin. According to our results (Sect. 4), our most reliable estimation is achieved when $R_{\rm s}=R_{\rm s,opt}$. In this case, the least squares fit to a power law function infers a cluster radius in the range $\sim$20-25 arcmin. However, if we consider a sample of size $R_{\rm s}=25$arcmin, the contamination by field stars tends to cause the overestimation of the star density and both the index of the power law and the estimated cluster radius change significantly (see Fig. 9). But the main drawback of this method is that simple analytical fits are not always a good representation of the star distribution in open clusters (Sánchez & Alfaro 2009). The radius defined by fitting a density profile may be useful in analyzing and comparing the properties of several clusters systematically, but great care must be taken when using these model-dependent definitions to estimate the ``true'' cluster radius. The point at which the fitted star density equals the background (or drops to zero) does not even necessarily agree with the outer boundary of an open cluster. In principle, new-born stars in a young cluster should be spread throughout the region that is collapsing to form the cluster. At a certain distance from the high density peak in the molecular cloud, the required conditions are no longer fulfilled and the star formation efficiency may decrease abruptly. So, a radial star density distribution that decays smoothly to $R_{\rm c}$ may not always be suitable, especially for compact and/or very young star clusters. Moreover, if the clusters exhibit some degree of substructure, this type of procedure yields totally unrealistic results (Sánchez & Alfaro 2009). Young embedded clusters often show hierarchical structure (Elmegreen 2009), so that these methods cannot in principle be applied to embedded clusters but only to centrally concentrated open clusters.

Obviously, any reliable estimation of the cluster radius ultimately depends on the membership determination. Field star contamination may affect the determination of $R_{\rm c}$, and what we have demonstrated in this work is that this contamination can become a significant problem if not taken into consideration. Furthermore, even though cluster and field populations were well separated, the estimated radius would depend on the limit magnitude if, for instance, there was mass segregation. This kind of problems is particularly relevant to the development of automated techniques in which it is necessary to establish objective criteria when determining the size of the sample to be processed. What we propose here is to apply any suggested method to several sample sizes $R_{\rm s}$ and analyze the behavior obtained. It is difficult to establish simple rules for evaluating this behavior because the results will depend directly on both the membership determination algorithm and the input data. However, for the method considered in this work, which is based on two Gaussian populations, the basic procedure can be outlined as follows:

1.
An upper limit to $R_{\rm s}$ can be estimated by fitting the spatial star density to, for example, a King profile. The estimated tidal radius (or, to be conservative, twice its value) may be considered an upper limit to the optimal sampling radius and would define the range of $R_{\rm s}$ values to be scanned.
2.
For each $R_{\rm s}$ value, cluster memberships and all the relevant quantities (numbers of cluster stars and field stars, centroids with their standard deviations, effectiveness of membership determination) have to be estimated.
3.
The next step is to plot the number of cluster members $N_{\rm c}$ as a function of the sampling radius $R_{\rm s}$. If the membership determination works reasonably well, meaning that it presents little contamination by field stars, then we would observe a behavior similar to that seen in Fig. 2: $N_{\rm c}$ increases as $R_{\rm s}$ increases until some point (when $R_{\rm s}=R_{\rm c}$) and then $N_{\rm c}$remains approximately constant for higher $R_{\rm s}$ values (or increases at a much slower rate). In this way, we can estimate the cluster size directly from the data and the membership criteria without making any additional assumptions. The optimal sampling radius at which we achieve the most reliable membership estimation is precisely $R_{\rm s,opt}=R_{\rm c}$ (Fig. 4)
4.
If the parametric model does not adequately describe the real data and/or if the internal noise does not have simple properties, then the behavior of the estimated parameters with respect to $R_{\rm s}$ would differ from that expected. If this were the case, we should plot the fraction of members $N_{\rm c}/(N_{\rm c}+N_{\rm f})$versus $R_{\rm s}$, where we would identify the optimal sampling radius $R_{\rm s,opt} $ with the minimum in this plot (Fig. 6). In the absence of more accurate information, this value should correspond to the radius at which the membership classification is the most reliable (for this method in a given astrometric catalogue).
5.
Our experience indicates that the properties derived from the Sanders' method tend to exhibit noise and it is not always easy to identify precisely the position of specific features (such as the minimum in the $N_{\rm c}/(N_{\rm c}+N_{\rm f})$ versus $R_{\rm s}$ plot). Some complementary strategies may be useful in identifying or confirming the optimal sampling radius. First, one can consider the variation in the proper motion standard deviation with radius. The dispersion in the cluster proper motions should exhibit a change of slope at radius close to the optimal sampling radius (Fig. 7). Secondly, the maximal effectiveness of membership determination should be reached around $R_{\rm s,opt} $(Fig. 8).
The strategy proposed in this work, i.e., to estimate and analyze cluster memberships as a function of $R_{\rm s}$, should in principle allow us to identify the optimal sampling radius. However, we emphasize that it may not always be possible (or at least not always unambiguous) to determine $R_{\rm s,opt} $in the way described above. For instance, for very peaked cluster density profiles the change in $N_{\rm c}$at $R_{\rm s}=R_{\rm c}$ may not be significant enough to be easily detected (e.g., Fig. 2a). In spite of this, it seems appropriate and useful to perform these tests before further analysis.

5 Conclusions

We have evaluated the performance of the commonly used Sanders' method (Sanders 1971; Vasilevskis et al. 1958; Cabrera-Caño & Alfaro 1985) for determining star cluster memberships. In general, the results depend on the radius of the field containing the sampled cluster (the sampling radius, $R_{\rm s}$). The main reason for this dependence is the difference between the assumed Gaussian and the true underlying proper motion distributions. The contamination of cluster members by field stars increases as the sampling radius increases. The rate at which this effect occurs depends on the intrinsic characteristics of the data set. There is a threshold value of $R_{\rm s}$ above which the identified cluster members are highly contaminated by field stars and the effectiveness of membership determination is relatively small. Thus, care must be taken when applying the Sanders' method (by itself or as part of a more extensive procedure) especially when we do not have reliable information about the true cluster radius and/or when the sampling radius is larger than the cluster radius. If this type of effect is not taken into consideration in automated data analysis then significant biases may arise in the derived cluster parameters. The optimal sampling radius can be estimated by plotting the number of cluster members and/or the fraction of members as a function of the sampling radius. Moreover, this type of analysis can also be used as an objective procedure that can be applied systematically to determine cluster radii.

Acknowledgements
We thank the referee for his/her comments which improved this paper. We acknowledge financial support from MICINN of Spain through grant AYA2007-64052 and from Consejería de Educación y Ciencia (Junta de Andalucía) through TIC-101 and TIC-4075. N.S. is supported by a post-doctoral JAE-Doc (CSIC) contract. E.J.A. acknowledges financial support from the Spanish MICINN under the Consolider-Ingenio 2010 Program grant CSD2006-00070: ``First Science with the GTC''.

References

Footnotes

... Webda[*]
http://www.univie.ac.at/webda

All Figures

  \begin{figure}
\par\includegraphics[width=17cm,clip]{12886fg1.eps}
\end{figure} Figure 1:

Proper motion for the stars of a random simulation with $N_{\rm c,max}=N_{\rm f,cri}=200$, $\delta =2$, and $R_{\rm s}/R_{\rm c}=1.1$ (see text for details of the meaning of each of these quantities). Left panel shows the distribution for all the 442 simulated stars. Red circles are the field stars centered on (0,0) with $\sigma _{\rm f}=5$and blue circles are the 200 cluster stars centered on (1,0)with $\sigma _{\rm c}=1$. Right panel is a magnification of the central region in which we have marked with circles the stars whose resulting cluster membership probabilities are higher than 0.5 according to the algorithm used.

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg2.eps}}
\end{figure} Figure 2:

Calculated number of field and cluster stars as a function of the sampling radius in units of the cluster radius, $R_{\rm s}/R_{\rm c}$, for simulations with the same set of parameters as Fig. 1. a) Simulation with peaked density profile ( $\delta =0.5$), assigned members are indicated by squares connected by lines. b) Simulation with flat density profile ($\delta =2$), members are indicated by circles connected by lines. Assigned field stars are indicated by vertical bars connected by lines, the length of the bars indicating one standard deviation. The real numbers of simulated stars are shown by dashed lines (cluster) and dotted lines (field).

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg3.eps}}
\end{figure} Figure 3:

Calculated fraction of cluster stars as a function of the sampling radius for the same simulations as in Fig. 2. The real (simulated) values are shown by dashed lines.

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg4.eps}}
\end{figure} Figure 4:

Matching fraction of the cluster (see text) as a function of the sampling radius for the same simulations as in Fig. 2. The error bars are of the order of the symbol sizes but are not shown for clarity.

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg5.eps}}
\end{figure} Figure 5:

Number of cluster stars $N_{\rm c}$ as a function of the sampling radius $R_{\rm s}$ in arcmin for the open clusters NGC 2323 (squares connected by lines) and NGC 2311 (circles connected by lines).

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg6.eps}}
\end{figure} Figure 6:

Fraction of cluster stars as a function of the sampling radius for NGC 2323 (squares connected by lines) and NGC 2311 (circles connected by lines). Vertical arrows indicate the optimal sampling radii (see text).

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg7.eps}}
\end{figure} Figure 7:

Estimated standard deviations as a function of the sampling radius for the clusters NGC 2323 (squares connected by lines) and NGC 2311 (circles connected by lines). The bars indicate the uncertainties obtained from bootstrapping.

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg8.eps}}
\end{figure} Figure 8:

Effectiveness of membership determination (see Eq. (8)) as a function of the sampling radius for the open cluster NGC 2323 (open squares connected by solid lines) and for simulations using parameter values corresponding to those obtained for NGC 2323 (dashed lines).

Open with DEXTER
In the text

  \begin{figure}
\par\resizebox{9cm}{!}{\includegraphics{12886fg9.eps}}
\end{figure} Figure 9:

Radial density profiles for the cluster NGC 2323 calculated for the cases $R_{\rm s}=14$ arcmin (solid circles) and $R_{\rm s}=25$ arcmin (open circles). Lines show the best fits to functions of the form $\sim r^{\delta -2}$ (see Eq. (6)). The solid line represents the case $R_{\rm s}=14$ arcmin for which $\delta \simeq 1.7$, and the dashed line corresponds to $R_{\rm s}=25$ arcmin for which $\delta \simeq 1.2$.

Open with DEXTER
In the text


Copyright ESO 2010

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.