EDP Sciences
Free Access
Issue
A&A
Volume 608, December 2017
Article Number A13
Number of page(s) 13
Section Galactic structure, stellar clusters and populations
DOI https://doi.org/10.1051/0004-6361/201630193
Published online 29 November 2017

© ESO, 2017

1. Introduction

Embedded star clusters offer one of the best opportunities to understand star formation. Within these astrophysical laboratories hundreds of stars are formed in volumes below 1pc3. Overall, it is estimated that 80–90% of young stellar objects (YSOs) form in embedded clusters (Lada & Lada 2003; although the exact fraction is somewhat sensitive to the definition of a cluster, see Bressert et al. 2010). It is now established that the fate of these objects is directly linked to the evolution of the molecular gas, which is responsible for most of the gravitational potential that binds the stars in the cluster. Therefore, it is important to study these objects in their early phases, before they are subject to infant mortality (Lada et al. 1984; Geyer & Burkert 2001).

Operationally, clusters are often defined and identified as groups of stars whose stellar surface density exceeds that of field stars of the same physical type (see, e.g., Wilking & Lada 1985; Lada et al. 1991; Carpenter 2000)1. In this respect, embedded clusters pose special problems because they are buried in dust and gas, and therefore are often not even visible in optical images; moreover, their shape is often elongated and clumpy, reflecting the initial structure of the dense molecular gas (Gutermuth et al. 2005; Megeath et al. 2016). However, even with infrared observations, discovering deeply embedded clusters and measuring their basic parameters (such as surface density and size) can still be a challenge since the associated dust extinguishes both the cluster members and the field stars behind the cloud (see, e.g., Román-Zúñiga et al. 2008). In fact, typical optical or near-infrared observations stellar fields around molecular clouds show underdensities at the location of the clouds because of the effects of extinction on the density of background stars. In such a situation the observed cluster stellar surface density can be comparable or even less than the unobscured field stellar surface density (see Fig. 1).

thumbnail Fig. 1

A one-dimensional sketch figure that illustrates the difficulties encountered in detecting embedded clusters. The bottom black line shows the extinction profile of a molecular cloud. As a result of the extinction, background stars in the field show an underdensity. Even if an embedded cluster is present, the total observed star surface density does not reveal it, making it virtually impossible to detect the cluster without the use of the technique described in this paper.

Open with DEXTER

Different authors have used different techniques to take into account the effects of extinction in the identification and characterization of star clusters. Carpenter (2000) built density maps in the direction of nearby molecular clouds using the 2MASS Point Source Catalog, and corrected the effect of dust extinction using publicly available CO maps (converted into infrared extinction with a constant X-factor). In his technique, the extinction correction is only applied to field stars: the cluster stellar density is obtained by subtracting an extinction-corrected model of the field stellar density to the observed stellar density. Moreover, the use of CO maps and the largely uncertain X-factor represent a primary source of error.

An opposite approach has been adopted by Cambrésy et al. (2013), who studied star clusters embedded in the Rosette nebula using 2MASS data. In this case, the local extinction was determined directly from the stellar near-infrared colors, and a correction for the effects of extinction was applied to each star. In practice, as noted by the authors themselves, the correction applied would only be correct for background stars, while it is used for all stars (including the foreground, which should not be corrected, and embedded ones, which should be partially corrected). Things are further complicated because molecular clouds are known to have steep gradients in their column densities, and these, if undetected (because of resolution effects, which is surely the case of the Rosette cloud at the 2MASS resolution), would introduce further biases in the extinction map and in the correction to the cluster richness.

Cambrésy et al. (2006) use yet another technique (similar to Cambrésy et al. 2002) to study the properties of IC 348 in Perseus. The basic idea is that two independent measurements of extinction in molecular clouds are available, one from the star color excess (here applied to HK color), and one from the star number counts. In absence of contaminants, both measurements should agree. The presence of a cluster, however, will affect both the color excess measurement (in a way that depends on the intrinsic color of the cluster members) and the star count method (in a way that depends on the location of the cluster within the cloud). The difference of the two extinction estimates is a proxy for the cluster density. The various assumptions made when applying this technique (in particular, the degree of embedding of the cluster within the cloud) need to be resolved by calibrating it using independent measurements of cluster richness: this clearly limits its application.

In this paper we present a new methodology to identify and characterize clusters in wide-field or all sky, multi-band, infrared imaging surveys. The method is based on the production of extinction-corrected maps of stellar surface density, and takes advantage of the different H-band luminosity functions of embedded clusters with respect to that of the background population. Additionally, in contrast to the methods described above, it is able to correct for cluster members unidentified because of extinction in a way that takes into account the position of the cluster within the cloud along the line of sight. The technique is based on a rigorous mathematical framework; this provides a clear advantage, in that we can perform a detailed statistical analysis of the method. But the detailed mathematical derivation of this method might not be of as much interest to astronomers as its implementation. However, we stress that those readers not interested in the detailed mathematical aspects of the derivation can still benefit from the method proposed here, because its application is simple and straightforward: Eq. (30), with optionally a second expression for the noise estimate, Eq. (31). We also provide a pseudo-code for it in the appendix.

This paper is organized as follows. In Sect. 2 we present the general framework and discuss the standard technique generally employed to identify clusters. In Sect. 3 we present our new method and we discuss its practical implementation in Sect. 4. Simple numerical tests used to validate the method are described in Sect. 5, while the results of an application to the Orion molecular complex are discussed in Sect. 6. Finally, we summarize the results obtained in this paper in Sect. 7.

2. The standard technique

Traditionally, star clusters are identified as overdensities in the distribution of stars. Although many different options are available to estimate the local angular density of stars σ (see, e.g., von Hoerner 1963; Casertano & Hut 1985; Gutermuth et al. 2009), we consider in this paper the simple “moving average” estimate2(1)Here n runs on the individual stars, { xn } are the locations of the various stars, and W is a window function, normalized to unity: (2)So, by construction, W has a unit of one over area. As a simple example, suppose that W is taken to be a top-hat function: (3)In this case, the sum of Eq. (1)runs just over the Nr stars within an angular distance r from the point of the estimate, and the observed estimates reduces to (4)For a constant density of stars, this quantity is unbiased, in the sense that the mean (ensemble average) E of is just the true density of stars: (5)and the associated variance is (6)This equation shows that the error associated with the measured star density decreases as : therefore, if it is known that the density of stars is constant, to determine its value it is sensible to use relatively large window functions W.

More generally, if the star density is variable, the average measured density can be shown to be a convolution of the true density with the window function W: (7)and the associated two point correlation function is (8)Equation (7)shows that W sets the scale for the resolution of the density map; similarly, Eq. (8)shows that points close enough in the density map will have correlated noise. Therefore, if one aims at finding changes in the star density (such as a star cluster), the window function should have a scale that is comparable or smaller than the typical size of the variations of the star density. However, this is in tension with the noise properties of Eq. (8), because a small window function implies a large noise.

In order to be able to detect a star cluster, the measured density of stars at the location of the cluster must differ from the average density much more than the standard deviation of . Hence, the quantity sets the sensitivity of the cluster finding algorithm. In general the true density σ = σfield + σcluster is the sum of the field density and of the cluster density, and in some cases σclusterσfield. In these conditions the error is dominated by the shot-noise due to the field star population. In reality, many other effects can prevent the discovery and characterization of star clusters:

  • extinction by dark nebulæ, which reduces the surface density ofbackground sources;

  • the galactic structure, which induces smooth large-scale variations;

  • differences in the sensitivity across a mosaic image due to changes in the observational conditions;

  • other systematical observational effects, such halos produced by bright objects within the image.

Extinction correction.

Among the effects listed above, the first one is particularly important for young clusters, since these objects tend to be deeply embedded and thus can escape a detection; additionally, detected clusters are plagued by large uncertainties in their physical properties (number of stars, mass, and size). For this reason, Cambrésy et al. (2013) developed a technique to perform a simple correction of the results produced by Eq. (1). They noted that the density of background stars observed through a molecular cloud decreases by a factor 10αA, where α is the exponential slope of the number counts and A is the extinction, both in the band used for the observation. Therefore, to account for the undetected stars one can just multiply the local density estimate by 10αA(x), where A(x) is the local estimate extinction (i.e., the extinction derived from an extinction map at the location of the x).

The problem with this approach is that it uses the same correction factor for foreground stars, embedded objects, and background stars, which generally results in an overestimate of the local corrected star density. Additionally, the same correction is applied to YSOs and field stars, which however have number count distributions very different. This leaves a large uncertainty on the corrected and, ultimately, on the characterization of each cluster.

3. Maximum-likelihood approach

3.1. Constant extinction, no weighting

As discussed in the previous section, the error associated to is often dominated by shot noise due to the field star population. However, as mentioned, YSOs have photometric properties that differ significantly from those of field stars: the former have a H-band luminosity function (hereafter HLF) that can be approximated by a Gaussian distribution (Muench et al. 2002), while the latter have a HLF that follows, up to H ~18 mag, an exponential with slope α ~ 0.33 (see, e.g., Meingast et al. 2016). This suggests that we might use suitable cuts in the H-band to reduce the number of field stars, while essentially retaining all YSOs, thus gaining in terms of noise. However, this naive procedure is difficult to apply, since both YSOs and field stars are also affected by extinction, which would change the shape and the peaks of the respective HLFs. Therefore, we are forced to use a more systematic approach.

Let us first model the observed HLF for a set of angularly close stars (i.e., in a patch of the sky). We suppose that the true, unextinguished HLF can be described as a mixture of L different HLFs, corresponding to different stellar populations (in the situation considered later on in this paper we will use just two components, one for the field stars and one for the YSOs, but deeper observations might require the inclusion of a third component corresponding to galaxies and following exponential number counts with a slope of 0.6; see Hubble 1926). The observed HLF differs from the true one because of extinction and because of incompleteness in the faint end; additionally, we will also have photometric errors, but generally these will be small compared to the typical width of the various HLF components, and for the sake of simplicity are therefore neglected here3. We can thus write finally the observed HLF, measured in units of stars per area per magnitude bin, as (9)where A is the H-band extinction4, c(m) is the completeness function (i.e. the probability to detect a star with apparent magnitude m), pi(m) is the probability distribution for the ith component of the HLF, and σi is a coefficient that indicates the predominance of the ith component in the HLF. In order to give σi a simple interpretation, it is useful to take pi suitably normalized. In particular, we assume that (10)With this choice, σi can be identified as the observed angular density of stars for the ith component in absence of extinction.

Our goal is to infer the parameters { σi } from a set of observed H-band magnitudes { mn } in a patch of the sky of area S; we will then repeat this operation in different directions in order to build maps of densities for the various components. This will allow us to identify overdensities in each component, such as in the YSO one. To this purpose, we use Eq. (9)to write the log-likelihood following the prescriptions of Lombardi et al. (2013). The entire analysis is done in the magnitude space, by using as the probability distribution density for the magnitudes: (11)This likelihood can be used in Bayes’ theorem to infer a posterior probability for the densities { σi }. More simply, we can just find the most likely parameters { σi } using a maximum likelihood approach. To this purpose, we consider (12)The maximum likelihood solution is given by the set of densities σi that maximize or, equivalently, lnℒ, i.e. by the zero of Eq. (12); that is, we need to solve (13)Unfortunately, the solution of this equation for L > 1 cannot be provided in analytic form, and must be obtained by numerical methods. We will discuss below in Sect. 3.3 simple ways to find it.

We can obtain an estimate of the errors associated to the measurements σi from the Fisher information matrix (see, e.g., Feigelson & Babu 2012), that we recall is defined as (14)The Fisher information matrix is related to the minimum covariance matrix that can be attained by an unbiased estimator, as provided by the Cramér-Rao bound: (15)Since the maximum-likelihood estimator is asymptotically efficient (i.e. it attains the Cramér-Rao bound when the sample size tends to infinity) and the resulting errors on tend to a multivariate Gaussian distribution, it is interesting to obtain an analytic result for the information matrix.

The Fisher information matrix can be readily evaluated from our data using Eq. (12) of Lombardi et al. (2013): (16)This relation is interesting from several points of view. First, note that the Fisher matrix contains elements outside the diagonal, unless the probability distributions for the various components do not overlap, i.e. have non-intersecting support: this would basically mean that we could immediately tell to which component belongs a star from its unextinguished magnitude. Second, note that all elements of the matrix are non-negative and therefore, in case of L = 2 components, Cov [σ] ≃ I-1 will have non-positive elements off-diagonal: in other words, the measurements of the two densities σ1 and σ2 will be affected by a negative correlation. This is expected and corresponds to a classification error: if p1 overlaps with p2, we cannot uniquely associate stars to each different population and in general an overestimate of one population is done at the expenses of an underestimate of the other population.

It is instructive to consider the form of the information matrix in the special case where L = 1. When just one component is used, then I is a scalar and it reduces to (17)where E [N] = is the average number of stars observed in the area S. Its inverse, I-1, is therefore E [N] /σ2, as expected from a simple Poisson statistics.

With L ≥ 2, in principle we can encounter cases where the Fisher information matrix is singular. Given the analytic form of I, this happens in particular when the two components have the exact same probability distributions within the support of c(m), i.e. c(m)pi(m) = c(m)pj(m): in this case, the corresponding rows (or columns) of I are identical. In such a situation, clearly, it is virtually impossible to classify objects as one of the two degenerate components, and therefore the uncertainty on the respective densities σi and σj are infinite.

For completeness, we also report the expected maximum value of the log-likelihood (18)and the associated variance (19)These equations can be used to verify that the chosen model (9)can account well for the data.

3.2. Differential extinction and spatial weight

So far, we have assumed that all stars are subject to the same extinction A; moreover, we have not weighted stars depending on their angular position as done in Eq. (1). In this section we intend to remove these two limitations and consider the full problem.

The simpler approach to include the effects of differential extinction is to consider the joint density in magnitudes and positions. In this framework, we can rewrite Eq. (11)for log-likelihood as (20)In this equation the quantity σ(m,x′) represents the predicted density of stars with magnitude m at the location x. Similarly to Eq. (9), we write this quantity as a mixture of different densities, corresponding to different stellar populations: (21)where σi(x′) represents the density of stars of class i at the sky position x. In order to proceed, we need to model these densities. A simple and consistent way of doing this, is to suppose that lnσi(x′) can be written as the weighted sum of two terms: one, lnσ(x), associated to the density at the point of interest x (the point where we intend to evaluate the local densities of stellar populations); and one, lnτi(x′), which parametrizes the local changes of the densities. As a result, we write (22)The function ω describes the spatial correlation between densities at different positions and plays a central role in identifying which stars contribute to the density estimate at x.

We can now insert Eqs. (22)and (21)in Eq. (20)and find the maximum likelihood solution over σi(x). Calling AnA(xn) the extinction suffered by the nth star, we find (23)We now make an important assumption. Because of the form of the parametrization (22), a solution for the maximum likelihood can only be found if we specify the functional form of the functions τi(x′). However, these functions are truly unknown, since they parametrize the local changes of the various densities, which in turn depend on the local density map. Using a statistical approach, however, it is natural to assume that these functions have the same distribution of σi(x), the quantity we are interested in. As a result, if we take an average of Eq. (23), terms such as σi(x′) /σi(x) cancel out: (24)Before proceeding, it is useful to consider the solution of the maximum likelihood approach in the simple case where there is a single population of stars and where the extinction vanishes, A(x′) = 0. We find in this case (25)where we have used the normalization property (10). The solution of this equation is immediately found as (26)where (27)We have therefore recovered Eq. (1), with the correct normalization (2)for the weight W.

In the general case, the maximum-likelihood solution of Eq. (24)must be obtained numerically. The errors associated to the solutions can be estimated using the Fisher matrix. In our case, we can obtain the Fisher matrix from (28)As before, we can replace x to x in all arguments of σi, with the justification that the statistical properties of σ are invariant upon translation. We will discuss in the next section useful expressions to evaluate this quantity in practical cases.

Table 1

Different two-dimensional spatial functions ω(x) and corresponding weight functions W(x).

3.3. Implementation

The algorithm proposed in this paper is essentially the search of the solutions of Eq. (24)as a function of the star densities { σi } corresponding to the various components or star populations. The same procedure must be applied to different patches of the sky, so that maps of the star densities can be obtained. These, in turn, will allow us to identify and characterize star clusters, and in particular embedded ones.

Although the practical implementation of the algorithm follows this schema, a number of technical and theoretical aspects must be correctly addressed in order to optimize the detection and make the technique efficient.

First, we note that a simple way to obtain the (positive) solutions of Eq. (24)is through the use of a recursive formula. To this purpose, multiply both members of this equation by σi(x), and solve for this quantity, thus obtaining the expression (29)Unfortunately, we are unable to use this equation because we only know the extinction at the locations of the stars AnA(xn): this prevents us from evaluating the integral over dx2 in the denominator. We can, however, move the denominator inside the sum, and evaluate the second integral by replacing A(x) with A(xn), the extinction at the direction of each star. Additionally, using the same argument that has been employed in Eq. (23), that is the similarity between σ and τ, it is convenient to replace σi(x) with σi(xn) in the numerator. This procedure leads to the iteration (30)Equation (30)is the solution proposed in this paper to estimate the local density of stars. As indicated by the left arrow symbol, we can obtain the set of values { σi } by starting with some arbitrary (positive) values for these quantities, and then by calculating updated values of σi by applying Eq. (30). The convergence is usually reached within a few tens of iterations.

Note that Eq. (30)has a simple interpretation. Let us ignore for a moment the weight W, i.e. let us assume that all stars have the same weight. The sum in Eq. (30)is carried out over all stars in the patch of the sky, but each star is counted only partially (i.e., contributes with a term between zero and unity in the sum): precisely, each star contributes by the computed probability that the star be associated with the ith component. The way this probability is computed is actually a simple application of Bayes’ theorem, where pi(mnAn) plays the role of the likelihood, σi(xn) is proportional to the prior that the star is of class i, and the sum over j in the denominator is proportional to the evidence. The result of the sum of these terms is divided by the by the result of the integral: this is a correcting factor that takes into account the fact that, because of extinction and incompleteness, we will miss a fraction of stars. Note also that Eq. (30)can be also considered as a special case of a K-means soft clustering algorithm where the only unknown quantities are the classes σi (see MacKay 2003).

Before proceeding, it is useful to recall the hypotheses of this algorithm and its strengths. First, we assume that we have some knowledge of the HLF for the various populations of stars that are likely to be present in the field. In practice, we will generally use two probabilities, one for the field stars, and one for the YSOs. Second, we assume that we have measured the extinction An of each star: note that this is not the average extinction at the location of the star, which might be very different because of perspective effects: for example, a foreground star in front of a cloud would have An ≃ 0, while the average extinction would be significant. This way, the algorithm can directly account for foreground contamination: foreground stars will not be corrected in their counts, since the integral in the denominator of Eq. (30)will evaluate to unity. Similarly, stars within molecular clouds will be corrected only for the amount of material that is really in front of them.

Finally, we stress that the iterative procedure proposed here only find positive solutions for the values σi. Although reasonable, nevertheless this choice inevitably introduces biases in the results: for example, in a region where no YSO is present, because of errors we will still measure small positive values for the density of YSOs. However, numerical tests have shown that the bias amount is limited; moreover, a reduction of the bias is associated to a large increase in the scatter. Therefore, we will force the σi to be positive and use Eq. (30)for the solution.

The uncertainties associated to Eq. (30)can be computed from the Fisher matrix. For practical applications, it is convenient to rewrite Eq. (28)by replacing the integrals over dm and d2x with a sum over the observed objects. This leads to the approximated Fisher matrix expression (31)In this equation, we take the spatial function ω to be normalized such that (32)that is, in terms of W, (33)Table 1 reports a few common choices for the spatial function ω(x) and the corresponding weight function W(x), both correctly normalized. As usual, the covariance matrix associated with the measurements of the densities σi can be computed from the inverse of the Fisher matrix, I-1.

4. Simulations

Table 2

Summaries of the results of simulations.

In order to test our method and verify its robustness we have performed a set of simulations. We have considered a small patch of the sky with two different stellar populations: field stars, with exponentially increasing number counts p1(m) ∝ 10αm, with α =0.33 mag-1; and YSOs, with Gaussian number counts p2(m) ∝ exp(−(mm0)2/ 2s2), with m0 =12 mag and s =1.65 mag. We have distributed both populations randomly in the small patch of the sky considered and we have added to each star a random extinction drawn from the probability distribution pA(A) ∝ A-2, in the range A ∈ [0.1,2.0]mag. This choice is meant to simulate the effects of differential extinction for objects within molecular clouds. Finally, we have imagined that our observations are complete up to 15 mag, and that no stars can be observed beyond that magnitude: in other words, we have modeled the completeness function as a Heaviside function c(m) = H(15−m). This way, our final catalog contains, for each star, the angular position in the sky, the H-band magnitude, and the measured extinction. Note that the parameters used here are chosen to simulate a real situation corresponding to the sample application of Sect. 5, that is the Orion molecular cloud complex observed with 2MASS.

We have used these data in our algorithm, represented by Eqs. (30)and (31). As weight function W(x) we have used a Gaussian, and we have chosen the angular units so that (34)This choice guarantees that the effective area of the weight function is unity, i.e. the effective number of stars used for the analysis, in absence of extinction, would be just σ1 + σ2. In reality, the presence of the extinction reduces this number by a factor that depends on the relative ratio between σ1 and σ2 (typically, by ~ 20%).

thumbnail Fig. 2

Distributions of measured densities in a simulation with σ1 = 5 and σ2 = 20 (histogram), together with the predicted Gaussian distribution obtained according to the Fisher matrix evaluated from Eq. (31). The excess of small values of is due to the constraint that imposed by the algorithm.

Open with DEXTER

thumbnail Fig. 3

Distribution of measured total density in a simulation with σ1 = 5 and σ2 = 20, together with the predicted Gaussian distribution (derived from the Fisher matrix). The distribution is essentially unbiased; moreover, because of the anticorrelation between and , the total density has significantly less scatter than both and .

Open with DEXTER

thumbnail Fig. 4

Distributions of measured densities in a simulation with σ1 = 20 and σ2 = 5.

Open with DEXTER

We have then performed different simulations, with various choices for σ1 and σ2, to verify the ability of the algorithm to recover the input densities. Figures 25 show the observed distributions of and , together with the predicted ones (Gaussian distributions centered on the true values of σ1 and σ2, with the variances predicted from the Fisher matrix I). In general, we can note a few points:

  • When one of the input densities is below~ 7, there is a clear excess of the corresponding quantity for small measured values. This is a consequence of the fact that the proposed algorithm only returns positive values for the densities.

  • Except for the point above, the predicted distributions reproduce very well the measured data. The agreement is particularly good when the input densities are large.

  • Overall, the total density σ1 + σ2 is better constrained than the individual densities σ1,2.

Figures 6 and 7 show the biases and the errors on the measured densities for σ1 = 10 and σ2 varying in the interval [0,20]. We can see that there is a measurable bias for σ2 < 5, while the results are essentially unbiased above this value. Correspondingly, we observe in the same range σ2 ∈ [0,5] a measured scatter in measured densities that is significantly smaller than what predicted from the Fisher matrix. For larger values of the input density the error estimate gets closer to the measured errors, but still remains slightly above. This is actually good, because implies a small overestimate of the error which will make the entire method more robust for cluster detections (that is, it will decrease the number of false positive). In summary, all these simulations confirm that the method works and that the error estimate is quite accurate.

5. Sample application: Orion

thumbnail Fig. 5

Distributions of measured densities in a simulation with σ1 = 10 and σ2 = 10.

Open with DEXTER

thumbnail Fig. 6

Biases in the measured densities for σ1 = 10 and σ2 ∈ [0,20]. Note how the bias essentially vanishes for σ2 > 5.

Open with DEXTER

thumbnail Fig. 7

Scatter on and for σ1 = 10 and σ2 ∈ [0,20] (dots), together with the predictions obtained from Eq. (31)(lines).

Open with DEXTER

thumbnail Fig. 8

Violin plot showing the distribution of measured densities for σ1 = 10 and σ2 ∈ [0,20]. Each elongated structure corresponds to a different value of σ2; its width is proportional to the distribution of measured values of , i.e. effectively it is a histogram displayed vertically. The small red dashes indicate the average values.

Open with DEXTER

We have applied the method proposed in this paper to 2MASS data of the Orion A and B molecular cloud complexes. The regions selected ensure that we can verify the reliability of the algorithm proposed here using some of the best studied objects in the sky. In particular, the populations of embedded clusters for both clouds have been the targets of extensive observational campaigns using ground-based, near-infrared Lada et al. 1991) and millimeter-wave (Lane et al. 2016) surveys as well as space-based, mid-infrared Spitzer Space Telescope (Megeath et al. 2012, 2016), and Chandra X-ray (Prisinzano et al. 2008) surveys. Additionally, the distance of these clouds ensures that the 2MASS H-band data are, in absence of extinction, complete for YSOs: that is, the cluster HLF at the distance of Orion is essentially entirely within the 2MASS H-band limiting magnitude (~15 mag).

thumbnail Fig. 9

Violin plot showing the distribution of measured densities for σ1 = 10 and σ2 ∈ [0,20]. The small blue dashes indicate the average values.

Open with DEXTER

thumbnail Fig. 10

Results of the cluster finding algorithm in Orion A using the 2MASS Point Source Catalog. The red contours shows all surface density detections 3σ above the background, while the black contour corresponds to AK =0.3 mag. When displayed in Adobe Acrobat, it is possible to hide thelines, the blackcontours, the red contours corresponding to the, the red dots representing the, and the clusters’.

Open with DEXTER

Table 3

YSO clusters identified in Orion A and B.

Figures 10 and 11 show the density of fiducial YSOs measured in Orion A and B. These maps have been produced by our pipeline together with Nicer (Lombardi & Alves 2001) and Nicest (Lombardi 2009) extinction maps from the 2MASS Point Source Catalogue (see Lombardi et al. 2011, for details on the data selection and extinction map construction). The algorithm has been run in conditions similar to the simulations described above: that is, we have used two different stellar populations, one associated with the background and characterized by exponential number counts, and one associated with the YSOs and characterized by Gaussian number counts (with parameters consistent with the HLF of Muench et al. 2002). Using an angularly close control field we also measured the distribution of intrinsic colors of stars and the shape of the completeness function: the latter has been modeled using a complementary error function erfc, as described in Appendix A.2. This choice makes it possible to use the entire 2MASS catalogue without any magnitude cut (which would increase the noise in the final data products). The maps produced have a pixel size of 1.5 arcmin and a weight function Wω in turn proportional to a Gaussian with FWHM =3 arcmin. We have used a relatively large beam in these maps to increase the sensitivity of our algorithm and to minimize the effects of the biases shown in the simulations described in Sect. 4, while still retaining in most situations the ability to distinguish different clusters (i.e., avoid confusion effects at the distance of Orion).

Since we have at our disposal the covariance map of these measurements, we have assessed the reliability of each density peak in these figures. The red contours in the figures show the areas (larger than 2 pixels) where the local YSO density exceeds 1.5 stars/pixel, corresponding approximately to a signal-to-noise ratio larger than 3.5: note how some regions in black in Fig. 10 do not reach the threshold because of the large error associated with them (mostly due to the high extinction values there).

Table 3 shows the YSO clusters identified in the Orion A and B areas using our simple prescription, together with the most relevant parameters. In some cases we clearly see that angularly close clusters appear as a single contour in our maps: the simple procedure used here to define clusters, the relatively coarse resolution used, and the cluster morphology itself prevent us from deblending some close objects. An extreme case of this situation might be the integral shaped filament (ISF) cluster, where the limitations due to angular resolution would make it difficult to resolve and separate smaller clusters if they exist in such a very populous region. We note that the ISF cluster encompasses M42, the Trapezium and ONC clusters as well as an extended population of YSOs along the ISF.

thumbnail Fig. 11

Results of the cluster finding algorithm in Orion B using the 2MASS Point Source Catalog (see caption of Fig. 10 for the legend). Note that NGC 2023 is below the detection threshold and appears only as a weak smudge in this image. When displayed in Adobe Acrobat, it is possible to hide the, the, the, the red dots representing the, or the.

Open with DEXTER

thumbnail Fig. 12

A Gaussian-kernel smoothed density map of the Megeath et al. (2016) YSO list in Orion A (top) to compare with our density map (bottom).

Open with DEXTER

thumbnail Fig. 13

A Gaussian-kernel smoothed density map of the Megeath et al. (2016) YSO list in Orion B (left) to compare with our density map (right).

Open with DEXTER

Table 4

Correspondence between the clusters identified in this work, marked with the subscript 1, and the ones identified from a smoothed version of the YSO catalog (Megeath et al. 2016).

thumbnail Fig. 14

Distribution of PYSO, the probability that our algorithm assigns to each star to be a YSO, for the Megeath et al. (2016) YSO candidates and for all other objects in the 2MASS point source catalog.

Open with DEXTER

thumbnail Fig. 15

Low-resolution version of the YSO density in Orion. The grey area at ≃ 205° is the Orion OB Ib association, while the lighter area to the right (around ≃ 201°) is OB Ia, containing the 25 Ori cluster (the grey spot at ≃ 201° and b ≃ −18°).

Open with DEXTER

The radius R reported in the table corresponds to the radius of a circle that would occupy the same area as the identified cluster, i.e. to the connected region of the sky where the inferred density of YSOs exceeds the background by 3σ. At the estimated distance of Orion, 413 pc (Reid et al. 2009), 1′ corresponds to 0.12 pc: therefore, the clusters identified have radii spanning from ~2.4 pc to ~0.15 pc.

The well known clusters in these clouds are correctly identified by our procedure. It is interesting to compare Table 3 with clusters identified independently using much more secure data. Among the ones at our disposal, the recent catalog of embedded YSOs obtained by Megeath et al. (2016) using the Spitzer Space Telescope and the Chandra observatory is probably the most secure and complete: we will therefore focus on this catalog. Since our definition of a cluster is based on an somewhat arbitrary parameters (signal-to-noise threshold, minimum number of pixels, no correction for the stars missed at the boundaries), and since different, more-or-less, arbitrary parameters are also used by Megeath et al. (2016), we find it more appropriate and fair to make a comparison after we first homogenize the data. Specifically, we take Megeath et al. (2016) YSO list and we make out of it a density map using a Gaussian kernel of the same size of the one used for our map. Figures 12 and 13 show the results obtained for Orion A and B, which clearly compares very well with our own maps, derived purely from the 2MASS point source catalog. The most relevant structures are present in both maps and have very similar shapes; the only differences are the noise present in our maps (however at a relatively low level), and the limited coverage of the Spitzer derived density maps.

The qualitative similarity of these maps can be quantified if compare clusters identified in both maps using the same criteria. Table 4 shows a list of clusters identified in the smoothed Megeath et al. (2016) maps using a fix density threshold (set to 1.5 stars/pixel). In this table we compare the number of Spitzer YSOs with the number of YSOs predicted from the integral of the σYSOs over the area of each cluster as defined from Megeath et al. density map, together with the computed 1σ error. It is clear that in almost all cases we find an excellent agreement, although in many cases our estimates are slightly larger than the ones by Megeath et al. We can speculate that this is due to the presence of class III YSO, which likely would be missed by Spitzer. Indeed, a comparison of the two panels of Fig. 12 shows that the bottom panel, corresponding to our density map, has spatially more extended clusters than the top panel, corresponding to Megeath et al. density map.

As discussed earlier on, our algorithm is a statistical one and works best when it is applied to a sizeable number of stars. However, we can also push it and associate to each single star a probability of being a YSO: to this purpose, for the nth star we can compute (35)Note how this quantity resembles the term within the outer sum of Eq. (30). Figure 14 shows the distribution of PYSO (that is, the distribution in the probabilities assigned to each object to be a YSO) for the Megeath et al. (2016) YSO candidates and for all the other objects. It is clear how all the other objects have PYSO that is concentrated around zero, while the YSO candidates have a much broader distribution that extends to unity. For these latter objects the distribution, in addition to a substantial peak at PYSO = 1, shows a long tail up to small values of PYSO: this is not unexpected, since our identification is only statistical (and therefore we cannot identify unambiguously YSOs). Note also how the relatively low values of PYSO for some genuine YSOs in our algorithm are compensated by the small tail in the distribution of field stars (this works because there are many more field stars than YSOs, a fact that is accounted for in the algorithm).

Sensitivity to the distributed population.

Recently, Kubiak et al. 2017, have identified a rich and well-defined stellar population of about 2000 objects, mostly M stars without extinction or infrared-excesses, as the low-mass counterpart to the Orion OB Ib subgroup (the Orion belt population). This low-mass population is not obviously clustered but instead appears to be distributed across ~ 10 square degrees and the authors speculate that it could represent the evolved counterpart of a Orion nebula-like cluster. While more data is needed to test this scenario, it is clear that much can be learned about the origin of stellar OB associations and the dispersal of clusters into the Galactic field if one is able to trace in a robust manner the distribution of the slightly older and more expanded populations surrounding star-forming molecular clouds.

We now investigate the extent to which the technique proposed here is suitable for detection of looser, more expanded distributions of young stars, in particular the low-mass counterpart to the Orion OB association presented in Kubiak et al. (2017). For this purpose, we have built a lower resolution map of the region, employing a FWHM of 30 arcmin.

Figure 15 shows that, surprisingly, we are well able to recover the stellar over-density of the Ori Ib population, and for the first time, the stellar over-density of the Ori Ia population. These over-densities are likely to be created by low-mass stars as 2MASS is still sensitive to the peak of the IMF for the putative distance and age of these subgroups. An analysis of the substructure seen in the distributed population visible in Fig. 15 above the noise pattern is beyond the scope of this paper, but will best addressed once Gaia parallaxes are generally available. Of relevance for this paper is that the ability of the method to trace the dispersed population from existing all sky data opens a new window on the unsolved problem of the origins of OB association and cluster dispersal in to the field.

6. Conclusions

The following items summarize the main results presented in this paper:

  • We have developed a new method to discover and characterizedeeply embedded star clusters.

  • The method is able to statistically classify objects as field stars or YSOs and corrects for the effects of differential extinction.

  • We have provided expressions for the covariance of the inferred densities and we have validated both the method and the analytic expression for the covariance with a set of simple but realistic simulations.

  • We have applied the new method to 2MASS point sources observed in the Orion A and B and we have shown that we can identify and characterize well protostellar clusters in these regions, as well as detect much looser associations such as the OB 1a and 1b subgroups.

Finally, we note that the method proposed here can be easily extended to multi-band observations by using suitable probability distributions pi(m) for various populations as a function of the magnitude vector m. Its implementation would be essentially identical, with minor modifications to take into account the different effects of extinction on different bands. The natural use of such an extension would be in the context of techniques such as the one proposed by Juvela & Montillaud (2016) which are able to recover the extinction from a complex analysis of multi-band data.


1

In this paper we focus on the optical identification of a cluster, and we explicitly ignore issues such as the gravitational stability of overdensities of stars. Therefore, we will call “cluster” any overdensity, irrespective of its boundness.

2

Throughout this paper we will use “hats” to denote measured quantities. Hence, σ(x) is the true star density at the angular position x, while is the measured star density at the same position.

3

Photometric errors can be easily included in Eq. (9)by replacing pi(mA) there with the convolution of this function with the photometric uncertainty. In presence of different uncertainties for each star, one will need to specialize Eq. (9)to each star. A similar technique can be used to include the effects of errors on the extinction measurements.

4

In this section we will take the extinction to be identical for a set of angularly close stars; we will then relax this assumption and use the individual extinction in the direction of each star.

Acknowledgments

This research has made use of the 2MASS archive, provided by NASA/IPAC Infrared Science Archive, which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. Additionally, this research has made use of the SIMBAD database, operated at CDS, Strasbourg, France.

References

Appendix A: Implementation

In this section we consider a few details and analytical expressions useful to evaluate some of the expressions needed to implement the method proposed here.

Appendix A.1: Algorithm

The algorithm proposed in this paper is essentially an iterative application of Eq. (30)to the set of densities { σi(x) }. A few notes, however, are necessary to better implement the algorithm:

  • The method assumes that the extinction values in the H band are available for each star used. In our implementation these values are obtained through the use of the Nicer algorithm. Other techniques can be used, as long as the extinction is referred to the single star and is not an average extinction at the location of the star. In our implementation, this requirement limits the method to stars for which, in addition to the H-band measurements, have at least another band photometry.

  • Since the various densities are spatially variable, one needs to repeat the iteration of Eq. (30)to each point in the map.

  • Equation (30)itself needs to be iterated a few times (typically around ten, in some cases a little more) before reaching a good convergence. This generally suggests the use of simple analytical models for the various functions involved in this equation (the probability distributions pi(m) and the completeness function c(m), discussed below).

Appendix A.2: H-band probability distributions and completeness function

In this paper we have considered two useful models for the H-band luminosity function: the exponential distribution and the normal (Gaussian) one.

We parametrize the exponential distribution as (A.1)where k is a normalization constant that will be found lather on and β = αln10.

The normal distribution is parametrized as (A.2)where again k is a normalization constant to be investigated later.

Finally, we model the completeness function c(m) in terms of the complementary error function erfc. This can be justified from an empirical point of view since the error function has a shape that resembles the completeness function. It is also reasonable from a statistical point of view if the photometric errors are Gaussian: in this case, one can suppose that the probability that an object be detected is an integral over a relevant part of the Gaussian. Specifically, we assume for c(m) the following functional form: (A.3)where mc is the 50% completeness limit and σc sets the width of the completeness function. Figure A.1 shows that our simple model for the control field HLF, the product of p(m)c(m) with p(m) following an exponential distribution and c(m) described in terms of an erfc, reproduces well the data.

thumbnail Fig. A.1

Histogram of H-band magnitudes of stars in a control field region, together with its best fit (an exponential with a complementary error function truncation, see the text).

Open with DEXTER

Appendix A.3: Statistical errors and completeness function

As noted above, statistical errors can be included in our algorithm by convolving the probability distributions pi with appropriate kernels. We expect to have two main sources for statistical errors: photometric errors in the H-band magnitude measurements and errors in the extinction measurements. While the formers are relatively simple to characterize (typically, they will be provided in the star catalogue), the latter are a combination of different sources of errors: the scatter in the intrinsic color of sources and the photometric errors in each band used. If extinctions are derived using the Nicer algorithm, as assumed here, then the variance associated to each extinction measurement can be computed from the expression (A.4)where, following the notation of Lombardi & Alves (2001), we have called k is the reddening vector and C the combined covariance matrix of the observed colors (including both the intrinsic scatter in the color of unextinguished stars and the individual photometric errors).

Looking again at Eq. (30), i.e. the main equation representing our method, we see that in reality the H-band magnitudes and the associated extinction measurements typically enter the problem through the combination mA; moreover, while in the numerator the combination is mnAn and involves therefore measurements of both m and A for individual stars, in the denominator the combination is mAn, with m integration variable. Therefore, in presence of errors these two parts of the expression must be computed separately.

Let us consider first the denominator. Since there the only argument of pi with associated errors is An, the measured extinction for the nth star, we just need to convolve the magnitude distribution with a Gaussian kernel with variance provided by Eq. (A.4).

In the case of the numerator the situation is slightly more complicated, because there we find the combination mnAn. Since An is computed from the observed magnitude of each star (including of course the H-band magnitude), mn and An are correlated. A simple calculation shows that the variance associated to the combination mnAn is (A.5)In this expression the last term is related to the correlation between A and m and is equal to , the square of the photometric error on the H-band magnitude. The factors bJH and bHK are quantities that can be computed from Eq. (12) of Lombardi & Alves (2001), while kH is the reddening vector for the H band, i.e. the ratio AH/A.

Appendix A.4: Convolution integrals

The convolution of the luminosity functions (A.1)and (A.2)with Gaussian kernels representing the statistical errors take simple forms. Calling the variance of the convolution kernel, given depending on the term by Eq. (A.4)or by Eq. (A.5), and using the tilde to represent the convolved H-band luminosity functions, we find for the exponential distribution (A.6)Similarly, for the Gaussian model we find (A.7)

Appendix A.5: Completeness integrals

Finally, we need to compute the integral in the denominator of Eq. (30), involving the HLF p(m) and the completeness function c(m). A simple change of variable casts this integral (similar to a convolution) into a more convenient form: (A.8)\newpage\noindentIn this equation we have called K(A) the result of the integral, retaining its explicit dependence on A. We now consider the derivative K of K: (A.9)and note that, since c is the error function, c is a Gaussian function. Since Eq. (A.9)is also (essentially) a convolution, this allows us to compute K′(A) using formulae similar to the ones of the previous section: as a result, we see that K′(A) remains either an exponential or a Gaussian, depending on the function form of p(m).

Finally, we need to integrate back K′(A) to obtain K. The result we obtain from this procedure is, in case of an exponential distribution, (A.10)For the Gaussian distribution the result contains again the error function: (A.11)If statistical errors have to be taken into account, these expressions simply change into (A.12)and (A.13)

All Tables

Table 1

Different two-dimensional spatial functions ω(x) and corresponding weight functions W(x).

Table 2

Summaries of the results of simulations.

Table 3

YSO clusters identified in Orion A and B.

Table 4

Correspondence between the clusters identified in this work, marked with the subscript 1, and the ones identified from a smoothed version of the YSO catalog (Megeath et al. 2016).

All Figures

thumbnail Fig. 1

A one-dimensional sketch figure that illustrates the difficulties encountered in detecting embedded clusters. The bottom black line shows the extinction profile of a molecular cloud. As a result of the extinction, background stars in the field show an underdensity. Even if an embedded cluster is present, the total observed star surface density does not reveal it, making it virtually impossible to detect the cluster without the use of the technique described in this paper.

Open with DEXTER
In the text
thumbnail Fig. 2

Distributions of measured densities in a simulation with σ1 = 5 and σ2 = 20 (histogram), together with the predicted Gaussian distribution obtained according to the Fisher matrix evaluated from Eq. (31). The excess of small values of is due to the constraint that imposed by the algorithm.

Open with DEXTER
In the text
thumbnail Fig. 3

Distribution of measured total density in a simulation with σ1 = 5 and σ2 = 20, together with the predicted Gaussian distribution (derived from the Fisher matrix). The distribution is essentially unbiased; moreover, because of the anticorrelation between and , the total density has significantly less scatter than both and .

Open with DEXTER
In the text
thumbnail Fig. 4

Distributions of measured densities in a simulation with σ1 = 20 and σ2 = 5.

Open with DEXTER
In the text
thumbnail Fig. 5

Distributions of measured densities in a simulation with σ1 = 10 and σ2 = 10.

Open with DEXTER
In the text
thumbnail Fig. 6

Biases in the measured densities for σ1 = 10 and σ2 ∈ [0,20]. Note how the bias essentially vanishes for σ2 > 5.

Open with DEXTER
In the text
thumbnail Fig. 7

Scatter on and for σ1 = 10 and σ2 ∈ [0,20] (dots), together with the predictions obtained from Eq. (31)(lines).

Open with DEXTER
In the text
thumbnail Fig. 8

Violin plot showing the distribution of measured densities for σ1 = 10 and σ2 ∈ [0,20]. Each elongated structure corresponds to a different value of σ2; its width is proportional to the distribution of measured values of , i.e. effectively it is a histogram displayed vertically. The small red dashes indicate the average values.

Open with DEXTER
In the text
thumbnail Fig. 9

Violin plot showing the distribution of measured densities for σ1 = 10 and σ2 ∈ [0,20]. The small blue dashes indicate the average values.

Open with DEXTER
In the text
thumbnail Fig. 10

Results of the cluster finding algorithm in Orion A using the 2MASS Point Source Catalog. The red contours shows all surface density detections 3σ above the background, while the black contour corresponds to AK =0.3 mag. When displayed in Adobe Acrobat, it is possible to hide thelines, the blackcontours, the red contours corresponding to the, the red dots representing the, and the clusters’.

Open with DEXTER
In the text
thumbnail Fig. 11

Results of the cluster finding algorithm in Orion B using the 2MASS Point Source Catalog (see caption of Fig. 10 for the legend). Note that NGC 2023 is below the detection threshold and appears only as a weak smudge in this image. When displayed in Adobe Acrobat, it is possible to hide the, the, the, the red dots representing the, or the.

Open with DEXTER
In the text
thumbnail Fig. 12

A Gaussian-kernel smoothed density map of the Megeath et al. (2016) YSO list in Orion A (top) to compare with our density map (bottom).

Open with DEXTER
In the text
thumbnail Fig. 13

A Gaussian-kernel smoothed density map of the Megeath et al. (2016) YSO list in Orion B (left) to compare with our density map (right).

Open with DEXTER
In the text
thumbnail Fig. 14

Distribution of PYSO, the probability that our algorithm assigns to each star to be a YSO, for the Megeath et al. (2016) YSO candidates and for all other objects in the 2MASS point source catalog.

Open with DEXTER
In the text
thumbnail Fig. 15

Low-resolution version of the YSO density in Orion. The grey area at ≃ 205° is the Orion OB Ib association, while the lighter area to the right (around ≃ 201°) is OB Ia, containing the 25 Ori cluster (the grey spot at ≃ 201° and b ≃ −18°).

Open with DEXTER
In the text
thumbnail Fig. A.1

Histogram of H-band magnitudes of stars in a control field region, together with its best fit (an exponential with a complementary error function truncation, see the text).

Open with DEXTER
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.