Issue |
A&A
Volume 620, December 2018
|
|
---|---|---|
Article Number | A27 | |
Number of page(s) | 20 | |
Section | Galactic structure, stellar clusters and populations | |
DOI | https://doi.org/10.1051/0004-6361/201833042 | |
Published online | 26 November 2018 |
Multiplicity and clustering in Taurus star forming region
II. From ultra-wide pairs to dense NESTs⋆
1
Univ. Grenoble Alpes, CNRS, IPAG, 38000 Grenoble, France
e-mail: isabelle.joncour@univ-grenoble-alpes.fr
2
Department of Astronomy, University of Maryland, 20742 College Park, MD, USA
3
Astronomy Department, University of California, 94720-3411 Berkeley, CA, USA
Received:
18
March
2018
Accepted:
31
August
2018
Context. Multiplicity and clustering of young pre-main sequence stars appear as critical clues to understand and constrain the star formation process. Taurus is the archetypical example of the most quiescent star forming regions that may still retain primeval signatures of star formation.
Aims. This work identifies local overdense stellar structures as a critical scale between wide pairs and loose groups in Taurus.
Methods. Using the density-based spatial clustering of applications with noise (dbscan) algorithm, and setting its free parameters based on the one-point correlation function and the k-nearest neighbor statistics, we have extracted reliably overdense structures from the sky-projected spatial distribution of stars.
Results. Nearly half of the entire stellar population in Taurus is found to be concentrated in 20 very dense, tiny and prolate regions called NESTs (for Nested Elementary STructures). They are regularly spaced (≈2 pc) and mainly oriented along the principal gas filaments axes. Each NEST contains between four and 23 stars. Inside NESTs, the surface density of stars may be as high as 2500 pc−2 and the mean value is 340 pc−2. Nearly half (11) of these NESTs contain about 75% of the class 0 and I objects. The balance between Class I, II, and, III fraction within the NESTs suggests that they may be ordered as an evolutionary temporal scheme, some of them getting infertile with time, while other still giving birth to young stars. We have inferred that only 20% of stars in Taurus do not belong to any kind of stellar groups (either multiple system, ultra wide pairs or NESTs). The mass-size relation for stellar NESTs is very close to the Bonnor–Ebert expectation. The range in mass is about the same as that of dense molecular cores. The distribution in size is bimodal peaking at 12.5 and 50 kAU and the distribution of the number of YSOs in NESTs as a function of size exhibits two regimes.
Conclusions. We propose that the NESTs in their two size regimes represent the spatial imprints of stellar distribution at birth as they may have emerged within few millions years from their natal cloud either from a single core or from a chain of cores. We have identified them as the preferred sites of star formation in Taurus. These NESTs are the regions of highest stellar density and intermediate spatial scale structures between ultra-wide pairs and loose groups.
Key words: methods: statistical / binaries: visual / open clusters and associations: individual: Taurus / ISM: structure / stars: formation
Table A.1 is only available at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/620/A27
© ESO 2018
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
The Taurus star-forming region is the archetype of a quiescent and sparse star distribution associated to a low surface stellar density (ρ ∼ 0.8 pc−2, ∼5 deg−2), spread over an area of ≈ 420 pc2 on the sky for the central part in Taurus. A high-multiplicity fraction and loose spatial clustering are key features in this region. On the clustering side, it is now well established that the stellar population of Taurus is distributed in a few groups of 1.5–2.5 pc size (Jones & Herbig 1979; Gomez et al. 1993; Kirk & Myers 2011). On the multiplicity side, the companion frequency of young stars in Taurus is generally twice that of field stars (Duchêne & Kraus 2013). Moreover the presence of most probably coeval ultra-wide binaries appears to be a crucial outcome of the star formation process in that region (Joncour et al. 2017; hereafter Paper I), a conclusion that also applies to the Perseus region for the embedded young stellar objects (YSOs; Sadavoy & Stahler 2017). The higher order multiplicity property of these ultra-wide pairs (UWPs) in Taurus suggests a close link between the spatial scales associated with multiple systems and with UWPs, ≲1 kAU and ≲60 kAU respectively. Indeed, the UWPs population appears to extend the traditional multiple system regime by almost two orders in magnitude in spatial scale. In Paper I, we have proposed that this property illustrates the remaining imprints of a molecular core or clump fragmentation cascade scenario that gave birth to young stars. Many of the UWPs in Taurus are made of two multiple systems, and are therefore “small stellar groups” of up to five members. However, the method used to identify UWPs was based on the mutual nearest neighbor property, which could only identify pairs. Higher order multiplicity (groupings of three or more separate systems) were thus not addressed in Paper I. We now wish to detect local stellar groups (overdensities) of any order, and thus need to use a different methodology. We have adopted a clustering algorithm, which is appropriate for this task.
In continuity with Paper I, the present study aims at analyzing spatial structures within the Taurus stellar population at intermediate length scales. In order to reach this objective, we propose a new methodology based on bottom-up approach using local physical properties to uncover explicitly spatial structures at this scale with a high level of confidence. The paper is organized as follow: in Sect. 2, we describe the methodology that allows the identification with high reliability of the new spatial structures that we called NESTs (Nested Elementary STructures) described in the Sect. 3. In Sect. 4, we characterize those structures and highlight their fundamental properties, while in Sect. 5, we discuss further properties and the nature of the NESTs and give our conclusions in Sect. 6.
2. Method
2.1. Data
We used the same Taurus catalog as described in Paper I. It contains 338 members of Taurus taken from the full census of members down to 0.02 M⊙ (Luhman et al. 2010) that we complement by the multiplicity information, with a high degree (67%) of completeness. Following the seminal work of Larson (1995), we aim at distinguishing clustering from multiplicity. To this end, we grouped together all stars that are separated by less than 1 kAU to form a single entity and simply called them “multiple systems”. While this threshold is somewhat arbitrary, we selected it in Paper I based on two complementary arguments. First of all, it is the lower threshold that defines by consensus wide binaries with separation larger than 1 kAU (Reipurth & Mikkola 2012; Tokovinin 2017). Second, this threshold is close to the beam (seven arcsec) of the Spitzer-MIPS instrument, which is used in the classification of young stellar objects, particularly the most embedded ones. Therefore, the census of neighbor stars is completely independent of the Class of the Taurus members for all separations larger than this threshold.
We note that Kraus et al. (2017) have presented a more recent catalog that includes a new distributed population of diskless stars. The newly identified members are distributed over a broader footprint and belong primarily to the dispersed stellar population, which is typically older than the “classical” stellar population of Taurus. Therefore leaving this population aside does not significantly affect our conclusions surrounding local overdense stellar regions and we retain the catalog from Paper I for our analysis.
2.2. Main approaches to identify stellar subgroups
The typical tools that are used to study the stellar spatial distribution may be subdivided in two main categories. The first focuses on spatial analyses in a global sense, aiming at characterizing the sources distribution as a whole (e.g., the two-point correlation function is used to evaluate the degree and the regimes of clustering, and the one-point correlation function to probe the binary regime range). The second approach aims at extracting (sub-)structures as topological entities using clustering algorithms, to further characterize them and derive their geometrical and physical properties. This task of finding clusters in a set of points has a long history in the field of applied mathematics and computer science. But intuitively, clusters can be seen simply as regions of enhanced stellar density with respect to their surroundings. Despite this simple definition, decades of dedicated and still on-going work show that there exist several methods to identify clusters (see Appendix C for a short review and references therein). Each of these methods has its pros and cons, and the optimal algorithm depends heavily on the type of data at hand and on the scientific goals.
The global distribution analysis and the substructures identification are complementary studies and, in some cases, the same tool can be used in both approaches. In particular, this is the case of the Minimum Spanning Tree (MST; Kruskal 1956; Prim 1957; Dijkstra 1960; Zahn 1971). This method is a graph procedure that connects all points in an ensemble through a simple path such that the total distance length of its edges is minimal. The method then generates clusters by deleting the longest individual segments of the MST above a heuristically chosen threshold, such that clusters are defined as the remaining connected subgraphs. In the field of astronomy, the MST method was first applied in extragalactic astrophysics by Barrow et al. (1985). Over the last decade, it is widely used in the context of young star clusters to identify subgroups of young stars, although alternate probabilistic techniques have been proposed recently (e.g., Kuhn et al. 2014).
Gutermuth et al. (2009) have proposed an empirical procedure to identify a critical MST length threshold above which the MST edges may be removed to identify clusters (plus usually a user-based condition on the minimum objects that must be located in the subgroups to be considered as such). This technique was notably used by Kirk & Myers (2011) to study the stellar groups properties in four star-forming regions (Taurus, Lupus III, Cha I, and IC 348).
The MST method can also be used to define useful parameters to quantify the degree of sub-structuring, such as the Q ratio of the average branches length of the MST over the mean separation of all pairs of stars (Cartwright & Whitworth 2004; Schmeja et al. 2008; Wright et al. 2014; Parker 2014), or to investigate mass segregation (Allison et al. 2009). A comparison and discussion on different methods assessing a global measure of the sub-structuring degree and the mass segregation can be found in Schmeja (2011), Küpper et al. (2011), and Parker & Goodwin (2015). The strength of the MST method is that it is a non-parametric technique, but its major drawbacks are inherent to all single linkage techniques, namely the artificial chaining effect, and the impossibility of handling noise and outliers.
2.3. dbscan
In this work, we have chosen to identify spatial features at a given scale using local density properties and a connectivity rule to link together adjacent stars having similar stellar density neighborhood. The spatial structures thus identified are defined as connected “density cluster”, following previous graph-principled work (Hartigan 1975). To efficiently identify density clusters, we have chosen the non-parametric, one-level dbscan (Density-Based Spatial Clustering Applications with Noise) clustering algorithm developed in the Knowledge Discovery Database field (Ester et al. 1996, see Appendix B). dbscan has several advantages over other clustering algorithms (see Appendix C): the partition of stars within density components is unique, it allows the detection of clusters of arbitrary shape and size at a global scale from local requirements, and it is the only clustering algorithm that explicitly labels noise and outliers. While other clustering algorithms induce a complete partitioning of the ensemble, dbscan proposes partial clustering. Thus, unlike single linkage and MST algorithms, it is resistant to noise and outliers. The key algorithmic idea of dbscan is to incrementally group stars, provided that (1) they are direct or indirect neighbors, and (2) the stellar density neighborhood of neighbors satisfied the selected criteria. dbscan directly searches for connected dense regions in space separated by local stellar density drops.
In dbscan, the “density cluster” Cϵ, Nmin sets are determined by the choice of two local free parameters: a distance ϵ and a number of stars Nmin. Stars are grouped together if they satisfy two conditions: (1) all stars within a set Cϵ, Nmin have a minimum number of stars Nmin within a radius ϵ, (2) each star within a set Cϵ, Nmin is connected to any other star of the same set by a sequence of neighbors separated at most by a distance ϵ. With these properties, a density cluster Cϵ, Nmin is said to be maximal among connected sets, i.e., Cϵ, Nmin is not contained in any larger density cluster defined at the same local ϵ level. As we explain next, the values of ϵ and Nmin that we use in this study have been determined based on the confidence level that local structures are distinct from random fluctuations.
2.4. Selecting the algorithm parameters
The clustering property in Taurus is traditionally characterized using the two-point correlation function (TPCF, Gomez et al. 1993; Larson 1995; Simon 1997; Gladwin et al. 1999; Hartmann 2002; Kraus & Hillenbrand 2008). This method reveals an “elbow” in the 4–40 kAU range, which is interpreted either as the signature of the Jeans instability in cool dense molecular cores (Larson 1995) or as an indication for a quasi-constant surface density of pairs produced by random motions that smooth out primordial stellar lumps (Kraus & Hillenbrand 2008). Either way, the TPCF does not yield a unique plausible value for ϵ and we thus have sought another approach.
In Paper I, we have introduced the one point correlation function defined as the probability of having a closest star located at r from any chosen star at random. It provides a local analysis over all the range of r. This function differs and complements the two-point correlation function, the latter being the probability of having pairs separated by r. At small r, these two functions describe a same spatial property. At larger r, the two-point correlation function gives an “integrated” view of the spatial characterization of the spatial distribution. We defined
as an estimator of the one-point correlation function, where wT is the distribution of nearest neighbor distances in Taurus, and wR the same distribution for a random distribution with the same mean surface density.
This Ψ function, which encapsulates the coarsest trends in the stellar spatial distribution of star, reveals three different spatial regimes in Taurus (clustering, inhibition and dispersion). The clustering regime extends over all distances associated with an excess of stars that have a nearest neighbor less than rc = 0.1° ≈0.24 pc (≈50 kAU) over a random distribution, as seen in Fig. 1 (adapted from Fig. 4 in Paper I). This introduces a natural benchmark for the local scale around which the value of ϵ has to be set.
![]() |
Fig. 1. One-point correlation function Ψ (blue symbols and blue dot-dashed line fit) and estimated pair correlation function g (gray symbols). This figure is reproduced from Paper I in order to highlight the choice of the ϵ parameter as rc, the intersection point between the horizontal line (random expectation) and the Ψ function. This value indicates the scale at which we want to investigate local overdense structures. |
We have endeavored to set the two free parameters ϵ and Nmin of the dbscan algorithm by requiring that local overdense features are detected at a high level of significance (α = 99.85%, i.e., three σ, if the distribution was to be normal) above random expectation. In Paper I, we have derived the theoretical cumulative distribution 𝒲k(r) of the k-nearest neighbor distribution in the case of a 2D random distribution:
where Γ is the Gamma function, ρ = 5 stars deg−2 is the mean stellar surface density of Taurus, and r is the distance to the k-nearest neighbor. The value of 𝒲k(r) as described in Eq. (2), gives the probability for a random star to have a k-nearest neighbor located at a distance of r or less. In turn, α = 1 − 𝒲k(r) represents the degree of significance of an over-density defined by k+1 stars. Setting rc = 0.1° as the relevant local length scale, we see in Fig. 2 that the probability of having a first companion (i.e., two stars within rc) is fairly high, i.e., 𝒲1(rc)≈0.15. Similarly, the probability of having 2 companions within rc is ≈0.11. We must therefore consider the third nearest neighbor cumulative distribution. With Nmin = 3 + 1 = 4, we must set ϵ = 0.112°, very close to the previously identified length scale benchmark rc, to achieve the required confidence level. In summary, the two free parameters of the dbscan agorithm have been set based on the nearest neighbor statistics analysis, with a requirement to identify local spatial features with a 99.85% level of significance above random expectations.
![]() |
Fig. 2. Cumulative distribution of the first nearest neighbor (W1(r), dashed blue), second nearest neighbor (W2(r), dot-dashed green) and the third nearest neighbor (W3(r), solid black) in a random spatial distribution. The solid red vertical line is the rc value. |
3. NESTs detection
3.1. A new type of structures in Taurus
With the parameters ϵ and Nmin set to the values defined in the previous section, we ran dbscan on the Taurus catalog. We have identified 20 distinct stellar overdense structures, which we have termed Nested Elementary STructures (NESTs). Each NEST shelters between four stars – the minimum number of stars imposed by the method – and 23 stars, with a mean (resp. median) number of eight (resp. six) stars (see Table 1, Fig. 3 and Appendix A). Considered as an ensemble, the NESTs contain nearly half of the entire stellar population in Taurus. Eighteen of these NESTs are located in the three principal gas filaments of the Taurus molecular cloud.
Properties of the NESTs.
![]() |
Fig. 3. Spatial distribution of the 20 NESTs identified in this study. The barycenter of each NEST is indicated by a colored plus mark and all the members are colored the same way. Star not pertaining to any NEST are shown as black dots. The MST defined by the NEST is shown with solid gray lines. NEST are numbered based on their increasing right ascension order (see Table 1). The Planck dust emission (217 GHz) is mapped as the background. |
To estimate the size of each NEST, we have first defined its convex hull as the smallest convex set of points that contains all of its members. The polygonal window drawn from this set of points provides an estimate of the minimal area A enclosing all systems within the NEST. We have then evaluated the typical radius of each NEST as RH = (A/π)1/2; they range from ≈5 to ≈80 kAU. The average of the first nearest neighbor separation (1-NNS) within all the NESTs, r1, is about 20 kAU (i.e., 0.1 pc). Thus NESTs appear as a new, intermediate, type of structure, on similar physical scales to UWPs but containing more elements, yet significantly smaller and denser than the already identified loose groups.
3.2. Reliability of the NESTs
Algorithms designed to identify overdense substructures within an ensemble of objects can sometimes produce spurious groupings and incorrectly split large groups into smaller sub-units. To test the reliability of these substructures, we have first verified their separability (i.e., how distinct NESTs really are from one another) and their degree of dilution (i.e., how significantly does the overdensity stands out relative to the stellar population outside the NESTs).
To test the separability between NESTs, we have compared the internal spacing of stars belonging to a single NEST to their distance to any other NESTs. The evaluation of this separability for each NEST is performed by introducing a separability ratio αN = DN/rN, defined as the ratio of the nearest distance (DN) from a star in a given NEST to a star belonging to any other NEST over the mean 1-NNS within the NEST (rN). Large values of the ratio indicate well-separated NESTs. All but 2 NESTs have αN ≳ 3, with a median value of ≈11. In other words, stars in a NEST are on average one order of magnitude closer to other members of the same NEST than to stars in other NESTs. NESTs number 10 and 12 are the least separated NESTs, raising the possibility that they are two substructures within a larger one.
In a second test, we have evaluated the degree of dilution of the NESTs relative to the more dispersed population, that is the stellar population outside the NESTs. Specifically, for a given NEST, we have first computed the nearest distance d* between a NEST member and a star of the dispersed population and we have defined the dilution parameter βN = d*/rN. Smaller values of βN indicate that it is harder to distinguish a NEST from the surrounding population, as the overdensity of the NEST becomes increasingly marginal. Values for βN range from ≈1 to ≈13, with a median of 4. Therefore, NESTs are not in complete isolation but rather immersed in the more dispersed population.
As Fig. 4 shows, the αN and βN quantities are positively correlated (p-value of 10−2 based on nonparametric Spearman rank correlation test). Looking at the symbol size that is proportional to the mean 1-NNS, we note that the quantities rN and αN on the one hand, and rN and βN on the other hand, appear significantly anti-correlated (p-value of 10−3 for both quantities). This indicates that the more compact a NEST is, the further it is from the nearest NEST and the more its members are separated from the dispersed population. Taking these correlations together, we conclude that there is a connection between the internal local density of the NEST and the local density of the immediate stellar neighborhood. Consequently, the NESTs and the dispersed population are somehow connected and must be interpreted in a comprehensive model.
![]() |
Fig. 4. Dilution factor βN as a function of the inter-NEST separation ratio αN. The size of the marks is indicative of the mean first nearest neighbor rN. These ratios are markers of the reliability of the NESTs. Values higher than 2 (dashed black lines) indicate that a NEST is well separated both from the other NESTs and from the stellar population outside the NESTs. |
3.3. Robustness of the NESTs.
We now focus on the NESTs robustness. Since their detection is predicated on setting the two local parameters ϵ and Nmin, we must explore whether changing the values of these parameters affect the results of our analysis. The results of these tests are illustrated in Fig. 5. In a first test, we have varied Nmin while keeping ϵ at its fiducial value. Increasing this parameter by one (Nmin = 5) automatically eliminates all NESTs that contain only four stellar systems. However, it does not affect the detection of all others (see top panel of Fig. 5). Conversely, all fiducial NESTs were retrieved if Nmin is decreased to 3, with the addition of three new features, one located just south of NEST number 17 and the two others in the immediate vicinity of two big NESTs (numbers 2 and 14). Because they contain only three systems, these newer NESTs have a slightly higher probability of occurring from random fluctuations, and we place a 97.9% confidence level on their physical nature.
![]() |
Fig. 5. Robustness tests of the detection of NESTs in the three main filaments of Taurus. The fiducial NESTs are indicated by black plus symbols and labeled in each panel. Each panel displays alternate sets of NESTs resulting from varying the selecting criteria. Top panel: the NESTs identified if Nmin is reduced to three (large red filled squares) or increased to five (small white filled squares). Middle and bottom panels: ϵ has been increased and decreased by 10 % and 50 % percent, respectively. In both panels, small orange squares and large dark blue squares represents the results of increasing and decreasing ϵ, respectively. |
We have then evaluated the effect of the local radius ϵ by varying its value by ±10% and ±50% (see central and bottom panels of Fig. 5) while keeping Nmin = 4. This range of variations is associated with the full recovery of the multiscale stellar structure in Taurus (Joncour et al., in prep.). A decrease of ten pourcent of ϵ have had no effect on the NEST identification, while an equivalent increase lead to the identification of a new feature (containing four stars) along filament number 1, south of NESTs numbers 17 and 18, at the same location where a new feature appeared when decreasing Nmin. Furthermore, the same ten pourcent increase on ϵ has driven the merging of the two NESTs numbers 10 and 12 into a single larger NEST. This is a consequence of both NESTs having the lowest values of αN (see Table 1) indicating that they are only marginally separated. Increasing ϵ by 50% did not further alter the set of identified NESTs as no other pair of NESTs are similarly poorly separated. However, decreasing ϵ by 50%, down to ∼30kAU (∼0.15 pc), has resulted in only eight NESTs being identified which corresponds to the most compact “cores” of the fiducial NESTs. The confidence level for these eight detections above random is extremely high due to their high surface density, reaching 99.997%.
In conclusion, while the exact number and detailed properties of NESTs are dependent on the parameters Nmin and ϵ, the detection of stellar NESTs in Taurus, as well as their gross properties, are robust results. In the following analysis, we have retained the 20 fiducial NESTs identified in Sect. 3.1. These are embedded in a more dispersed or hierarchically structured population on larger scales.
3.4. NESTs in the context of previously identified groups in Taurus
The identification of loose stellar groups in Taurus is a topic that has a rich history (e.g., Gomez et al. 1993; Kirk & Myers 2011). It is natural to compare the newly identified NESTs to these groups. In Table 1, we indicate to which group identified by Gomez et al. (1993), Kirk & Myers (2011) each NEST belongs. Five NESTs are located at the center of loose groups identified by Kirk & Myers (2011) in a one-to-one correspondence. On the other hand, the remaining 3 loose groups of Kirk & Myers (2011) are in fact substructured, with two or three NESTs in each. Furthermore, our analysis has revealed seven new substructures, each containing between 4 and 6 stars. These substructures could not have been identified by Kirk & Myers (2011), as these authors have used an arbitrary threshold of 10 stars per group. Nonetheless, based on our analysis, we believe that these small NESTs are physical coherent structures, given our 99.85% significance level above random spatial fluctuations.
4. NESTs properties
4.1. NESTs stellar content
The NESTs contain nearly half of the Taurus stellar population, yet their total projected surface area is a small fraction of the star-forming region. Focusing first on the three central main filaments within the cloud, which appear as the main sites of ongoing star formation in Taurus, the projected area of the convex hull formed by all stars they contain is 33 deg2 while the total projected area of all NESTs located within the three main filaments is only 0.27 deg2. Thus, the NESTs cover less than 1% of the projected area of the central filaments. Expanding this analysis to the whole cloud, including the northern and southern stellar components illustrated in Fig. 3, shows that the NESTs cover less than 0.1% of the cloud’s projected area (0.32 deg2 out of 202 deg2).
Consequently, half of the stellar population in Taurus is concentrated in tiny, high density pockets of stars, with stellar densities that range from 50 to 2500 stars pc−2, with a median density that is ≈100 times higher than the average density in the whole cloud. This range of stellar surface densities places all NESTs above the median surface densities in nearby star-forming regions (Bressert et al. 2010), including Ophiuchus and Orion. Indeed, the densest NESTs are close to the maximum found in these star forming regions, although it must be noted that their analysis is unable to probe densities exceeding ≈1000 stars pc−2.
4.2. NESTs spatial distribution and geometry
To further study the nature of NESTS, we have computed their “ellipsoid hull” which are the ellipsoid of minimal area such that all given stars within the NEST lie just inside or on the boundary of the ellipsoid. These ellipses are characterized by their semi-major and semi-minor axis (a, b), eccentricity (e), centroid (αC, βC) and major axis position angle (PA). These quantities are listed in Table 1 and allows other studies of NESTs, such as the cumulative distribution of their position angles, the spatial distribution of their centroids, and derivation of their most probable intrinsic 3D geometrical properties; all 3 topics that will be described in the following.
As is evident in Fig. 6, all but two of the NESTs appear tightly concentrated along the three main filaments of the Taurus molecular cloud. Considering both the location and elongated geometry of the NESTs, we have noticed in Fig. 6 that the NESTs appear preferentially oriented along the gas filaments. Beyond this visual inspection, we wanted to go further and present a quantitative argument. To study the relative alignment of the NESTs, we have used the orientation of the local magnetic field traced by linear polarization measurement of background stars as reference. Based on observations, gas filaments in molecular cloud run rather perpendicular to that local direction (Chapman et al. 2011; Planck Collaboration Int. XXXII 2016) even if in denser environments, the orientation may be either parallel or perpendicular (Li et al. 2013; Zhang et al. 2014), although a much more complicated picture has been recently obtained in massive star forming regions (Koch et al. 2018). These findings may be understood in the framework of a recent theoretical work showing that these two configurations (i.e., at low gas column density, magnetic field tends to be orthogonal to the density gradients, while it tends to be parallel to them at high gas column density), are shown to be the two preferred modes that a turbulent magnetized gas found at equilibrium (Soler & Hennebelle 2017). Taurus is well known to be a low density environment, and as such most of the magnetic fields are perpendicular to the main filaments, probably fed by the gas along the striations (Palmeirim et al. 2013). Figure 7 reveals quantitatively that the position angle of the NESTs is indeed preferentially oriented perpendicular to the local magnetic field, along the filaments. In comparison, Ménard & Duchêne (2004) have shown that the symmetry axis of individual YSO disks in Taurus is randomly oriented relative to the magnetic field, while the major axis of dense cores is intermediate between these two populations, neither completely random nor almost always perpendicular to the magnetic field (Lee & Myers 1999; Ménard & Duchêne 2004). This study shows quantitatively the very close connection of the NESTs and the gas filament.
![]() |
Fig. 6. Convex hull (dashed line) and spanning ellipsoid hull fitted on the NEST. The colored horizontal line at the bottom of each NEST scales as 0.1 deg. |
![]() |
Fig. 7. Cumulative distribution function of the difference in position angle between the local magnetic field and the NESTs semi-major axis (dashed-dotted blue line). The long dashed light blue histogram corresponds to the subset of NESTs whose semi-major axis is more than twice as large as their semi-minor axis and, thus, whose position angle is best defined. The green solid and red hatched histograms are the corresponding distribution for the symmetry axis of YSO disks and dense cores, respectively (Ménard & Duchêne 2004). The black dotted line is the function expected for a randomly oriented sample. |
We have then studied the spacings between NESTs by themselves using the MST built from the set of the NESTs centroids (see Fig. 3). We have found that the median length of a segment of the MST is 2.3 pc. However, given the distribution of NESTs along the filaments, the distribution of segment lengths is skewed by those that connect NESTs located in different filaments. Nonetheless, the median length of the segments in the northern two filaments is 1.9–2.1 pc (with a dispersion of ≈0.8 pc), i.e., not significantly smaller. On the other hand, the three NESTs identified in filament 3 are separated by 0.7 and 1.1 pc, respectively, significantly closer to one another. If the NESTs 10 and 12, which are located in that southernmost filament, instead constitute a single, larger NEST (see Sect. 3.3), the NESTs are distributed rather evenly along filaments, with a typical spacing of ≈2 pc.
We now move on analyzing the intrinsic geometry of the NESTs. The apparent elongation of the NESTs informs their three dimensional structure, but projection effects must be taken into account to infer the latter. The simplest three dimensional ellipsoids shapes, such that two of three axis lengths are equal, are either prolate (one major axis la, two same minor ones lb) or oblate (two same major axis la, one minor one lb). The aspect ratio q is then defined as q = lb/la. If one considers an ensemble of identical spheroids that are randomly oriented in 3D space, the relationship between the intrinsic 3D aspect ratio q of these spheroids and the expected value of their projected 2D aspect ratio qp = b/a may be estimated (Myers et al. 1991) as:
The expected cumulative function of the projected aspect ratio qp (Eq. (3)) for randomly oriented 3D oblate and prolate ellipsoids are plotted in Fig. 8 along with the empirical cumulative distribution of the projected aspect ratio qp of all 20 NESTs. It is extremely rare for oblate ellipsoids to project into a high aspect ratio structure, as this can only happen if they are observed exactly perpendicularly to their main plane. Therefore, the projected ratio qp associated to a distribution of oblate ellipsoids spans the 0.5 − 1 range, excluding an oblateness hypothesis for the NESTs. The NESTs are then most probably prolate. From the Fig. 8, we see indeed that at small values of qp, less than 0.35 (i.e., q less than ∼0.25, highly elongated structures), the observed blue points associated with the NESTs follow the prolate curve. But at higher values, it deviates from a prolate randomly oriented distribution, with half of the NESTs having a value of qp around 0.4, its maximal value being 0.7 (q around 0.6). This results suggest that the 3D distribution of the prolate structures are not randomly oriented. It is what we expect for the NESTs structures being aligned with the gas filaments. It is thus tempting to assume that the NESTs are forming inside and along the gas filaments, keeping through time their pristine prolate structure despite the dynamical effects (gas expulsion, and dynamical star interaction).
![]() |
Fig. 8. Cumulative distribution of the projected axial ratio qp (Eq. (3)) for a population of randomly oriented oblate (black) and prolate (red) ellipsoids for a fixed intrinsic (3D) axial ratio q. The blue solid circles mark the cumulative distribution of the 2D ratio qp of the NESTs. |
4.3. Mass and size distribution of NESTs
We now study the distribution of the NESTs mass based on sum of the mass of the objects as given in full catalog of Paper I. It should be noted here that the reported mass of a NEST is evaluated by the sum of the primary masses of individual object, as given in Paper I, as masses for stellar companions are incomplete at this stage. It is then a lower mass estimation for the mass of the NESTs. The distribution of the NESTs’ mass is shown in Fig. 9 as ΦN = dN/dlogm, where N is the total number of the NESTs per logarithmic mass bin and m the mass of the NESTs. It is broad and heavy-tailed, ranging from ≲1 M⊙ to ≳10 M⊙. In the logarithm mass space, a normal distribution provides an acceptable fit with a mean and standard deviation of and σ(logm)=0.33 ± 0.05, respectively. Conversely, the high-mass end (m ≥ 2 M⊙) of the distribution is well reproduced by a power law (logΦN ∝ mΓ) of index Γ = −0.50 ± 0.1. We note that this index is smaller than the Salpeter reference slope (ΓS = −1.35) observed in stellar mass functions (Bastian et al. 2010), but in reasonable agreement with the slope reported for CO clumps (e.g., Γ = −0.65 and −0.85; Kramer et al. 1996; Heithausen et al. 1998) and more recently for massive clumps (Γ = −0.32, albeit at higher masses than NESTs; Liu et al. 2012). This analysis and this connection made with the gas clump should however be taken with caution due to the low numbers statistics, as indicated by the fairly large uncertainties reported in Fig. 9.
![]() |
Fig. 9. Mass Function of NESTs (dashed histogram). The graphic shows the mass probability density function (ΦN = dN/dlogm) as a function of logm, where N is the total number of the NESTs and m the mass of the NESTs. For comparison, we report also the mass functions of the H13CO+ cores (Onishi et al. 2002, light purple) and the near-infrared extinction dust cores (Schmalzl et al. 2010, light green). The red curve and black dashed curve indicate respectively the power law dN/dlogm ∝ m−Γ fit for the high mass NESTs distribution (red points) and normal fits to the whole observed mass distribution expressed in logarithm of the mass. |
Contrary to the mass distribution, the size distribution of NESTs is clearly bimodal (see left panel of Fig. 10). The two peaks occur at ≈12.5 kAU and ≈50 kAU. Moreover, we have found that the number of YSOs inside each NEST is dependent on the NEST size, possibly revealing two distinct regimes, as illustrated in Fig. 11. All the smallest NESTs (≲30 kAU) contain 4–6 systems, whereas larger NESTs are characterized by a steady increase in the number of members with the NEST’s size.
![]() |
Fig. 10. Size distribution of NESTs (which exhibits two peaks, at 12.5 and 50 kAU, respectively; left panel), H13CO+ cores (Onishi et al. 2002; central panel) and near-infrared extinction dust cores (Schmalzl et al. 2010; right panel). |
![]() |
Fig. 11. Number of YSOs in NESTs as a function of their size. |
While we have found no correlation between the NESTs’ radius and their stellar density, there is a positive correlation with their total stellar mass (see Fig. 12). A Pearson correlation test indicates that this correlation is highly significant (10−5 false alarm probability). To quantify this correlation, we perform a power law (m ∝ rγ) Deming fit that takes into account uncertainties on both independent quantities. Uncertainties on the radius of the NESTs are taken to be the difference between the radius of the convex hull (RH) and the geometrical radius computed from the minimum spanning ellipsoid fit (). The uncertainty on the total mass is dominated by uncertainties on the individual stellar masses, which may be as high as 50% (Kirk & Myers 2011). We adopt this conservative estimate in our analysis. From the Deming fit, we derive γ = 0.94 ± 0.21, i.e., a nearly linear correlation. Thus the mass-radius relationship for the NESTs is markedly shallower than that of star clusters (for which the power law exponent is about 5/3, Pfalzner et al. 2016, and references therein) and of prestellar clumps and cores in the simulations (Lee & Hennebelle 2016, and references therein). On the other hand, the nearly linear behavior is reminiscent of the correlation expected for isothermal critical Bonnor–Ebert spheres, the assumed conditions setting the onset of gravitational collapse against thermal support. In that situation, the power law exponent is predicted to be exactly unity. This is consistent with observed core properties, as summarized in the work of Motte et al. (2018, see their Fig. 7). As initially reported in Motte et al. (2001) and confirmed by Könyves et al. (2015), the protostellar cores that are dominated by gravity are close to the linear relation m ∝ r, unlike the cores dominated by turbulence which have the power law m ∝ r2.
![]() |
Fig. 12. Mass-radius relation for NESTs (black symbols labeled by NEST number). The solid black line is the best-fitting power law fit to all data points while dashed lines represent lines of constant column density (3 × 1022 cm−2: red line; 3 × 1020 cm−2: blue line). |
5. Discussion
5.1. NESTs: Preferred sites of star formation
Comparing the population of YSOs located inside and outside of NESTs can give us information on the physical nature and origin of NESTs. Most importantly, Class I sources are the most embedded, hence likely the youngest, YSOs in Taurus. Sites of ongoing star formation are therefore expected to host a high fraction of Class I sources. Table 2 summarizes the classification of all objects inside and outside of NESTs.
Classification of YSOs inside (IN) and outside (OUT) the NESTs.
The relative proportions of Class II and, III sources inside the NESTs is not significantly different from that of the more dispersed population, despite a slightly lower proportion of Class III sources. On the other hand, the NESTs host a proportion of Class I sources in Taurus that is ≈3 times higher than the stellar population outside of NESTs (18.5 and 5.9%, respectively). A standard Pearson χ2 statistical test indicates that this difference is significant at the 99.9% confidence level, in other words Class I sources are preferentially found within NESTs.
Besides the fact that almost 75% of all Taurus Class I sources belong to NESTs, they are concentrated in just 11 of those. This high concentration reveals that these NESTs represent regions the most presently active in Taurus. The remaining nine NESTs contain only Class II and III objects suggesting that nearly half of the NESTs are getting infertile while the other half has experienced recent star formation. Indeed, the ratio of the number of Class I versus Class II sources and the ratio of Class II to Class III sources within NESTs suggest an evolutionary temporal scheme (see Fig. 13). In particular, the proportion of Class I objects decreases as the proportion of Class III objects increases. When the fraction of Class III objects inside a NEST reaches 60%, there are no associated Class I objects, suggesting that these NESTs are the oldest. Conversely, NESTs with a high proportion of Class I objects (80%) have no Class III objects. A NEST with such a YSOs content is thus amongst the youngest.
![]() |
Fig. 13. Fraction of Class I (Black), II (Red) and, III (Green) objects within each NEST. From right to left, the NESTs are ordered by decreasing fraction of Class I (or Class II if there is no Class I object), which could correspond to an evolutionary sequence from the younger to the older. |
Past studies in Taurus (e.g., Luhman et al. 2010) show that Class III sources are more dispersed than Class II themselves being more dispersed than Class I objects. In our catalog, the mean 1-NNS of all Class I, II and III objects are 0.15, 0.34 and 0.51 pc, respectively (see Table 3). This has been interpreted in the past as a consequence of the least evolved objects being preferentially concentrated near the gas filaments and the more evolved ones being much more widely dispersed.
Mean 1-NNS of class I, II and III for YSOs located within NESTs, outside the NESTs and for all of them.
The identification of NESTs as physical substructures within Taurus prompts us to revisit this finding. For all classes, objects located outside the NESTs have larger mean 1-NNS than the overall population and these mean 1-NNSs show a marked increase from Class I to Class III. This is in line with the previous interpretation of a dispersed population of Class III sources throughout the cloud. However, the mean 1-NNS for objects located within NESTs shows a qualitatively different behavior. Strikingly, while there is also a slight marginal increase of the 1-NNS from Class I to Class III objects, all these values are consistent with a single value for all the 3 Classes given their associated uncertainties (see Table 3). This shows that, inside the NESTs, neither Class II nor Class III sources are more widely spread out than Class I sources, suggesting that the spacing between objects of a given class within NESTs does not depend on evolutionary stage.
In an effort to search for signs of the expansion of NESTs, we searched for any correlation between the population and proportion of Class I, II and, III and their radius or surface density but found none. This suggests that there is no detectable dynamical expansion of NESTs at this scale, i.e., NESTs remain tightly packed as they age. In that respect, the observed NESTs may be pristine spatial imprints of the stellar distribution at birth. Most interestingly, the spatial distribution of the lowest mass YSOs appears different at a significant level than that of the most massive ones. Specifically, we have found that the majority of the former are found outside the NESTs, in the more distributed population. This aspect will be developed in more details in a forthcoming paper.
5.2. NESTs as intermediate spatial scale structures
With a typical size of 0.1–0.25 pc, the NESTs are smaller structures than the the loose groups identified in the past in Taurus (e.g., Kirk & Myers 2011) and are more comparable to the size of the largest UWPs, which contain more than half of all sources in Taurus. By construction, traditional multiple systems are much smaller than NESTs. In Paper I, we have shown that the majority of UWPs have high order multiplicity, especially the closest ones with separation in the 1–10 kAU range. Characterizing the connection between structures on these different spatial scales is important to assess whether they arise from a unique process, whereby NESTs and UWPs could be pristine imprints of a fragmentation cascade scenario.
To probe how these different scales of stellar groups are related, from the multiple scale (5–1000 AU) to the UWPs scale (1–60 kAU) up to the NESTs scale (15–155 kAU), we estimated the fraction of YSOs that belong in each stellar group (see Fig. 14). The vast majority of YSOs are either members of multiples, UWPs or NESTs, leaving a small fraction less than 20% of single isolated stars. Only 20% of multiples are not included in groups of higher hierarchy (i.e., either within UWPs or NESTs) whereas nearly half of the UWPs are inside NESTs. Only ∼20% of the whole YSO population (single and multiples) belong neither to an UWP nor to a NEST. They are isolated since they are located at the peripheries of the stellar groups and on average 4.7 times further away (0.56 pc, 115 kAU for the median) from their first nearest companion compared to the other YSOs that belong either to UWPs or NESTs (0.12 pc or 25 kAU for the median). Although this isolated population contains proportionally more Class III objects and less Class II and I objects, this difference does not appear statistically significant when performing the Pearson’s contingency test.
![]() |
Fig. 14. Breakdown of the Taurus population in terms of belonging to multiple systems (separation ≤1 kAU), UWPs and NESTs. The pie charts are designed to account for the fact that objects can belong to more than one category. Only ∼20% of Taurus members are isolated single stars. |
We propose to outline the connection between the different length scales through the estimate of their surface stellar density as a function of their mean radius r. For UWPs, the latter is estimated as half their separation, and the stellar surface density ρ* is then estimated as the total number of stars n* within the UWPs divided by the area, i.e., ρ* = n*/(πr2). For NESTs, loose groups (as defined in Kirk & Myers 2011), the three main filaments region and the whole Taurus cloud, the projected area is estimated from the surface A of their associated convex hull containing each n* stars. We fit an ellipsoid to the convex hull and derive the semi major and minor axes (respectively a and b). We have defined the mean geometrical radius r as and the associated stellar surface density is ρ* = n*/A.
The surface stellar density of these spatial features, shown in Fig. 15, reveals two regimes: one is associated with UWPs (ρ* ∝ r−2), and the second with loose groups and spatial features on larger scales (ρ* ∝ r−1.2). The former is related to cascade fragmentation (Paper I), while the latter is associated with a clustering regime. NESTs appear as intermediate structures, with smaller (resp. larger) ones following the relationship observed for UWPs (resp.loose groups and larger structures).
![]() |
Fig. 15. Surface stellar density on the scale of UWPs (red dots), NESTs (light blue dots), loose groups (dark blue dots) as defined in Kirk & Myers (2011), the 3 main filaments as a whole and the whole Taurus cloud (black dots). This plot outlines two main free scale regimes, respectively associated to UWPs (red line) and the various structures on larger scales (blue line). |
5.3. Origin of NESTs
The close connection between the NESTs and the gas along the Taurus filaments raises the question of their origin (see Fig. 6). We have distinguished, as generally accepted, the more dense cores with typical density of 104 − 105 cm−3 and length scale of 0.05 − 0.1 pc from larger diffuse gas clumps with typical density of 103 − 104 cm−3. Zooming in the L1495 Taurus region and gathering data from NIR dust extinction and C18O clumps (Schmalzl et al. 2010; Hacar et al. 2013) and dense N2H+ and H13CO+ cores (Hacar et al. 2013; Onishi et al. 2002), we indeed note that the large scale structure of filament as outlined by the C18O gas tracer is closely associated with the location of both NESTs and molecular cores (see Fig. 16). We note also that there are no NEST in regions where dense cold molecular cores are the most present and clustered (B216, B218, B210, B10). The exceptions are at the edges of both N2H+ B213 clustered cores regions (NESTs numbers 3 and 4) and for the largest NEST (NEST number 2) located in the B7 region, where there are few dense N2H+ cores in the central part. These three situations may illustrate different evolutionary states for the formation of stars in those regions, the latter being the most evolved.
![]() |
Fig. 16. Stellar NESTs 1–6 and dense molecular cores in L1495 Taurus complex. Orange and green triangles indicate C18O and N2H+ cores (Hacar et al. 2013); purple triangles mark H13CO+ cores (Onishi et al. 2002); finally, red triangles represent NIR dust extinction cores (Schmalzl et al. 2010; we note that in their work, observations are limited to the east portion of the region, no NIR data are reported within B211, B10 and B7 regions). Based on the data published in (Kenyon et al. 2008; Table 1), the center of obscured Barnard regions are plotted as black plus marks. The YSOs are plotted as solid (resp. empty) blue circle when they are inside (resp. outside) the NESTs. The NESTs’ limits are shown as blue ellipsoids. We note that the L1495 complex shelter the NESTS 2–5 along the filament. The NESTs 1 and 6 are located outside the region of molecular gas studies. |
The range of surface density in NESTs is close to that of dense cores in molecular clouds with typical H2 surface density of 1020–1022 g cm−2 (see Fig. 12). Moreover, we have gathered mass information on NIR dust extinction clumps (Schmalzl et al. 2010) and H13CO+ cores (Onishi et al. 2002). While a KS test indicates that the two associated core mass distributions differ from one another (at the 99.8% confidence level), the mass distribution of NESTs is indistinguishable with either (p-value of the same parent distribution of 0.8 and 0.1 for NIR or H13CO+ cores, respectively).
These clues tend to suggest that NESTs are the direct descendants of cores, in the sense that each NEST could be connected on a one-to-one basis to a dense core that subsequently collapses and fragments. Indeed, observations showed multiple and wide pair Class 0 or I objects within a single core (Pineda et al. 2015; Sadavoy & Stahler 2017), indicating that several objects may form within a single core. However, while this relation is plausible for the small NESTs having 4–6 stars, this scenario appears less credible to explain the formation of the richer NESTs sheltering up to more than 20 stars from one single core. Indeed, to form these richer NESTs would require denser cores. Since such cores are not observed in Taurus (Hacar et al. 2013; Onishi et al. 2002), a scenario implying aggregation of cores is necessary to provide an explanation for the richer NESTs.
The observed bimodality in the distribution of NEST sizes and the two regimes associated with their size-star number relation suggest there are two mechanisms at work in the formation of NESTs, allowing us to produce a coherent framework for all NESTs. The first peak in the distribution of size appears in the range of the H13CO+ dense core size distribution (Onishi et al. 2002), whereas the second peak is far beyond that range, nearly 4 times bigger, but this peak is also 4 times smaller than the average size of the less dense gas cores as traced by near-infrared extinction maps (Schmalzl et al. 2010). Besides, the mass distribution of the NESTs spans the same range of both mass cores.
It is therefore tempting to associate the first regime of NESTs (those with few members) with the fragmentation of a single core and the second regime (the richer ones) with a clustering of a few cores in a chain, as it is outlined in the work of Tafalla & Hacar (2015). Indeed these authors report that the distances to the nearest neighbor among the N2H+ cores in L1495 mostly lie below 0.2 pc, 41 kAU, exactly the range of the second peak. They also find that these dense cores tend to cluster in linear groups of three cores on average, which they call chains. The elongated geometry of the NESTs would also be compatible with this scenario, either due to the prolate nature of the cores (Myers et al. 1991) or the linear fragmentation of the filament given the chain of cores as observed by Tafalla & Hacar (2015). The process of hierarchical fragmentation from the cloud to protostars has been recently outlined in the Perseus complex based on the different gas tracers, in particular two different paths for cloud fragmentation (isolated cores and cores in filaments, Pokhrel et al. 2018). We propose that the present study outlines the stellar structures resulting from this hierarchical fragmentation. Viewing the NESTs as the intermediate spatial structures between the UWPs and the loose groups, part of the UWPs should also result from this clustering scenario of cores as proposed by Tokovinin (2017). Furthermore, the power law fit at the high mass range of the NESTs Γ = −0.5 is much shallower than the power law found for dense cores in Taurus, as Sadavoy et al. (2010; respectively Schmalzl et al. 2010) found Γ = −1.22 (resp. Γ = −1.2) for the starless dense cores (resp. NIR extinction clumps) more massive than 2 M⊙. But close to the the power law associated with larger CO clumps (Γ = −0.65 (resp. Γ = −0.85), Kramer et al. (1996); Heithausen et al. 1998) that may breakdown in smaller pieces to form spatial substructures such as cores in a later stage of their evolution.
5.4. Links to star formation models
Young stars form on a wide range of scales out of their molecular parental cloud, producing aggregates, groups, clusters and distributed population with various degrees of clumpiness and stellar density. But whether these star forming regions form as a result of a slow collapse or contraction scenarios of clumps at quasi-equilibrium over several free-fall dynamical times (e.g., Krumholz et al. 2006; Tan et al. 2006; Farias et al. 2017) or they collapse more quickly (intermediate or short timescales) on the order of 1–2 dynamical time (e.g., Ballesteros-Paredes et al. 1999; Elmegreen et al. 2000; Hartmann et al. 2001; Bate et al. 2003) mediated either by large-scale accretion flows along filaments (Myers 2009; Smilgys & Bonnell 2017) and/or a global collapse (Hartmann & Burkert 2007; Vazquez-Semadeni et al. 2017) remains an open question.
The possible diagnostics to discriminate between these models include the morphologies of subclusters, the degree of clustering and hierarchy, and the spatial distribution of stars. We intend to develop in our next paper in this series the analysis of the subclusters hierarchy in Taurus to expand in that direction. For now, focusing on the densest part of Taurus, i.e., NESTs, may already prove informative. In the slow type models, the morphology of subclusters tend to be rather spherically symmetric (Tan et al. 2006) in contrast with the rapid type model in which elongated structures dominate (Bate et al. 2003). Thus, the highly elongated NESTs we found tend to favor the rapid scenario. This is in agreement with Hartmann et al. (2001), who have made the assumption of a rapid formation and dissipation of molecular clouds in a few dynamical times to explain the small age spread (1–3 Myr) of the Taurus members and the absence of the Post T Tauri stars. However in all scenarios, the most recent works show that the subclusters that form at local scale merge afterwards within larger ones that themselves merge to finally join a large central cluster fueled with gas and stars. But if the mixed stellar populations we observed within NESTs were to be due to the merging of previous physically unrelated sub-units, we do not expect to see an evolution of the status of the population inside the NESTS as we have noted. An important caveat is that most of these models, and specially the slow type models, have been developed to simulate massive and dense star forming regions, and cannot be directly compared to the Taurus region. To our knowledge, no specific model to explain the formation and the persistence of the NESTs in Taurus has been proposed.
To conclude this section, we suggest that the NESTs are pristine imprints of stellar formation and that they are representative of two fragmentation scenarios. The first one is associated with the fragmentation of a single dense core that could give birth to a small number of stellar systems (4–6), while the second one corresponds to the fragmentation of a filament in a few (2–5) cores closely spatially associated, and thus spatially clustered. These structures may remain mostly unchanged for at least a few millions years.
6. Conclusion
Building on the stellar spatial distribution in the Taurus star-forming region, we have used the density based clustering algorithm dbscan to identify local overdense subclusters. We have set the two free parameters of the algorithm based on the one-point correlation function and the cumulative function of the k-nearest neighbor separations, in order to reach a 99.8% significance level of detection above random. We found 20 stellar structures that we dub NESTs (Nested Elementary STructures). We have then performed a set of statistical studies to derive their main properties.
Our work has shows that about 45% of the whole stellar population in Taurus is concentrated in 20 small and most probably prolate (median eccentricity of 0.9) NESTs that are regularly spaced (≈2 pc) and mainly oriented along the principal gas filaments axes. Five of these NESTs are associated with the densest parts of eight previously identified loose groups (Gomez et al. 1993; Kirk & Myers 2011). Each of the 3 remaining loose groups are composed by two or three stellar NESTs and seven of the NESTs were not identified so far. The stellar NESTs contain between 4 and 23 stars each, the median being 5–6. Inside the NESTs, the surface density of stars may be as high as 2500 pc−2 (while the mean surface density of the Orion Nebula Cluster is about 1,000 YSOS/pc2, Bressert et al. 2010). The mean value is of the order of 340 pc−2, which agrees well with the initial substructures density (350 pc−2) required to reproduce the present-day binary properties in Taurus when starting from a universal initial binary distribution according to Nbody numerical simulations (Marks & Kroupa 2012).
Although the proportion of Class II and, Class III objects inside the NESTs is not significantly different from that of the more dispersed population, the NESTs as a whole contain 3/4 of the class I objects. Studying them individually, only half (11) of these NESTs contain those class 0 or I objects, showing that they are the sites of privileged star formation. The balance between Class I, II and, III fraction within the NESTs suggests that they may be ordered as an evolutionary temporal scheme, some of them getting infertile with time, while other still giving birth to young stars. The fact that Class III objects are still present in such a relatively tight environment suggests that they may get old and remain in the same environment for few millions years. Furthermore, the higher mass stars of the population are equally found inside and outside the NESTs, but the great majority (60–80%) of their lowest mass counterparts are found outside the NESTs. This specific point will be developed in a forthcoming paper. The total mass of stellar NESTs ranges from half a solar mass up to 10 M⊙, with a median mass of 3 M⊙. The size distribution of the NESTs is bimodal with one peak at 12.5 kAU and the second at 50 kAU.
We have identified the preferred sites of star formation in Taurus as the densest stellar groups of the region. Each NEST may be the individual stellar outcome of the gravitational collapse of a cloud that fragments to give groups of stars within few millions years. Cloud fragmentation being prior to the formation of the YSOs, it could lead to the observable clustering of dense cores and then stars. These NESTs are intermediate structures between the ultra-wide pairs and the loose groups, and provide an explanation for the elbow observed in the two-point correlation function and mark the transition between the multiplicity regime and larger scale structures. The study of the relationship between the different scales of stellar groups, NESTs, UWPs and multiple systems, reveals that they are inter-connected, with only 20% of stars truly single and isolated.
We propose that the youngest NESTs are the spatial imprints of the stellar distribution as they may have emerged from their natal cloud at a scale that have been overlooked up to now in Taurus. Using DR2 GAIA release should provide invaluable information both on distance to probe the 3D structure and on kinematics to further probe the dynamical status of these features and their origin. We also intend in future work to analyze other star forming regions (IC348, Orion, Chamaleon, ρ Ophiucus, ...) to compare the stellar substructures to highlight similarities and differences.
Acknowledgments
We gratefully thank the anonymous referee for the careful reading and helpful suggestions to improve the paper. We also thank Lee Mundy for helpful discussions and Marc Pound for reading the draft of this article. This work was funded by the European Commission through the EU H2020 Framework Program for Research and Innovation H2020-COMPET-2015-RIA-687528.
References
- Allison, R. J., Goodwin, S. P., Parker, R. J., et al. 2009, MNRAS, 395, 1449 [Google Scholar]
- Ballesteros-Paredes, J., Hartmann, L., & Vázquez-Semadeni, E. 1999, ApJ, 527, 285 [NASA ADS] [CrossRef] [Google Scholar]
- Barrow, J. D., Bhavsar, S. P., & Sonoda, D. H. 1985, MNRAS, 216, 17 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
- Bastian, N., Covey, K. R., & Meyer, M. R. 2010, ARA&A, 48, 339 [NASA ADS] [CrossRef] [Google Scholar]
- Bate, M. R., Bonnell, I. A., & Bromm, V. 2003, MNRAS, 339, 577 [NASA ADS] [CrossRef] [Google Scholar]
- Bressert, E., Bastian, N., Gutermuth, R., et al. 2010, MNRAS, 409, L54 [NASA ADS] [Google Scholar]
- Cartwright, A., & Whitworth, A. P. 2004, MNRAS, 348, 589 [NASA ADS] [CrossRef] [Google Scholar]
- Chapman, N. L., Goldsmith, P. F., Pineda, J. L., et al. 2011, ApJ, 741, 21 [NASA ADS] [CrossRef] [Google Scholar]
- Dijkstra, E. W. 1960, Some Theorems On Spanning Subtrees of a Graph : CWI Technical Report Stichting Mathematisch Centrum. [Google Scholar]
- Duchêne, G., & Kraus, A. 2013, ARA&A, 51, 269 [Google Scholar]
- Elmegreen, B. G., Efremov, Y., Pudritz, R. E., & Zinnecker, H. 2000, Protostars and Planets IV (Tucson: University of Arizona Press), 179 [Google Scholar]
- Ester, M., Kriegel, H., Sander, J., & Xu, X. 1996, Proc. Second International Conf. on Knowledge Discovery and Data Mining (Portland, OR: AAAI Press), 226 [Google Scholar]
- Everitt, B. S., Landau, S., Leese, M., & Stahl, D. 2011, Cluster Analysis, 5th edn. (New York, NY: John Wiley & Sons) [CrossRef] [Google Scholar]
- Farias, J. P., Tan, J. C., & Chatterjee, S. 2017, ApJ, 838, 116 [NASA ADS] [CrossRef] [Google Scholar]
- Filippone, M., Camastra, F., Masulli, F., & Rovetta, S. 2008, Pattern Recogn., 41, 176 [CrossRef] [Google Scholar]
- Fraley, C., & Raftery, A. E. 1998, Comput. J., 41, 578 [CrossRef] [Google Scholar]
- Fraley, C., & Raftery, A. E. 2002, J. Am. Stat. Assoc., 97, 611 [Google Scholar]
- Fraley, C., & Raftery, A. E. 2007, J. Classif., 24, 155 [CrossRef] [Google Scholar]
- Gladwin, P. P., Kitsionas, S., Boffin, H. M. J., & Whitworth, A. P. 1999, MNRAS, 302, 305 [NASA ADS] [CrossRef] [Google Scholar]
- Gomez, M., Hartmann, L., Kenyon, S. J., & Hewett, R. 1993, AJ, 105, 1927 [NASA ADS] [CrossRef] [Google Scholar]
- Gower, J. C., & Ross, G. J. S. 1969, Appl. Stat., 18, 54 [CrossRef] [Google Scholar]
- Gupta, M. R., & Chen, Y. 2011, Signal Process., 4, 223 [Google Scholar]
- Gutermuth, R. A., Megeath, S. T., Myers, P. C., et al. 2009, ApJS, 184, 18 [NASA ADS] [CrossRef] [Google Scholar]
- Hacar, A., Tafalla, M., Kauffmann, J., & Kovács, A. 2013, A&A, 554, A55 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Hartigan, J. A. 1975, Clustering Algorithms, 99th edn. (New York, NY: John Wiley & Sons) [Google Scholar]
- Hartmann, L. 2002, ApJ, 578, 914 [NASA ADS] [CrossRef] [Google Scholar]
- Hartmann, L., & Burkert, A. 2007, ApJ, 654, 988 [NASA ADS] [CrossRef] [Google Scholar]
- Hartmann, L., Ballesteros-Paredes, J., & Bergin, E. A. 2001, ApJ, 562, 852 [NASA ADS] [CrossRef] [Google Scholar]
- Heithausen, A., Bensch, F., Stutzki, J., Falgarone, E., & Panis, J. F. 1998, A&A, 331, L65 [Google Scholar]
- Imai, H., & Inaba, M. 1995, in Proc. of the Third International Congress on Industrial and Applied Mathematics : ICIAM, 1st edn. (New York: John Wiley & Sons), 124 [Google Scholar]
- Inaba, M., Katoh, N., & Imai, H. 1994, in Proceedings of the Tenth Annual Symp. on Computational Geometry (New York: ACM), 332 [Google Scholar]
- Jain, A. K. 2010, Pattern Recogn. Lett., 31, 651 [Google Scholar]
- Joncour, I., Duchêne, G., & Moraux, E. 2017, A&A, 599, A14 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Jones, B. F., & Herbig, G. H. 1979, AJ, 84, 1872 [NASA ADS] [CrossRef] [Google Scholar]
- Kanungo, T., Mount, D., Netanyahu, N., et al. 2002, IEEE Trans. Pattern Anal. Mach. Intell., 24, 881 [CrossRef] [Google Scholar]
- Kaufman, L., & Rousseeuw, P. J. 2008, Finding Groups in Data: An Introduction to Cluster Analysis (New York: John Wiley & Sons) [Google Scholar]
- Kenyon, S. J., Gómez, M., & Whitney, B. A. 2008, in Low Mass Star Formation in the Taurus-Auriga Clouds, ed. B. Reipurth, 405 [Google Scholar]
- Kirk, H., & Myers, P. C. 2011, ApJ, 727, 64 [NASA ADS] [CrossRef] [Google Scholar]
- Koch, P. M., Tang, Y.-W., Ho, P. T. P., et al. 2018, ApJ, 855, 39 [NASA ADS] [CrossRef] [Google Scholar]
- Könyves, V., André, P., Men’shchikov, A., et al. 2015, A&A, 584, A91 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Kramer, C., Stutzki, J., & Winnewisser, G. 1996, A&A, 307, 915 [Google Scholar]
- Kraus, A. L., Herczeg, G. J., Rizzuto, A. C., et al. 2017, ApJ, 838, 150 [NASA ADS] [CrossRef] [Google Scholar]
- Kraus, A. L., & Hillenbrand, L. A. 2008, ApJ, 686, L111 [NASA ADS] [CrossRef] [Google Scholar]
- Krumholz, M. R., Matzner, C. D., & McKee, C. F. 2006, ApJ, 653, 361 [NASA ADS] [CrossRef] [Google Scholar]
- Kruskal, J. B. 1956, Proc. Amer. Math. Soc., 7, 48 [CrossRef] [MathSciNet] [Google Scholar]
- Kuhn, M. A., Feigelson, E. D., Getman, K. V., et al. 2014, ApJ, 787, 107 [Google Scholar]
- Küpper, A. H. W., Maschberger, T., Kroupa, P., & Baumgardt, H. 2011, MNRAS, 417, 2300 [NASA ADS] [CrossRef] [Google Scholar]
- Larson, R. B. 1995, MNRAS, 272, 213 [NASA ADS] [Google Scholar]
- Lee, C. W., & Myers, P. C. 1999, ApJS, 123, 233 [NASA ADS] [CrossRef] [Google Scholar]
- Lee, Y.-N., & Hennebelle, P. 2016, A&A, 591, A30 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Li, H.-B., Fang, M., Henning, T., & Kainulainen, J. 2013, MNRAS, 436, 3707 [NASA ADS] [CrossRef] [Google Scholar]
- Liu, T., Wu, Y., & Zhang, H. 2012, ApJS, 202, 4 [NASA ADS] [CrossRef] [Google Scholar]
- Luhman, K. L., Allen, P. R., Espaillat, C., Hartmann, L., & Calvet, N. 2010, ApJS, 186, 111 [NASA ADS] [CrossRef] [Google Scholar]
- Marks, M., & Kroupa, P. 2012, A&A, 543, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Ménard, F., & Duchêne, G. 2004, A&A, 425, 973 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Mirkin, B. 2005, Clustering For Data Mining: A Data Recovery Approach (London: Chapman & Hall/CRC) [CrossRef] [Google Scholar]
- Motte, F., André, P., Ward-Thompson, D., & Bontemps, S. 2001, A&A, 372, L41 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Motte, F., Bontemps, S., & Louvet, F. 2018, ARA&A, in press, DOI: 10.1146/annurev-astro-091916-055235 [Google Scholar]
- Murtagh, F., & Contreras, P. 2012, Rev. Data Min. Knowl. Discovery, 2, 86 [CrossRef] [Google Scholar]
- Myers, P. C. 2009, ApJ, 700, 1609 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Myers, P. C., Fuller, G. A., Goodman, A. A., & Benson, P. J. 1991, ApJ, 376, 561 [NASA ADS] [CrossRef] [Google Scholar]
- Onishi, T., Mizuno, A., Kawamura, A., Tachihara, K., & Fukui, Y. 2002, ApJ, 575, 950 [NASA ADS] [CrossRef] [Google Scholar]
- Palmeirim, P., André, P., Kirk, J., et al. 2013, A&A, 550, A38 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Parker, R. J. 2014, MNRAS, 445, 4037 [NASA ADS] [CrossRef] [Google Scholar]
- Parker, R. J., & Goodwin, S. P. 2015, MNRAS, 449, 3381 [NASA ADS] [CrossRef] [Google Scholar]
- Pfalzner, S., Kirk, H., Sills, A., et al. 2016, A&A, 586, A68 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Pineda, J. E., Offner, S. S. R., Parker, R. J., et al. 2015, Nature, 518, 213 [NASA ADS] [CrossRef] [Google Scholar]
- Planck Collaboration Int. XXXII (Adam, R., et al.) 2016, A&A, 586, A135 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Pokhrel, R., Myers, P. C., Dunham, M. M., et al. 2018, ApJ, 853, 5 [NASA ADS] [CrossRef] [Google Scholar]
- Prim, R. C. 1957, Bell Syst. Tech. J., 36, 1389 [Google Scholar]
- Reipurth, B., & Mikkola, S. 2012, Nature, 492, 221 [Google Scholar]
- Sadavoy, S. I., Di Francesco, J., Bontemps, S., et al. 2010, ApJ, 710, 1247 [NASA ADS] [CrossRef] [Google Scholar]
- Sadavoy, S. I., & Stahler, S. W. 2017, MNRAS, 469, 3881 [NASA ADS] [CrossRef] [Google Scholar]
- Schmalzl, M., Kainulainen, J., Quanz, S. P., et al. 2010, ApJ, 725, 1327 [NASA ADS] [CrossRef] [Google Scholar]
- Schmeja, S. 2011, Astron. Nachr., 332, 172 [NASA ADS] [CrossRef] [Google Scholar]
- Schmeja, S., Kumar, M. S. N., & Ferreira, B. 2008, MNRAS, 389, 1209 [NASA ADS] [CrossRef] [Google Scholar]
- Scott, D. W. 1992, Multivariate Density Estimation: Theory, Practice, and Visualization (New York: John Wiley & Sons) [Google Scholar]
- Silverman, B. W. 1986, Density Estimation for Statistics and Data Analysis (London: Chapman & Hall/CRC) [Google Scholar]
- Simon, M. 1997, ApJ, 482, L81 [NASA ADS] [CrossRef] [Google Scholar]
- Smilgys, R., & Bonnell, I. A. 2017, MNRAS, 472, 4982 [NASA ADS] [CrossRef] [Google Scholar]
- Soler, J. D., & Hennebelle, P. 2017, A&A, 607, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Tafalla, M., & Hacar, A. 2015, A&A, 574, A104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Tan, P.-N., Steinbach, M., & Kumar, V. 2005, Introduction to Data Mining, 1st edn. (Boston: Addison-Wesley) [Google Scholar]
- Tan, J. C., Krumholz, M. R., & McKee, C. F. 2006, ApJ, 641, L121 [NASA ADS] [CrossRef] [Google Scholar]
- Tokovinin, A. 2017, MNRAS, 468, 3461 [NASA ADS] [CrossRef] [Google Scholar]
- Vazquez-Semadeni, E., Gonzalez-Samaniego, A., & Colin, P. 2017, MNRAS, 467, 1313 [Google Scholar]
- Wright, N. J., Parker, R. J., Goodwin, S. P., & Drake, J. J. 2014, MNRAS, 438, 639 [NASA ADS] [CrossRef] [Google Scholar]
- Xu, R., & Wunsch, D. 2005, IEEE Trans. Neural Networks, 16, 645 [Google Scholar]
- Zahn, C. T. 1971, IEEE Trans. Comput., 20, 68 [CrossRef] [Google Scholar]
- Zhang, Q., Qiu, K., Girart, J. M., et al. 2014, ApJ, 792, 116 [NASA ADS] [CrossRef] [Google Scholar]
Appendix A: Catalogs of stars in NESTs
Table A.1 lists all Taurus members that are located within NESTs. Details on the multiplicity of each systems (separations and references) can be found in Table C.1 of Paper I.
Members of Taurus belonging to NESTs
Appendix B: DBSCAN algorithm
In this appendix, we define more formally the concepts and vocabulary of dbscan (Density-Based Spatial Clustering Applications with Noise) clustering method, along with complementary comments relevant to our analysis. Some terms are taken from the work of Ester et al. (1996).
The n objects of the dataset are assumed to be embedded in a d-dimensional space ℛ ⊂ ℛd. In our analysis of stellar clustering, we will deal with a projected Euclidean space of dimension d = 2.
B.1. Basic concepts of DBSCAN
The ϵ-neighborhood of an object i ∈ ℛd is defined as the subset {ϵ}i of ℛd such that :
where d(i, j) is the distance between objects i and j. We now introduce two intermediate definitions. First, two objects i and j of the dataset are said to be ϵ direct-neighbors, if their distance d(i, j) is less or equal to ϵ, in other words, if j belongs to the ϵ-neighborhood of i and vice-versa:
Secondly, two objects i and j of the dataset are said to be indirect ϵ-neighbors, if we can find a chain {l1, …, ln} of ϵ-direct-neighbors in ℛd connecting i and j. So the indirect ϵ-neighbors condition reads as:
Describing our dataset in terms of vertices (nodes) associated with the objects linked together if d(i, j)≤ϵ, the direct- and indirect ϵ-neighbors conditions reflect the definition of a path within the graph framework.
The ϵ-neighborhood population of a point i ∈ ℛd is then defined as the number of objects in ℛd found within the ϵ-neighborhood of i :
where Card stands for the cardinality.
To define clusters, a minimal local neighborhood population threshold is introduced to identify objects of the dataset having at least the same ϵ-neighborhood population or higher. An object i is said to be a core cluster object if its ϵ-neighborhood is as populated or even more, than the given local neighborhood threshold:
An object j of the dataset is said to be (ϵ, Nmin) directly density-reachable from i if they are ϵ-direct-neighbors and if i is a core object:
Similarly, a point j is said to be (ϵ, Nmin) density-reachable from i, if i and j are ϵ-indirect-neighbors through a chain l1, …, ln − 1 of core objects. The density-reachability property reads as:
Once an arbitrary core cluster object - called the seed - is identified, the density-reachability property allows to retrieve all the objects of the dataset belonging to the same cluster. A cluster is then defined as the set of points that are density-reachable from a seed. If an object i is density-reachable from a seed j but does not fulfill the core object condition (Eq. (B.5)), then it is said to be a border cluster object :
Given ϵ and , the dbscan algorithm proceeds in two steps: (1) pick one object from the dataset, check if it is a seed (i.e., satisfying Eq. (B.5)) and if it is not, pick another one until it satisfies the core object condition; and (2) retrieve all the objects in the dataset which are density-reachable from the seed. We note that the core points set in the final clusters obtained at the end of the algorithmic scan does not depend on the chosen seed, since the core objects of a given cluster are density reachable from each other. A cluster C is defined by the following conditions:
If an object of the dataset is not ()-density-reachable from any core objects then it is marked as an outlier, or noise, with respect to that property. Noise and outliers are then the set of objects that do not belong to any cluster in the dataset. Based on the above definition and within a graph framework, dbscan clusters are in fact the maximal connected component sets {C} such that ∀(i, j)∈{C}, Ni ≥ Nmin and there is a path between i and j. This definition of clusters is therefore connected to the more intuitive idea of density-contour clusters outlined by Hartigan (1975).
B.2. Discussion on dbscan
B.2.1. Complexity
In the general case, the complexity of hierarchical clustering techniques is 𝒪(n3), which makes them too slow for large data sets. However, but for the single and complete linkage clustering, this is optimally reduced to 𝒪(n2). The dbscan clustering algorithm visits each point of the database possibly multiple times (e.g., as candidates to different clusters). Without the use of an accelerating index structure, the run time complexity is 𝒪(n2). This can be reduced to 𝒪(nlogn) with a spatial indexing using a R-tree.
B.2.2. Minimal size of a cluster
There are two free parameters in the dbscan algorithm: ϵ, which defines the size of the neighborhood, or “window”, that is investigated around each object, and , the minimal local neighborhood population threshold, which eventually corresponds to a local density threshold since the population number is defined within a ϵ-neighborhood.
Since, is the minimum number of objects that must exist in the ϵ-neighborhood to obtain a core point, it is thus also the minimum size of a cluster (Ester et al. 1996). At least, this minimum cluster is composed by one core point and
border points. According to the definition of a border point (Eq. (B.8)), it may happen that a point is a border point for several different clusters. Then the labeling of this ambiguous point to a specific cluster depends arbitrarily on the order to which the points are processed during the algorithmic scan. By convention, it is automatically labeled to the first cluster it meets as border point. The algorithm is one of “exclusive” type, as opposed to non-exclusive clustering algorithm for which one point may belong to multiple clusters. If this point appears to be at a later stage also a border point for another cluster and if it is one of the possible smallest cluster, composed then a priori by
border points, this cluster would have consequently in fact less members than
local population threshold. We note, however, that this fact does not introduce inconsistency with the definition of the density-reachable based cluster (B.9) although it highlights the order-dependence of the algorithm. It also outlines two different roles of border points that may vary from being a real frontier that distinguishes a cluster from noise or other clusters from being a bridge between clusters.
B.2.3. Minimum distance between two clusters
The two parameters and the ϵ-neighborhood length play a key role in identifying clusters. If no seed point is found, the whole dataset is considered as noise. Conversely, if a first seed core point is found within the dataset and if all the the points are density-reachable from the seed, we obtain a single cluster containing all objects. The algorithmic process will give rise to a second cluster when only a subset of all points are associated with a core point and a second core point that is not density reachable from the first seed is found in the set. It is then worthwhile to investigate the two conditions at which the points are not density reachable. The first one states that a point is never density reachable from a seed point if it is not its ϵ-indirect neighbor. The second point outlines the fact that a point is never density reachable from a border point, even if this point is in the ϵ-neighborhood of the border point.
So there are two basic situations that stop the ongoing growth of a cluster. One is based on the ϵ-neighborhood length threshold since the algorithm excludes as candidate members all the points that are not ϵ-indirect neighbors of the seed (first condition above). The second is based on parameter which is the parameter that is indirectly used in order to decide whether a cluster should be increased at a certain stage of the algorithm. This parameter describes in fact the minimum number of reachable points that a given point must have in its ϵ-neighborhood not to be a border point. A significant local drop of ϵ-neighborhood density within the dataset therefore stops the growth process (second condition quoted above). In other words, the cluster growth stops when no more path is found from a core point, when the edges of a cluster are surrounded by less dense regions. Either way, the first conditions implies that two clusters are separated by a distance of at least ϵ to discriminate them from being one single cluster. However, this threshold value is only true when the distance between clusters is estimated from core points. This separability distance may indeed be shorter when border points are implied in this inter-distance evaluation (second condition). When this occurs, the two clusters are marginally separated due to a localized drop in density. Choosing a shorter ϵ-neighborhood length to scan the dataset would lead to identifying these border points as noise and would separate more clearly the two clusters.
B.2.4. Range of the number of star components
Given the ϵ-neighborhood length parameter, the N objects of the dataset are partitioned into a set {O} grouping data that are noise or outliers with 0 ≤ NO = Card({O}) ≤ N and NC number of clusters. When no seed cluster is found, the condition Eq. (B.5) is never fulfilled, no cluster is detected and NO = N and NC = 0. Excluding the pathological cases implying “ambiguous points”, the theoretical maximum value of the number of the clusters NC that can be found to partition the dataset is obtained when the neighborhood of each core point is composed by () border points. The total population of neighborhood is then exactly equals to the minimal value
. Then, the maximal range of the number of clusters is given by
where E is the floor function.
Appendix C: Discussion on clustering algorithms
This appendix is intended to review the different types of clustering algorithms in order to provide the context for our choice of the one level dbscan algorithm as the optimal procedure to detect local overdense spatial structures.
Clustering is the process of examining a collection of objects embedded in a d-dimensional space to group closest objects together automatically into natural clusters, while distinguishing groups from each other. Although this task may sometimes be easily done by visual inspection in a 2-dimensional space, it is not straightforward to formalize the process on general grounds. The fundamental challenge is to conceptually and quantitatively define the meaning of close, hence to define from heterogeneous variables a proximity (or distance) function between objects and to define a meaningful cluster membership criteria to assign (or not) objects in clusters. This is why many clustering algorithms have been developed over the last five decades (for general reviews, see Kaufman & Rousseeuw 2008; Tan et al. 2005; Xu & Wunsch 2005; Jain 2010; Everitt et al. 2011; Murtagh & Contreras 2012) and the choice for a given study must be guided by the nature of the available datas and scientific objectives.
C.1. Hierarchical clustering
Standard hierarchical clustering methods group data over a variety of scales by creating a cluster binary tree, or dendrogram. The tree is not a single set of clusters, but rather a multilevel hierarchy, where clusters at one level are joined as clusters at the next level. It is left to the user to decide and quantitatively define the scale of clustering (i.e., the cut-off level) that is “most appropriate.” Of course, this definition is a matter of debate since there is no general law, so at best the criteria used are heuristic, and ultimately will depend on the clustering purpose of the user.
A traditional class of hierarchical clustering algorithms are the so-called agglomerative, or bottom-up, algorithms. These non-parametric algorithms start from initial set of clusters (each composed of a single object) and iteratively build nested families of larger clusters through successive mergers of cluster pairs, using a heuristic proximity function criteria, until one single cluster containing all objects in the distribution is obtained.
The other traditional class of clustering algorithms are the non-parametric divisive hierarchical clustering algorithms, which essentially are the reverse of the agglomerative algorithms. That is to say that such algorithms start from the entire set and, step by step, divide the set in subsets until single objects are obtained. This approach is fundamentally related to the previous one, only being divisive instead of agglomerative. However, since the divisive approach is somewhat more complicated to implement (for a review on spectral partitioning, see Filippone et al. 2008), here we focus on the agglomerative algorithms.
In agglomerative algorithms a similarity (i.e., proximity) or dissimilarity (i.e., distance) function is defined heuristically at each merging step as described by the linkage method being employed, which is selected by the user prior to implementing the algorithm. In this approach, six most widely employed linkage methods are generally used, depending on the distance criteria chosen to proceed to the merging of clusters. Firstly the “single Link” method has a merging criteria based on the distance between the closest single element pair of two clusters in the set. Secondly, the “complete link” method is based on the distance between the furthest element pair of two clusters in the set. Thirdly, the “average link” method is based on the distance between the each element pair of two clusters in the set. The other two methods, “centroid” and “median” links are respectively based on the distance between the centroids of two clusters and the median distance between the each element pair of two clusters in the set. Although similar to the average link method, the median link method is less sensitive to outliers. Finally, the sixth method is the “Wards” link based on the value of the sum of squared deviations between the projected centroid of two clusters in the set.
In all these bottom-upx < methods, the cluster pair in the set with the smallest distance are then merged. The strength of these non-parametric methods is that no assumptions are made about the structure of data. However, they suffer from a significant drawback: since each step of the agglomerative process is built upon the previous one, and no backtracking is permitted, there is a loss of data such that the nested hierarchical structure is built-in and not the result of an open process free of reconfiguring the structure of data at each step. In other words, the hierarchical clustering algorithm is sensitive to possibly erroneous previous cluster merging, since, once assigned, objects’ cluster memberships are not permitted to change (i.e., the assignments at each step are permanent).
Another significant drawback of these agglomerative methods is their lack of robustness, especially when the distribution contains noise and outliers. Generally speaking, this leads spurious additional clusters being identified or the blurring of distinct clusters. It also tends to produce clusters of a particular shape. For example, in the single linkage method, the merging of two clusters relies only on a local merging criteria based on their most proximate members. Thus, only the closest parts of the two clusters are considered whereas the overall structure of the clusters and their more distant parts are not taken into account. Since clusters are merged on the criteria of the closest element pair of two clusters in a set, it may happen that in a noisy distribution elements are successively merged in such a way that two obvious distinct clusters are merged together (due to few sparse singletons being close to each other), even though many of the elements in each cluster are very distant from each other. This tendency to combine elements linked by a series of close intermediate elements gives rise to long chains (this is known as the chaining effect). On the other hand, in the complete linkage methods, the merging of two clusters at one step is based on the proximity of their furthest members. Thus, it tends to choosing at each step, the merger of the pair of clusters that gives rise to a final cluster that has the smallest diameter. And since the diameter of the merged clusters is minimized at each iteration, the method has a tendency to give rise to compact “globular” (dense and circular or spherical) clusters. This criterion for merging in complete link methods is non-local in the sense that the entire distribution of members in a cluster can influence the choice of the merging. A single element located far away from the majority of the members of one cluster increases significantly the final diameter of two merged clusters which, in turn, may lead to a major change in the final clustering. Thus, the complete-link methods are rather sensitive to outliers, as the single-link methods are sensitive to the noise.
Closely related to the agglomerative linkage methods, the Minimum Spanning Tree (MST) method is an alternate algorithm to build clusters from a given length threshold. When considering a set of spatial points as a complete graph, in which the points are vertices and edges are the euclidean distance between vertices, the MST is the subgraph in which any two vertices are connected by exactly one simple path (i.e., a connected graph without cycles, by definition a tree) such that the total distance length of its edges is minimal. In this framework, clusters are defined as the remaining connected graphs when deleting the largest lengths of MST above a heuristically chosen threshold (inconsistent edges, Zahn 1971), or the critical length (Gutermuth et al. 2009), such that clusters are then the remaining connected subgraphs (i.e., there is a path between any two points of the subgraph). The same clusters are obtained from the linkage methods agglomerative algorithms employ by cutting the dendrogram at the same critical length, so that each connected component forms a cluster. Indeed, it is well established that the same information required to build the MST of a set of points is contained within the dendrogram generated by the linkage methods (see e.g., Gower & Ross 1969). Therefore, the same drawbacks (high sensitivity to noise and outliers that blur cluster structure) affect both the MST and linkage methods.
C.2. Partitioning clustering
Iterative relocation algorithms use an iterative control strategy to optimize the partition of distribution into clusters. The number of clusters is usually an input parameter for these algorithms, in other words some a priori domain assumption knowledge is required (for a review, see Mirkin 2005). When the number of clusters is given, there are two statistical approaches in order to partition the data in these clusters:
Non-parametric iterative relation algorithms (K-means, K-median) are based on the assumption that clusters correspond to different modes of the probability density, in other words, they are associated with the values that appear more often in a data set. Once an initial partition is given, each cluster is associated with its representative, e.g., its barycenter, centroid, mean point, or by one of the objects of the cluster located near its center. The goal of this class of algorithm is to assign each object to its closest representative and iteratively process to minimize the objective function, as for example the distance of elements to the their closest representatives summed over the over whole set; the iterative process stops when convergence on representatives is reached. For example, the K-means algorithm (Kanungo et al. 2002), a variance-based function, can be shown to minimize within-cluster distance while maximizing between-cluster distance. It should be noted that the assignment of each object to a cluster implies that the induced partition is equivalent to a Voronoi diagram (Inaba et al. 1994; Imai et al. 1995). Thus, the shape of clusters found by a partitioning algorithm is always convex which is a restrictive bias of the method.
On the other hand, parametric iterative relation algorithms (mixture models) are based on the assumption that each cluster is represented by a set of parametric probability density coming from a same family. For example, either a traditional Gaussian distribution, or an “isothermal ellipsoid” (Kuhn et al. 2014), would be employed as a component mixture model method. The “parametric” term refers to the choice of function representing a cluster. Unlike the previous methods, the number of the clusters is required as a given input prior the clustering.
For this class of iterative relocation algorithms, the goal is to estimate (1) the number (and type of the geometrical distribution in the case of parametric methods) of clusters (structure identification), and (2) the parameters of the distributions (weight, variance and mean for each probability distribution associated with a cluster). Broadly speaking, partitioning cluster analysis algorithms are then based on two distinct phases: first is a model fitting phase, whereby the number and the geometrical type of component are a priori chosen, and second is a model validation phase, whereby the set of data are assessed to each cluster according to some cluster validity criterion and to one “optimal” partition hypothesis selected through a maximization technique of an objective function. In most cases, the expectation-Maximization (EM) technique of the (log-)likelihood function is used. The EM technique is able to propose the best local solution in order to smoothly distribute data in each cluster and to constrain the parameters of each probability distribution associated with a cluster. For example, the K-means approach is a special case of EM clustering applied to a mixture of Gaussians when all variances are equal (Gupta & Chen 2011). This is why the K-means method leads to globular shaped clusters.
Iterative relocation methods have several drawbacks: the number of clusters or components have to be set prior to implementation. They are sensitive to first initialization of the partition. A guess of the number of clusters can possibly be inferred based on the Akaike Information Criterion (AIC) such as used in the work of Kuhn et al. (2014). The AIC is a probabilistic indicator that somehow evaluates the gain of introducing some supplementary cluster, and then all the associated free parameters, to improve the model clustering fit evaluated through the global maximum (log-)likelihood function at the expense of model parsimony. Although it is a quantitative criteria, the derived clusters are biased toward convex (i.e., globular) shaped clusters and most importantly, they do not identify arbitrarily sized clusters since they are biased to equipartition. The notion of noisy data is also not considered, in other words, it is a termination requirement that every single object has to be assigned to a cluster.
More sophisticated parametric approaches have been developed to deal with the structure identification based on the optimization of the Bayesian Information Criteria that allows the choice of the “best” model based on direct models comparison, models that are computed simultaneously, each combining different weight of clustering methods (hierarchical and EM partitioning) and various number of components (Fraley & Raftery 1998, 2002, 2007). But even these models do not address the issue of the possible skewness of data distribution in which a single, skewed (or elongated) distribution is described by multiple normal distributions. Moreover, even combining hierarchical and partitional clustering, since there is no explicit notion of noise, the “best” model will be chosen based on complete partition of data even if data are sub-structured, which may be in fact a serious obstacle in determining locally small but significant over-densities.
For these reasons, the iterative relocation partitioning clustering algorithms do not appear adapted to the objective of our study, especially in view of the fact that these algorithms result in a single complete partition of stars into a unique set of disjoint clusters, which may not suit our perception of young star clusters topology that is most of the time complex, highly inhomogeneous and sub-structured. In summary, given a point-like distribution, the iterative relocation algorithms imposed on by the parametric cluster model, and by the implicit complete partitioning of, prevent the identification of non-globular, irregularly shaped and local small-scale substructures. Hence, partitioning algorithms are ruled out here, and the goal of identifying substructures as local over-densities drives us to the third class of partitioning clustering methods, the density based techniques.
C.3. Density based clustering
Density clustering algorithms, such as dbscan (Ester et al. 1996), partition a distribution into clusters based on a density criterion. Essentially, they are the formalization of the assumption that clusters are high-density regions surrounded by low-density regions (Hartigan 1975). These algorithms borrow (1) some aspects of the non-parametric and parametric iterative partition cluster methods, and (2) some aspects of MST (single linkage) hierarchical clustering methods. For the former, density clustering algorithms are a non-parametric method, as there is no a priori given function associated with cluster postulating a specific structure for discrete distributions. However, the notion of density probability is not completely ignored since there is an estimation of the local density from a kernel density estimator associated with a given smoothing bandwidth that reduces the complexity. The non-parametric estimation of probability density functions is based on the concept that the value of a density function at a continuity point can be estimated using the sample observations that fall within a small region around that point, known as the Parzen kernel class of density estimates. For a comprehensive review of kernel non-parametric density estimation and the crucial issue of the smoothing parameter (bandwidth) choice see for example Silverman (1986) and Scott (1992). In the case of the dbscan algorithm, a given circular and uniform kernel acts as a given smoothing bandwidth (ϵ-neighborhood for dbscan) kernel.
For dbscan, clusters are also defined as the maximal connected subgraphs remaining at a cutting scale length threshold, but these subgraphs are composed by selected instances (thus allowing an incomplete partitioning). A significant advantage of using such density based clustering algorithms is that the cluster membership criterion is based on a simple density requirement that allows noisy data to be filtered out, whereas this cannot be done using hierarchical and iterative relocation based partition based algorithms. The dbscan algorithmic implementation appears particularly well suited to our natural perception of star clusters and the study of their structure.
All Tables
Mean 1-NNS of class I, II and III for YSOs located within NESTs, outside the NESTs and for all of them.
All Figures
![]() |
Fig. 1. One-point correlation function Ψ (blue symbols and blue dot-dashed line fit) and estimated pair correlation function g (gray symbols). This figure is reproduced from Paper I in order to highlight the choice of the ϵ parameter as rc, the intersection point between the horizontal line (random expectation) and the Ψ function. This value indicates the scale at which we want to investigate local overdense structures. |
In the text |
![]() |
Fig. 2. Cumulative distribution of the first nearest neighbor (W1(r), dashed blue), second nearest neighbor (W2(r), dot-dashed green) and the third nearest neighbor (W3(r), solid black) in a random spatial distribution. The solid red vertical line is the rc value. |
In the text |
![]() |
Fig. 3. Spatial distribution of the 20 NESTs identified in this study. The barycenter of each NEST is indicated by a colored plus mark and all the members are colored the same way. Star not pertaining to any NEST are shown as black dots. The MST defined by the NEST is shown with solid gray lines. NEST are numbered based on their increasing right ascension order (see Table 1). The Planck dust emission (217 GHz) is mapped as the background. |
In the text |
![]() |
Fig. 4. Dilution factor βN as a function of the inter-NEST separation ratio αN. The size of the marks is indicative of the mean first nearest neighbor rN. These ratios are markers of the reliability of the NESTs. Values higher than 2 (dashed black lines) indicate that a NEST is well separated both from the other NESTs and from the stellar population outside the NESTs. |
In the text |
![]() |
Fig. 5. Robustness tests of the detection of NESTs in the three main filaments of Taurus. The fiducial NESTs are indicated by black plus symbols and labeled in each panel. Each panel displays alternate sets of NESTs resulting from varying the selecting criteria. Top panel: the NESTs identified if Nmin is reduced to three (large red filled squares) or increased to five (small white filled squares). Middle and bottom panels: ϵ has been increased and decreased by 10 % and 50 % percent, respectively. In both panels, small orange squares and large dark blue squares represents the results of increasing and decreasing ϵ, respectively. |
In the text |
![]() |
Fig. 6. Convex hull (dashed line) and spanning ellipsoid hull fitted on the NEST. The colored horizontal line at the bottom of each NEST scales as 0.1 deg. |
In the text |
![]() |
Fig. 7. Cumulative distribution function of the difference in position angle between the local magnetic field and the NESTs semi-major axis (dashed-dotted blue line). The long dashed light blue histogram corresponds to the subset of NESTs whose semi-major axis is more than twice as large as their semi-minor axis and, thus, whose position angle is best defined. The green solid and red hatched histograms are the corresponding distribution for the symmetry axis of YSO disks and dense cores, respectively (Ménard & Duchêne 2004). The black dotted line is the function expected for a randomly oriented sample. |
In the text |
![]() |
Fig. 8. Cumulative distribution of the projected axial ratio qp (Eq. (3)) for a population of randomly oriented oblate (black) and prolate (red) ellipsoids for a fixed intrinsic (3D) axial ratio q. The blue solid circles mark the cumulative distribution of the 2D ratio qp of the NESTs. |
In the text |
![]() |
Fig. 9. Mass Function of NESTs (dashed histogram). The graphic shows the mass probability density function (ΦN = dN/dlogm) as a function of logm, where N is the total number of the NESTs and m the mass of the NESTs. For comparison, we report also the mass functions of the H13CO+ cores (Onishi et al. 2002, light purple) and the near-infrared extinction dust cores (Schmalzl et al. 2010, light green). The red curve and black dashed curve indicate respectively the power law dN/dlogm ∝ m−Γ fit for the high mass NESTs distribution (red points) and normal fits to the whole observed mass distribution expressed in logarithm of the mass. |
In the text |
![]() |
Fig. 10. Size distribution of NESTs (which exhibits two peaks, at 12.5 and 50 kAU, respectively; left panel), H13CO+ cores (Onishi et al. 2002; central panel) and near-infrared extinction dust cores (Schmalzl et al. 2010; right panel). |
In the text |
![]() |
Fig. 11. Number of YSOs in NESTs as a function of their size. |
In the text |
![]() |
Fig. 12. Mass-radius relation for NESTs (black symbols labeled by NEST number). The solid black line is the best-fitting power law fit to all data points while dashed lines represent lines of constant column density (3 × 1022 cm−2: red line; 3 × 1020 cm−2: blue line). |
In the text |
![]() |
Fig. 13. Fraction of Class I (Black), II (Red) and, III (Green) objects within each NEST. From right to left, the NESTs are ordered by decreasing fraction of Class I (or Class II if there is no Class I object), which could correspond to an evolutionary sequence from the younger to the older. |
In the text |
![]() |
Fig. 14. Breakdown of the Taurus population in terms of belonging to multiple systems (separation ≤1 kAU), UWPs and NESTs. The pie charts are designed to account for the fact that objects can belong to more than one category. Only ∼20% of Taurus members are isolated single stars. |
In the text |
![]() |
Fig. 15. Surface stellar density on the scale of UWPs (red dots), NESTs (light blue dots), loose groups (dark blue dots) as defined in Kirk & Myers (2011), the 3 main filaments as a whole and the whole Taurus cloud (black dots). This plot outlines two main free scale regimes, respectively associated to UWPs (red line) and the various structures on larger scales (blue line). |
In the text |
![]() |
Fig. 16. Stellar NESTs 1–6 and dense molecular cores in L1495 Taurus complex. Orange and green triangles indicate C18O and N2H+ cores (Hacar et al. 2013); purple triangles mark H13CO+ cores (Onishi et al. 2002); finally, red triangles represent NIR dust extinction cores (Schmalzl et al. 2010; we note that in their work, observations are limited to the east portion of the region, no NIR data are reported within B211, B10 and B7 regions). Based on the data published in (Kenyon et al. 2008; Table 1), the center of obscured Barnard regions are plotted as black plus marks. The YSOs are plotted as solid (resp. empty) blue circle when they are inside (resp. outside) the NESTs. The NESTs’ limits are shown as blue ellipsoids. We note that the L1495 complex shelter the NESTS 2–5 along the filament. The NESTs 1 and 6 are located outside the region of molecular gas studies. |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.