Issue |
A&A
Volume 699, July 2025
|
|
---|---|---|
Article Number | A103 | |
Number of page(s) | 21 | |
Section | Interstellar and circumstellar matter | |
DOI | https://doi.org/10.1051/0004-6361/202553843 | |
Published online | 04 July 2025 |
Chemical segregation analysed with unsupervised clustering
1
Max-Planck-Institute for Extraterrestrial Physics,
Giessenbachstrasse 1,
85748
Garching,
Germany
2
European Southern Observatory,
Karl-Schwarzschild-Strasse 2,
85748
Garching,
Germany
★ Corresponding author: kgiers@mpe.mpg.de
Received:
21
January
2025
Accepted:
23
May
2025
Context. Molecular emission is a powerful tool for studying the physical and chemical structures in cold and dense cores. The distribution and abundance of different molecular species provide information on the chemical composition and physical properties in these cores.
Aims. We study the chemical segregation of three molecules – c-C3H2, CH3OH, and CH3CCH – in the two starless cores B68 and L1521E, and the prestellar core L1544.
Methods. We applied the density-based clustering algorithms DBSCAN and HDBSCAN to identify chemical and physical structures within these cores. To enable cross-core comparisons, the clustering input samples were characterised based on their physical environment, discarding the two-dimensional spatial information.
Results. Clustering analysis showed significant chemical differentiation across the cores. The clustering successfully reproduces the known molecular segregation of c-C3H2 and CH3OH in all three cores. Furthermore, it identifies a segregation between c-C3H2 and CH3CCH, which is not apparent from the emission maps. Key features driving the clustering are integrated intensity, velocity offset, H2 column density, and H2 column density gradient. Different environmental conditions are reflected in the variations in the feature relevance across the cores.
Conclusions. This study shows that density-based clustering provides valuable insights into chemical and physical structures of starless cores. It demonstrates that already small datasets covering only two or three molecules can yield meaningful results. In fact, this new approach revealed similarities in the clustering patterns of CH3 OH and CH3CCH relative to c-C3H2, suggesting that c-C3H2 traces more outer layers or lower-density regions than to the other two molecules. This allowed for insight into the CH3CCH peak in L1544, which appears to trace a landing point of chemically fresh gas that is accreted to the core, highlighting the impact of accretion processes on molecular distributions.
Key words: astrochemistry / stars: formation / ISM: abundances / ISM: clouds / ISM: molecules
© The Authors 2025
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model.
Open Access funding provided by Max Planck Society.
1 Introduction
To understand star and planetary-system formation, it is crucial to understand the physics and chemistry in star-forming regions. Particularly important are starless dense cores, as they represent the earliest stages of star formation and set its initial conditions. In starless cores, the chemistry and physics can be studied without the complications caused by protostellar feedback. Molecular line emission is a powerful tool for the study of the structure of these dense cores. The distribution and abundance of different molecular species provide information on their chemical composition and physical properties (e.g. Crapsi et al. 2007; Redaelli et al. 2021; Lin et al. 2022). In addition, the line emission can help recover the chemical evolution of molecules. This has been shown, for example, with the inheritance of water (Cleeves et al. 2014) and methanol (Drozdovskaya et al. 2021, 2022) in the Solar System, which come from the prestellar phase. Prestellar cores are a subset of starless cores that are gravitationally bound and on the verge of star formation (e.g. Andre et al. 2000; Keto & Caselli 2008). This makes prestellar cores dynamically evolved, with higher central densities and more pronounced temperature and velocity gradients compared to unbound starless cores (e.g. see Crapsi et al. 2007).
A well-studied example is the prestellar core L1544 in the Taurus Molecular Cloud. Its molecular emission maps have led to the understanding of its evolutionary status, which is close to gravitational collapse (Williams et al. 1999; Ohashi et al. 1999), as well as its physical structure (volume density, velocity, and the dust and gas temperature profiles; e.g. Crapsi et al. 2007; Keto & Caselli 2008; Keto et al. 2015; Chacón-Tanarro et al. 2019). Spezzano et al. (2016) observed a striking chemical differentiation between the carbon-bearing molecules c-C3H2 and CH3OH in L1544, driven by differences in the external illumination onto the core. This chemical segregation has also been observed in other starless cores, linking it to their environments (Spezzano et al. 2020). In Spezzano et al. (2017), the authors identified four molecular families in L1544, classified by the location of their emission peaks (carbon-chain peak, CH3OH peak, dust emission peak, and HNCO peak). Using principal component analysis, they found correlations between the different families and with the physical properties of the core. A similar analysis of the starless core L1521E by Nagy et al. (2019) also reported chemical differentiation between the c-C3H2, the CH3OH, and the dust emission peak.
In the big data era of astronomy, statistical methods are essential to analysing and interpreting the vast amounts of observational and simulated data. The rapid advancements in machine learning provide novel approaches to study the molecular complexity during the early stages of star formation. Unsupervised learning algorithms (clustering techniques in particular) are increasingly applied to identify hidden patterns, structures, and relationships in multidimensional datasets (e.g. see review by Fotopoulou 2024). By grouping together data points based on similarities in various physical and chemical parameters, these methods help visualise subtle trends that might not be detected otherwise.
In astrochemistry, clustering methods have primarily been applied to large-scale surveys, for instance, to identify molecular clouds in an unbiased and systematic way (e.g. Colombo et al. 2015; Bron et al. 2018; Yan et al. 2022). By isolating distinct structures in position-position-velocity space, these methods segment clouds into regions with similar physical or chemical properties. Furthermore, Valdivia-Mena et al. (2023) have connected filament scales (<0.1 pc) with envelope scales (>100 au) by identifying (velocity-) coherent structures of inflowing material, known as streamers, through the clustering of molecular emission. Meanwhile, Okoda et al. (2020, 2021) applied principal component analysis to molecular line emission, characterising velocity structures and distinct features surrounding a protostar. Additionally, this method has also been used to disentangle overlapping kinematic components and to understand spectral variations (e.g. Yun & Lee 2023), providing deeper insight into the dynamics of star-forming regions.
In this work, we apply the unsupervised clustering algorithms Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Hierarchical DBSCAN (HDBSCAN) to investigate the chemical segregation and differences between the molecular emission of c-C3H2, CH3 OH, and CH3CCH towards the starless cores B68 and L1521E and the prestellar core L1544. The molecules were chosen as representatives of three of the molecular families found in L1544 (see Spezzano et al. 2017). Our goal is to study the chemical segregation previously observed for c-C3H2 and CH3OH from a different perspective using clustering techniques. Additionally, we aim to investigate the less understood differentiation between the two carbon chains, c-C3H2 and CH3CCH. In our analysis, we take a novel approach by dropping the two-dimensional spatial information and instead characterising each pixel of the emission maps based on its physical parameters. By concentrating on the core scale and incorporating both starless and prestellar cores, we study the influence of different evolutionary stages on the molecular emission and distribution as well as the effect of different environments.
In Sect. 2, we describe the data and the sources analysed in this work. In Sect. 3, we explain the preprocessing we applied to our dataset and what methods we used for the unsupervised clustering with DBSCAN and HDBSCAN. The results of the dataset analysis and the clustering are presented in Sect. 4. We discuss the results and their implications in Sect. 5 and present our conclusions in Sect. 6.
Source sample.
Spectroscopic parameters of the observed lines.
2 Observations and data reduction
2.1 Data
The data presented in this work were taken with the IRAM 30m single-dish radio telescope on Pico Veleta in the Sierra Nevada, Spain. The observations were carried out between October 2013 and April 2018 (PIs: Silvia Spezzano, Zofia Nagy). The data have also been used in Spezzano et al. (2017), Nagy et al. (2019), and Spezzano et al. (2020). The on-the-fly (OTF) maps were observed in position switching mode, using the EMIR E090 receiver and the Fourier transform spectrometer (FTS) backend with a spectral resolution of 50 kHz. The map sizes, sources and coordinates are listed in Table 1. The observed transitions are summarised in Table 2.
The data processing was done using the GILDAS software (Pety 2005) and the python packages pandas (Wes McKinney 2010) and spectral-cube (Ginsburg et al. 2019). All emission maps were gridded to a pixel size of 8″ with the CLASS software in the GILDAS packages; this corresponds to one-third to one-quarter of the actual beam size, depending on the frequency. To create a uniform dataset, we additionally resampled the data to a spectral resolution of 0.18km s−1, corresponding to the resolution of the lowest frequency observation (82 GHz). The antenna temperature was converted to the main beam temperature Tmb using the relation
. The corresponding values for the 30m forward (Feff) and main-beam efficiencies (Beff) are given in Table 2.
2.2 Sources
The source sample is listed in Table 1. The pre-stellar core L1544 (following the definition of Crapsi et al. 2005) and the two starless cores (L1521E, B68) are located in different star-forming regions (Taurus, Ophiuchus) and therefore cover different evolutionary stages and environmental conditions.
L1521E and L1544 are located in the Taurus molecular cloud. Both cores are located at the edge of their filament, which exposes their southern sides to the local interstellar radiation field (ISRF). The higher illumination leads to an increase of C atoms in the gas phase and subsequently enhanced abundances of carbon chains such as c-C3H2, as discussed in Spezzano et al. (2020). L1521E was classified as a very young core because of its high abundances of carbon-chain molecules and low level of CO depletion (Hirota et al. 2002; Tafalla & Santiago 2004; Nagy et al. 2019). The well-studied prestellar core L1544, on the other hand, is more evolved and shows signs of contraction, suggesting that it is on the verge of star formation (Williams et al. 1999; Ohashi et al. 1999; Lee et al. 2001; Caselli et al. 2002, 2012). Within the central 1000 au, L1544 exhibits an almost total (99.99%) freeze-out of all species heavier than Helium (Caselli et al. 2022).
The isolated starless core B68 is located in the south of the Ophiuchus molecular cloud. It shows kinematic features of oscillation, indicating a stage prior to contraction (Lada et al. 2003; Keto & Caselli 2008). Due to it being a Bok globule (and therefore isolated), it can be assumed to be exposed to uniform external illumination.
3 Methodology
3.1 Preprocessing
The integrated intensity maps were generated by calculating the zeroth moment of the emission data cubes. We used the resulting map when the emission is extended over at least one telescope beam, as our focus in this work is primarily on the spatial distribution of the molecules. To facilitate comparisons between the molecules, all maps were convolved to an angular resolution of 32″, corresponding to the half-power beam width (HPBW) of the largest observed beam. The resulting integrated intensity maps of c-C3H2, CH3OH and CH3CCH are shown in Figs. A.1 and A.2.
To compare the distribution of molecular emission across three different cores with varying map sizes and distances, we treated the individual pixels of each map as input samples, and discarded their two-dimensional spatial information. Instead, we described the environment of each pixel using the projected distance to the dust peak, the H2 column density, and the H2 column density gradient at that position. Additionally, we used the Gaussian fit parameters - velocity and linewidth - obtained from fitting each pixel’s spectrum with a one-dimensional Gaussian profile. This approach is valid as all cores in our sample display only one velocity component along the line of sight. For the analysis, we selected only the fits with a signal-to-noise ratio greater than or equal to three and an error below 70%.
We used six input features in the clustering dataset. The following features characterise one input sample or emission pixel:
Integrated intensity. The intensity was integrated over a range of ±0.7 km s−1 around the centroid velocity. While the expected linewidth for optically thin emission in our cores is around 0.5 kms−1, we used a wider interval here to account for line shifts due the velocity gradients across the cores. For the clustering, we selected only pixels with a signal-to-noise ratio of at least three. Each integrated intensity map was then normalised using the MinMaxScaler from the scikit-learn preprocessing package, scaling the values from zero to one before being added to the clustering dataset. This standardisation removes information about the absolute brightness of the molecular emission and the intensity ratios between different molecules, focusing instead on the distribution of emission across the core. In the analysis, this feature is referred to as ‘intensity’.
Velocity offset with respect to source’s VLSR. Here, we applied a selection criterium where the uncertainty in the fitted velocity position, VLSR, has to be smaller than 0.08 km s−1 (which is channel width/2.355). To calculate the relative velocity offset of the emission line, we subtracted the systemic velocity from the velocity position: Voffset = VLSR − Vsys. The systemic velocities of the cores are listed in Table 1. In the analysis, this feature is referred to as Voffset.
Linewidth. The minimum linewidth was set to be the spectral resolution of the spectra (0.18 km s−1). In the analysis, this feature is referred to as ‘linewidth’.
H2 column density. We used the H2 column density maps derived from Herschel SPIRE maps (Spezzano et al. 2016, 2020), which are shown in Fig. B.1. Similar to the integrated intensity maps, each column density map was normalised individually before being added to the clustering dataset. In the analysis, this feature is referred to as NH2.
Distance to the dust continuum peak. This feature was calculated as the distance between the equatorial coordinates of each pixel and the dust emission peak of the corresponding core. Tobe consistent across all cores, the dust emission peak was approximated by the emission peak in the H2 column density map. In the analysis, this feature is referred to as ‘dist2dust’.
H2 column density gradient. To calculate this feature, we convolved the H2 column density maps with a Gaussian derivative kernel using a standard deviation of two telescope beams, as described in Soler et al. (2013). Details of the derivation and the resulting gradients can be found in Sec. 4.1 and Appendix B, where the gradient maps are presented in Fig. B.1. As with the integrated intensity and H2 column density maps, each gradient map was normalised individually before being added to the clustering dataset. In the analysis, this feature is referred to as .
The features Voffset, linewidth, and dist2dust are normalised only when chosen for a specific sub-dataset (see Tables 3 and 4). This ensures that the feature values of all selected molecules are standardised together rather than individually, as is done for the emission.
Feature combinations used in the clustering.
Molecular transitions used in Case 1, Case 2, and Case 3.
3.2 Clustering
We used two density-based clustering algorithms in this analysis: DBSCAN (Ester et al. 1996) and HDBSCAN (Campello et al. 2013). For DBSCAN, we used the scikit-learn implementation (Pedregosa et al. 2011) and for HDBSCAN the hdbscan Python package (McInnes et al. 2017). Density-based clustering separates high-density areas from low-density areas by grouping points that are closer than a given distance threshold, which is defined by the hyperparameter epsilon. The minimum number of points required to form a cluster is set by the hyperparameter min_samples; groupings smaller than this are considered as noise. Unlike k-means clustering, which defines clusters as spherical, density-based clustering allows for clusters of arbitrary shape. DBSCAN classifies points into three types: core points, which have at least min_samples of neighbouring points within the epsilon distance; border points, which have less than min_samples of neighbouring points, but are within epsilon distance to at least one core point; noise points, which have less than min_samples of neighbouring points and are not within epsilon distance to any core point, so they are not part of any cluster. HDBSCAN extends DBSCAN by converting it into a hierarchical clustering algorithm, allowing for clusters with varying densities. Clusters are defined by the hyperparameter min_cluster_size, which sets the minimum number of points required for a group to be considered a cluster, and the hyperparameter min_samples, which sets the minimum number of neighbouring points needed for a point to be considered a core point.
We used three features at a time as inputs to both DBSCAN and HDBSCAN, which helped to simplify the output interpretation. The integrated intensity is kept as a fixed input feature to retain the information about the molecular emission, while the other two features are varied. The ten resulting feature combinations are listed in Table 3. To optimise the clustering, we perform a systematic grid search for the respective hyperparameters: epsilon (ranging from 0.05 to 0.155, in steps of 0.005) and minsamples (ranging from 10 to 100, in steps of 5) for DBSCAN; and minclustersize (ranging from 5 to 20) and minsamples (ranging from 1 to minclustersize) for HDBSCAN.
To determine the best tuning, we use the density-based clustering validation (DBCV) score (Moulavi et al. 2014). This score is particularly well-suited to validating density-based clustering methods because it accounts for outliers and noise points (unlike cross-validation, for example). To calculate the score, a kernel density function first estimates the local density of data points around each object. Then, the cluster quality is evaluated by comparing the minimum density within clusters (which represents cluster cohesion) to the maximum density between clusters (which represents cluster separation). We use the HDBSCAN function validity_index, which is a fast approximation of the original DBCV score. Although this function is provided by the hdbscan package, it only requires the data points and the corresponding cluster labels as input, and therefore we also applied it to the DBSCAN results. For HDBSCAN, we additionally used the intrinsic attribute relative_validity, which is another fast approximation of the DBCV score. Since this intrinsic score gives slightly different results from the validity_index function, we consider both scores in our evaluation. Finally, we select the best result based on the number of noise points and the relative sizes of the clusters. However, since both the validity_index function and the relative_validity attribute are only approximations of the DBCV score, we treat them as relative measures and use them only to compare results across different hyperparameters choices, not across different datasets.
In addition to the DBCV score, we applied two postprocessing criteria to determine the optimal clustering results for each method, feature combination, and dataset: (1) the number of clusters found by the algorithm has to be between two and five (inclusive), and (2) the found clusters must collectively cover at least 50% of the data points. Tests with more than five clusters showed that in this case, larger clusters are merely subdivided into smaller ones covering the same total amount of data and provide no additional insights. Conversely, when only one cluster is found, it typically contains over 90% of the input data points, offering little value to understanding the data distribution. Most datasets show two to four prominent trends that are effectively captured by the clustering. When the resulting clusters cover less than 50% of the data, they fail to represent the overall trends and instead highlight minor sub-clusters while overlooking the majority of the data.
3.3 Case studies
In addition to c-C3H2, CH3OH, and CH3CCH, the dataset used for this study covers molecules such as CCS, HC3N, HC18O+, C4H, HNCO, and CS, with between 8 to 20 detected molecules per core. However, to enhance the interpretability and extract meaningful patterns, we focused our analysis on the three key molecules c-C3H2, CH3OH, and CH3CCH. This targeted approach allowed for a clear understanding of the relationships between these molecules, and helped to maintain clarity in the complex clustering output. During the clustering process, the features of the molecular maps were combined without giving the algorithm any prior information about which data point corresponds to each molecule.
We used the following four datasets as input to the clustering:
-
Case 1. c-C3H2 vs CH3OH
The two molecules show a well-known and well-studied segregation in the sources of our dataset (Spezzano et al. 2016, 2020), which is driven by environmental effects. Through clustering, we analyse how this chemical differentiation is represented in the six features and how it influences the clusters identified by the algorithm. This approach allowed us to assess the effectiveness of the clustering technique. The transitions used in each core are listed in Table 4, along with the initial ratio between the data points of each molecule in this sub-dataset.
-
Case 2. c-C3H2 vs CH3CCH
These two molecules display a chemical differentiation in the prestellar core L1544 that is not yet understood (Spezzano et al. 2017). While CH3CCH is a carbon chain such as c-C3H2 and is therefore expected to peak in the carbon-chain rich south-east of the core, similar to c-C3H2, it instead peaks in the north-west of the core. To ensure a balanced number of data points between the two molecules, we use two transitions of CH3 CCH for B68, and three transitions for L1521E, as shown in Table 4.
-
Case 3. CH3 OH vs CH3CCH
The two molecules show a chemical differentiation in the prestellar core L1544 (Spezzano et al. 2017). Both peak in the northern part of the core, CH3OH in the north-east, and CH3CCH in the north-west. We use this combination to rule out biases that might arise from a clustering with c-C3H2.
-
Case 4. c-C3H2 vs CH3OH vs CH3CCH
We use the combination of all three molecules to validate the results from the other two case studies. This approach also helps to eliminate biases in the algorithm that might arise from comparing only two molecules. The dataset for each core contains the combined data of Case 1 and Case 2 (see Table 4).
4 Results
4.1 H2 column density gradient
To derive the H2 column density gradients for our three cores, we apply the method presented in Soler et al. (2013). Therefore, the gradient is calculated by convolving the H2 column density maps with a Gaussian derivative kernel in the x and y direction. To derive the total gradient, we combine the two directions. We use the method gaussian_filter from the Python package scipy.ndimage. To depict the filament environments of the cores, we choose a Gaussian kernel with standard deviation equivalent to two telescope beams (=2 × 32″ or 2 × 4 pixels).
The derived H2 column density gradient maps are presented in Fig. B.1 in the Appendix, alongside the corresponding H2 column density maps. The observed gradients agree with the different levels of exposure to the ISRF of the different cores (see Sect. 2.2). The isolated starless core B68 has a mostly uniform N(H2) gradient, forming a ring-like structure at the edges and a ≈ 0 in the centre, representing a uniform external illumination. For L1521E, the larger illumination towards the south is represented by an increased N(H2) gradient along the south and lower values in the protected centre. Similar to L1521E, the N(H2) gradient of the prestellar core L1544 depicts how the south of the core is more exposed to the ISRF, resulting in larger values of
. In contrast, the N(H2) gradient is much lower in the more protected centre and north of the core, where both CH3OH and CH3CCH peak.
4.2 Comparison of dataset features
Figure 1 shows a comparison of selected features of the dataset, illustrating the data points for c-C3H2 (green circles), CH3OH (blue crosses), and CH3CCH (red diamonds) observed towards B68 (left), L1521E (centre), and L1544 (right). The plots reveal distinct patterns and behaviours that vary depending on the core, molecule, and feature.
The distributions of intensity over and intensity over Voffset, shown in the top two rows of Fig. 1, reflect the distribution of the molecular emission across the cores. In B68, all molecules exhibit similar distributions, with peak intensity at the highest H2 column density. In L1521E, c-C3H2 and CH3CCH peak in the south-eastern part of the core at lower
compared to CH3OH, which peaks at the dust peak. In contrast, in L1544, all molecules peak in different locations across the core with varying projected distances to the dust peak. Velocity-wise, CH3OH in L1521E shows a slightly different behaviour compared to the carbon chains: at high intensity it extends to higher Voffset, while at lower intensity it spreads to lower Voffset. Additionally, in L1544, CH3CCH spans a broader velocity range than c-C3H2 and CH3OH, suggesting that it traces a different layer of the core.
The c-C3H2 emission shows a wide range of linewidths, which are in general broader and reach higher values compared to the other molecules (see third row in Fig. 1), likely tracing more turbulent material (compare e.g. Lin et al. 2022 for L1544). In L1521E, the linewidths of c-C3H2 have a more compact distribution around a value of 0.4kms−1. For CH3OH, the linewidths are typically smaller in B68 and L1544, between 0.25-0.30 km s−1 and O.30-0.35 kms−1, respectively, while in L1521E, they are slightly higher, around 0.4kms−1, with some values reaching up to 0.65 km s−1. The linewidths of CH3CCH are more compactly distributed at lower values in the two starless cores (averaging around 0.3 km s−1), but extend up to 0.5 km s−1 in the prestellar core L1544, possibly indicating the presence of more turbulent material.
The -intensity distribution also reflects the morphology of the molecular emission across the cores (see bottom row in Fig. 1). In L1521E, all molecules display a similar behaviour, with the highest intensities occurring at high, though not maximum,
values. In contrast, in L1544, the c-C3H2 intensity peaks at a much higher
than CH3OH and CH3CCH. This illustrates the active photochemistry in the south of the core caused by the external illumination of the core, leading to an increased abundance of carbon-chains such as c-C3H2. This is supported by a local peak in CH3CCH emission in the south and a sharp decline in CH3OH intensity at a higher
. In B68, which has a more spherical shape and is uniformly illuminated, the molecular emissions peak at lower
values in the protected centre of the core. The emission then shows a rather steep decline at a higher
in the outer parts of the core. Additionally, CH3CCH exhibits lower intensity values even at a lower
, because the emission map is less spatially extended across this core compared to c-C3H2 and CH3OH (see Figs. A.1 and A.2).
In summary, the plots point out the wealth of chemical and physical differences between the cores. In L1544, all molecules are clearly separated and behave differently, which indicates a more profound chemical segregation in this core. In contrast, in B68 and L1521E, the molecules show a more similar behaviour and are less segregated, which could be linked to their earlier evolutionary stages compared to L1544. Consequently, an unbiased approach such as clustering can provide valuable insights into the varying chemical environments across these cores.
![]() |
Fig. 1 Comparison of different features for B68 (left), L1521E (middle), and L1544 (right), with c-C3H2 given in green (circle), CH3 OH in blue (cross), and CH3 CCH in red (diamond). |
4.3 Clustering
In Figs. 2, C.1, and C.2, we show the clustering results for B68, L1521E, and L1544, respectively. They present the results for the feature combinations 2, 3, 4, 9, and 10 for Case 1 and Case 2 (for details see Table 3). The remaining results for Cases 1 and 2, together with the results for Cases 3 and 4 are published on Zen-odo, in Figs. C.3–C.14. The molecular ratio (i.e. the ratio of data points belonging to each of the molecules in a specific cluster) and the number of data points assigned to a cluster are given in Table C.1. In Sections 4.3.1−4.3.4, we describe the results for each case individually.
Both the DBSCAN and the HDBSCAN results are included in the analysis. However, to improve readability, only one result is represented for each combination. This was decided manually for each combination based on the amount of noise points and the number of clusters. This ensures that all scientific results discussed in this work are also presented visually.
To evaluate the possible chemical segregation within the clusters, we focused on imbalanced clusters, where the molecular ratio deviates from the initial ratio by at least 10% (e.g. 37/63 instead of 47/53). These imbalanced clusters are particularly insightful because they highlight regions where specific molecular abundances diverge from the average distribution, and they potentially indicate distinct physical or chemical processes at work. The imbalanced clusters are marked in boldface in Table C.1. Our primary focus is to extract meaningful chemical and scientific insights from the clustering patterns. In fact, many clusters exhibit an excess of, or are dominated by, one molecule, indicating chemical differentiation across all three cases and all three cores in this study. However, clusters that exclusively contain data points from a single molecule are rare and typically contain only a small number of points (N ≤ 20). Overall, we prioritised clusters that cover at least one telescope beam, which corresponds to a size of roughly 20 data points or pixels.
![]() |
Fig. 2 Clustering results for the starless core B68 for the dataset of Case 1 (left) and the dataset of Case 2 (right) for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, Voffset, and dist2dust), combination 3 (intensity, Voffset, and |
4.3.1 Case 1: c-C3H2 vs CH3OH
In some results, the clusters vary greatly in size. The largest cluster typically maintains a balanced molecular ratio, similar to the input ratio (see Table 4), while the smaller clusters tend to show more variation (see Table C.1). For L1521E and L1544, the input data has a slightly higher proportion of CH3OH compared to c-C3H2 (see Table 4), causing the largest cluster to often show a slight excess in CH3OH. In the following section, we discuss the clustering results for each core individually.
B68. Imbalanced clusters with an excess of c-C3H2 or CH3 OH are spatially separated into different regions of the core. CH3OH-dominated clusters are concentrated around the dust peak at the centre of the core, characterised by features such as high intensity and high (e.g. red clusters in combs. 2, 4, and 7, or yellow cluster in comb. 3; see Figs. 2 and C.3, and Table C.1). In contrast, c-C3H2−dominated clusters are confined to the (south-)west region and associated with data points of moderate intensity (see yellow and purple clusters in comb. 2 in Fig. 2). In combination 9 and 10,
and
are structured in ring-like clusters around the dust peak (see Fig. 2), all dominated by CH3OH, and with no significant contribution from c-C3H2. Interestingly, in B68, the largest cluster of each combination (shown in blue) is slightly imbalanced towards CH3OH (on average 3%, see Table C.1), even though the initial molecular ratio is 50/50. The smaller clusters, however, display greater variation.
L1521E. In this core, many features are clustered into separate structures dominated by either CH3OH or c-C3H2. This molecular segregation is evident across multiple combinations: The intensity-dist2dust distribution is clustered into two separate diagonals (see blue and red clusters in combs. 2 and 5 in Figs. C.1 and C.4). The lower intensity diagonal (blue) corresponds to regions in the north-west of the core and is imbalanced towards c-C3H2 (see Table C.1). Conversely, the upper intensity diagonal (red) is associated with the south-east of the core and is dominated by CH3 OH. The same cluster distribution, with similar molecular imbalance, appears in the intensity- and the intensity-Voffset distributions: clusters dominated by c-C3H2 (in the north-west) show high
and high Voffset (see blue cluster in comb. 2, red cluster in comb. 3), while CH3 OH-dominated clusters (south-east) show low
and lower Voffset (see red clusters in comb. 2 and 9, and blue cluster in comb. 3). Additionally, combination 2 (Fig. C.1) shows that the CH3 OH-dominated clusters appear as a narrow diagonal structure in dist2dust over Voffset, while the c-C3H2 clusters present a separate, more diffuse distribution. To summarise, c-C3H2 and CH3OH show a separation in
and Voffset in L1521E, and cluster into separate diagonals in the intensity-dist2dust distribution. Beyond the north-west/south-east separation, the Voffset−
distribution is split into c-C3H2 at high
and mid Voffset in the south of the core, and CH3OH at mid
and lower Voffset, in the east (c-C3H2: blue cluster in comb. 4; CH3OH,: red cluster in comb. 4; see Figs. C.1 and C.4). Additionally, CH3OH shows a cluster at the core centre, characterised by high intensity and high
(see yellow cluster in comb. 9 in Fig. C.1).
L1544. Similar to L1521E, many features in L1544 are clustered into separate structures (Voffset over intensity, dist2dust over intensity, over Voffset, dist2dust over Voffset, linewidth over Voffset,
over Voffset,
over dist2dust), dividing the core into north/south, and on-centre/off-centre regions. However, unlike in L1521E, these structures do not consistently correspond to a specific molecular segregation. A separation of the two molecules is visible in Voffset and
in combination 4, where a c-C3H2−dominated cluster (red) is concentrated in the northern part of the core and a CH3OH-dominated cluster (blue) is found in the south, both with lower intensity (see Fig. C.2). Additionally, c-C3H2 clusters appear at the core centre with high intensity, high linewidth, high
and low
(see cyan cluster in comb. 3 in Fig. C.2, and red clusters in combs. 7 and 10 in Figs. C.2 and C.5). CH3OH, on the other hand, forms a cluster on the CH3CCH peak in the north-west, characterised by low Voffset and higher intensity (see red cluster in comb. 2 in Fig. C.2). It also appears across the northern part of the core with lower intensity and low Voffset (red cluster in comb. 3, see Fig. C.2). In combination 10, c-C3H2 (red) is clustered on the dust peak, while CH3OH (blue) is off-peak, with separation visible in
, and
(see Fig. C.2). A similar split appears in
in combination 6 (see C.5), though both corresponding clusters (blue and red) are imbalanced towards CH3OH without a molecular separation.
Summary of Case 1. All cores display cluster structures with a clear separation between c-C3H2 and CH3OH. However, the molecules are not necessarily clustered on their respective peaks. The molecular separation is primarily visible in the features intensity, Voffset, and , and for L1521E and L1544 also in
. In general, the clustering reveals recurring structures in several feature combinations, where one molecule dominates over the other. Additionally, B68 and L1544 show structures in
and
that divide the core into on-centre and around-centre. These divisions, however, are not always linked to a molecular separation.
4.3.2 Case 2: c-C3H2 vs CH3CCH
Similar to Case 1, c-C3H2is slightly underrepresented in the datasets for L1521E and L1544, resulting in clusters that are more imbalanced towards CH3CCH. For B68, the opposite is true, resulting in a slight excess of c-C3H2 in many clusters. In the following, we discuss the clustering results for each core individually:
B68. The clusters dominated by either c-C3H2 or CH3CCH are spatially separated, showing behaviour similar to the segregation of c-C3H2 and CH3OH in Case 1. High intensity CH3CCH is clustered at the core centre, where the molecule peaks (see red and cyan clusters in comb. 4, red cluster in comb. 8, and cyan and yellow clusters in comb. 10 in Figs. 2 and C.6). In contrast, c-C3H2 is clustered off its emission peak, along the east side of the core, with lower intensity (see red clusters in combs. 1, 3, and 9 in Fig. 2). Beyond intensity, the molecular separation is also evident in Voffset and : c-C3H2 is associated with high Voffset and low
, while CH3CCH is associated with low Voffset and high
. Interesting to note is that this c-C3H2 cluster is shaped along high values of
(see Fig. B.1), which is not included in the mentioned feature combinations (1, 3, 9). In combination 10, a c-C3H2 -dominated cluster forms a broad ring (blue), corresponding to high values of
(see also Fig. B.1), surrounding CH3CCH-dominated clusters in the core centre (cyan and yellow). This separation is visible in
and
(see Fig. 2).
L1521E. Similar to Case 1, clusters dominated by c-C3H2 and CH3CCH are spatially separated, forming separate structures in various features (e.g. Voffset over intensity, dist2dust over intensity, dist2dust over Voffset, over intensity,
over linewidth). As in Case 1, c-C3H2 is clustered in the north and north-west, characterised by high
, high Voffset, and appearing as lower diagonal in the intensity-dist2dust distribution (see red clusters in combs. 2, 3, and 6, and blue clusters in combs. 5 and 9 in Figs. C.1 and C.7). In contrast, CH3CCH-dominated clusters are found at low
and low Voffset, similar to CH3OH in Case 1, building the upper diagonal in the intensity-dist2dust and located in the south and south-east of the core (see blue clusters in combs. 2, 3, and 9, and red cluster in comb. 5 in Figs. C.1 and C.7). CH3CCH-dominated clusters are concentrated around the carbon peak in the south-east and are associated with higher
(see cyan and yellow clusters in comb. 4, cyan cluster in comb. 6, and cyan and purple clusters in comb. 10 in Figs. C.1 and C.7). Additionally, a low intensity c-C3H2 cluster is located along the sharp edge of the core in the south-west (see cyan cluster in comb. 9 in Fig. C.1). Similar to B68, the shape of this cluster follows values of high
, even though this feature is not included in combination 9.
L1544. For c-C3H2 and CH3CCH, the clustered features reveal structures similar to those found in Case 1 (e.g. in Voffset over intensity, linewidth over Voffset, over Voffset,
over Voffset,
over linewidth), which again divide the core into north/south and on-centre/off-centre regions. As in Case 1, c-C3H2 is predominantly associated with the northern part of the core, while CH3CCH is dominant in the south. This separation is visible in Voffset and
(c-C3H2: red clusters in combs. 1, 2, and 4; CH3CCH: blue clusters in combs. 1 and 4; see Figs. C.2 and C.8). In addition, both molecules cluster in the core centre, exhibiting high intensity, higher Voffset and higher linewidth (c-C3H2: cyan cluster in comb. 1; CH3CCH: yellow cluster in comb. 1, and red cluster in comb. 6; see Figs. C.2 and C.8). In combination 3, a CH3CCH-dominated cluster also covers the c-C3H2 peak in the south-east of the core, characterised by high intensity, high
, and high Voffset (see red cluster in Fig. C.2).
Summary of Case 2. Clusters dominated by c-C3H2 or CH3CCH show spatial separation across all cores, similar to the segregation observed between c-C3H2 and CH3OH in Case 1, revealing distinct structures in various features combinations. The molecular segregation is evident in the same features as in Case 1, intensity, Voffset, and , with B68 and L1544 also showing separation in
. Overall, the clustering shows the same strong divisions of the cores into north/south, east/west, on-centre/off-centre regions as seen in Case 1, highlighting structural and chemical themes across the cores.
4.3.3 Case 3: CH3OH vs CH3CCH
For B68, CH3CCH is slightly underrepresented in this dataset, resulting in clusters that are more imbalanced towards CH3OH. In the following, we discuss the clustering results for each core individually.
B68 (Fig. C.9). CH3OH-dominated clusters and CH3CCH-dominated clusters are not clearly spatially separated. Instead, both are found in the central area of the core and on the core centre (CH3OH: blue cluster in combs. 1, 2, 6, 9, cyan cluster in comb. 6; CH3CCH: red cluster in comb. 1 and cyan cluster in combs. 8 and 10), reproducing the clustering behaviour of Case 1 for CH3OH and Case 2 for CH3CCH. A direct, feature-wise separation of the two molecules occurs only in combination 1 in intensity, with CH3OH at higher and CH3CCH at lower values. Apart from that, CH3OH is clustered along the east of the core with high Voffset, and along the west of the core with low Voffset (east: cyan cluster in comb. 1, red cluster in combs. 2 and 3; west: yellow cluster in comb. 4). The shapes of those clusters follow high values of , even though this feature is only included in combination 4, resembling patterns of c-C3H2 in Case 2 (east) and Case 1 (west). In combinations 8 and 10, the clustering creates ring-like structures around the dust peak, visible in
, dist2dust, and
, with CH3CCH concentrated at the centre and CH3OH forming the outer rings - reflecting structures found in Case 1 (CH3OH) and Case 2 (CH3CCH).
L1521E (Fig. C.10). As in Cases 1 and 2, we see a spatial separation of the two input molecules, CH3OH and CH3CCH, but here it is less pronounced. CH3CCH is primarily clustered in the (south-) east of the core, similar to Case 2 (e.g. see blue and red clusters in comb. 1, cyan cluster in combs. 4, 5 and 10, yellow cluster in combs. 7 and 8). In contrast, CH3OH is clustered along the north of the core (see red cluster in combs. 3 and 8, blue cluster in comb. 9), showing patterns similar to c-C3H2 in Case 1 and Case 2, but contrary to its own clustering behaviour in Case 1. In terms of features, we see an indirect separation in values: CH3CCH is associated with mid
, and CH3OH with higher
(CH3CCH: cyan cluster in comb. 10; CH3OH: blue cluster in comb. 9). Combination 5 shows a direct spatial separation of the two molecules into east (CH3CCH) and west (CH3OH), visible as upper and lower diagonal in the intensity/dist2dust-distribution. Apart from that, CH3CCH is clustered at the sharp edge of the core in the south-west, with high
and mid Voffset (see red cluster in comb. 4, blue cluster in comb. 7), similar to c-C3H2 in Case 2. Both CH3OH and CH3CCH are additionally clustered at the core centre, at high
, with CH3OH at high intensity (see red cluster in comb. 6), and CH3CCH at lower intensity (see cyan cluster in comb. 3, purple cluster in comb. 6 and yellow cluster in comb. 9). This pattern was not observed for either CH3OH or CH3CCH in Case 1 or Case 2.
L1544 (Fig. C.11). For CH3OH and CH3CCH, the clustered features reveal structures similar to those found in Case 1 and Case 2 (e.g. in Voffset over intensity, linewidth over Voffset, over Voffset,
over Voffset,
over linewidth). As before, this leads to a division of the core into north/south and on-centre/off-centre regions. CH3OH exhibits a north-south separation, visible in the features Voffset, intensity, and
(see blue and red clusters in combs. 1-4), similar to the pattern seen in Case 1. In contrast, CH3CCH is predominantly clustered in the south and on the c-C3H2 peak (see yellow cluster in combs. 3, 4, and 7), as well as at the core centre (see yellow cluster in comb. 2, cyan cluster in comb. 3, red cluster in combs. 5, 6 and 9), which were both observed in Case 2. Additionally, CH3CCH is clustered at its molecule peak in the north-west, with high intensity, high linewidth, and low
, which was not seen in Case 2 (see cyan cluster in comb. 7). Direct molecular segregation occurs only in comb. 3, where CH3CCH is clustered at the core centre (cyan cluster) with high intensity and high
, while CH3OH is clustered around the centre at lower intensity and lower
(blue and red clusters). Ring-like cluster structures are visible in
and
in combinations 6, 8, 9, and 10; however, they do not display any molecular segregation but instead a balanced ratio between CH3CCH and CH3OH.
Summary of Case 3. All cores display cluster structures with feature-wise or spatial separation between CH3 OH and CH3CCH. However, the molecular segregation is less distinct than in Case 1 and Case 2, and it is mainly visible in the features intensity, Voffset, , and
. In B68 and L1544, the clustering predominantly reproduces the structures and clusters found in Case 1 and 2. All three cores show minor differences of cluster behaviour compared to Case 1 and Case 2, where CH3 OH or CH3CCH behave similar to c-C3H2. This is particularly evident in L1521E, where CH3OH and CH3 CCH show a more distinct separation, similar to Case 1 and 2.
4.3.4 Case 4: c-C3H2 vs CH3OH vs CH3CCH
With a dataset containing three molecules, it is more difficult and less clear to identify molecular segregation, as the ratios between the molecules mostly do not show big variations from the initial ratio of the dataset (the initial ratios for c-C3H2/CH3OH/CH3CCH are 37/36/27 for B68, 29/35/36 for L1521E, and 28/39/33 for L1544). In the following, we discuss the clustering results for each core individually.
In B68 (see Fig. C.12), c-C3H2 is clustered in a shell along the east, reproducing the structure of Case 1. CH3CCH is clustered in the core centre, reproducing the behaviour in Case 2 and Case 3. However, CH3 OH is not clustered in the core centre as in Case 1 but instead around the centre in shells along the east and along the west, similar to what was found in Case 3. Additionally, c-C3H2 and CH3OH show concentric clusters around the centre, visible in , similar to before.
In L1521E (see Fig. C.13), c-C3H2 is clustered in the northern part of the core, reproducing the clusters in both Case 1 and 2. It also shows the small cluster along the sharp edge in the south-west of the core, seen in Case 2. CH3 CCH is clustered in the south of the core and the core centre, reproducing the cluster structures of Case 2 and Case 3. CH3 OH, on the other hand, is not clustered in the south as seen in Case 1 but instead in the core centre, as in Case 3 (where its emission peaks) and the north-west of the core.
In L1544 (see Fig. C.14), the division of the core into north/south is visible in Voffset for CH3OH, similar to Cases 1 and 3. For c-C3H2, the association with the northern part seen in Case 1 and Case 2 cannot be reproduced. Instead, it is clustered only on the CH3OH peak in the north-east of the core. The cluster in the core centre can be reproduced. Additionally, both c-C3H2 and CH3 OH are clustered on their respective molecular peaks, which was not seen in Case 1 or Case 2. For CH3CCH, both the cluster in the core centre and on the c-C3H2 peak are reproduced. However, Case 4 does not recreate CH3CCH as being associated with the southern part of the core in the northsouth division as in Cases 2 and 3. Also, the molecular separation in and Voffset that was seen in combination 4 in Case 1 and 2 is not reproduced with this combined dataset.
To summarise, in B68 and L1521E, the clustering behaviour of c-C3H2 and CH3CCH seen in Case 2 can be reproduced, but CH3OH behaves now differently than in Case 1. In L1544, however, the behaviour of CH3 OH and part of CH3 CCH seen in Case 1 and Case 2 can be reproduced, but not the behaviour of c-C3H2. We discuss this further in Sect. 5.1.
![]() |
Fig. 3 Abundances of CH3 CCH and c-C3H2 at the dust peaks of the cores. The molecular column density was calculated assuming Tex = 8 K (see text for details). The starless cores are marked with an asterisk. |
4.4 CH3 CCH abundances
This section focuses on comparing the CH3CCH abundances towards the dust peaks of the different cores. In addition to B68, L1521E, and L1544, we include data from the prestellar cores HMM-1, L429, L694-2, and OphD, which were observed but not analysed in the IRAM project of Spezzano et al. (2020). To calculate the abundances at the dust peaks, we divide the CH3CCH column density by the H2 column density. The H2 column density at the dust peak is extracted from the respective N(H2) map (derived from Herschel SPIRE observations, Spezzano et al. 2020) using a circular aperture with a diameter of 16″ (matching the Herschel map pixel size). We assume a 20% uncertainty for the resulting values. To derive N(CH3CCH), we convolve the CH3CCH spectral cubes with the 40″ beam of the Herschel telescope and extract the spectrum at the dust peak using the same 16″ aperture. The column density is calculated using a one-dimensional Gaussian fit, and following Mangum & Shirley (2015), with the respective spectroscopic parameters listed in Table 2. Since we do not have sufficient data to determine a precise excitation temperature via a rotational diagram for all cores, we adopt a standard excitation temperature of 8 K for consistency. A lower (higher) excitation temperature only affects the abundances of CH3CCH (and c-C3H2 in L1544) by shifting them to higher (lower) values, but the overall trend stays the same. For each core, we select the most optically thin transition with the smallest (propagated) error in column density. Specifically, we use the CH3CCH (51-41) transition for most cores, except for L694-2 and L1544, where the (50-40) and (61-51) transitions are used, respectively.
For comparison, we also calculate the abundances of c-C3H2, using the (202-111) transition for all cores except L1544, where the (32,2−31,3) transition is applied. Figure D.1 displays the extracted spectra along with their Gaussian fits. Figure 3 presents the resulting CH3CCH (blue circles) and c-C3H2 (orange squares) abundances at the dust peaks of the different cores for an assumed excitation temperature of 8 K.
The CH3CCH abundances of the starless cores (see left part of Fig. 3) are about one order of magnitude higher than the values of the prestellar cores (see right part of Fig. 3), suggesting an evolutionary trend of CH3CCH from the starless to the prestellar phase. Notably, the CH3CCH abundances in L1544 are significantly higher compared to the other prestellar cores and even compared to the starless cores. This suggests that the observed variations in CH3 CCH are probably influenced not only by the evolutionary stage but also by environmental factors. This is be discussed further in Sect. 5.2. However, to further study the interplay between environmental and evolutionary or dynamical effects on the CH3CCH abundance spread, additional work is necessary that goes beyond the scope of this paper.
The c-C3H2 abundances show much less variation than those of CH3CCH and do not exhibit a significant difference between the starless and the prestellar stages. The abundances are spread within one order of magnitude, suggesting that c-C3H2 and CH3CCH trace different layers within the core. This indicates that c-C3H2 is largely unaffected by the evolution from the starless to the prestellar stage.
5 Discussion
5.1 Density-based clustering
Segregation between c-C3H2 and CH3OH. Density-based clustering is able to find molecular differentiation in our dataset. In particular, the clustering with the dataset containing c-C3H2 and CH3OH (Case 1) successfully reproduces the known molecular segregation between these two molecules in B68, L1521E, and L1544. This segregation is attributed to uneven illumination across the cores, as discussed in Spezzano et al. (2016, 2020). The differentiation mainly appears in the features intensity, Voffset, , and
. In addition, the following pairs of features frequently show segregation: intensity/Voffset, intensity/dist2dust, intensity/
, intensity/
, Voffset/
, Voffset/
.
Segregation between c-C3H2 and CH3CCH. The clustering analysis of the dataset containing c-C3H2 and CH3CCH (Case 2) reveals molecular segregation between these two carbon chains in all three cores. Like in Case 1, the segregation appears in the features intensity, Voffset, ,
, and similar pairs of features. However, in B68 and L1521E, a differentiation between these two molecules is not apparent in their emission maps (see Figs. A.1 and A.2) and was therefore previously unrecognised. The segregation between c-C3H2 and CH3CCH suggests that these molecules trace different layers in the cores, representing different physical conditions. A similar result was discussed in Lin et al. (2022), where c-C3H2 was found to trace lower density regions than for example CH3 OH.
In B68 and L1521E, the emission of CH3CCH is less spatially extended compared to c-C3H2 (see Figs. A.1 and A.2). In B68, the CH3CCH emission is concentrated on the core centre, while in L1521E, it is primarily located in the eastern part of the core. Due to the lower number of available data points for CH3CCH compared to c-C3H2, the clustering dataset in Case 2 includes two CH3CCH transitions for B68 and three transitions for L1521E (as detailed in Table 4). This results in a higher density of data points in the core centre of B68 and the eastern part of L1521E, increasing the likelihood of forming clusters at these locations with density-based algorithms such as DBSCAN and HDBSCAN. The incomplete coverage of CH3CCH across the cores becomes more apparent in Case 3 (CH3 OH, and CH3 CCH) and Case 4 (c-C3H2, CH3OH, and CH3CCH). In Case 4, CH3CCH forms clusters similar to those in Case 2 (c-C3H2and CH3CCH), but CH3OH shows different behaviour compared to Case 1 (c-C3H2 andCH3OH). In contrast, in L1544 - where the emission maps of c-C3H2, CH3OH, and CH3CCH extend across the entire core - the clustering results of Case 4 differ significantly from Case 1 and 2. In Case 3, on the other hand, most clusters found in Case 1 and 2 are recreated. The only exception is CH3OH in L1521E, that mimics the clustering behaviour of c-C3H2 instead.
Similarities between CH3OH and CH3CCH. Our analysis reveals that in B68 and L1521E, the clusters dominated by CH3OH in Case 1 behave very similarly to those dominated by CH3CCH in Case 2, despite the fact that the CH3OH emission is as spatially extended as c-C3H2 in both cores and CH3CCH is not. The CH3CCH- and CH3OH-dominated clusters are associated with the same features -NH2 and Voffset for L1521E, intensity for B68 - and are located in the same regions within the cores (south-east for L1521E, and the core centre for B68). Additionally, these clusters are spatially distinct from the c-C3H2 clusters. As shown in Fig 1, c-C3H2 exhibits broader linewidths than CH3OH and CH3CCH across B68, likely indicating that it traces a different, more turbulent layer. These clustering results may therefore reflect differences in the physical layers traced by the molecules.
In L1544, the similarity in cluster behaviour between CH3OH and CH3CCH relative to c-C3H2 is observed in only one feature pairing: and Voffset (see combination 4). This could be linked to both CH3OH and CH3CCH tracing gas influenced by slow accretion flows. For CH3OH, such an association has been demonstrated and discussed, for instance, by Lin et al. (2022). In Sec. 5.2, we further explore how the CH3CCH peak in L1544 may also be impacted by inflowing gas.
Relevance of different features in the clustering analysis. In our clustering analysis, Voffset appears to be a dominant feature that drives the division of the cores into the different clusters, and in some cases, molecular separation. In L1521E, the core is divided into north-west/south-east, while L1544 shows a north/south split, both reflecting the velocity structure of the cores. A similar split is indicated in B68, with a separation between the core centre and a shell along the eastern side, although this division is less pronounced. The strong dependence of velocity structure with chemical prominence indicates that static chemical models might not be sufficient to predict observed features in full. While the overall physical structure of these cores is generally well-described by quasi-static models, understanding the anisotropic chemical structures requires a more dynamic approach (see also Lin et al. 2022).
Both B68 and L1544 show clusters with concentric ring structures around their core centres, following the patterns of and
. This ring-like pattern is a result of the rather spherical shape of these cores. In contrast, L1521E, which is more elongated and irregularly shaped, does not show this pattern. The clustering analysis of both L1521E (Case1) and L1544 (Case1/Case2) reveals molecular separation in the Voffset−∇VNH2 distribution, which does not appear for B68. This difference may be due to environmental factors, as B68, a Bok globule, is exposed to relatively uniform external illumination. In L1521E, the clusters with the highest
values in this feature pairing (see combination 4) are dominated by c-C3H2. These data points are located at the filamentary edge in the southern part of the core, where c-C3H2 peaks and external illumination is strongest. In L1544, the data points with highest
values also come from the southern part of the core, near the filamentary edge with high external illumination. However, in this case they are dominated by CH3OH and CH3CCH instead of c-C3H2. This suggests that the clustering results reflect the distinct environmental conditions within each core. Overall, the clusters with prominent
and
features appear to represent the chemical patterns across the core structures, with differences in the clustering likely tied to varying environmental conditions.
The features dist2dust and linewidth appear to be less significant in our analysis, as they rarely exhibit distinct structures or molecular separations by themselves. However, when combined with other features, such as dist2dust/intensity or linewidth/Voffset, they provide additional insights. The onedimensional projected distance to the dust peak seems less relevant in the clustering analysis compared to and
, as these two features better characterise the immediate environment of a data point or pixel.
![]() |
Fig. 4 Centroid velocities (top) and linewidths (bottom) of c-C3H2 (left), CH3CCH (right), and CH3OH (middle) towards the prestellar core L1544. Black contours show 50% and 90% of the respective molecular emission peak. White contours show 30%, 50%, and 90% of the H2 column density peak derived from Herschel maps (Spezzano et al. 2016). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32″). |
5.2 Evolution traced by CH3CCH
In the starless cores B68 and L1521E, the distribution of the CH3CCH emission overlaps with that of c-C3H2 (see Fig. A.1 and Fig. A.2). In L1544, however, the peak of the CH3CCH emission is not located in the south-east of the core, where the other carbon chains are found, but rather in the north-west. It is known that in the north-east of L1544, around the CH3 OH peak, two filaments converge (e.g. see André et al. 2010; Spezzano et al. 2016). This may result in slow accretion flows (Punanova et al. 2018; Lin et al. 2022), which could deliver fresh material to the core and help replenish CH3CCH.
To investigate the formation and destruction routes of CH3CCH, we conducted chemical simulations using the gasgrain chemical model pyRate (Sipilä et al. 2015), applied to the standard physical model of L1544 (Keto et al. 2015). For the gas-phase chemical network we adopted the 2014 public release of the KIDA chemical network (kida.uva.2014; Wakelam et al. 2015), while the grain-surface network is an updated version of the one presented in Semenov et al. (2010). The simulation yields radial abundance profiles as a function of time; we checked the results at an evolutionary time of 105 yrs. The model shows that at intermediate densities (n ~ 104 cm−3), CH3CCH is mainly formed by the dissociative recombination of C3H5+:
(1)
while it mainly gets destroyed by the reaction with free carbon:
(2)
Following this, in regions with high irradiation and therefore active gas-phase chemistry, such as the carbon-chain peak in L1544, CH3CCH is quickly destroyed due to the high amounts of atomic carbon present in the gas phase. In contrast, the north-western part of L1544 is more shielded from irradiation, allowing CH3CCH to form from fresh material brought in by the filamentary flow from the north-west. Here, the low abundance of free carbon in the gas phase protects the CH3CCH from destruction. This is further supported by the velocity and linewidth maps of CH3CCH, as shown in Fig. 4, along with the results for c-C3H2 and CH3OH. The linewidth map of CH3CCH (bottom right) shows increased linewidths near the emission peak, while the velocity map (top right) reveals a sharp velocity gradient in the same area. This suggests that the CH3CCH emission peak in L1544 is the landing point or accumulation point of the incoming fresh gas. Similar signs of this active chemistry are observed in c-C3H2. The integrated intensity map (Fig. A.2) shows a local maximum in this region, while the linewidth map (bottom left in Fig. 4) exhibits the highest linewidths not at the c-C3H2 peak in the south-east but at the CH3CCH peak in the north-west. The velocity map of c-C3H2 (top left) also shows a gradient around this area.
The fact that CH3CCH is an early-type molecule explains why, in our dataset, extended CH3CCH emission is only observed in the starless cores (B68, L1521E) and the prestellar core L1544 due to the possible material accretion. In contrast, in the other prestellar cores (HMM1, OphD, L694-2, L429), we detect some emission at the respective dust peaks, but no significant emission beyond that. This evolutionary trend is further supported by the CH3CCH abundances at the dust peaks (shown in Fig. 3), where the abundances in the starless cores are approximately one order of magnitude higher than those in the prestellar cores, except for L1544.
6 Conclusions
We have presented an analysis of molecular differentiation using the density-based clustering algorithms DBSCAN and HDB-SCAN. The clustering was applied to four different datasets, in order to compare the emission morphologies of c-C3H2, CH3OH, and CH3CCH observed towards the starless cores B68 and L1521E and the prestellar core L1544.
Our main results can be summarised as follows:
The analysis with density-based clustering finds a significant chemical differentiation across the cores in our dataset. It successfully reproduces the known molecular segregation of c-C3H2and CH3OH for B68, L1521E, and L1544. Furthermore, the clustering analysis identifies a segregation between c-C3H2 andCH3CCH in all three cores, which is not apparent from visual inspection of the emission maps;
The most relevant features in the clustering analysis are integrated intensity, velocity offset, H2 column density, and H2 column density gradient. Distinct and recurring cluster structures in the H2 column density and the gradient highlight structural and chemical patterns across the cores. Differences in the relevance of these two features for the three cores reflect the varying environmental conditions within each core. The strong relation between molecular emission and velocity structure suggests that to understand anisotropic chemical structures, static chemical models are not sufficient, but dynamical models are necessary;
Increased CH3CCH abundances towards the starless cores compared to prestellar cores indicate an evolutionary trend. Increased CH3CCH abundances towards L1544 suggest an additional influence of environmental factors. In fact, in L1544, the CH3CCH peak in the north-west of the core appears to trace the landing point of chemically fresh gas that is accreted to the core. Unlike the photochemically active south of the core, this area is shielded from external irradiation, which protects CH3CCH from being destroyed by free carbon atoms;
The clustering analysis finds a similar behaviour between CH3OH and CH3CCH relative to c-C3H2 in all cores. This indicates that c-C3H2 traces an outer layer of gas and possibly a lower-density shell compared to the other two molecules. In L1544, the similar clustering patterns observed for CH3 OH and CH3CCH may reflect the influence of accretion processes in shaping the molecular distribution.
Our results demonstrate that a successful density-based clustering approach for studying astrochemical processes does not require a large dataset covering multiple molecules across various cores. In fact, the results are often easier to interpret when only two or three molecules are considered. While this clustering method is more time-consuming than techniques such as principal component analysis, it can process much more detailed information and provide deeper insights into the core’s structure.
Using the more general approach of describing a data point’s location through its H2 column density and its gradient, rather than relying on spatial coordinates, also enables simple comparisons between cores. In future studies, we aim to expand our analysis of molecular differentiation with density-based clustering to include more cores and molecules, especially those that trace other physical or chemical features. This will also help explore any evolutionary effects that the cores or their environment might have on the molecular distribution.
Data availability
Figures C.3-C.14, presenting the detailed clustering results for Cases 1-4, are published on Zenodo (zenodo.org/records/15519030).
Acknowledgements
We wish to thank the anonymous referee for their constructive comments. K.G. thanks Caroline Gieser for useful discussions. S.S. and K.G. wish to thank the Max Planck Society for the Max Planck Research Group funding. All others authors affiliated to the MPE wish to thank the Max Planck Society for financial support.
Appendix A Integrated intensity maps
The integrated intensity maps observed towards B68, L1521E, and L1544 are shown in Fig. A.1 (c-C3H2 and CH3OH), and in Fig. A.2 (CH3CCH).
![]() |
Fig. A.1 Integrated intensity maps of c-C3 H2 and CH3OH observed towards B68 (Spezzano et al. 2020), L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2016). The solid line contours indicate the 3σ level of the integrated intensity, except for CH3 OH in L1544, where they indicate the 9σ level. The dashed line contours represent 90%, 50%, and 30% of the H2 column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32"). |
![]() |
Fig. A.2 Integrated intensity maps of CH3 CCH observed towards B68, L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2017). The solid line contours indicate the 3σ level of the integrated intensity for B68 and L1521E, and the 6σ level for L1544. The dashed line contours represent 90%, 50%, and 30% of the H2 column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30m telescope (32"). |
Appendix B H2 column density gradient maps
Figure B.1 shows the H2 column density gradient maps derived from Herschel SPIRE maps (see Spezzano et al. 2016, 2020) and the corresponding H2 column density maps, for B68, L1521E, and L1544, respectively.
![]() |
Fig. B.1 H2 column density maps (left) derived from Herschel SPIRE maps (Spezzano et al. 2016, 2020) and the corresponding H2 column density gradient maps (right) for B68 (top), L1521E (middle), and L1544 (bottom). The dashed line contours represent 90%, 50%, and 30% of the H2 column density peak. The red rectangle marks the location and size of the emission maps observed towards each core. The red circle in the bottomleft corner indicates the Herschel beam size (40"). |
Appendix C Detailed clustering results
The clustering results for Case 1 and Case 2 are visualised in Fig. C.1 and Fig. C.2 for feature combinations 2, 3, 4, 9, 10 for L1521E and L1544, respectively. The remaining results for Cases 1 and 2, together with the results for Cases 3 and 4 are published on Zenodo, in Figs. C.3-C.14. The molecular ratio in a cluster and the number of data points assigned to it are given in Table C.1 for all combinations and Cases.
Cluster content for all Cases and feature combinations.
![]() |
Fig. C.1 Clustering results for the starless core L1521E for the dataset of Case 1 (left) and the dataset of Case 2 (right), for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, Voffset, and dist2dust), combination 3 (intensity, Voffset, and |
Appendix D Spectra at dust peak
The spectra of c-C3H2 and CH3CCH observed towards the dust peaks of B68, L1521E, L1544, OphD, HMM1, L694-2, and L429 are shown in Fig. D.1. The observed data cubes are convolved with the Herschel beam size (40"), then a circular aperture with radius 8" is used to extract the spectra.
![]() |
Fig. D.1 Spectra of CH3CCH (left) and c-C3H2 (right) at the dust peak of each core (black) extracted within a circular aperture of radius 8" and the corresponding Gaussian fit (cyan). The 3σ level is indicated by the grey dotted line. The systemic velocity with respect to the line chosen for analysis is shown by the red dotted line. The Gaussian fit parameters are annotated for each line. |
References
- Alves, F. O., & Franco, G. A. P. 2007, A&A, 470, 597 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Andre, P., Ward-Thompson, D., & Barsony, M. 2000, in Protostars and Planets IV, eds. V. Mannings, A. P. Boss, & S. S. Russell, 59 [Google Scholar]
- André, P., Men’shchikov, A., Bontemps, S., et al. 2010, A&A, 518, L102 [CrossRef] [EDP Sciences] [Google Scholar]
- Bauer, A., & Burie, J. 1969, C. R. Acad. Sci. Paris, B 268, 800 [Google Scholar]
- Bron, E., Daudon, C., Pety, J., et al. 2018, A&A, 610, A12 [CrossRef] [EDP Sciences] [PubMed] [Google Scholar]
- Campello, R. J. G. B., Moulavi, D., & Sander, J. 2013, in Advances in Knowledge Discovery and Data Mining, eds. J. Pei, V. S. Tseng, L. Cao, H. Motoda, & G. Xu (Berlin, Heidelberg: Springer Berlin Heidelberg), 160 [Google Scholar]
- Caselli, P., Walmsley, C. M., Zucconi, A., et al. 2002, ApJ, 565, 331 [NASA ADS] [CrossRef] [Google Scholar]
- Caselli, P., Keto, E., Bergin, E. A., et al. 2012, ApJ, 759, L37 [Google Scholar]
- Caselli, P., Pineda, J. E., Sipilä, O., et al. 2022, ApJ, 929, 13 [NASA ADS] [CrossRef] [Google Scholar]
- Chacón-Tanarro, A., Caselli, P., Bizzocchi, L., et al. 2019, A&A, 622, A141 [Google Scholar]
- Cleeves, L. I., Bergin, E. A., Alexander, C. M. O. D., et al. 2014, Science, 345, 1590 [NASA ADS] [CrossRef] [Google Scholar]
- Colombo, D., Rosolowsky, E., Ginsburg, A., Duarte-Cabral, A., & Hughes, A. 2015, MNRAS, 454, 2067 [NASA ADS] [CrossRef] [Google Scholar]
- Crapsi, A., Caselli, P., Walmsley, C. M., et al. 2005, ApJ, 619, 379 [Google Scholar]
- Crapsi, A., Caselli, P., Walmsley, M. C., & Tafalla, M. 2007, A&A, 470, 221 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Drozdovskaya, M. N., Schroeder I, I. R. H. G., Rubin, M., et al. 2021, MNRAS, 500, 4901 [Google Scholar]
- Drozdovskaya, M. N., Coudert, L. H., Margulès, L., et al. 2022, A&A, 659, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. 1996, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD-96 (AAAI Press), 226 [Google Scholar]
- Fotopoulou, S. 2024, Astron. Comput., 48, 100851 [Google Scholar]
- Galli, P. A. B., Loinard, L., Ortiz-Léon, G. N., et al. 2018, ApJ, 859, 33 [Google Scholar]
- Galli, P. A. B., Loinard, L., Bouy, H., et al. 2019, A&A, 630, A137 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Ginsburg, A., Koch, E., Robitaille, T., et al. 2019, https://doi.org/10.5281/zenodo.2573901 [Google Scholar]
- Hirota, T., Ito, T., & Yamamoto, S. 2002, ApJ, 565, 359 [NASA ADS] [CrossRef] [Google Scholar]
- Keto, E., & Caselli, P. 2008, ApJ, 683, 238 [Google Scholar]
- Keto, E., Caselli, P., & Rawlings, J. 2015, MNRAS, 446, 3731 [NASA ADS] [CrossRef] [Google Scholar]
- Lada, C. J., Bergin, E. A., Alves, J. F., & Huard, T. L. 2003, ApJ, 586, 286 [NASA ADS] [CrossRef] [Google Scholar]
- Lee, C. W., Myers, P. C., & Tafalla, M. 2001, ApJS, 136, 703 [NASA ADS] [CrossRef] [Google Scholar]
- Lin, Y., Spezzano, S., Sipilä, O., Vasyunin, A., & Caselli, P. 2022, A&A, 665, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Mangum, J. G., & Shirley, Y. L. 2015, PASP, 127, 266 [Google Scholar]
- McInnes, L., Healy, J., & Astels, S. 2017, J. Open Source Softw., 2 [Google Scholar]
- Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A., & Sander, J. 2014, Density-Based Clustering Validation, 839 [Google Scholar]
- Müller, H. S. P., Thorwirth, S., Roth, D. A., & Winnewisser, G. 2001, A&A, 370, L49 [Google Scholar]
- Nagy, Z., Spezzano, S., Caselli, P., et al. 2019, A&A, 630, A136 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Ohashi, N., Lee, S. W., Wilner, D. J., & Hayashi, M. 1999, ApJ, 518, L41 [Google Scholar]
- Okoda, Y., Oya, Y., Sakai, N., Watanabe, Y., & Yamamoto, S. 2020, ApJ, 900, 40 [NASA ADS] [CrossRef] [Google Scholar]
- Okoda, Y., Oya, Y., Abe, S., et al. 2021, ApJ, 923, 168 [Google Scholar]
- Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
- Pety, J. 2005, in SF2A-2005: Semaine de l’Astrophysique Francaise, eds. F. Casoli, T. Contini, J. M. Hameury, & L. Pagani, 721 [Google Scholar]
- Punanova, A., Caselli, P., Feng, S., et al. 2018, ApJ, 855, 112 [NASA ADS] [CrossRef] [Google Scholar]
- Redaelli, E., Sipilä, O., Padovani, M., et al. 2021, A&A, 656, A109 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Semenov, D., Hersant, F., Wakelam, V., et al. 2010, A&A, 522, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Sipilä, O., Caselli, P., & Harju, J. 2015, A&A, 578, A55 [Google Scholar]
- Soler, J. D., Hennebelle, P., Martin, P. G., et al. 2013, ApJ, 774, 128 [Google Scholar]
- Spezzano, S., Bizzocchi, L., Caselli, P., Harju, J., & Brünken, S. 2016, A&A, 592, L11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Spezzano, S., Caselli, P., Bizzocchi, L., Giuliano, B. M., & Lattanzi, V. 2017, A&A, 606, A82 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Spezzano, S., Caselli, P., Pineda, J. E., et al. 2020, A&A, 643, A60 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Tafalla, M., & Santiago, J. 2004, A&A, 414, L53 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Thaddeus, P., Vrtilek, J. M., & Gottlieb, C. A. 1985, ApJ, 299, L63 [NASA ADS] [CrossRef] [Google Scholar]
- Valdivia-Mena, M. T., Pineda, J. E., Segura-Cox, D. M., et al. 2023, A&A, 677, A92 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Wakelam, V., Loison, J. C., Herbst, E., et al. 2015, ApJS, 217, 20 [NASA ADS] [CrossRef] [Google Scholar]
- Wes McKinney. 2010, in Proceedings of the 9th Python in Science Conference, eds. Stéfan van der Walt & Jarrod Millman, 56-61 [Google Scholar]
- Williams, J. P., Myers, P. C., Wilner, D. J., & Di Francesco, J. 1999, ApJ, 513, L61 [NASA ADS] [CrossRef] [Google Scholar]
- Xu, L.-H., & Lovas, F. 1997, J. Phys. Chem. Ref. Data, 26, 17 [Google Scholar]
- Yan, Q.-Z., Yang, J., Su, Y., et al. 2022, AJ, 164, 55 [Google Scholar]
- Yun, H.-S., & Lee, J.-E. 2023, ApJ, 958, 113 [Google Scholar]
All Tables
All Figures
![]() |
Fig. 1 Comparison of different features for B68 (left), L1521E (middle), and L1544 (right), with c-C3H2 given in green (circle), CH3 OH in blue (cross), and CH3 CCH in red (diamond). |
In the text |
![]() |
Fig. 2 Clustering results for the starless core B68 for the dataset of Case 1 (left) and the dataset of Case 2 (right) for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, Voffset, and dist2dust), combination 3 (intensity, Voffset, and |
In the text |
![]() |
Fig. 3 Abundances of CH3 CCH and c-C3H2 at the dust peaks of the cores. The molecular column density was calculated assuming Tex = 8 K (see text for details). The starless cores are marked with an asterisk. |
In the text |
![]() |
Fig. 4 Centroid velocities (top) and linewidths (bottom) of c-C3H2 (left), CH3CCH (right), and CH3OH (middle) towards the prestellar core L1544. Black contours show 50% and 90% of the respective molecular emission peak. White contours show 30%, 50%, and 90% of the H2 column density peak derived from Herschel maps (Spezzano et al. 2016). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32″). |
In the text |
![]() |
Fig. A.1 Integrated intensity maps of c-C3 H2 and CH3OH observed towards B68 (Spezzano et al. 2020), L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2016). The solid line contours indicate the 3σ level of the integrated intensity, except for CH3 OH in L1544, where they indicate the 9σ level. The dashed line contours represent 90%, 50%, and 30% of the H2 column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32"). |
In the text |
![]() |
Fig. A.2 Integrated intensity maps of CH3 CCH observed towards B68, L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2017). The solid line contours indicate the 3σ level of the integrated intensity for B68 and L1521E, and the 6σ level for L1544. The dashed line contours represent 90%, 50%, and 30% of the H2 column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30m telescope (32"). |
In the text |
![]() |
Fig. B.1 H2 column density maps (left) derived from Herschel SPIRE maps (Spezzano et al. 2016, 2020) and the corresponding H2 column density gradient maps (right) for B68 (top), L1521E (middle), and L1544 (bottom). The dashed line contours represent 90%, 50%, and 30% of the H2 column density peak. The red rectangle marks the location and size of the emission maps observed towards each core. The red circle in the bottomleft corner indicates the Herschel beam size (40"). |
In the text |
![]() |
Fig. C.1 Clustering results for the starless core L1521E for the dataset of Case 1 (left) and the dataset of Case 2 (right), for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, Voffset, and dist2dust), combination 3 (intensity, Voffset, and |
In the text |
![]() |
Fig. C.2 Same as in Fig. C.1 but for the prestellar core L1544. |
In the text |
![]() |
Fig. D.1 Spectra of CH3CCH (left) and c-C3H2 (right) at the dust peak of each core (black) extracted within a circular aperture of radius 8" and the corresponding Gaussian fit (cyan). The 3σ level is indicated by the grey dotted line. The systemic velocity with respect to the line chosen for analysis is shown by the red dotted line. The Gaussian fit parameters are annotated for each line. |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.