Chemical segregation analysed with unsupervised clustering

K. Giers; S. Spezzano; Y. Lin; M. T. Valdivia-Mena; P. Caselli; O. Sipilä

doi:10.1051/0004-6361/202553843

Home

All issues

Volume 699 (July 2025)

A&A, 699 (2025) A103

Full HTML

Open Access

Issue		A&A Volume 699, July 2025


Article Number		A103
Number of page(s)		21
Section		Interstellar and circumstellar matter
DOI		https://doi.org/10.1051/0004-6361/202553843
Published online		04 July 2025

A&A, 699, A103 (2025)

Chemical segregation analysed with unsupervised clustering

K. Giers¹^★, S. Spezzano¹, Y. Lin¹, M. T. Valdivia-Mena¹^,2, P. Caselli¹ and O. Sipilä¹

¹ Max-Planck-Institute for Extraterrestrial Physics, Giessenbachstrasse 1, 85748 Garching, Germany
² European Southern Observatory, Karl-Schwarzschild-Strasse 2, 85748 Garching, Germany

^★ Corresponding author: kgiers@mpe.mpg.de

Received: 21 January 2025
Accepted: 23 May 2025

Abstract

Context. Molecular emission is a powerful tool for studying the physical and chemical structures in cold and dense cores. The distribution and abundance of different molecular species provide information on the chemical composition and physical properties in these cores.

Aims. We study the chemical segregation of three molecules – c-C₃H₂, CH₃OH, and CH₃CCH – in the two starless cores B68 and L1521E, and the prestellar core L1544.

Methods. We applied the density-based clustering algorithms DBSCAN and HDBSCAN to identify chemical and physical structures within these cores. To enable cross-core comparisons, the clustering input samples were characterised based on their physical environment, discarding the two-dimensional spatial information.

Results. Clustering analysis showed significant chemical differentiation across the cores. The clustering successfully reproduces the known molecular segregation of c-C₃H₂ and CH₃OH in all three cores. Furthermore, it identifies a segregation between c-C₃H₂ and CH₃CCH, which is not apparent from the emission maps. Key features driving the clustering are integrated intensity, velocity offset, H₂ column density, and H₂ column density gradient. Different environmental conditions are reflected in the variations in the feature relevance across the cores.

Conclusions. This study shows that density-based clustering provides valuable insights into chemical and physical structures of starless cores. It demonstrates that already small datasets covering only two or three molecules can yield meaningful results. In fact, this new approach revealed similarities in the clustering patterns of CH₃ OH and CH₃CCH relative to c-C₃H₂, suggesting that c-C₃H₂ traces more outer layers or lower-density regions than to the other two molecules. This allowed for insight into the CH₃CCH peak in L1544, which appears to trace a landing point of chemically fresh gas that is accreted to the core, highlighting the impact of accretion processes on molecular distributions.

Key words: astrochemistry / stars: formation / ISM: abundances / ISM: clouds / ISM: molecules

© The Authors 2025

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model.

Open Access funding provided by Max Planck Society.

1 Introduction

To understand star and planetary-system formation, it is crucial to understand the physics and chemistry in star-forming regions. Particularly important are starless dense cores, as they represent the earliest stages of star formation and set its initial conditions. In starless cores, the chemistry and physics can be studied without the complications caused by protostellar feedback. Molecular line emission is a powerful tool for the study of the structure of these dense cores. The distribution and abundance of different molecular species provide information on their chemical composition and physical properties (e.g. Crapsi et al. 2007; Redaelli et al. 2021; Lin et al. 2022). In addition, the line emission can help recover the chemical evolution of molecules. This has been shown, for example, with the inheritance of water (Cleeves et al. 2014) and methanol (Drozdovskaya et al. 2021, 2022) in the Solar System, which come from the prestellar phase. Prestellar cores are a subset of starless cores that are gravitationally bound and on the verge of star formation (e.g. Andre et al. 2000; Keto & Caselli 2008). This makes prestellar cores dynamically evolved, with higher central densities and more pronounced temperature and velocity gradients compared to unbound starless cores (e.g. see Crapsi et al. 2007).

A well-studied example is the prestellar core L1544 in the Taurus Molecular Cloud. Its molecular emission maps have led to the understanding of its evolutionary status, which is close to gravitational collapse (Williams et al. 1999; Ohashi et al. 1999), as well as its physical structure (volume density, velocity, and the dust and gas temperature profiles; e.g. Crapsi et al. 2007; Keto & Caselli 2008; Keto et al. 2015; Chacón-Tanarro et al. 2019). Spezzano et al. (2016) observed a striking chemical differentiation between the carbon-bearing molecules c-C₃H₂ and CH₃OH in L1544, driven by differences in the external illumination onto the core. This chemical segregation has also been observed in other starless cores, linking it to their environments (Spezzano et al. 2020). In Spezzano et al. (2017), the authors identified four molecular families in L1544, classified by the location of their emission peaks (carbon-chain peak, CH₃OH peak, dust emission peak, and HNCO peak). Using principal component analysis, they found correlations between the different families and with the physical properties of the core. A similar analysis of the starless core L1521E by Nagy et al. (2019) also reported chemical differentiation between the c-C₃H₂, the CH₃OH, and the dust emission peak.

In the big data era of astronomy, statistical methods are essential to analysing and interpreting the vast amounts of observational and simulated data. The rapid advancements in machine learning provide novel approaches to study the molecular complexity during the early stages of star formation. Unsupervised learning algorithms (clustering techniques in particular) are increasingly applied to identify hidden patterns, structures, and relationships in multidimensional datasets (e.g. see review by Fotopoulou 2024). By grouping together data points based on similarities in various physical and chemical parameters, these methods help visualise subtle trends that might not be detected otherwise.

In astrochemistry, clustering methods have primarily been applied to large-scale surveys, for instance, to identify molecular clouds in an unbiased and systematic way (e.g. Colombo et al. 2015; Bron et al. 2018; Yan et al. 2022). By isolating distinct structures in position-position-velocity space, these methods segment clouds into regions with similar physical or chemical properties. Furthermore, Valdivia-Mena et al. (2023) have connected filament scales (<0.1 pc) with envelope scales (>100 au) by identifying (velocity-) coherent structures of inflowing material, known as streamers, through the clustering of molecular emission. Meanwhile, Okoda et al. (2020, 2021) applied principal component analysis to molecular line emission, characterising velocity structures and distinct features surrounding a protostar. Additionally, this method has also been used to disentangle overlapping kinematic components and to understand spectral variations (e.g. Yun & Lee 2023), providing deeper insight into the dynamics of star-forming regions.

In this work, we apply the unsupervised clustering algorithms Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Hierarchical DBSCAN (HDBSCAN) to investigate the chemical segregation and differences between the molecular emission of c-C₃H₂, CH₃ OH, and CH₃CCH towards the starless cores B68 and L1521E and the prestellar core L1544. The molecules were chosen as representatives of three of the molecular families found in L1544 (see Spezzano et al. 2017). Our goal is to study the chemical segregation previously observed for c-C₃H₂ and CH₃OH from a different perspective using clustering techniques. Additionally, we aim to investigate the less understood differentiation between the two carbon chains, c-C₃H₂ and CH₃CCH. In our analysis, we take a novel approach by dropping the two-dimensional spatial information and instead characterising each pixel of the emission maps based on its physical parameters. By concentrating on the core scale and incorporating both starless and prestellar cores, we study the influence of different evolutionary stages on the molecular emission and distribution as well as the effect of different environments.

In Sect. 2, we describe the data and the sources analysed in this work. In Sect. 3, we explain the preprocessing we applied to our dataset and what methods we used for the unsupervised clustering with DBSCAN and HDBSCAN. The results of the dataset analysis and the clustering are presented in Sect. 4. We discuss the results and their implications in Sect. 5 and present our conclusions in Sect. 6.

Table 1

Source sample.

Table 2

Spectroscopic parameters of the observed lines.

2 Observations and data reduction

2.1 Data

The data presented in this work were taken with the IRAM 30m single-dish radio telescope on Pico Veleta in the Sierra Nevada, Spain. The observations were carried out between October 2013 and April 2018 (PIs: Silvia Spezzano, Zofia Nagy). The data have also been used in Spezzano et al. (2017), Nagy et al. (2019), and Spezzano et al. (2020). The on-the-fly (OTF) maps were observed in position switching mode, using the EMIR E090 receiver and the Fourier transform spectrometer (FTS) backend with a spectral resolution of 50 kHz. The map sizes, sources and coordinates are listed in Table 1. The observed transitions are summarised in Table 2.

The data processing was done using the GILDAS software (Pety 2005) and the python packages pandas (Wes McKinney 2010) and spectral-cube (Ginsburg et al. 2019). All emission maps were gridded to a pixel size of 8″ with the CLASS software in the GILDAS packages; this corresponds to one-third to one-quarter of the actual beam size, depending on the frequency. To create a uniform dataset, we additionally resampled the data to a spectral resolution of 0.18km s⁻¹, corresponding to the resolution of the lowest frequency observation (82 GHz). The antenna temperature $T_{A}^{*}$ $T_A^*$ was converted to the main beam temperature T_mb using the relation $T_{m b} = F_{e f f} / B_{e f f} \cdot T_{A}^{*}$ $T_\mathrm{mb}=F_\mathrm{eff}/B_\mathrm{eff}\cdot T_A^*$ . The corresponding values for the 30m forward (F_eff) and main-beam efficiencies (B_eff) are given in Table 2.

2.2 Sources

The source sample is listed in Table 1. The pre-stellar core L1544 (following the definition of Crapsi et al. 2005) and the two starless cores (L1521E, B68) are located in different star-forming regions (Taurus, Ophiuchus) and therefore cover different evolutionary stages and environmental conditions.

L1521E and L1544 are located in the Taurus molecular cloud. Both cores are located at the edge of their filament, which exposes their southern sides to the local interstellar radiation field (ISRF). The higher illumination leads to an increase of C atoms in the gas phase and subsequently enhanced abundances of carbon chains such as c-C₃H₂, as discussed in Spezzano et al. (2020). L1521E was classified as a very young core because of its high abundances of carbon-chain molecules and low level of CO depletion (Hirota et al. 2002; Tafalla & Santiago 2004; Nagy et al. 2019). The well-studied prestellar core L1544, on the other hand, is more evolved and shows signs of contraction, suggesting that it is on the verge of star formation (Williams et al. 1999; Ohashi et al. 1999; Lee et al. 2001; Caselli et al. 2002, 2012). Within the central 1000 au, L1544 exhibits an almost total (99.99%) freeze-out of all species heavier than Helium (Caselli et al. 2022).

The isolated starless core B68 is located in the south of the Ophiuchus molecular cloud. It shows kinematic features of oscillation, indicating a stage prior to contraction (Lada et al. 2003; Keto & Caselli 2008). Due to it being a Bok globule (and therefore isolated), it can be assumed to be exposed to uniform external illumination.

3 Methodology

3.1 Preprocessing

The integrated intensity maps were generated by calculating the zeroth moment of the emission data cubes. We used the resulting map when the emission is extended over at least one telescope beam, as our focus in this work is primarily on the spatial distribution of the molecules. To facilitate comparisons between the molecules, all maps were convolved to an angular resolution of 32″, corresponding to the half-power beam width (HPBW) of the largest observed beam. The resulting integrated intensity maps of c-C₃H₂, CH₃OH and CH₃CCH are shown in Figs. A.1 and A.2.

To compare the distribution of molecular emission across three different cores with varying map sizes and distances, we treated the individual pixels of each map as input samples, and discarded their two-dimensional spatial information. Instead, we described the environment of each pixel using the projected distance to the dust peak, the H₂ column density, and the H₂ column density gradient at that position. Additionally, we used the Gaussian fit parameters - velocity and linewidth - obtained from fitting each pixel’s spectrum with a one-dimensional Gaussian profile. This approach is valid as all cores in our sample display only one velocity component along the line of sight. For the analysis, we selected only the fits with a signal-to-noise ratio greater than or equal to three and an error below 70%.

We used six input features in the clustering dataset. The following features characterise one input sample or emission pixel:

Integrated intensity. The intensity was integrated over a range of ±0.7 km s⁻¹ around the centroid velocity. While the expected linewidth for optically thin emission in our cores is around 0.5 kms⁻¹, we used a wider interval here to account for line shifts due the velocity gradients across the cores. For the clustering, we selected only pixels with a signal-to-noise ratio of at least three. Each integrated intensity map was then normalised using the MinMaxScaler from the scikit-learn preprocessing package, scaling the values from zero to one before being added to the clustering dataset. This standardisation removes information about the absolute brightness of the molecular emission and the intensity ratios between different molecules, focusing instead on the distribution of emission across the core. In the analysis, this feature is referred to as ‘intensity’.

Velocity offset with respect to source’s V_LSR. Here, we applied a selection criterium where the uncertainty in the fitted velocity position, V_LSR, has to be smaller than 0.08 km s⁻¹ (which is channel width/2.355). To calculate the relative velocity offset of the emission line, we subtracted the systemic velocity from the velocity position: V_offset = V_LSR − V_sys. The systemic velocities of the cores are listed in Table 1. In the analysis, this feature is referred to as V_offset.

Linewidth. The minimum linewidth was set to be the spectral resolution of the spectra (0.18 km s⁻¹). In the analysis, this feature is referred to as ‘linewidth’.

H₂ column density. We used the H₂ column density maps derived from Herschel SPIRE maps (Spezzano et al. 2016, 2020), which are shown in Fig. B.1. Similar to the integrated intensity maps, each column density map was normalised individually before being added to the clustering dataset. In the analysis, this feature is referred to as N_H₂.

Distance to the dust continuum peak. This feature was calculated as the distance between the equatorial coordinates of each pixel and the dust emission peak of the corresponding core. Tobe consistent across all cores, the dust emission peak was approximated by the emission peak in the H₂ column density map. In the analysis, this feature is referred to as ‘dist2dust’.

H₂ column density gradient. To calculate this feature, we convolved the H₂ column density maps with a Gaussian derivative kernel using a standard deviation of two telescope beams, as described in Soler et al. (2013). Details of the derivation and the resulting gradients can be found in Sec. 4.1 and Appendix B, where the gradient maps are presented in Fig. B.1. As with the integrated intensity and H₂ column density maps, each gradient map was normalised individually before being added to the clustering dataset. In the analysis, this feature is referred to as $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ .

The features V_offset, linewidth, and dist2dust are normalised only when chosen for a specific sub-dataset (see Tables 3 and 4). This ensures that the feature values of all selected molecules are standardised together rather than individually, as is done for the emission.

Table 3

Feature combinations used in the clustering.

Table 4

Molecular transitions used in Case 1, Case 2, and Case 3.

3.2 Clustering

We used two density-based clustering algorithms in this analysis: DBSCAN (Ester et al. 1996) and HDBSCAN (Campello et al. 2013). For DBSCAN, we used the scikit-learn implementation (Pedregosa et al. 2011) and for HDBSCAN the hdbscan Python package (McInnes et al. 2017). Density-based clustering separates high-density areas from low-density areas by grouping points that are closer than a given distance threshold, which is defined by the hyperparameter epsilon. The minimum number of points required to form a cluster is set by the hyperparameter min_samples; groupings smaller than this are considered as noise. Unlike k-means clustering, which defines clusters as spherical, density-based clustering allows for clusters of arbitrary shape. DBSCAN classifies points into three types: core points, which have at least min_samples of neighbouring points within the epsilon distance; border points, which have less than min_samples of neighbouring points, but are within epsilon distance to at least one core point; noise points, which have less than min_samples of neighbouring points and are not within epsilon distance to any core point, so they are not part of any cluster. HDBSCAN extends DBSCAN by converting it into a hierarchical clustering algorithm, allowing for clusters with varying densities. Clusters are defined by the hyperparameter min_cluster_size, which sets the minimum number of points required for a group to be considered a cluster, and the hyperparameter min_samples, which sets the minimum number of neighbouring points needed for a point to be considered a core point.

We used three features at a time as inputs to both DBSCAN and HDBSCAN, which helped to simplify the output interpretation. The integrated intensity is kept as a fixed input feature to retain the information about the molecular emission, while the other two features are varied. The ten resulting feature combinations are listed in Table 3. To optimise the clustering, we perform a systematic grid search for the respective hyperparameters: epsilon (ranging from 0.05 to 0.155, in steps of 0.005) and minsamples (ranging from 10 to 100, in steps of 5) for DBSCAN; and minclustersize (ranging from 5 to 20) and minsamples (ranging from 1 to minclustersize) for HDBSCAN.

To determine the best tuning, we use the density-based clustering validation (DBCV) score (Moulavi et al. 2014). This score is particularly well-suited to validating density-based clustering methods because it accounts for outliers and noise points (unlike cross-validation, for example). To calculate the score, a kernel density function first estimates the local density of data points around each object. Then, the cluster quality is evaluated by comparing the minimum density within clusters (which represents cluster cohesion) to the maximum density between clusters (which represents cluster separation). We use the HDBSCAN function validity_index, which is a fast approximation of the original DBCV score. Although this function is provided by the hdbscan package, it only requires the data points and the corresponding cluster labels as input, and therefore we also applied it to the DBSCAN results. For HDBSCAN, we additionally used the intrinsic attribute relative_validity, which is another fast approximation of the DBCV score. Since this intrinsic score gives slightly different results from the validity_index function, we consider both scores in our evaluation. Finally, we select the best result based on the number of noise points and the relative sizes of the clusters. However, since both the validity_index function and the relative_validity attribute are only approximations of the DBCV score, we treat them as relative measures and use them only to compare results across different hyperparameters choices, not across different datasets.

In addition to the DBCV score, we applied two postprocessing criteria to determine the optimal clustering results for each method, feature combination, and dataset: (1) the number of clusters found by the algorithm has to be between two and five (inclusive), and (2) the found clusters must collectively cover at least 50% of the data points. Tests with more than five clusters showed that in this case, larger clusters are merely subdivided into smaller ones covering the same total amount of data and provide no additional insights. Conversely, when only one cluster is found, it typically contains over 90% of the input data points, offering little value to understanding the data distribution. Most datasets show two to four prominent trends that are effectively captured by the clustering. When the resulting clusters cover less than 50% of the data, they fail to represent the overall trends and instead highlight minor sub-clusters while overlooking the majority of the data.

3.3 Case studies

In addition to c-C₃H₂, CH₃OH, and CH₃CCH, the dataset used for this study covers molecules such as CCS, HC₃N, HC¹⁸O⁺, C₄H, HNCO, and CS, with between 8 to 20 detected molecules per core. However, to enhance the interpretability and extract meaningful patterns, we focused our analysis on the three key molecules c-C₃H₂, CH₃OH, and CH₃CCH. This targeted approach allowed for a clear understanding of the relationships between these molecules, and helped to maintain clarity in the complex clustering output. During the clustering process, the features of the molecular maps were combined without giving the algorithm any prior information about which data point corresponds to each molecule.

We used the following four datasets as input to the clustering:

Case 1. c-C₃H₂ vs CH₃OH

The two molecules show a well-known and well-studied segregation in the sources of our dataset (Spezzano et al. 2016, 2020), which is driven by environmental effects. Through clustering, we analyse how this chemical differentiation is represented in the six features and how it influences the clusters identified by the algorithm. This approach allowed us to assess the effectiveness of the clustering technique. The transitions used in each core are listed in Table 4, along with the initial ratio between the data points of each molecule in this sub-dataset.
Case 2. c-C₃H₂ vs CH₃CCH

These two molecules display a chemical differentiation in the prestellar core L1544 that is not yet understood (Spezzano et al. 2017). While CH₃CCH is a carbon chain such as c-C₃H₂ and is therefore expected to peak in the carbon-chain rich south-east of the core, similar to c-C₃H₂, it instead peaks in the north-west of the core. To ensure a balanced number of data points between the two molecules, we use two transitions of CH₃ CCH for B68, and three transitions for L1521E, as shown in Table 4.
Case 3. CH₃ OH vs CH₃CCH

The two molecules show a chemical differentiation in the prestellar core L1544 (Spezzano et al. 2017). Both peak in the northern part of the core, CH₃OH in the north-east, and CH₃CCH in the north-west. We use this combination to rule out biases that might arise from a clustering with c-C₃H₂.
Case 4. c-C₃H₂ vs CH₃OH vs CH₃CCH

We use the combination of all three molecules to validate the results from the other two case studies. This approach also helps to eliminate biases in the algorithm that might arise from comparing only two molecules. The dataset for each core contains the combined data of Case 1 and Case 2 (see Table 4).

4 Results

4.1 H₂ column density gradient

To derive the H₂ column density gradients for our three cores, we apply the method presented in Soler et al. (2013). Therefore, the gradient is calculated by convolving the H₂ column density maps with a Gaussian derivative kernel in the x and y direction. To derive the total gradient, we combine the two directions. We use the method gaussian_filter from the Python package scipy.ndimage. To depict the filament environments of the cores, we choose a Gaussian kernel with standard deviation equivalent to two telescope beams (=2 × 32″ or 2 × 4 pixels).

The derived H₂ column density gradient maps are presented in Fig. B.1 in the Appendix, alongside the corresponding H₂ column density maps. The observed gradients agree with the different levels of exposure to the ISRF of the different cores (see Sect. 2.2). The isolated starless core B68 has a mostly uniform N(H₂) gradient, forming a ring-like structure at the edges and a $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ≈ 0 in the centre, representing a uniform external illumination. For L1521E, the larger illumination towards the south is represented by an increased N(H₂) gradient along the south and lower values in the protected centre. Similar to L1521E, the N(H₂) gradient of the prestellar core L1544 depicts how the south of the core is more exposed to the ISRF, resulting in larger values of $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ . In contrast, the N(H₂) gradient is much lower in the more protected centre and north of the core, where both CH₃OH and CH₃CCH peak.

4.2 Comparison of dataset features

Figure 1 shows a comparison of selected features of the dataset, illustrating the data points for c-C₃H₂ (green circles), CH₃OH (blue crosses), and CH₃CCH (red diamonds) observed towards B68 (left), L1521E (centre), and L1544 (right). The plots reveal distinct patterns and behaviours that vary depending on the core, molecule, and feature.

The distributions of intensity over $N_{H_{2}}$ $N_{{\rm H}_2}$ and intensity over V_offset, shown in the top two rows of Fig. 1, reflect the distribution of the molecular emission across the cores. In B68, all molecules exhibit similar distributions, with peak intensity at the highest H₂ column density. In L1521E, c-C₃H₂ and CH₃CCH peak in the south-eastern part of the core at lower $N_{H_{2}}$ $N_{{\rm H}_2}$ compared to CH₃OH, which peaks at the dust peak. In contrast, in L1544, all molecules peak in different locations across the core with varying projected distances to the dust peak. Velocity-wise, CH₃OH in L1521E shows a slightly different behaviour compared to the carbon chains: at high intensity it extends to higher V_offset, while at lower intensity it spreads to lower V_offset. Additionally, in L1544, CH₃CCH spans a broader velocity range than c-C₃H₂ and CH₃OH, suggesting that it traces a different layer of the core.

The c-C₃H₂ emission shows a wide range of linewidths, which are in general broader and reach higher values compared to the other molecules (see third row in Fig. 1), likely tracing more turbulent material (compare e.g. Lin et al. 2022 for L1544). In L1521E, the linewidths of c-C₃H₂ have a more compact distribution around a value of 0.4kms⁻¹. For CH₃OH, the linewidths are typically smaller in B68 and L1544, between 0.25-0.30 km s⁻¹ and O.30-0.35 kms⁻¹, respectively, while in L1521E, they are slightly higher, around 0.4kms⁻¹, with some values reaching up to 0.65 km s⁻¹. The linewidths of CH₃CCH are more compactly distributed at lower values in the two starless cores (averaging around 0.3 km s⁻¹), but extend up to 0.5 km s⁻¹ in the prestellar core L1544, possibly indicating the presence of more turbulent material.

The $\nabla N_{H_{2}}$ $N_{{\rm H}_2}$ -intensity distribution also reflects the morphology of the molecular emission across the cores (see bottom row in Fig. 1). In L1521E, all molecules display a similar behaviour, with the highest intensities occurring at high, though not maximum, $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ values. In contrast, in L1544, the c-C₃H₂ intensity peaks at a much higher $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ than CH₃OH and CH₃CCH. This illustrates the active photochemistry in the south of the core caused by the external illumination of the core, leading to an increased abundance of carbon-chains such as c-C₃H₂. This is supported by a local peak in CH₃CCH emission in the south and a sharp decline in CH₃OH intensity at a higher $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ . In B68, which has a more spherical shape and is uniformly illuminated, the molecular emissions peak at lower $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ values in the protected centre of the core. The emission then shows a rather steep decline at a higher $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ in the outer parts of the core. Additionally, CH₃CCH exhibits lower intensity values even at a lower $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , because the emission map is less spatially extended across this core compared to c-C₃H₂ and CH₃OH (see Figs. A.1 and A.2).

In summary, the plots point out the wealth of chemical and physical differences between the cores. In L1544, all molecules are clearly separated and behave differently, which indicates a more profound chemical segregation in this core. In contrast, in B68 and L1521E, the molecules show a more similar behaviour and are less segregated, which could be linked to their earlier evolutionary stages compared to L1544. Consequently, an unbiased approach such as clustering can provide valuable insights into the varying chemical environments across these cores.

Fig. 1

Comparison of different features for B68 (left), L1521E (middle), and L1544 (right), with c-C₃H₂ given in green (circle), CH₃ OH in blue (cross), and CH₃ CCH in red (diamond).

4.3 Clustering

In Figs. 2, C.1, and C.2, we show the clustering results for B68, L1521E, and L1544, respectively. They present the results for the feature combinations 2, 3, 4, 9, and 10 for Case 1 and Case 2 (for details see Table 3). The remaining results for Cases 1 and 2, together with the results for Cases 3 and 4 are published on Zen-odo, in Figs. C.3–C.14. The molecular ratio (i.e. the ratio of data points belonging to each of the molecules in a specific cluster) and the number of data points assigned to a cluster are given in Table C.1. In Sections 4.3.1−4.3.4, we describe the results for each case individually.

Both the DBSCAN and the HDBSCAN results are included in the analysis. However, to improve readability, only one result is represented for each combination. This was decided manually for each combination based on the amount of noise points and the number of clusters. This ensures that all scientific results discussed in this work are also presented visually.

To evaluate the possible chemical segregation within the clusters, we focused on imbalanced clusters, where the molecular ratio deviates from the initial ratio by at least 10% (e.g. 37/63 instead of 47/53). These imbalanced clusters are particularly insightful because they highlight regions where specific molecular abundances diverge from the average distribution, and they potentially indicate distinct physical or chemical processes at work. The imbalanced clusters are marked in boldface in Table C.1. Our primary focus is to extract meaningful chemical and scientific insights from the clustering patterns. In fact, many clusters exhibit an excess of, or are dominated by, one molecule, indicating chemical differentiation across all three cases and all three cores in this study. However, clusters that exclusively contain data points from a single molecule are rare and typically contain only a small number of points (N ≤ 20). Overall, we prioritised clusters that cover at least one telescope beam, which corresponds to a size of roughly 20 data points or pixels.

Fig. 2

Clustering results for the starless core B68 for the dataset of Case 1 (left) and the dataset of Case 2 (right) for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, V_offset, and dist2dust), combination 3 (intensity, V_offset, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), combination 4 (intensity, V_offset, and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ), combination 9 (intensity, dist2dust, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), and combination 10 (intensity, $N_{H_{2}}$ $N_{{\rm H}_2}$ , and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ). Top : Distribution of the resulting clusters in the input features. The annotations provide information on how many clusters were found by the algorithm (two to five) and what percentage of data points are assigned to the clusters. Noise points (=points not assigned to any cluster) are plotted in black. The colours of the clusters are ordered by cluster size: the biggest cluster is given in blue, followed by red, cyan, yellow, and purple. Bottom: Corresponding spatial distribution of each cluster across the core. The annotations indicate if a cluster contains more than 60% of one molecule (c: c-C₃ H₂; m: CH₃OH; p: CH₃CCH). The dashed line contours represent 30%, 50%, 90% of the H₂ column density peak derived from Herschel maps Spezzano et al. (2020).

4.3.1 Case 1: c-C₃H₂ vs CH₃OH

In some results, the clusters vary greatly in size. The largest cluster typically maintains a balanced molecular ratio, similar to the input ratio (see Table 4), while the smaller clusters tend to show more variation (see Table C.1). For L1521E and L1544, the input data has a slightly higher proportion of CH₃OH compared to c-C₃H₂ (see Table 4), causing the largest cluster to often show a slight excess in CH₃OH. In the following section, we discuss the clustering results for each core individually.

B68. Imbalanced clusters with an excess of c-C₃H₂ or CH₃ OH are spatially separated into different regions of the core. CH₃OH-dominated clusters are concentrated around the dust peak at the centre of the core, characterised by features such as high intensity and high $N_{H_{2}}$ $N_{{\rm H}_2}$ (e.g. red clusters in combs. 2, 4, and 7, or yellow cluster in comb. 3; see Figs. 2 and C.3, and Table C.1). In contrast, c-C₃H₂−dominated clusters are confined to the (south-)west region and associated with data points of moderate intensity (see yellow and purple clusters in comb. 2 in Fig. 2). In combination 9 and 10, $N_{H_{2}}$ $N_{{\rm H}_2}$ and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ are structured in ring-like clusters around the dust peak (see Fig. 2), all dominated by CH₃OH, and with no significant contribution from c-C₃H₂. Interestingly, in B68, the largest cluster of each combination (shown in blue) is slightly imbalanced towards CH₃OH (on average 3%, see Table C.1), even though the initial molecular ratio is 50/50. The smaller clusters, however, display greater variation.

L1521E. In this core, many features are clustered into separate structures dominated by either CH₃OH or c-C₃H₂. This molecular segregation is evident across multiple combinations: The intensity-dist2dust distribution is clustered into two separate diagonals (see blue and red clusters in combs. 2 and 5 in Figs. C.1 and C.4). The lower intensity diagonal (blue) corresponds to regions in the north-west of the core and is imbalanced towards c-C₃H₂ (see Table C.1). Conversely, the upper intensity diagonal (red) is associated with the south-east of the core and is dominated by CH₃ OH. The same cluster distribution, with similar molecular imbalance, appears in the intensity- $N_{H_{2}}$ $N_{{\rm H}_2}$ and the intensity-V_offset distributions: clusters dominated by c-C₃H₂ (in the north-west) show high $N_{H_{2}}$ $N_{{\rm H}_2}$ and high V_offset (see blue cluster in comb. 2, red cluster in comb. 3), while CH₃ OH-dominated clusters (south-east) show low $N_{H_{2}}$ $N_{{\rm H}_2}$ and lower V_offset (see red clusters in comb. 2 and 9, and blue cluster in comb. 3). Additionally, combination 2 (Fig. C.1) shows that the CH₃ OH-dominated clusters appear as a narrow diagonal structure in dist2dust over V_offset, while the c-C₃H₂ clusters present a separate, more diffuse distribution. To summarise, c-C₃H₂ and CH₃OH show a separation in $N_{H_{2}}$ $N_{{\rm H}_2}$ and V_offset in L1521E, and cluster into separate diagonals in the intensity-dist2dust distribution. Beyond the north-west/south-east separation, the V_offset− $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ distribution is split into c-C₃H₂ at high $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ and mid V_offset in the south of the core, and CH₃OH at mid $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ and lower V_offset, in the east (c-C₃H₂: blue cluster in comb. 4; CH₃OH,: red cluster in comb. 4; see Figs. C.1 and C.4). Additionally, CH₃OH shows a cluster at the core centre, characterised by high intensity and high $N_{H_{2}}$ $N_{{\rm H}_2}$ (see yellow cluster in comb. 9 in Fig. C.1).

L1544. Similar to L1521E, many features in L1544 are clustered into separate structures (V_offset over intensity, dist2dust over intensity, $N_{H_{2}}$ $N_{{\rm H}_2}$ over V_offset, dist2dust over V_offset, linewidth over V_offset, $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ over V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ over dist2dust), dividing the core into north/south, and on-centre/off-centre regions. However, unlike in L1521E, these structures do not consistently correspond to a specific molecular segregation. A separation of the two molecules is visible in V_offset and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ in combination 4, where a c-C₃H₂−dominated cluster (red) is concentrated in the northern part of the core and a CH₃OH-dominated cluster (blue) is found in the south, both with lower intensity (see Fig. C.2). Additionally, c-C₃H₂ clusters appear at the core centre with high intensity, high linewidth, high $N_{H_{2}}$ $N_{{\rm H}_2}$ and low $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ (see cyan cluster in comb. 3 in Fig. C.2, and red clusters in combs. 7 and 10 in Figs. C.2 and C.5). CH₃OH, on the other hand, forms a cluster on the CH₃CCH peak in the north-west, characterised by low V_offset and higher intensity (see red cluster in comb. 2 in Fig. C.2). It also appears across the northern part of the core with lower intensity and low V_offset (red cluster in comb. 3, see Fig. C.2). In combination 10, c-C₃H₂ (red) is clustered on the dust peak, while CH₃OH (blue) is off-peak, with separation visible in $N_{H_{2}}$ $N_{{\rm H}_2}$ , and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ (see Fig. C.2). A similar split appears in $N_{H_{2}}$ $N_{{\rm H}_2}$ in combination 6 (see C.5), though both corresponding clusters (blue and red) are imbalanced towards CH₃OH without a molecular separation.

Summary of Case 1. All cores display cluster structures with a clear separation between c-C₃H₂ and CH₃OH. However, the molecules are not necessarily clustered on their respective peaks. The molecular separation is primarily visible in the features intensity, V_offset, and $N_{H_{2}}$ $N_{{\rm H}_2}$ , and for L1521E and L1544 also in $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ . In general, the clustering reveals recurring structures in several feature combinations, where one molecule dominates over the other. Additionally, B68 and L1544 show structures in $N_{H_{2}}$ $N_{{\rm H}_2}$ and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ that divide the core into on-centre and around-centre. These divisions, however, are not always linked to a molecular separation.

4.3.2 Case 2: c-C₃H₂ vs CH₃CCH

Similar to Case 1, c-C₃H₂is slightly underrepresented in the datasets for L1521E and L1544, resulting in clusters that are more imbalanced towards CH₃CCH. For B68, the opposite is true, resulting in a slight excess of c-C₃H₂ in many clusters. In the following, we discuss the clustering results for each core individually:

B68. The clusters dominated by either c-C₃H₂ or CH₃CCH are spatially separated, showing behaviour similar to the segregation of c-C₃H₂ and CH₃OH in Case 1. High intensity CH₃CCH is clustered at the core centre, where the molecule peaks (see red and cyan clusters in comb. 4, red cluster in comb. 8, and cyan and yellow clusters in comb. 10 in Figs. 2 and C.6). In contrast, c-C₃H₂ is clustered off its emission peak, along the east side of the core, with lower intensity (see red clusters in combs. 1, 3, and 9 in Fig. 2). Beyond intensity, the molecular separation is also evident in V_offset and $N_{H_{2}}$ $N_{{\rm H}_2}$ : c-C₃H₂ is associated with high V_offset and low $N_{H_{2}}$ $N_{{\rm H}_2}$ , while CH₃CCH is associated with low V_offset and high $N_{H_{2}}$ $N_{{\rm H}_2}$ . Interesting to note is that this c-C₃H₂ cluster is shaped along high values of $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ (see Fig. B.1), which is not included in the mentioned feature combinations (1, 3, 9). In combination 10, a c-C₃H₂ -dominated cluster forms a broad ring (blue), corresponding to high values of $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ (see also Fig. B.1), surrounding CH₃CCH-dominated clusters in the core centre (cyan and yellow). This separation is visible in $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ and $N_{H_{2}}$ $N_{{\rm H}_2}$ (see Fig. 2).

L1521E. Similar to Case 1, clusters dominated by c-C₃H₂ and CH₃CCH are spatially separated, forming separate structures in various features (e.g. V_offset over intensity, dist2dust over intensity, dist2dust over V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ over intensity, $N_{H_{2}}$ $N_{{\rm H}_2}$ over linewidth). As in Case 1, c-C₃H₂ is clustered in the north and north-west, characterised by high $N_{H_{2}}$ $N_{{\rm H}_2}$ , high V_offset, and appearing as lower diagonal in the intensity-dist2dust distribution (see red clusters in combs. 2, 3, and 6, and blue clusters in combs. 5 and 9 in Figs. C.1 and C.7). In contrast, CH₃CCH-dominated clusters are found at low $N_{H_{2}}$ $N_{{\rm H}_2}$ and low V_offset, similar to CH₃OH in Case 1, building the upper diagonal in the intensity-dist2dust and located in the south and south-east of the core (see blue clusters in combs. 2, 3, and 9, and red cluster in comb. 5 in Figs. C.1 and C.7). CH₃CCH-dominated clusters are concentrated around the carbon peak in the south-east and are associated with higher $N_{H_{2}}$ $N_{{\rm H}_2}$ (see cyan and yellow clusters in comb. 4, cyan cluster in comb. 6, and cyan and purple clusters in comb. 10 in Figs. C.1 and C.7). Additionally, a low intensity c-C₃H₂ cluster is located along the sharp edge of the core in the south-west (see cyan cluster in comb. 9 in Fig. C.1). Similar to B68, the shape of this cluster follows values of high $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , even though this feature is not included in combination 9.

L1544. For c-C₃H₂ and CH₃CCH, the clustered features reveal structures similar to those found in Case 1 (e.g. in V_offset over intensity, linewidth over V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ over V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ over V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ over linewidth), which again divide the core into north/south and on-centre/off-centre regions. As in Case 1, c-C₃H₂ is predominantly associated with the northern part of the core, while CH₃CCH is dominant in the south. This separation is visible in V_offset and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ (c-C₃H₂: red clusters in combs. 1, 2, and 4; CH₃CCH: blue clusters in combs. 1 and 4; see Figs. C.2 and C.8). In addition, both molecules cluster in the core centre, exhibiting high intensity, higher V_offset and higher linewidth (c-C₃H₂: cyan cluster in comb. 1; CH₃CCH: yellow cluster in comb. 1, and red cluster in comb. 6; see Figs. C.2 and C.8). In combination 3, a CH₃CCH-dominated cluster also covers the c-C₃H₂ peak in the south-east of the core, characterised by high intensity, high $N_{H_{2}}$ $N_{{\rm H}_2}$ , and high V_offset (see red cluster in Fig. C.2).

Summary of Case 2. Clusters dominated by c-C₃H₂ or CH₃CCH show spatial separation across all cores, similar to the segregation observed between c-C₃H₂ and CH₃OH in Case 1, revealing distinct structures in various features combinations. The molecular segregation is evident in the same features as in Case 1, intensity, V_offset, and $N_{H_{2}}$ $N_{{\rm H}_2}$ , with B68 and L1544 also showing separation in $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ . Overall, the clustering shows the same strong divisions of the cores into north/south, east/west, on-centre/off-centre regions as seen in Case 1, highlighting structural and chemical themes across the cores.

4.3.3 Case 3: CH₃OH vs CH₃CCH

For B68, CH₃CCH is slightly underrepresented in this dataset, resulting in clusters that are more imbalanced towards CH₃OH. In the following, we discuss the clustering results for each core individually.

B68 (Fig. C.9). CH₃OH-dominated clusters and CH₃CCH-dominated clusters are not clearly spatially separated. Instead, both are found in the central area of the core and on the core centre (CH₃OH: blue cluster in combs. 1, 2, 6, 9, cyan cluster in comb. 6; CH₃CCH: red cluster in comb. 1 and cyan cluster in combs. 8 and 10), reproducing the clustering behaviour of Case 1 for CH₃OH and Case 2 for CH₃CCH. A direct, feature-wise separation of the two molecules occurs only in combination 1 in intensity, with CH₃OH at higher and CH₃CCH at lower values. Apart from that, CH₃OH is clustered along the east of the core with high V_offset, and along the west of the core with low V_offset (east: cyan cluster in comb. 1, red cluster in combs. 2 and 3; west: yellow cluster in comb. 4). The shapes of those clusters follow high values of $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , even though this feature is only included in combination 4, resembling patterns of c-C₃H₂ in Case 2 (east) and Case 1 (west). In combinations 8 and 10, the clustering creates ring-like structures around the dust peak, visible in $N_{H_{2}}$ $N_{{\rm H}_2}$ , dist2dust, and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , with CH₃CCH concentrated at the centre and CH₃OH forming the outer rings - reflecting structures found in Case 1 (CH₃OH) and Case 2 (CH₃CCH).

L1521E (Fig. C.10). As in Cases 1 and 2, we see a spatial separation of the two input molecules, CH₃OH and CH₃CCH, but here it is less pronounced. CH₃CCH is primarily clustered in the (south-) east of the core, similar to Case 2 (e.g. see blue and red clusters in comb. 1, cyan cluster in combs. 4, 5 and 10, yellow cluster in combs. 7 and 8). In contrast, CH₃OH is clustered along the north of the core (see red cluster in combs. 3 and 8, blue cluster in comb. 9), showing patterns similar to c-C₃H₂ in Case 1 and Case 2, but contrary to its own clustering behaviour in Case 1. In terms of features, we see an indirect separation in $N_{H_{2}}$ $N_{{\rm H}_2}$ values: CH₃CCH is associated with mid $N_{H_{2}}$ $N_{{\rm H}_2}$ , and CH₃OH with higher $N_{H_{2}}$ $N_{{\rm H}_2}$ (CH₃CCH: cyan cluster in comb. 10; CH₃OH: blue cluster in comb. 9). Combination 5 shows a direct spatial separation of the two molecules into east (CH₃CCH) and west (CH₃OH), visible as upper and lower diagonal in the intensity/dist2dust-distribution. Apart from that, CH₃CCH is clustered at the sharp edge of the core in the south-west, with high $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ and mid V_offset (see red cluster in comb. 4, blue cluster in comb. 7), similar to c-C₃H₂ in Case 2. Both CH₃OH and CH₃CCH are additionally clustered at the core centre, at high $N_{H_{2}}$ $N_{{\rm H}_2}$ , with CH₃OH at high intensity (see red cluster in comb. 6), and CH₃CCH at lower intensity (see cyan cluster in comb. 3, purple cluster in comb. 6 and yellow cluster in comb. 9). This pattern was not observed for either CH₃OH or CH₃CCH in Case 1 or Case 2.

L1544 (Fig. C.11). For CH₃OH and CH₃CCH, the clustered features reveal structures similar to those found in Case 1 and Case 2 (e.g. in V_offset over intensity, linewidth over V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ over V_offset, $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ over V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ over linewidth). As before, this leads to a division of the core into north/south and on-centre/off-centre regions. CH₃OH exhibits a north-south separation, visible in the features V_offset, intensity, and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ (see blue and red clusters in combs. 1-4), similar to the pattern seen in Case 1. In contrast, CH₃CCH is predominantly clustered in the south and on the c-C₃H₂ peak (see yellow cluster in combs. 3, 4, and 7), as well as at the core centre (see yellow cluster in comb. 2, cyan cluster in comb. 3, red cluster in combs. 5, 6 and 9), which were both observed in Case 2. Additionally, CH₃CCH is clustered at its molecule peak in the north-west, with high intensity, high linewidth, and low $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , which was not seen in Case 2 (see cyan cluster in comb. 7). Direct molecular segregation occurs only in comb. 3, where CH₃CCH is clustered at the core centre (cyan cluster) with high intensity and high $N_{H_{2}}$ $N_{{\rm H}_2}$ , while CH₃OH is clustered around the centre at lower intensity and lower $N_{H_{2}}$ $N_{{\rm H}_2}$ (blue and red clusters). Ring-like cluster structures are visible in $N_{H_{2}}$ $N_{{\rm H}_2}$ and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ in combinations 6, 8, 9, and 10; however, they do not display any molecular segregation but instead a balanced ratio between CH₃CCH and CH₃OH.

Summary of Case 3. All cores display cluster structures with feature-wise or spatial separation between CH₃ OH and CH₃CCH. However, the molecular segregation is less distinct than in Case 1 and Case 2, and it is mainly visible in the features intensity, V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ , and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ . In B68 and L1544, the clustering predominantly reproduces the structures and clusters found in Case 1 and 2. All three cores show minor differences of cluster behaviour compared to Case 1 and Case 2, where CH₃ OH or CH₃CCH behave similar to c-C₃H₂. This is particularly evident in L1521E, where CH₃OH and CH₃ CCH show a more distinct separation, similar to Case 1 and 2.

4.3.4 Case 4: c-C₃H₂ vs CH₃OH vs CH₃CCH

With a dataset containing three molecules, it is more difficult and less clear to identify molecular segregation, as the ratios between the molecules mostly do not show big variations from the initial ratio of the dataset (the initial ratios for c-C₃H₂/CH₃OH/CH₃CCH are 37/36/27 for B68, 29/35/36 for L1521E, and 28/39/33 for L1544). In the following, we discuss the clustering results for each core individually.

In B68 (see Fig. C.12), c-C₃H₂ is clustered in a shell along the east, reproducing the structure of Case 1. CH₃CCH is clustered in the core centre, reproducing the behaviour in Case 2 and Case 3. However, CH₃ OH is not clustered in the core centre as in Case 1 but instead around the centre in shells along the east and along the west, similar to what was found in Case 3. Additionally, c-C₃H₂ and CH₃OH show concentric clusters around the centre, visible in $N_{H_{2}}$ $N_{{\rm H}_2}$ , similar to before.

In L1521E (see Fig. C.13), c-C₃H₂ is clustered in the northern part of the core, reproducing the clusters in both Case 1 and 2. It also shows the small cluster along the sharp edge in the south-west of the core, seen in Case 2. CH₃ CCH is clustered in the south of the core and the core centre, reproducing the cluster structures of Case 2 and Case 3. CH₃ OH, on the other hand, is not clustered in the south as seen in Case 1 but instead in the core centre, as in Case 3 (where its emission peaks) and the north-west of the core.

In L1544 (see Fig. C.14), the division of the core into north/south is visible in V_offset for CH₃OH, similar to Cases 1 and 3. For c-C₃H₂, the association with the northern part seen in Case 1 and Case 2 cannot be reproduced. Instead, it is clustered only on the CH₃OH peak in the north-east of the core. The cluster in the core centre can be reproduced. Additionally, both c-C₃H₂ and CH₃ OH are clustered on their respective molecular peaks, which was not seen in Case 1 or Case 2. For CH₃CCH, both the cluster in the core centre and on the c-C₃H₂ peak are reproduced. However, Case 4 does not recreate CH₃CCH as being associated with the southern part of the core in the northsouth division as in Cases 2 and 3. Also, the molecular separation in $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ and V_offset that was seen in combination 4 in Case 1 and 2 is not reproduced with this combined dataset.

To summarise, in B68 and L1521E, the clustering behaviour of c-C₃H₂ and CH₃CCH seen in Case 2 can be reproduced, but CH₃OH behaves now differently than in Case 1. In L1544, however, the behaviour of CH₃ OH and part of CH₃ CCH seen in Case 1 and Case 2 can be reproduced, but not the behaviour of c-C₃H₂. We discuss this further in Sect. 5.1.

Fig. 3

Abundances of CH₃ CCH and c-C₃H₂ at the dust peaks of the cores. The molecular column density was calculated assuming T_ex = 8 K (see text for details). The starless cores are marked with an asterisk.

4.4 CH₃ CCH abundances

This section focuses on comparing the CH₃CCH abundances towards the dust peaks of the different cores. In addition to B68, L1521E, and L1544, we include data from the prestellar cores HMM-1, L429, L694-2, and OphD, which were observed but not analysed in the IRAM project of Spezzano et al. (2020). To calculate the abundances at the dust peaks, we divide the CH₃CCH column density by the H₂ column density. The H₂ column density at the dust peak is extracted from the respective N(H₂) map (derived from Herschel SPIRE observations, Spezzano et al. 2020) using a circular aperture with a diameter of 16″ (matching the Herschel map pixel size). We assume a 20% uncertainty for the resulting values. To derive N(CH₃CCH), we convolve the CH₃CCH spectral cubes with the 40″ beam of the Herschel telescope and extract the spectrum at the dust peak using the same 16″ aperture. The column density is calculated using a one-dimensional Gaussian fit, and following Mangum & Shirley (2015), with the respective spectroscopic parameters listed in Table 2. Since we do not have sufficient data to determine a precise excitation temperature via a rotational diagram for all cores, we adopt a standard excitation temperature of 8 K for consistency. A lower (higher) excitation temperature only affects the abundances of CH₃CCH (and c-C₃H₂ in L1544) by shifting them to higher (lower) values, but the overall trend stays the same. For each core, we select the most optically thin transition with the smallest (propagated) error in column density. Specifically, we use the CH₃CCH (51-41) transition for most cores, except for L694-2 and L1544, where the (50-40) and (61-51) transitions are used, respectively.

For comparison, we also calculate the abundances of c-C₃H₂, using the (202-111) transition for all cores except L1544, where the (3_2,2−31,3) transition is applied. Figure D.1 displays the extracted spectra along with their Gaussian fits. Figure 3 presents the resulting CH₃CCH (blue circles) and c-C₃H₂ (orange squares) abundances at the dust peaks of the different cores for an assumed excitation temperature of 8 K.

The CH₃CCH abundances of the starless cores (see left part of Fig. 3) are about one order of magnitude higher than the values of the prestellar cores (see right part of Fig. 3), suggesting an evolutionary trend of CH₃CCH from the starless to the prestellar phase. Notably, the CH₃CCH abundances in L1544 are significantly higher compared to the other prestellar cores and even compared to the starless cores. This suggests that the observed variations in CH₃ CCH are probably influenced not only by the evolutionary stage but also by environmental factors. This is be discussed further in Sect. 5.2. However, to further study the interplay between environmental and evolutionary or dynamical effects on the CH₃CCH abundance spread, additional work is necessary that goes beyond the scope of this paper.

The c-C₃H₂ abundances show much less variation than those of CH₃CCH and do not exhibit a significant difference between the starless and the prestellar stages. The abundances are spread within one order of magnitude, suggesting that c-C₃H₂ and CH₃CCH trace different layers within the core. This indicates that c-C₃H₂ is largely unaffected by the evolution from the starless to the prestellar stage.

5 Discussion

5.1 Density-based clustering

Segregation between c-C₃H₂ and CH₃OH. Density-based clustering is able to find molecular differentiation in our dataset. In particular, the clustering with the dataset containing c-C₃H₂ and CH₃OH (Case 1) successfully reproduces the known molecular segregation between these two molecules in B68, L1521E, and L1544. This segregation is attributed to uneven illumination across the cores, as discussed in Spezzano et al. (2016, 2020). The differentiation mainly appears in the features intensity, V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ , and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ . In addition, the following pairs of features frequently show segregation: intensity/V_offset, intensity/dist2dust, intensity/ $N_{H_{2}}$ $N_{{\rm H}_2}$ , intensity/ $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , V_offset/ $N_{H_{2}}$ $N_{{\rm H}_2}$ , V_offset/ $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ .

Segregation between c-C₃H₂ and CH₃CCH. The clustering analysis of the dataset containing c-C₃H₂ and CH₃CCH (Case 2) reveals molecular segregation between these two carbon chains in all three cores. Like in Case 1, the segregation appears in the features intensity, V_offset, $N_{H_{2}}$ $N_{{\rm H}_2}$ , $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , and similar pairs of features. However, in B68 and L1521E, a differentiation between these two molecules is not apparent in their emission maps (see Figs. A.1 and A.2) and was therefore previously unrecognised. The segregation between c-C₃H₂ and CH₃CCH suggests that these molecules trace different layers in the cores, representing different physical conditions. A similar result was discussed in Lin et al. (2022), where c-C₃H₂ was found to trace lower density regions than for example CH₃ OH.

In B68 and L1521E, the emission of CH₃CCH is less spatially extended compared to c-C₃H₂ (see Figs. A.1 and A.2). In B68, the CH₃CCH emission is concentrated on the core centre, while in L1521E, it is primarily located in the eastern part of the core. Due to the lower number of available data points for CH₃CCH compared to c-C₃H₂, the clustering dataset in Case 2 includes two CH₃CCH transitions for B68 and three transitions for L1521E (as detailed in Table 4). This results in a higher density of data points in the core centre of B68 and the eastern part of L1521E, increasing the likelihood of forming clusters at these locations with density-based algorithms such as DBSCAN and HDBSCAN. The incomplete coverage of CH₃CCH across the cores becomes more apparent in Case 3 (CH₃ OH, and CH₃ CCH) and Case 4 (c-C₃H₂, CH₃OH, and CH₃CCH). In Case 4, CH₃CCH forms clusters similar to those in Case 2 (c-C₃H₂and CH₃CCH), but CH₃OH shows different behaviour compared to Case 1 (c-C₃H₂ andCH₃OH). In contrast, in L1544 - where the emission maps of c-C₃H₂, CH₃OH, and CH₃CCH extend across the entire core - the clustering results of Case 4 differ significantly from Case 1 and 2. In Case 3, on the other hand, most clusters found in Case 1 and 2 are recreated. The only exception is CH₃OH in L1521E, that mimics the clustering behaviour of c-C₃H₂ instead.

Similarities between CH₃OH and CH₃CCH. Our analysis reveals that in B68 and L1521E, the clusters dominated by CH₃OH in Case 1 behave very similarly to those dominated by CH₃CCH in Case 2, despite the fact that the CH₃OH emission is as spatially extended as c-C₃H₂ in both cores and CH₃CCH is not. The CH₃CCH- and CH₃OH-dominated clusters are associated with the same features -N_H₂ and V_offset for L1521E, intensity for B68 - and are located in the same regions within the cores (south-east for L1521E, and the core centre for B68). Additionally, these clusters are spatially distinct from the c-C₃H₂ clusters. As shown in Fig 1, c-C₃H₂ exhibits broader linewidths than CH₃OH and CH₃CCH across B68, likely indicating that it traces a different, more turbulent layer. These clustering results may therefore reflect differences in the physical layers traced by the molecules.

In L1544, the similarity in cluster behaviour between CH₃OH and CH₃CCH relative to c-C₃H₂ is observed in only one feature pairing: $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ and V_offset (see combination 4). This could be linked to both CH₃OH and CH₃CCH tracing gas influenced by slow accretion flows. For CH₃OH, such an association has been demonstrated and discussed, for instance, by Lin et al. (2022). In Sec. 5.2, we further explore how the CH₃CCH peak in L1544 may also be impacted by inflowing gas.

Relevance of different features in the clustering analysis. In our clustering analysis, V_offset appears to be a dominant feature that drives the division of the cores into the different clusters, and in some cases, molecular separation. In L1521E, the core is divided into north-west/south-east, while L1544 shows a north/south split, both reflecting the velocity structure of the cores. A similar split is indicated in B68, with a separation between the core centre and a shell along the eastern side, although this division is less pronounced. The strong dependence of velocity structure with chemical prominence indicates that static chemical models might not be sufficient to predict observed features in full. While the overall physical structure of these cores is generally well-described by quasi-static models, understanding the anisotropic chemical structures requires a more dynamic approach (see also Lin et al. 2022).

Both B68 and L1544 show clusters with concentric ring structures around their core centres, following the patterns of $N_{H_{2}}$ $N_{{\rm H}_2}$ and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ . This ring-like pattern is a result of the rather spherical shape of these cores. In contrast, L1521E, which is more elongated and irregularly shaped, does not show this pattern. The clustering analysis of both L1521E (Case1) and L1544 (Case1/Case2) reveals molecular separation in the V_offset−∇VNH₂ distribution, which does not appear for B68. This difference may be due to environmental factors, as B68, a Bok globule, is exposed to relatively uniform external illumination. In L1521E, the clusters with the highest $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ values in this feature pairing (see combination 4) are dominated by c-C₃H₂. These data points are located at the filamentary edge in the southern part of the core, where c-C₃H₂ peaks and external illumination is strongest. In L1544, the data points with highest $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ values also come from the southern part of the core, near the filamentary edge with high external illumination. However, in this case they are dominated by CH₃OH and CH₃CCH instead of c-C₃H₂. This suggests that the clustering results reflect the distinct environmental conditions within each core. Overall, the clusters with prominent $N_{H_{2}}$ $N_{{\rm H}_2}$ and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ features appear to represent the chemical patterns across the core structures, with differences in the clustering likely tied to varying environmental conditions.

The features dist2dust and linewidth appear to be less significant in our analysis, as they rarely exhibit distinct structures or molecular separations by themselves. However, when combined with other features, such as dist2dust/intensity or linewidth/V_offset, they provide additional insights. The onedimensional projected distance to the dust peak seems less relevant in the clustering analysis compared to $N_{H_{2}}$ $N_{{\rm H}_2}$ and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ , as these two features better characterise the immediate environment of a data point or pixel.

Fig. 4

Centroid velocities (top) and linewidths (bottom) of c-C₃H₂ (left), CH₃CCH (right), and CH₃OH (middle) towards the prestellar core L1544. Black contours show 50% and 90% of the respective molecular emission peak. White contours show 30%, 50%, and 90% of the H₂ column density peak derived from Herschel maps (Spezzano et al. 2016). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32″).

5.2 Evolution traced by CH₃CCH

In the starless cores B68 and L1521E, the distribution of the CH₃CCH emission overlaps with that of c-C₃H₂ (see Fig. A.1 and Fig. A.2). In L1544, however, the peak of the CH₃CCH emission is not located in the south-east of the core, where the other carbon chains are found, but rather in the north-west. It is known that in the north-east of L1544, around the CH₃ OH peak, two filaments converge (e.g. see André et al. 2010; Spezzano et al. 2016). This may result in slow accretion flows (Punanova et al. 2018; Lin et al. 2022), which could deliver fresh material to the core and help replenish CH₃CCH.

To investigate the formation and destruction routes of CH₃CCH, we conducted chemical simulations using the gasgrain chemical model pyRate (Sipilä et al. 2015), applied to the standard physical model of L1544 (Keto et al. 2015). For the gas-phase chemical network we adopted the 2014 public release of the KIDA chemical network (kida.uva.2014; Wakelam et al. 2015), while the grain-surface network is an updated version of the one presented in Semenov et al. (2010). The simulation yields radial abundance profiles as a function of time; we checked the results at an evolutionary time of 10⁵ yrs. The model shows that at intermediate densities (n ~ 104 cm⁻³), CH₃CCH is mainly formed by the dissociative recombination of C₃H₅⁺: $C_{3} H_{5}^{+} + e^{-} ⟶ {C H}_{3} C C H + H,$ $\mathrm{C}_3\mathrm{H}_5^+ + e^- \longrightarrow \mathrm{CH}_3\mathrm{CCH} + \mathrm{H},$ (1)

while it mainly gets destroyed by the reaction with free carbon: ${C H}_{3} C C H + C ⟶ C_{4} H_{3} + H .$ $\mathrm{CH}_3\mathrm{CCH} + \mathrm{C} \longrightarrow \mathrm{C}_4\mathrm{H}_3 + \mathrm{H}.$ (2)

Following this, in regions with high irradiation and therefore active gas-phase chemistry, such as the carbon-chain peak in L1544, CH₃CCH is quickly destroyed due to the high amounts of atomic carbon present in the gas phase. In contrast, the north-western part of L1544 is more shielded from irradiation, allowing CH₃CCH to form from fresh material brought in by the filamentary flow from the north-west. Here, the low abundance of free carbon in the gas phase protects the CH₃CCH from destruction. This is further supported by the velocity and linewidth maps of CH₃CCH, as shown in Fig. 4, along with the results for c-C₃H₂ and CH₃OH. The linewidth map of CH₃CCH (bottom right) shows increased linewidths near the emission peak, while the velocity map (top right) reveals a sharp velocity gradient in the same area. This suggests that the CH₃CCH emission peak in L1544 is the landing point or accumulation point of the incoming fresh gas. Similar signs of this active chemistry are observed in c-C₃H₂. The integrated intensity map (Fig. A.2) shows a local maximum in this region, while the linewidth map (bottom left in Fig. 4) exhibits the highest linewidths not at the c-C₃H₂ peak in the south-east but at the CH₃CCH peak in the north-west. The velocity map of c-C₃H₂ (top left) also shows a gradient around this area.

The fact that CH₃CCH is an early-type molecule explains why, in our dataset, extended CH₃CCH emission is only observed in the starless cores (B68, L1521E) and the prestellar core L1544 due to the possible material accretion. In contrast, in the other prestellar cores (HMM1, OphD, L694-2, L429), we detect some emission at the respective dust peaks, but no significant emission beyond that. This evolutionary trend is further supported by the CH₃CCH abundances at the dust peaks (shown in Fig. 3), where the abundances in the starless cores are approximately one order of magnitude higher than those in the prestellar cores, except for L1544.

6 Conclusions

We have presented an analysis of molecular differentiation using the density-based clustering algorithms DBSCAN and HDB-SCAN. The clustering was applied to four different datasets, in order to compare the emission morphologies of c-C₃H₂, CH₃OH, and CH₃CCH observed towards the starless cores B68 and L1521E and the prestellar core L1544.

Our main results can be summarised as follows:

The analysis with density-based clustering finds a significant chemical differentiation across the cores in our dataset. It successfully reproduces the known molecular segregation of c-C₃H₂and CH₃OH for B68, L1521E, and L1544. Furthermore, the clustering analysis identifies a segregation between c-C₃H₂ andCH₃CCH in all three cores, which is not apparent from visual inspection of the emission maps;
The most relevant features in the clustering analysis are integrated intensity, velocity offset, H₂ column density, and H₂ column density gradient. Distinct and recurring cluster structures in the H₂ column density and the gradient highlight structural and chemical patterns across the cores. Differences in the relevance of these two features for the three cores reflect the varying environmental conditions within each core. The strong relation between molecular emission and velocity structure suggests that to understand anisotropic chemical structures, static chemical models are not sufficient, but dynamical models are necessary;
Increased CH₃CCH abundances towards the starless cores compared to prestellar cores indicate an evolutionary trend. Increased CH₃CCH abundances towards L1544 suggest an additional influence of environmental factors. In fact, in L1544, the CH₃CCH peak in the north-west of the core appears to trace the landing point of chemically fresh gas that is accreted to the core. Unlike the photochemically active south of the core, this area is shielded from external irradiation, which protects CH₃CCH from being destroyed by free carbon atoms;
The clustering analysis finds a similar behaviour between CH₃OH and CH₃CCH relative to c-C₃H₂ in all cores. This indicates that c-C₃H₂ traces an outer layer of gas and possibly a lower-density shell compared to the other two molecules. In L1544, the similar clustering patterns observed for CH₃ OH and CH₃CCH may reflect the influence of accretion processes in shaping the molecular distribution.

Our results demonstrate that a successful density-based clustering approach for studying astrochemical processes does not require a large dataset covering multiple molecules across various cores. In fact, the results are often easier to interpret when only two or three molecules are considered. While this clustering method is more time-consuming than techniques such as principal component analysis, it can process much more detailed information and provide deeper insights into the core’s structure.

Using the more general approach of describing a data point’s location through its H₂ column density and its gradient, rather than relying on spatial coordinates, also enables simple comparisons between cores. In future studies, we aim to expand our analysis of molecular differentiation with density-based clustering to include more cores and molecules, especially those that trace other physical or chemical features. This will also help explore any evolutionary effects that the cores or their environment might have on the molecular distribution.

Data availability

Figures C.3-C.14, presenting the detailed clustering results for Cases 1-4, are published on Zenodo (zenodo.org/records/15519030).

Acknowledgements

We wish to thank the anonymous referee for their constructive comments. K.G. thanks Caroline Gieser for useful discussions. S.S. and K.G. wish to thank the Max Planck Society for the Max Planck Research Group funding. All others authors affiliated to the MPE wish to thank the Max Planck Society for financial support.

Appendix A Integrated intensity maps

The integrated intensity maps observed towards B68, L1521E, and L1544 are shown in Fig. A.1 (c-C₃H₂ and CH₃OH), and in Fig. A.2 (CH₃CCH).

Fig. A.1

Integrated intensity maps of c-C₃ H₂ and CH₃OH observed towards B68 (Spezzano et al. 2020), L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2016). The solid line contours indicate the 3σ level of the integrated intensity, except for CH₃ OH in L1544, where they indicate the 9σ level. The dashed line contours represent 90%, 50%, and 30% of the H₂ column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32").

Fig. A.2

Integrated intensity maps of CH₃ CCH observed towards B68, L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2017). The solid line contours indicate the 3σ level of the integrated intensity for B68 and L1521E, and the 6σ level for L1544. The dashed line contours represent 90%, 50%, and 30% of the H₂ column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30m telescope (32").

Appendix B H₂ column density gradient maps

Figure B.1 shows the H₂ column density gradient maps derived from Herschel SPIRE maps (see Spezzano et al. 2016, 2020) and the corresponding H₂ column density maps, for B68, L1521E, and L1544, respectively.

Fig. B.1

H₂ column density maps (left) derived from Herschel SPIRE maps (Spezzano et al. 2016, 2020) and the corresponding H₂ column density gradient maps (right) for B68 (top), L1521E (middle), and L1544 (bottom). The dashed line contours represent 90%, 50%, and 30% of the H₂ column density peak. The red rectangle marks the location and size of the emission maps observed towards each core. The red circle in the bottomleft corner indicates the Herschel beam size (40").

Appendix C Detailed clustering results

The clustering results for Case 1 and Case 2 are visualised in Fig. C.1 and Fig. C.2 for feature combinations 2, 3, 4, 9, 10 for L1521E and L1544, respectively. The remaining results for Cases 1 and 2, together with the results for Cases 3 and 4 are published on Zenodo, in Figs. C.3-C.14. The molecular ratio in a cluster and the number of data points assigned to it are given in Table C.1 for all combinations and Cases.

Table C.1

Cluster content for all Cases and feature combinations.

Fig. C.1

Clustering results for the starless core L1521E for the dataset of Case 1 (left) and the dataset of Case 2 (right), for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, V_offset, and dist2dust), combination 3 (intensity, V_offset, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), combination 4 (intensity, V_offset, and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ), combination 9 (intensity, dist2dust, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), and combination 10 (intensity, $N_{H_{2}}$ $N_{{\rm H}_2}$ , $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ). Top : Distribution of the resulting clusters in the input features. The annotations provide information on how many clusters were found by the algorithm (two to five) and what percentage of data points are assigned to the clusters. Noise points (=points not assigned to any cluster) are plotted in black. The colours of the clusters are ordered by cluster size: the biggest cluster is given in blue, followed by red, cyan, yellow, and purple. Bottom: Corresponding spatial distribution of each cluster across the core. The annotations indicate if a cluster contains more than 60% of one molecule (c: c-C₃H₂; m: CH₃OH; p: CH₃CCH). The dashed line contours represent 30%, 50%, 90% of the H₂ column density peak derived from Herschel maps Spezzano et al. (2020).

Fig. C.2

Same as in Fig. C.1 but for the prestellar core L1544.

Appendix D Spectra at dust peak

The spectra of c-C₃H₂ and CH₃CCH observed towards the dust peaks of B68, L1521E, L1544, OphD, HMM1, L694-2, and L429 are shown in Fig. D.1. The observed data cubes are convolved with the Herschel beam size (40"), then a circular aperture with radius 8" is used to extract the spectra.

Fig. D.1

Spectra of CH₃CCH (left) and c-C₃H₂ (right) at the dust peak of each core (black) extracted within a circular aperture of radius 8" and the corresponding Gaussian fit (cyan). The 3σ level is indicated by the grey dotted line. The systemic velocity with respect to the line chosen for analysis is shown by the red dotted line. The Gaussian fit parameters are annotated for each line.

References

Alves, F. O., & Franco, G. A. P. 2007, A&A, 470, 597 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Andre, P., Ward-Thompson, D., & Barsony, M. 2000, in Protostars and Planets IV, eds. V. Mannings, A. P. Boss, & S. S. Russell, 59 [Google Scholar]
André, P., Men’shchikov, A., Bontemps, S., et al. 2010, A&A, 518, L102 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bauer, A., & Burie, J. 1969, C. R. Acad. Sci. Paris, B 268, 800 [Google Scholar]
Bron, E., Daudon, C., Pety, J., et al. 2018, A&A, 610, A12 [CrossRef] [EDP Sciences] [PubMed] [Google Scholar]
Campello, R. J. G. B., Moulavi, D., & Sander, J. 2013, in Advances in Knowledge Discovery and Data Mining, eds. J. Pei, V. S. Tseng, L. Cao, H. Motoda, & G. Xu (Berlin, Heidelberg: Springer Berlin Heidelberg), 160 [Google Scholar]
Caselli, P., Walmsley, C. M., Zucconi, A., et al. 2002, ApJ, 565, 331 [NASA ADS] [CrossRef] [Google Scholar]
Caselli, P., Keto, E., Bergin, E. A., et al. 2012, ApJ, 759, L37 [Google Scholar]
Caselli, P., Pineda, J. E., Sipilä, O., et al. 2022, ApJ, 929, 13 [NASA ADS] [CrossRef] [Google Scholar]
Chacón-Tanarro, A., Caselli, P., Bizzocchi, L., et al. 2019, A&A, 622, A141 [Google Scholar]
Cleeves, L. I., Bergin, E. A., Alexander, C. M. O. D., et al. 2014, Science, 345, 1590 [NASA ADS] [CrossRef] [Google Scholar]
Colombo, D., Rosolowsky, E., Ginsburg, A., Duarte-Cabral, A., & Hughes, A. 2015, MNRAS, 454, 2067 [NASA ADS] [CrossRef] [Google Scholar]
Crapsi, A., Caselli, P., Walmsley, C. M., et al. 2005, ApJ, 619, 379 [Google Scholar]
Crapsi, A., Caselli, P., Walmsley, M. C., & Tafalla, M. 2007, A&A, 470, 221 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Drozdovskaya, M. N., Schroeder I, I. R. H. G., Rubin, M., et al. 2021, MNRAS, 500, 4901 [Google Scholar]
Drozdovskaya, M. N., Coudert, L. H., Margulès, L., et al. 2022, A&A, 659, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. 1996, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD-96 (AAAI Press), 226 [Google Scholar]
Fotopoulou, S. 2024, Astron. Comput., 48, 100851 [Google Scholar]
Galli, P. A. B., Loinard, L., Ortiz-Léon, G. N., et al. 2018, ApJ, 859, 33 [Google Scholar]
Galli, P. A. B., Loinard, L., Bouy, H., et al. 2019, A&A, 630, A137 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Ginsburg, A., Koch, E., Robitaille, T., et al. 2019, https://doi.org/10.5281/zenodo.2573901 [Google Scholar]
Hirota, T., Ito, T., & Yamamoto, S. 2002, ApJ, 565, 359 [NASA ADS] [CrossRef] [Google Scholar]
Keto, E., & Caselli, P. 2008, ApJ, 683, 238 [Google Scholar]
Keto, E., Caselli, P., & Rawlings, J. 2015, MNRAS, 446, 3731 [NASA ADS] [CrossRef] [Google Scholar]
Lada, C. J., Bergin, E. A., Alves, J. F., & Huard, T. L. 2003, ApJ, 586, 286 [NASA ADS] [CrossRef] [Google Scholar]
Lee, C. W., Myers, P. C., & Tafalla, M. 2001, ApJS, 136, 703 [NASA ADS] [CrossRef] [Google Scholar]
Lin, Y., Spezzano, S., Sipilä, O., Vasyunin, A., & Caselli, P. 2022, A&A, 665, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Mangum, J. G., & Shirley, Y. L. 2015, PASP, 127, 266 [Google Scholar]
McInnes, L., Healy, J., & Astels, S. 2017, J. Open Source Softw., 2 [Google Scholar]
Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A., & Sander, J. 2014, Density-Based Clustering Validation, 839 [Google Scholar]
Müller, H. S. P., Thorwirth, S., Roth, D. A., & Winnewisser, G. 2001, A&A, 370, L49 [Google Scholar]
Nagy, Z., Spezzano, S., Caselli, P., et al. 2019, A&A, 630, A136 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Ohashi, N., Lee, S. W., Wilner, D. J., & Hayashi, M. 1999, ApJ, 518, L41 [Google Scholar]
Okoda, Y., Oya, Y., Sakai, N., Watanabe, Y., & Yamamoto, S. 2020, ApJ, 900, 40 [NASA ADS] [CrossRef] [Google Scholar]
Okoda, Y., Oya, Y., Abe, S., et al. 2021, ApJ, 923, 168 [Google Scholar]
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
Pety, J. 2005, in SF2A-2005: Semaine de l’Astrophysique Francaise, eds. F. Casoli, T. Contini, J. M. Hameury, & L. Pagani, 721 [Google Scholar]
Punanova, A., Caselli, P., Feng, S., et al. 2018, ApJ, 855, 112 [NASA ADS] [CrossRef] [Google Scholar]
Redaelli, E., Sipilä, O., Padovani, M., et al. 2021, A&A, 656, A109 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Semenov, D., Hersant, F., Wakelam, V., et al. 2010, A&A, 522, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Sipilä, O., Caselli, P., & Harju, J. 2015, A&A, 578, A55 [Google Scholar]
Soler, J. D., Hennebelle, P., Martin, P. G., et al. 2013, ApJ, 774, 128 [Google Scholar]
Spezzano, S., Bizzocchi, L., Caselli, P., Harju, J., & Brünken, S. 2016, A&A, 592, L11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Spezzano, S., Caselli, P., Bizzocchi, L., Giuliano, B. M., & Lattanzi, V. 2017, A&A, 606, A82 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Spezzano, S., Caselli, P., Pineda, J. E., et al. 2020, A&A, 643, A60 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Tafalla, M., & Santiago, J. 2004, A&A, 414, L53 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Thaddeus, P., Vrtilek, J. M., & Gottlieb, C. A. 1985, ApJ, 299, L63 [NASA ADS] [CrossRef] [Google Scholar]
Valdivia-Mena, M. T., Pineda, J. E., Segura-Cox, D. M., et al. 2023, A&A, 677, A92 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Wakelam, V., Loison, J. C., Herbst, E., et al. 2015, ApJS, 217, 20 [NASA ADS] [CrossRef] [Google Scholar]
Wes McKinney. 2010, in Proceedings of the 9th Python in Science Conference, eds. Stéfan van der Walt & Jarrod Millman, 56-61 [Google Scholar]
Williams, J. P., Myers, P. C., Wilner, D. J., & Di Francesco, J. 1999, ApJ, 513, L61 [NASA ADS] [CrossRef] [Google Scholar]
Xu, L.-H., & Lovas, F. 1997, J. Phys. Chem. Ref. Data, 26, 17 [Google Scholar]
Yan, Q.-Z., Yang, J., Su, Y., et al. 2022, AJ, 164, 55 [Google Scholar]
Yun, H.-S., & Lee, J.-E. 2023, ApJ, 958, 113 [Google Scholar]

All Tables

Table 1

Source sample.

In the text

Table 2

Spectroscopic parameters of the observed lines.

In the text

Table 3

Feature combinations used in the clustering.

In the text

Table 4

Molecular transitions used in Case 1, Case 2, and Case 3.

In the text

Table C.1

Cluster content for all Cases and feature combinations.

In the text

All Figures

	Fig. 1 Comparison of different features for B68 (left), L1521E (middle), and L1544 (right), with c-C₃H₂ given in green (circle), CH₃ OH in blue (cross), and CH₃ CCH in red (diamond).
In the text

Fig. 2

Clustering results for the starless core B68 for the dataset of Case 1 (left) and the dataset of Case 2 (right) for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, V_offset, and dist2dust), combination 3 (intensity, V_offset, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), combination 4 (intensity, V_offset, and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ), combination 9 (intensity, dist2dust, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), and combination 10 (intensity, $N_{H_{2}}$ $N_{{\rm H}_2}$ , and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ). Top : Distribution of the resulting clusters in the input features. The annotations provide information on how many clusters were found by the algorithm (two to five) and what percentage of data points are assigned to the clusters. Noise points (=points not assigned to any cluster) are plotted in black. The colours of the clusters are ordered by cluster size: the biggest cluster is given in blue, followed by red, cyan, yellow, and purple. Bottom: Corresponding spatial distribution of each cluster across the core. The annotations indicate if a cluster contains more than 60% of one molecule (c: c-C₃ H₂; m: CH₃OH; p: CH₃CCH). The dashed line contours represent 30%, 50%, 90% of the H₂ column density peak derived from Herschel maps Spezzano et al. (2020).

In the text

	Fig. 3 Abundances of CH₃ CCH and c-C₃H₂ at the dust peaks of the cores. The molecular column density was calculated assuming T_ex = 8 K (see text for details). The starless cores are marked with an asterisk.
In the text

Fig. 4

Centroid velocities (top) and linewidths (bottom) of c-C₃H₂ (left), CH₃CCH (right), and CH₃OH (middle) towards the prestellar core L1544. Black contours show 50% and 90% of the respective molecular emission peak. White contours show 30%, 50%, and 90% of the H₂ column density peak derived from Herschel maps (Spezzano et al. 2016). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32″).

In the text

Fig. A.1

Integrated intensity maps of c-C₃ H₂ and CH₃OH observed towards B68 (Spezzano et al. 2020), L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2016). The solid line contours indicate the 3σ level of the integrated intensity, except for CH₃ OH in L1544, where they indicate the 9σ level. The dashed line contours represent 90%, 50%, and 30% of the H₂ column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30 m telescope (32").

In the text

Fig. A.2

Integrated intensity maps of CH₃ CCH observed towards B68, L1521E (Nagy et al. 2019), and L1544 (Spezzano et al. 2017). The solid line contours indicate the 3σ level of the integrated intensity for B68 and L1521E, and the 6σ level for L1544. The dashed line contours represent 90%, 50%, and 30% of the H₂ column density peak derived from Herschel maps (Spezzano et al. 2020). The white circle in the bottom-left corner indicates the beam size of the IRAM 30m telescope (32").

In the text

Fig. B.1

H₂ column density maps (left) derived from Herschel SPIRE maps (Spezzano et al. 2016, 2020) and the corresponding H₂ column density gradient maps (right) for B68 (top), L1521E (middle), and L1544 (bottom). The dashed line contours represent 90%, 50%, and 30% of the H₂ column density peak. The red rectangle marks the location and size of the emission maps observed towards each core. The red circle in the bottomleft corner indicates the Herschel beam size (40").

In the text

Fig. C.1

Clustering results for the starless core L1521E for the dataset of Case 1 (left) and the dataset of Case 2 (right), for feature combinations 2, 3, 4, 9, and 10. Each row represents a different combination of features (see Table. 3): combination 2 (intensity, V_offset, and dist2dust), combination 3 (intensity, V_offset, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), combination 4 (intensity, V_offset, and $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ), combination 9 (intensity, dist2dust, and $N_{H_{2}}$ $N_{{\rm H}_2}$ ), and combination 10 (intensity, $N_{H_{2}}$ $N_{{\rm H}_2}$ , $\nabla N_{H_{2}}$ $\nabla N_{{\rm H}_2}$ ). Top : Distribution of the resulting clusters in the input features. The annotations provide information on how many clusters were found by the algorithm (two to five) and what percentage of data points are assigned to the clusters. Noise points (=points not assigned to any cluster) are plotted in black. The colours of the clusters are ordered by cluster size: the biggest cluster is given in blue, followed by red, cyan, yellow, and purple. Bottom: Corresponding spatial distribution of each cluster across the core. The annotations indicate if a cluster contains more than 60% of one molecule (c: c-C₃H₂; m: CH₃OH; p: CH₃CCH). The dashed line contours represent 30%, 50%, 90% of the H₂ column density peak derived from Herschel maps Spezzano et al. (2020).

In the text

	Fig. C.2 Same as in Fig. C.1 but for the prestellar core L1544.
In the text

Fig. D.1

Spectra of CH₃CCH (left) and c-C₃H₂ (right) at the dust peak of each core (black) extracted within a circular aperture of radius 8" and the corresponding Gaussian fit (cyan). The 3σ level is indicated by the grey dotted line. The systemic velocity with respect to the line chosen for analysis is shown by the red dotted line. The Gaussian fit parameters are annotated for each line.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Alves, F. O., & Franco, G. A. P. 2007, A&A, 470, 597 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[2] Andre, P., Ward-Thompson, D., & Barsony, M. 2000, in Protostars and Planets IV, eds. V. Mannings, A. P. Boss, & S. S. Russell, 59 [Google Scholar]

[3] André, P., Men’shchikov, A., Bontemps, S., et al. 2010, A&A, 518, L102 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[4] Bauer, A., & Burie, J. 1969, C. R. Acad. Sci. Paris, B 268, 800 [Google Scholar]

[5] Bron, E., Daudon, C., Pety, J., et al. 2018, A&A, 610, A12 [CrossRef] [EDP Sciences] [PubMed] [Google Scholar]

[6] Campello, R. J. G. B., Moulavi, D., & Sander, J. 2013, in Advances in Knowledge Discovery and Data Mining, eds. J. Pei, V. S. Tseng, L. Cao, H. Motoda, & G. Xu (Berlin, Heidelberg: Springer Berlin Heidelberg), 160 [Google Scholar]

[7] Caselli, P., Walmsley, C. M., Zucconi, A., et al. 2002, ApJ, 565, 331 [NASA ADS] [CrossRef] [Google Scholar]

[8] Caselli, P., Keto, E., Bergin, E. A., et al. 2012, ApJ, 759, L37 [Google Scholar]

[9] Caselli, P., Pineda, J. E., Sipilä, O., et al. 2022, ApJ, 929, 13 [NASA ADS] [CrossRef] [Google Scholar]

[10] Chacón-Tanarro, A., Caselli, P., Bizzocchi, L., et al. 2019, A&A, 622, A141 [Google Scholar]

[11] Cleeves, L. I., Bergin, E. A., Alexander, C. M. O. D., et al. 2014, Science, 345, 1590 [NASA ADS] [CrossRef] [Google Scholar]

[12] Colombo, D., Rosolowsky, E., Ginsburg, A., Duarte-Cabral, A., & Hughes, A. 2015, MNRAS, 454, 2067 [NASA ADS] [CrossRef] [Google Scholar]

[13] Crapsi, A., Caselli, P., Walmsley, C. M., et al. 2005, ApJ, 619, 379 [Google Scholar]

[14] Crapsi, A., Caselli, P., Walmsley, M. C., & Tafalla, M. 2007, A&A, 470, 221 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[15] Drozdovskaya, M. N., Schroeder I, I. R. H. G., Rubin, M., et al. 2021, MNRAS, 500, 4901 [Google Scholar]

[16] Drozdovskaya, M. N., Coudert, L. H., Margulès, L., et al. 2022, A&A, 659, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. 1996, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD-96 (AAAI Press), 226 [Google Scholar]

[18] Fotopoulou, S. 2024, Astron. Comput., 48, 100851 [Google Scholar]

[19] Galli, P. A. B., Loinard, L., Ortiz-Léon, G. N., et al. 2018, ApJ, 859, 33 [Google Scholar]

[20] Galli, P. A. B., Loinard, L., Bouy, H., et al. 2019, A&A, 630, A137 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[21] Ginsburg, A., Koch, E., Robitaille, T., et al. 2019, https://doi.org/10.5281/zenodo.2573901 [Google Scholar]

[22] Hirota, T., Ito, T., & Yamamoto, S. 2002, ApJ, 565, 359 [NASA ADS] [CrossRef] [Google Scholar]

[23] Keto, E., & Caselli, P. 2008, ApJ, 683, 238 [Google Scholar]

[24] Keto, E., Caselli, P., & Rawlings, J. 2015, MNRAS, 446, 3731 [NASA ADS] [CrossRef] [Google Scholar]

[25] Lada, C. J., Bergin, E. A., Alves, J. F., & Huard, T. L. 2003, ApJ, 586, 286 [NASA ADS] [CrossRef] [Google Scholar]

[26] Lee, C. W., Myers, P. C., & Tafalla, M. 2001, ApJS, 136, 703 [NASA ADS] [CrossRef] [Google Scholar]

[27] Lin, Y., Spezzano, S., Sipilä, O., Vasyunin, A., & Caselli, P. 2022, A&A, 665, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[28] Mangum, J. G., & Shirley, Y. L. 2015, PASP, 127, 266 [Google Scholar]

[29] McInnes, L., Healy, J., & Astels, S. 2017, J. Open Source Softw., 2 [Google Scholar]

[30] Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A., & Sander, J. 2014, Density-Based Clustering Validation, 839 [Google Scholar]

[31] Müller, H. S. P., Thorwirth, S., Roth, D. A., & Winnewisser, G. 2001, A&A, 370, L49 [Google Scholar]

[32] Nagy, Z., Spezzano, S., Caselli, P., et al. 2019, A&A, 630, A136 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[33] Ohashi, N., Lee, S. W., Wilner, D. J., & Hayashi, M. 1999, ApJ, 518, L41 [Google Scholar]

[34] Okoda, Y., Oya, Y., Sakai, N., Watanabe, Y., & Yamamoto, S. 2020, ApJ, 900, 40 [NASA ADS] [CrossRef] [Google Scholar]

[35] Okoda, Y., Oya, Y., Abe, S., et al. 2021, ApJ, 923, 168 [Google Scholar]

[36] Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]

[37] Pety, J. 2005, in SF2A-2005: Semaine de l’Astrophysique Francaise, eds. F. Casoli, T. Contini, J. M. Hameury, & L. Pagani, 721 [Google Scholar]

[38] Punanova, A., Caselli, P., Feng, S., et al. 2018, ApJ, 855, 112 [NASA ADS] [CrossRef] [Google Scholar]

[39] Redaelli, E., Sipilä, O., Padovani, M., et al. 2021, A&A, 656, A109 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[40] Semenov, D., Hersant, F., Wakelam, V., et al. 2010, A&A, 522, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[41] Sipilä, O., Caselli, P., & Harju, J. 2015, A&A, 578, A55 [Google Scholar]

[42] Soler, J. D., Hennebelle, P., Martin, P. G., et al. 2013, ApJ, 774, 128 [Google Scholar]

[43] Spezzano, S., Bizzocchi, L., Caselli, P., Harju, J., & Brünken, S. 2016, A&A, 592, L11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[44] Spezzano, S., Caselli, P., Bizzocchi, L., Giuliano, B. M., & Lattanzi, V. 2017, A&A, 606, A82 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[45] Spezzano, S., Caselli, P., Pineda, J. E., et al. 2020, A&A, 643, A60 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[46] Tafalla, M., & Santiago, J. 2004, A&A, 414, L53 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[47] Thaddeus, P., Vrtilek, J. M., & Gottlieb, C. A. 1985, ApJ, 299, L63 [NASA ADS] [CrossRef] [Google Scholar]

[48] Valdivia-Mena, M. T., Pineda, J. E., Segura-Cox, D. M., et al. 2023, A&A, 677, A92 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[49] Wakelam, V., Loison, J. C., Herbst, E., et al. 2015, ApJS, 217, 20 [NASA ADS] [CrossRef] [Google Scholar]

[50] Wes McKinney. 2010, in Proceedings of the 9th Python in Science Conference, eds. Stéfan van der Walt & Jarrod Millman, 56-61 [Google Scholar]

[51] Williams, J. P., Myers, P. C., Wilner, D. J., & Di Francesco, J. 1999, ApJ, 513, L61 [NASA ADS] [CrossRef] [Google Scholar]

[52] Xu, L.-H., & Lovas, F. 1997, J. Phys. Chem. Ref. Data, 26, 17 [Google Scholar]

[53] Yan, Q.-Z., Yang, J., Su, Y., et al. 2022, AJ, 164, 55 [Google Scholar]

[54] Yun, H.-S., & Lee, J.-E. 2023, ApJ, 958, 113 [Google Scholar]

Chemical segregation analysed with unsupervised clustering

1 Introduction

2 Observations and data reduction

2.1 Data

2.2 Sources

3 Methodology

3.1 Preprocessing

3.2 Clustering

3.3 Case studies

4 Results

4.1 H2 column density gradient

4.2 Comparison of dataset features

4.3 Clustering

4.3.1 Case 1: c-C3H2 vs CH3OH

4.3.2 Case 2: c-C3H2 vs CH3CCH

4.3.3 Case 3: CH3OH vs CH3CCH

4.3.4 Case 4: c-C3H2 vs CH3OH vs CH3CCH

4.4 CH3 CCH abundances

5 Discussion

5.1 Density-based clustering

5.2 Evolution traced by CH3CCH

6 Conclusions

Data availability

Acknowledgements

Appendix A Integrated intensity maps

Appendix B H2 column density gradient maps

Appendix C Detailed clustering results

Appendix D Spectra at dust peak

References

All Tables

All Figures

4.1 H₂ column density gradient

4.3.1 Case 1: c-C₃H₂ vs CH₃OH

4.3.2 Case 2: c-C₃H₂ vs CH₃CCH

4.3.3 Case 3: CH₃OH vs CH₃CCH

4.3.4 Case 4: c-C₃H₂ vs CH₃OH vs CH₃CCH

4.4 CH₃ CCH abundances

5.2 Evolution traced by CH₃CCH

Appendix B H₂ column density gradient maps