Issue |
A&A
Volume 698, May 2025
|
|
---|---|---|
Article Number | A90 | |
Number of page(s) | 23 | |
Section | Interstellar and circumstellar matter | |
DOI | https://doi.org/10.1051/0004-6361/202449718 | |
Published online | 04 June 2025 |
CH3OH as a user-friendly density probe: Calibration and beyond
1
INAF – Istituto di Radioastronomia di Bologna,
Via Gobetti 101,
40129
Bologna,
Italy
2
INAF – Osservatorio Astronomico di Cagliari,
Via della Scienza 5,
09047
Selargius
(CA),
Italy
3
INAF – Istituto di Astrofisica e Planetologia Spaziali,
Via Fosso del Cavaliere 100,
00133
Rome,
Italy
4
Haystack Observatory, Massachusetts Institute of Technology,
99 Millstone Road,
Westford,
MA
01886,
USA
5
Dipartimento di Fisica, Università degli Studi di Cagliari,
S.P. Monserrato-Sestu km 0,700,
09042
Monserrato
(CA),
Italy
★ Corresponding author: andrea.giannetti@inaf.it
Received:
24
February
2024
Accepted:
21
March
2025
Context. Almost all the physics of star formation critically depends on the number density of the molecular gas involved. However, the methods to estimate this keystone property often rely on very uncertain assumptions about the geometry of the molecular fragment, or depend on overly simplistic, uniform models, or require time-expensive observations to simultaneously constrain the gas temperature as well. An easy-to-use method to observationally derive the number density that is valid under realistic conditions is conspicuously absent, causing an evident asymmetry in how accurately the volume density is estimated, and how often dedicated tracers are used, compared to the gas temperature.
Aims. To fill this gap, we propose and calibrate a versatile diagnostic tool based on methanol spectral lines that greatly simplifies the inference of molecular number density. Methanol is abundant in both cold and hot gas, and has a dense spectrum of lines, which maximises observational efficiency. It can therefore be applied to a wide variety of scales, from entire clouds to protostellar discs, and both in our Galaxy and beyond. Moreover, this tool does not need to be tailored to the specific source properties (such as distance, temperature, and mass).
Methods. We construct large grids of clump models and perform radiative transfer calculations to investigate the robustness of different line ratios as density probes with different assumptions, also in the presence of density and temperature gradients.
Results. We find that the line ratios of the (2K − 1K) band transitions around 96.7 GHz are able to fully constrain the average number density along the line of sight within a factor of two-three in the range ~5 × 104−3 × 107 cm−3. The range can be extended down to a few times 103 cm−3, when also using line ratios from the (5K − 4K) and/or (7K − 6K) bands, around 241.7 GHz and 338.1 GHz, respectively. We provide the reader with practical analytic formulas and a numerical method for deriving volume density and its uncertainty from observed values of the line ratios.
Conclusions. Thanks to our calibration of line ratios, we make the estimate of the number density much simpler, with an effort comparable or inferior to deriving excitation temperatures. By providing directly applicable recipes that do not require the creation of a full large velocity gradient model grid, but are equally accurate, we contribute to offsetting the disparity between these two fundamental parameters of the molecular gas. Applying our method to a sub-sample of sources from the ATLASGAL TOP100 we show that the material in the clumps is being compressed, and this compression accelerates in the latest stages.
Key words: methods: data analysis / methods: observational / stars: formation / ISM: clouds / ISM: molecules
© The Authors 2025
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
1 Introduction
To gain insights into the star-formation (SF) process, we must know the physical properties (density and temperature) of the molecular gas and of the dust that will form new stars, how they are distributed in three dimensions (Tassis 2007; Li & Goldsmith 2012; Tritsis et al. 2016), and how they evolve over time.
Temperature is crucial to estimate the amount of gas and dust in the interstellar medium and to characterise its heating and cooling. On the other hand, the joint distributions of temperature, and gas number density play a major role in governing crucial processes in SF. For instance, chemical reaction rates are density and temperature dependent. Density also governs the Jeans mass and the free-fall time of the fragment, which are crucial factors for its stability against collapse and its dynamical timescales. Furthermore, energy losses for the particles of cosmic rays depend on density, which changes their efficiency in ionising the gas, and initiates chemistry in cold gas. The velocity field is measured through molecular lines, the intensity of which is determined by both density and temperature, and this influences measurements of the turbulence spectrum. Similarly, both quantities influence measurements of accretion onto molecular fragments, because the velocity field and the occurrence of self-absorption depend on density and temperature. These connections and effects have been discussed in a number of studies (e.g. Roberts & Millar 2000; Caselli et al. 1999, 2002; Padovani et al. 2009; Roman-Duval et al. 2011; Krumholz et al. 2012; Vázquez-Semadeni et al. 2019, 2024). Moreover, the comparison between independent measurements of the column density and the average number density of molecular condensations provide a way to estimate the size of these structures along the line of sight (LOS; e.g. Li & Goldsmith 2012), and enable us to reconstruct the 3D distribution of the gas. This information and the collapse timescale are relevant to determine the dominant mode of SF (e.g. Tassis 2007; Bovino et al. 2021; Sabatini et al. 2021). If collapse is hindered by magnetic fields or strong turbulence, long timescales compared to the free-fall time are expected (e.g. Mouschovias 1991). Instead, if gravity is dominant the collapse should proceed on timescales comparable to the local free-fall time (e.g. Vázquez-Semadeni et al. 2019, and references therein). Analogously, different formation mechanisms have different predictions for the intrinsic shape of molecular fragments. The shape of molecular fragments is predicted to be triaxial and prolate if they form from turbulent converging flows (e.g. Li et al. 2004), whereas they should be mostly oblate when magnetic fields are important (e.g. Ciolek & Basu 2006). Therefore, accurate measurements of both temperature and density are essential, because they determine the free-fall timescale, and they can be used to infer the degree of support against gravity and the intrinsic shape of molecular fragments.
Despite their importance, accurately measuring temperature and density in molecular fragments presents observational challenges. Temperature is the most straightforward parameter to measure. Lines of symmetric-top molecules, such as NH3, with different excitation properties are commonly used for this (e.g. Walmsley & Ungerechts 1983; Cummins et al. 1983; Askne et al. 1984). On the other hand, estimates of the number density of molecular hydrogen, n(H2), often require a measure of the temperature (Walmsley 1987).
The usual procedure to infer n(H2) is to derive the gas column density of molecular hydrogen from dust continuum observations (under the assumption of a given temperature), assume a geometry for the cloud to derive its depth along the LOS, and divide the column density by this estimated size (e.g. Rigby et al. 2019; Sanhueza et al. 2019; Redaelli et al. 2021). Density can also be estimated with molecular lines. For simple linear species (CS, HC3N, etc.) observations of several transitions at very different wavelengths are required. In this case, number density can be estimated by identifying the critical value of the rotational quantum number J that corresponds to the brightest transition, for which number of collisions and spontaneous radiative de-excitations roughly balance. However, observing multiple transitions at different wavelengths implies the need to consider calibration uncertainties, comparatively large integration times, and excitation effects (e.g. see the discussion in Li & Goldsmith 2012).
Slightly asymmetric rotors, such as formaldehyde (H2CO) or methanol (CH3OH), have spectral lines whose relative intensities depend both on T and n(H2) (e.g. Mangum & Wootten 1993; Leurini et al. 2004; Tang et al. 2018). Line combinations can be strategically selected to minimise the dependence on temperature or density, by leveraging on forbidden transitions as in the case of symmetric tops, and fast radiative de-excitation and/or similar excitation properties. Moreover, these diagnostics are more time-efficient and robust than linear molecules, because multiple lines often fall within a narrower frequency range, which reduces the number of required observational setups and the impact of calibration uncertainties.
Methanol is a widespread and abundant molecule that traces both cold- and hot material (e.g. Giannetti et al. 2017b; Sanna et al. 2021), in the Milky Way and in external galaxies (e.g. Henkel et al. 1987; Shimonishi et al. 2018; Sewiło et al. 2018; Humire et al. 2020; McCarthy et al. 2020; Humire et al. 2022). Combinations of CH3OH lines can be used to probe both density and temperature. The consequence is that this species can be used to conveniently explore the physical properties of a significant variety of environments. Leurini et al. (2004) proposed specific combinations of CH3OH lines within the same spectral band that exhibit ratios with a strong density dependence, while showing a very weak dependence on the gas temperature. These lines (see Table 1) are carefully selected to have similar excitation conditions (energies above the ground state, and critical densities), which ensures that they originate from the same emitting region. As a result, their ratios are unaffected by differences in gas distribution and in the specific line excitation conditions. However, the diagnostic power of the ratios was assessed by Leurini et al. (2004) only for the optically thin case, and using a simplified model consisting of a spherical clump with constant density, temperature, and abundance.
With the aim of providing an easy-to-use tool to analyse the emission of specific CH3OH transitions, we extended the work of Leurini et al. (2004) to more realistic conditions, including radial variations of density and temperature. This allows us to test if these ratios preserve their observational diagnostic potential, and to assess their robustness, in case of density and temperature stratification along the LOS. We also calibrated the ratios against the average value of the volume density along the LOS, and discussed the differences with the uniform and isothermal case. To improve the accuracy of the final density estimates we efficiently increased the sampling of the parameter space explored by our grid of models four-fold by emulating the radiative transfer results through machine learning (ML). We specifically trained decision tree-based algorithms, which are very robust and efficient. They can be trained even with relatively small training sets and they generally take only a fraction of the time needed to train deep neural networks, while they still provide state-of-the-art predictions (e.g. Chen & Guestrin 2016).
In Sect. 2, we show how we built realistic clump models and simulated their emission in methanol lines, the details of the clump model, the procedure followed to compute the line ratios, and how we derive the H2 number density from the ratios. Section 3 summarises the results that we obtained as we interpreted and calibrated the line ratios with a series of different experiments, comparing them to simpler modelling. We also discuss their robustness as number density indicators, and how we used machine learning to improve the final density estimates. In Sect. 4, we provide empirical analytic approximations to directly estimate the volume density from the observed line ratios. In Sect. 5, we show the application of our analysis to selected high-mass star-forming clumps from the ATLASGAL TOP100 sample (Giannetti et al. 2014), for which we estimate the number density and investigate its time evolution during the first stages of the process of high-mass SF. Finally, Sect. 6 lists our conclusions.
2 Simulating CH3OH line ratios: models and tools
We selected methanol lines from accessible regions of the millimeter spectrum. These lines are very close in frequency (~5–200 MHz, cf. Table 1), so that they can be observed with a single spectral setup by most telescopes. To minimise the impact of temperature on the ratios, and ensure that line emission originates from the same volume of gas, we consider ratios between transitions with similar energies above the ground state and critical densities. Our first objective is to calibrate the I(20 − 10)/I(2−1 − 1−1)1, I(21 − 11)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), I(5−1 − 4−1)/I(50 − 40), and I(7−1 − 6−1)/I(70 − 60) ratios against the average density along the LOS for realistic models of molecular condensations.
To pursue this objective we have produced a tool to simplify the creation of realistic molecular fragment models, simulate and analyse their molecular line emission. We exploited this tool to model the line ratios listed above. We will refer to it as the Swiss Army Knife, or SAK for short, because we adopt the nickname given to methanol by Leurini et al. (2005), to emphasise the tool versatility in modelling and analysing molecular line emission, similar to the versatility of methanol in tracing molecular gas properties. A more detailed description of its capabilities, of its infrastructure and automation is given in Appendix A and in the repository documentation2.
Properties of the methanol lines considered in this work.
2.1 SAK infrastructure
We began by developing a model generator that handles physical model setup, performs radiative transfer computations, and summarises the results for easy inspection. The system for model building and post-processing, as well as for line ratio computation, is designed to guarantee reproducibility, data conservation, and structuring. This design also minimises the risk of errors during data retrieval and analysis.
We created a container to run SAK, so that the environment for execution is easily replicable and accessible on different machines, both locally and on distributed systems. Containers are a way to isolate software, increase its portability, and improve the reproducibility of results, and they ensure that users can run SAK without the need to install or manage any dependencies.
2.2 Model generation
The system for generating the physical grids can currently handle single-cell models, or spherical clumps, with power-law distributions in number density and temperature. For spherical sources, the density and temperature profiles can be described by (e.g. van der Tak et al. 2000b; Lin et al. 2022):
(1)
(2)
where r is the distance from the fragment centre, r0 is a scaling radius, n(H2, r0) and T(r0) are the values of the number density and temperature at the scaling radius, respectively. The central grid cell is replaced with the maximum value in the neighbour cells to avoid the singularity at clump centre, present in case of a negative exponent in Eqs. (1) and (2). Our choice of the temperature profile was motivated by the fact that clumps do not host single stars, but a cluster. A power-law reproduces well the radial temperature variation observed in these regions (Lin et al. 2022). Simulating an embedded cluster by sampling the initial mass function, assuming the spectrum of the individual protostars, and then computing the temperature profile consistently is beyond the scope of the present paper.
The radiative transfer is performed with RADMC 3D V2.0 (Dullemond et al. 2012). SAK automates the radiative transfer computation by generating all necessary input files for RADMC 3D and then executing the code. To explore the sensitivity to number density, we use the “LVG+escape probability” method to perform non-LTE computations, as described in the RADMC-3D documentation. To use this method, we have assumed a solidbody rotation of the clump, with a gradient of 2 km s−1 pc−1, and a typical length scale for escape probability equal to the diameter of the clump, as listed in Table 2. We adopted the collisional excitation rates for H2 (Rabli & Flower 2010) provided by the LAMDA database (Schöier et al. 2005). The gas and dust are assumed to be thermally coupled. Because the collisional coefficients are only available for para-H2 in LAMDA, we assumed that no difference exists between ortho- and para-H2. Also, if the temperature is higher than the maximum one listed (200 K), the coefficients were assumed to remain constant.
2.3 Clump model details
The parameters for our clump model are reported in Tables 2 and 3. The values of these parameters resemble the typical properties of clumps identified in Galactic plane surveys (e.g. Urquhart et al. 2018; Elia et al. 2021). We chose to focus our analysis on the case of high-mass clumps, because this focus provides a more precise estimate of the dynamical timescales for these structures, which has immediate consequences for cluster formation. Extensive datasets targeting these systems are available for our Galaxy (e.g. Urquhart et al. 2018; Elia et al. 2021) but also for external galaxies, where high-resolution observations start to reach linear scales that approach the size of clumps (≈0.1–1 pc, e.g. Indebetouw et al. 2013; Muraoka et al. 2020).
We assumed a fiducial value of p = −1.5 for the density structure (see Eq. (1)), as observed in galactic clumps (Beuther et al. 2002; Giannetti et al. 2013), and scaling radius r0 = 0.5 pc. We explored regularly spaced values of the scaling density n(H2, r0) between 103 cm−3 and ~6.6 × 106 cm−3, spaced by a factor of three. A ceiling is set for the central density of 108 cm−3, compatible with the modelling of clumps where clusters are forming. Grid cells for which the density value would exceed this threshold are set to 108 cm−3 instead.
The presence of luminous Young Stellar Objects (YSOs) embedded in the clump, which heat the material from within was simulated by introducing the temperature gradient of Eq. (2), with a slope q = −0.5, which assumes optically thin dust. The value chosen for the exponent has been observed in the inner regions of high-mass cluster-forming regions (Lin et al. 2022). Again we set a ceiling, for the temperature value to 2000 K. An example of the assumed density and temperature profiles is shown in Figure 1. The assumed temperature profile is not independent of the bolometric luminosity of the embedded cluster (Wilner et al. 1995). By adapting Equation (1) of that work for our parameters, we express the luminosity of the embedded cluster as:
(3)
Given the grid limits, our models span a range of bolometric luminosities from 4 × 103 L⊙ to 2.2 × 106 L⊙. These values are typical of observations of high-mass star-forming clumps (e.g. Elia et al. 2017; Urquhart et al. 2018), and they encompass the luminosities associated with massive clusters in the MW. This validates our grid for the analysis of these objects.
The final number of models per grid is 630. We repeated the calculations for two additional grids, with p = −1.2 and p = −1.8 to investigate the effect of a different density distribution of gas on the ratios. Table 3 reports the entire parameter space that was covered by the grids in our experiments, for a total of 9450 individual simulated clumps.
By assuming a power-law distribution in density and temperature, our clump models simulate the effects of the envelope on the line ratios. We remark that common radiative transfer tools, such as RADEX (van der Tak et al. 2007), cannot account for density and temperature variations along the LOS, leaving a potentially large source of uncertainty unknown.
Basic properties of the simplified model of clumps.
Parameter space covered by the grids in our experiments.
![]() |
Fig. 1 Example of the radial density and temperature profiles inside the clump, for n(H2, r0) = 105 cm−3 and T(r0) = 25 K. |
2.4 Line ratio computation
We simulated the seven lines in Table 1, and computed the following ratios of integrated intensities: I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), I(21 − 11)/I(2−1 − 1−1), I(5−1 − 4−1)/I(50 − 40), and I(7−1 − 6−1)/I(70 − 60). Two different methods are considered for investigating the ratios. We compute the line ratios averaged over the entire clump to mimic single-dish observations of high-mass clumps in the Milky Way or high-resolution imaging of similar objects in nearby galaxies. The line ratios were also derived along each individual LOS across the clumps in the final simulated images (101 × 101 pixels, Table 2). Calibrating the relation between average number density along the LOS and the line ratios is key for deriving accurate density profiles from observations and reconstructing the 3D structure of the fragments. As we will see in Sect. 3.3, the two methods give consistent results as a function of average density.
The integrated line fluxes from all pixels in each line map were summed, and then the ratio of these summed fluxes was computed to obtain the integrated line ratio for the entire clump. The maps of the LOS-averaged physical properties (number density and temperature) were produced by a weighted mean of the grid cells values along the LOS. We used as weights the number density of methanol in the individual grid cells. This weighting scheme is naturally motivated by the fact that in the optically thin case, the line intensity is proportional to the number of molecules. Thus cells with more methanol molecules contribute more to the observed ratio. Furthermore, when both lines in a ratio are heavily optically thick, the ratio tends to a constant and its sensitivity to the volume density disappears.
2.5 Inferring the H2 number density from the line ratios
To derive the number density, given the methanol line ratios, we have used the kernel density estimate (KDE), a non-parametric method to estimate the probability density function. We have computed the KDE with the SKLearn package (Pedregosa et al. 2011), and we leveraged the Bayes theorem to extract the most probable value of n(H2) and its uncertainty. Each marginal probability is computed by first generating 1000 realizations of the two integrated line intensities to compute 1000 realizations of the ratio. These are used to extract the probability density function (PDF) for that ratio from the KDE, and then summed for each realization to obtain the final PDF:
(4)
where Pjk(n(H2), Rjk,i|D) is the ratio-number density PDF for the ratio is the marginalised PDF for the number density from ratio jk, and Rjk,i is the ith realization of the ratio jk. The PDFs for the different ratios are combined by multiplication to obtain the final posterior, by considering the marginal PDFs for each ratio jk:
(5)
because all the ratios provide independent information, and the model needs to reproduce all the available ones at the same time. A basic schema of the procedure is given in Fig. 2. Note that calibration uncertainties do not affect this combination, as they cancel out in each of the ratios. The tool derives from the posterior the highest-probability-density (HPD) interval for a given probability mass (i.e. the total probability within the interval), so that it is possible to estimate the uncertainty of the best-fit value, assumed to be the one corresponding to the maximum probability density.
![]() |
Fig. 2 Schema of the procedure to derive the final density posterior from the combination of several line ratios. Two distinct ratios are considered in this example, the relations of which with n(H2) are shown in the top row. After generating a number of ratios, according to observations and uncertainties, the PDFs of individual ratios are summed, and then the resulting distribution for each line ratio is multiplied to obtain the final number density posterior. |
3 Calibrating CH3OH line ratios as density probes with density and temperature stratification
To investigate the robustness of methanol line ratios as density probes under varying assumptions, we developed a series of models, progressively adding layers of complexity and evaluating their impact on the general behaviour of the line ratios. The sets are designed to explore: (1) constant density and temperature (Appendix C.1), used as a benchmark with the results from Leurini et al. (2004); (2) isothermal clumps with power-law density distributions (Appendix C.2), as an intermediate step to isolate the effect of the clump density structure; (3) internally heated clumps with a temperature gradient (Appendix C.3) and a constant abundance; (4) thermal sublimation from ices on dust grains (Appendix C.4), and (5) different column densities and line width (Appendix C.5), which explore the effect of optical depth on the proposed diagnostics.
For the radiative transfer we are only considering the E symmetry state, because the E and A states do not readily inter-convert under typical interstellar conditions, and can thus be considered different species. In most of our experiments we assume a constant abundance of E-CH3OH equal to 10−9 (van der Tak et al. 2000a; Sabatini et al. 2021). In fact, the low angular resolutions typical of clump surveys makes observations mostly sensitive to the cold methanol in the envelope. This abundance is intentionally chosen to be lower than what is commonly reported in star-forming regions for the total methanol abundance (Giannetti et al. 2017b; Sabatini et al. 2021). The choice is justified by the fact that we are considering only the E symmetry state, and that the abundances reported in literature are estimated assuming LTE, which typically tend to overestimate the total column density of methanol (Leurini et al. 2004, and Sect. 3.5). Combined, the two effects account for nearly one order of magnitude. To investigate the effect of this assumption, we consider two additional sets: set 4, where we consider an abundance jump of a factor 100 in the inner core due to ice sublimation, and set 5 where we adopt abundance values of 10−8 and 10−10.
We discuss below the results from the comparison between the observed line ratio and the number density of the emitting material, for different temperature conditions. For this we present the results derived from the analysis from each individual LOS, which represents a large ensemble of density and temperature conditions ideal to investigate the relevance and general trends of the proposed ratio. We also discuss these quantities derived from the clump-average, which are directly comparable with available survey observations.
3.1 Robustness of the ratios to density and temperature gradients
Figure 3 shows the line ratios as a function of the average number density of H2 along the LOS for each pixel in the final simulated image as grey plus signs, comparing three cases. The left column shows the isothermal clump case, while the central column displays the internally heated clump case, characterised by a temperature gradient with q = −0.5. The left- and central columns also contain a comparison of our results with a simpler isothermal and uniform model carried out with RADEX, indicated by the red stars. Due to the sparsity of our RT grid, the sampling of the average number density versus the line ratio space is not uniform, generating artefacts in the final number density posterior in specific regimes. ML algorithms are known to be extremely good in emulating complex computations, such as the radiative transfer, and can be used to address this issue. We therefore tuned such models to provide the individual line fluxes and line ratios on a four-times finer grid. The right column of Fig. 3 shows the results of this refinement, for our best-performing models. The details of the training are presented in Appendix D.
To connect the line ratios to the corresponding average number density along the LOS for each individual pixel across all models in our grids, we use a KDE. The shaded areas in the figures representing two (smoothed) HPD intervals of the resulting PDF. This overcomes saturation effects in the distribution of the ≈6 × 105 points in each plot. Details on how this is computed are given in Appendix B.
Figure 3 shows that the proposed line ratios are consistently correlated with density, despite the presence of a density gradient. Different temperatures of the gas cause the scatter that is visible in the left column in the measured ratios for a specific value of the average density along the LOS. Comparing the left- and central columns of the figure reveals the impact of realistic temperature gradient in the clump. This gradient, however, does not significantly alter the trend observed for isothermal clumps. On the other hand, the RADEX results reveal the potential errors in the density inference caused by a oversimplified clump model. The right column of the figure demonstrates how the inclusion of the emulated data improves the sampling of the parameter space. The model errors have no qualitative impact on the shape of the relation, indicating that the model accurately emulates the RT computations. This qualitative check serves to verify that the trained models generalise well for all the predicted data points, confirming that the final inference is reliable.
Examining in more detail the isothermal and internally heated cases shows the effect of the simulated heating. The primary effect of the embedded protocluster is a slight increase in the scatter of the relations. Additionally, it causes a 10–20% higher maximum value for the I(21 − 11)/I(20 − 10) and I(21 − 11)/I(2−1 − 1−1) ratios at very high densities (< n(H2) >≳ 5 × 106 cm−3) in the internally heated case, where the material is mostly thermalised. Confirmation that the temperature only has a second order effect comes also from the Spearman correlation coefficients ρ. The coefficient are listed in Table 4, and they are far larger for the average density than for the average temperature. Because of their non-monotonic behaviour, the ratios of lines in the 241.7 GHz and 338.1 GHz bands show a much weaker correlation with the average number density compared to those for the lines in the 96.7 GHz band.
Figure 3 also shows the importance of including the fragment envelope when calibrating the ratio values. We used RADEX to construct simple homogeneous and isothermal clump models, and compared them to our results. The RADEX points (red stars in the left- and central columns) represent extreme values of the ratios for typical high-mass clumps in the galactic disc. The line intensities were computed with two values of E-CH3OH column densities, 1014 cm−2 and 3 × 1015 cm−2. These values correspond to the minimum and maximum column density of the cold methanol component in Giannetti et al. (2017b), after accounting for an equal abundance of the E and A variants of methanol, and the likely overestimate of the LTE column density (Leurini et al. 2004, and Sect. 3.5). We varied the number density in the range 103−108 cm−3 in steps of one order of magnitude, and assumed two kinetic temperatures, 10 K or 30 K; the full width at half maximum (FWHM) of the line was fixed to 3.5 km s−1, as in the majority of our models. Comparing the most probable value of the ratio for the PDFs and the values predicted by RADEX reveals that the calibration of the relations can be off by even two orders of magnitude for a given value of the line ratio. In fact, they should roughly encompass the PDFs obtained through SAK if the calibrations were equivalent, because the RADEX points represent extreme values of the ratios. This stresses the importance of more sophisticated modelling when trying to infer n(H2).
It is difficult to appreciate the improvement in the final posterior that ML emulation brings from Fig. 3 alone. We have mentioned above that our grid introduces an uneven sampling of densities when considering the individual LOSs. Figure 4 shows the resulting oscillatory pattern in the number density PDFs for one of the sources included in the sample of high-mass clumps used to investigate the evolution of number density (cf. Sect. 5). The source has been chosen to be in the critical number density regime affected by this issue, and it shows the posterior before and after the dataset has been enriched with ML-emulated points. The finer grid produced through ML emulation removes artificial features from the relation, even using a kernel with half the bandwidth as for the RT-only case.
Spearman correlation coefficients between the line ratios considered in this work, the LOS-averaged number density and temperature.
![]() |
Fig. 3 Kernel density estimate of the probability density function for the line ratios as a function of average number density along the LOS, for all the temperatures considered in the grid, for the model assumptions indicated at the top of each column. The rows show the ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), I(21 − 11)/I(2−1 − 1−1), I(5−1 − 4−1)/I(50 − 40), and I(7−1 − 6−1)/I(70 − 60), in order, from top to bottom. RADEX computations for an isothermal, uniform clump are shown as red stars; the parameters used are described in the text. Note that for the I(5−1 − 4−1)/I(50 − 40), and I(7−1 − 6−1)/I(70 − 60) ratios the RADEX points at low densities are above the plotting limits, and reach values around 25 in both cases. |
![]() |
Fig. 4 Improvement in the final number density posterior when adding ML-emulated data. In blue, a source displaying oscillations in the posterior is shown for the RT-only case; in orange, the same source with the number density posterior evaluated on the finer grid, generated by our final ML model. The oscillations caused by grid sparsity disappear. |
3.2 Regimes of density probed by the ratios
Figure 3 shows that ratios computed with lines in the (2K − 1K) band around 96.7 GHz are sensitive to average densities in the range ≈5 × 104−3 × 107 cm−3. This interval reflects the density at which collisional excitation starts to be important, and the density at which the lines are fully thermalised. In this case, the ratio starts to be correlated to the density at values comparable with the critical densities of the lines (≈3 × 104 cm−3, see Table 1).
On the other hand, the ratios of lines in the (5K − 4K) and (7K − 6K) bands around 241.7 GHz and 338.1 GHz show a more complex behaviour. The density at which they thermalise (and quickly become optically thick) is comparable to the critical density, while dependence on the density starts around a few 103 cm−3−104 cm−3 for the internally heated clump case. These lines have an Einstein-A coefficient ≈20–40 times higher than the lines in the (2K − 1K) band. They have a significantly higher optical depth, and radiative trapping is more important, making them sensitive to densities much lower than their critical density.
The I(5−1 − 4−1)/I(50 − 40) and I(7−1 − 6−1)/I(70 − 60) ratios are less powerful than those computed with 3 mm lines for estimating number densities, because each value of the ratio has two solutions. However, they are valuable to extend the range of conditions where we can actually estimate this parameter to lower densities, not accessible through the lower-frequency lines. One example where this is useful is represented by entire molecular clouds, which have average densities typically in the range probed by these higher-frequency ratios.
3.3 Robustness of the ratios to gas distribution and distance
The effect of the specific distribution of material in the clumps has been investigated by comparing the shape and location of the relations across the three grids with a different p-value. This comparison reveals that the relations between the line ratios and the average density along the LOS are unaffected by the exponent of the density power-law: whatever the distribution of material within the explored cases, a given ratio of the lines always corresponds to the same average density along the LOS (Fig. C.3).
Figure 5 further decouples the temperature and density dependence, by showing the line ratio values derived by integrating over the entire clump as function of n(H2, r0) and T(r0), for different exponent p. In addition to the minor role of temperature, Fig. 5 shows that the ratio values depends only on the overall average density of the clumps, as indicated in the right axes of the panels. This confirms that the results are independent of the fine details of the material distribution.
The values and ranges of the ratios for a given average density can be compared when averaging along the LOS or over the entire clump by contrasting Figures 3 and 5. They are found to be broadly consistent, which indicates that the ratios present only a marginal dependence on distance, because they are computed on widely different (a factor of 100) spatial scales in the two cases.
3.4 Effects of optical depth and a hot-core-like abundance profile
In Appendix C.5 we quantify the impact of different optical depths by scaling the abundance up and down by one order of magnitude. As shown in Fig. 6, which compares models with E-CH3OH abundances of 10−10 and 10−8, a 100-fold change in abundance results in a density shift of approximately a factor of 15 in the line ratio relations. Moreover, the dependence on the density of the I(21 − 11)/I(20 − 10) ratio is mostly lost for the lowest methanol abundance (10−10), and thus the lowest optical depths of the lines. It is interesting to note that this ratio has the largest dependence on the temperature (cf. Table 4).
Figure 6 allows also to check if the behaviour of the line ratios is consistent with the effect of radiative trapping discussed in Sect. 3.2. Radiative trapping decreases the effective radiative de-excitation rate (Shirley 2015), and this renders the lines more efficient in tracing low-density gas (Kauffmann et al. 2017). Thus, a larger abundance should shift the relation to lower densities by increasing the optical depth of the lines, and an opposite behaviour is expected for lower abundances. Another consequence is that the ratio should become thermalised, and thus insensitive to density, at lower average density along the LOS. Indeed a lower optical depth, corresponding to the lower methanol abundance, shifts a given value of the line ratios to higher average densities while a higher abundance, and thus higher optical depth, has the contrary effect.
We finally note that the higher-frequency ratios show different maxima values, depending on the abundance assumed. From Fig. 6 one can see that the maximum value of the I(5−1 − 4−1)/I(50 − 40) ratio varies from ≈2.5 in the case of the lowest methanol abundance (10−10) to ≈14 in the case of the highest methanol abundance (10−8); similarly, the I(7−1 − 6−1)/I(70 − 60) ratio varies from ≈1.3 to ≈5. This increase is again an effect of radiative trapping. By making the lines easier to excite at lower density, the emitting region becomes larger, particularly so for the lower excitation line. The outer layers are also progressively colder due to the power-law behaviour of the temperature. Therefore, the combination of optical depths, density, and temperature creates a region where only the upper level of the lower-excitation line is efficiently populated by the collisions, resulting in brighter line fluxes, compared to the higher-excitation one. Where this region is located determines the maximum value of the ratio: the larger the emitting region, the higher the expected ratio due to the lower gas temperatures traced. The higher values of the ratio maximum are indeed observed at lower densities, corresponding to outer and colder layers in our models.
Methanol is a species that also efficiently traces the hot gas present close to protostars. This molecule forms in large quantities on the dust grains via successive hydrogenation of CO, and observations show that CH3OH is an ubiquitous and important constituent of interstellar ices, with abundances that can reach values close to 10−6 (e.g. Grim et al. 1991; Pontoppidan et al. 2003; Boogert et al. 2008, 2011, and Boogert et al. 2015 for a review). When temperatures exceed ~90 K CH3OH is evaporated from grains, increasing its gas-phase abundance by around two orders of magnitude (e.g. van der Tak et al. 2000a; Maret et al. 2005; Giannetti et al. 2017b; Sabatini et al. 2021). Therefore, we simulated evaporation of methanol from grains by adopting a step-function profile for the abundance, with a jump of a factor of 100 in the regions exceeding 90 K (see Appendix C.4). The 3 mm clump-integrated ratios are not affected by the presence of a hot core, while those at higher frequencies are reduced by ≈10% due to optical depth issues in the innermost layers. Because of the small area affected by the increased abundance, only a minimal fraction of the ratios computed along the LOS is influenced, showing values consistent with a constant abundance of 10−8.
![]() |
Fig. 5 Comparison of the integrated line ratios for different density distributions, corresponding to the three grids considered in the case of internally heated clumps. The columns of plots refer to the values of p indicated at the top of each column. We only show the I(21 − 11)/I(20 − 10) for the 3 mm lines, because all three have the same behaviour. |
3.5 Optical depth-number density degeneracy
Our results point to the fact that the considered line ratios are, in general, robust indicators of number density, with the largest uncertainties coming from the abundance of methanol itself. As mentioned earlier, changing the abundance of E-CH3OH by a factor of 10 up or down from the fiducial value, respectively decreases or increases the estimated density by a factor ≈2–5. This introduces a degeneracy between the column density or opacity of the lines and the number density, which is comparable to or larger than the uncertainty due to the spread in the modelled points.
This degeneracy can be broken relatively easily. One possibility would be to account for the line opacity before computing the ratios, estimating the corrections from the intensity of the corresponding 13CH3OH isotopologue lines, assuming that the isotopic ratio is known. This has also the potential of expanding the sensitivity to somewhat larger density values, if we use the low-abundance, low-opacity case for the inference, for which the lines trace denser gas and the thermalisation happens at higher densities (cf. green-shaded area in Fig. 6).
Knowing a priori the column density and the abundance of methanol would also be a useful constraint. Leurini et al. (2004) show that LTE and LVG analyses for the cold methanol component yield similar results in terms of column density, even though it underestimates the excitation temperature. In their analysis the authors find that the CH3OH column density is slightly over-estimated (by a factor ≈1.3) by the LTE assumption. If this systematic error can be characterised and accounted for, it is straightforward to have a rough estimate of the fractional abundance in the source, assuming that the H2 column density can be estimated from dust continuum emission. This allows to then select the most appropriate grid of models.
Figure 7 shows the comparison of the column densities of methanol from our models and those obtained via a simple population diagram analysis applied to the 96.7 GHz lines, which assumes LTE (Goldsmith & Langer 1999). The figure demonstrates that the two column densities for each pixel are very well correlated, with a tendency towards overestimation for the LTE approximation. The median difference between the LTE and model column density of E-CH3OH is approximately a factor of three. Of course this result depends on the number density of the material being probed, but Fig. 7 shows that this holds for all the models we created, with number densities along the LOS in excess of ~few × 102 cm−3 (cf. e.g. Fig. 3). This shows that such an approximation is valid for clumps and cores; averages over entire clouds are not likely to be dominated by heavily optically thick emission.
Alternatively, one could produce a model finely tuned to the source properties (radius, mass, temperature, distance, line width, etc.), and compare directly the synthetic and observed line intensities. In this case, the tool would not be universal, but tailored grids would have to be produced, and the results would be more model-dependent. Another possibility is to introduce a prior, according to the estimate of the density from the dust, given an estimate of the source size, but this would make it impossible to estimate the source dimension along the LOS from the number- and column densities of H2. The volume density from the dust already assumes a dimension of the source along the LOS and the final estimate would not be independent of this assumption. A final possibility is to select the grid based on the ratio versus methanol column density values. This would bypass the uncertainties connected to the dust. Of course the beam size must be taken into account in this case, when estimating the molecular column density for a given dataset.
![]() |
Fig. 6 Comparison between the internally heated clump with the fiducial E-CH3OH methanol abundance of 10−9 (yellow shaded areas), the low-abundance (E − CH3OH/H2 = 10−10, 1/10 fiducial), low-optical depth case (green shaded areas) and the high-abundance (E − CH3OH/H2 = 10−8, 10 times fiducial), high-optical depth case (blue shaded areas). The light-shaded areas correspond to the 90% HPD, while the dark shaded areas correspond to the 50% HPD. |
![]() |
Fig. 7 Comparison of the column density, as derived from a simple rotational diagram and computed from the model grid, on a pixel-by-pixel basis. The red line shows the one-to-one locus. |
4 Analytic approximation of the ratio-number density relations
To facilitate adoption of these results without having to run the SAK inference code locally, we provide analytic formulas to derive the average number density along the LOS from the ratios, based on the internally heated model. This model reflects the most typical scenario for Galactic clumps (e.g. Urquhart et al. 2018). Notably, even isothermal sources follow similar relations between line ratios and n(H2), because these are relatively insensitive to variations in temperature. The only case where significant adjustments have to be made is the observation of hot cores with high-spatial resolution, for which the lines have much larger optical depths.
These expressions were obtained by fitting a curve to the maximum of the PDF in Fig. 3. To represent the relation between ratios and number density we used the function:
(6)
where R is the measured line ratio. The best-fit parameters for Eq. (6) and the applicability ranges of these approximations are given in Table 5. The fit was performed through the “curve_fit” function of scipy.optimize (Virtanen et al. 2020). The fit used the limits in density listed in Table 5.
Not accounting for observational uncertainties, this is accurate to a factor of two at the 1σ level, for densities ≳5 × 104 cm−3 when one uses the ratios of lines in the (2K − 1K) band. The use of these relations for ratios computed with lines in the (5K − 4K) and (7K − 6K) bands requires a priori knowledge of whether we are in a high-(≳ 105 cm−3) or low-density regime (2 × 103 cm−3 ≲ n(H2) ≲ 105 cm−3) to avoid degeneracy. The formal uncertainties are within a factor of two for the (5K − 4K)-band ratio for both density regimes, and for the high-density regime of the one computed with the lines in the (7K − 6K) band. The latter ratio in the low-density regime is associated with a larger uncertainty of a factor ≈4.
5 Application to galactic massive clumps
We use our newly derived calibration for the methanol line ratios to investigate the average number density in a sample of massive star-forming clumps in different evolutionary phases. Indeed, despite some indications that the density progressively increases up to the development of a compact H II region, it is still uncertain how the average number density of high-mass clumps evolve with time (Elia et al. 2017; Giannetti et al. 2017b; Tang et al. 2018; Elia et al. 2021; Urquhart et al. 2022).
5.1 Observations and data processing
We selected sources in the TOP100 sample (Giannetti et al. 2014) of the ATLASGAL survey (Schuller et al. 2009). This sample contains around 110 massive clumps spanning the entire evolutionary sequence, selected to be the brightest per class in the entire ATLASGAL survey. The TOP100 has been extensively followed up with spectroscopic observations, covering more than 120 GHz of bandwidth in unbiased surveys in the 3 mm, 1.4 mm and 0.9 mm windows, and with observations of specific lines of interest (e.g. Tang et al. 2018; Navarete et al. 2019; Lee et al. 2022). Therefore, the sample is extremely well characterised (e.g. Csengeri et al. 2016; König et al. 2017; Giannetti et al. 2017b; Wienen et al. 2021), and is representative of the Galactic protocluster population.
Based on their IR- and radio-continuum properties (König et al. 2017), the TOP100 sample has been classified into four evolutionary stages. The four classes, ordered from the least- to the more evolved, are:
Quiescent: sources with no compact emission at 70 μm in Hi-GAL data;
Protostellar: clumps associated with a compact source at 70 μm. These sources are either not detected in MIPSGAL at 24 μm (Carey et al. 2009) or with an integrated flux lower than 2.6 Jy, corresponding to a B3 star at 4 kpc (Heyer et al. 2016);
MYSO: objects associated with a mid-IR source with a flux in excess of 2.6 Jy in MIPSGAL or MSX data (if the former is saturated);
HII: compact H II regions revealed by radio-continuum emission at 5- or 9 GHz. We considered the CORNISH (Hoare et al. 2012; Purcell et al. 2013) and RMS surveys (Urquhart et al. 2007, 2009), as well as targeted observations towards methanol masers (Walsh et al. 1998). Note that at this stage dispersal is just beginning and the clump envelope is still very massive.
This evolutionary sequence has been confirmed in terms of increasing temperatures, luminosity-to-mass ratios and incidence of hot cores (König et al. 2017; Giannetti et al. 2017b; Wienen et al. 2021), decreasing CO depletion (Giannetti et al. 2014), as well as chemical composition in general (Sabatini et al. 2021). Similar results in terms of temperatures, luminosities, and luminosity-to-mass ratios were obtained on a larger scale using the entire ATLASGAL sample (Urquhart et al. 2018, 2022).
For our experiment we focus on the sub-sample for which we have the lines listed in Table 1 (31 sources: 3 Quiescent, 10 Protostellar, 11 MYSO, and 7 HII). This is ideal, because we have all the lines we need and a very reliable determination of the evolutionary stage of the clumps, so that we can investigate if and how the average density changes with time. Although it is true that the sub-sample is small, an Anderson-Darling test confirms that, considering the sources with 300 M⊙ < M < 104 M⊙, the clumps for each evolutionary phase are drawn from the same population in terms of mass, diameter and distance (p-values 0.16, 0.1, and 0.25, respectively) as the others, and therefore there is no bias in the samples we use.
Details of the observations and data reduction are given in Csengeri et al. (2016); Giannetti et al. (2014, 2017b) and Sabatini et al. (2021). Lines were fitted with a Gaussian curve, to derive their peak brightness temperature, their integrated emission, and the rms noise of the spectrum. These parameters of the line emission is reported in Tables E.1 and E.2.
Sources included in the proof-of-concept, classification, distance, mass, and number density results.
5.2 Compression of clump-scale material
One thousand realisations of the intensities are generated according to the fit results, and the ratios are computed. When a line is not detected, the intensity is generated following a Gaussian distribution with mean and standard deviation equal to the measured RMS noise. From these ratios the average number density along the LOS is extracted, with its uncertainty, according to the procedure delineated in Section 2.5. The different ratios are combined as summarised in Figure 2 to obtain the final posterior for the density that best represents all of the data. The best-fit results and the uncertainties are listed in Table 6, together with the classification, the mass and distance of the sources. Despite the fact that the observations of the (7K − 6K) band have a ~35% smaller beam (~19″) than the (2K − 1K) and (5K − 4K) bands (~28″), each of the ratios is computed between lines with identical beams. When one combines all the ratios for the fit, there might be a small discrepancy due to the fact that lower-density material enters the larger beams of the (2K − 1K) and (5K − 4K) bands, lowering somewhat the average density along the LOS with respect to what is traced by the I(7−1 − 6−1)/I(70 − 60) line ratio in the (7K − 6K) band. The effect of the difference in beam sizes is expected to be small. This is mainly because the I(7−1 − 6−1)/I(70 − 60) ratio has a lower predictive power for density. Additionally, the material at the edge of the beam has a limited influence on the overall average due to the lower beam response in the integrated flux and lower column density. Indeed, a run without the I(7−1 − 6−1)/I(70 − 60) ratio shows that results are unaffected for the ~85% of sources, with the remaining ~15% showing variations of around 10% of the best-fit density value.
Figure 8 shows the comparison of the number density derived from the continuum emission, as reported in Giannetti et al. (2017b), assuming a spherical geometry, and those derived with SAK. The former estimate of the number density is obtained by dividing the peak column density by the diameter of the source, as derived from the circularization of the FWHM of the ATLASGAL continuum emission. The uncertainties are reported only for SAK, because they are extremely difficult to quantify accurately the density is derived from the dust continuum emission, using the assumption of spherical geometry. This estimate is affected by the unknown intrinsic morphology of the fragments, gas-to-dust ratio (Giannetti et al. 2017a), the dust properties (cf. Giannetti et al. 2014), temperature distribution and uncertainty (Elia et al. 2017, 2021). The uncertainty is at least a factor of three, considering only the unknown size along the LOS. The figure shows that the CH3OH-based estimates are always higher than those derived from dust continuum, with a discrepancy within a factor ~2–5. Above n(H2) ≈ 105 cm−3, the two measures become more correlated, with a relatively constant difference of a factor of ~2.
By combining the measurement of column- and number densities, it is possible to have an independent estimate of the extension of the object along the LOS. Our approach therefore potentially provides a detailed view of the 3D structure of molecular condensations, allowing for pixel-by-pixel analysis and yielding insights into both individual objects and broader statistical samples. We stress that the correlation between the ratios and the density along the LOS (Fig. 3) and the fact that it does not depend on the detailed distribution of the material (Appendix C.3 and Fig. C.3) indicate that we can use our results also if the clumps are not strictly spherical. Thus, the systematic discrepancy in the number densities derived using the dust continuum and the independent ones from SAK is possibly suggesting that the effective radius is not a good representation of the size of the clump along the LOS. The higher density measured with SAK indicates that the size along the LOS is shorter than the size derived in the plane-of-the-sky. If we consider the typical aspect ratio of 1.5–2 of ATLASGAL clumps (Urquhart et al. 2014), the size along the LOS would actually be comparable to the projected minor axis. The discrepancy increases for sources with an average density n(H2) < 105 cm−3, as estimated from the dust continuum. In these clumps the density shows a median difference of a factor ~5 compared to the CH3OH-based estimates. Such a large discrepancy suggests that these young sources, less affected by feedback from YSOs, may even be more elongated compared to the other clumps.
Figure 9 summarises the temporal evolution of the number density. The number density of H2 increases with evolutionary stage, indicating that the clump contracts and concentrates with time, under the action of gravity. Over the course of the early evolution, lasting around 5 × 105 yr (e.g. Urquhart et al. 2018; Sabatini et al. 2021), there is a noticeable increase in the median density within the star-forming clumps. The typical density rises from around 2 × 105 cm−3 to approximately 1.5 × 106 cm−3, showing an acceleration in the later stages. We used again an Anderson-Darling test to confirm that the four samples are drawn from different population, having a p-value lower than 0.001. This result remains valid even after removing the H II regions from the samples, or comparing only Protostellar clumps and those hosting a MYSO (p-value ~0.01 in both cases), which confirms that the density evolves with time, even though by a limited amount in the first stages of high-mass star formation. Of course, larger samples are needed to better quantify typical properties per stage, and firmly characterise the amount of compression among them on pc scales, but this demonstrates the power of the proposed method on scales where such measurements are notoriously difficult.
A similar acceleration is also observed in the evolution of the luminosity-to-mass ratio, as found by Urquhart et al. (2022) from the analysis of the full ATLASGAL sample. Such an acceleration is typical of free-fall, and, in general, of scenarios with accretion rates that increase over time. If one computes the free-fall time of an early clump with a number density of 105 cm−3 and compares it to the statistical lifetimes of the various phases, taken from Sabatini et al. (2021), one finds that the former is around five times shorter than the latter. Considering that ATLASGAL clumps are already relatively compact, a fluctuation with a number density of 104 cm−3 would have a free-fall time within a factor of two of the chemical and statistical lifetimes. This, together with the moderate and accelerating compression from cold, quiescent sources to those hosting a very compact H II region, is an important point to test in theories of high-mass star- and cluster formation. In fact, depending on whether the collapse proceeds inside-out or outside-in the evolution of the clump average density could proceed differently. On the one hand, in a clump that starts collapsing from the outside, the inner Jeans-stable layers are crushed by the outer ones falling on them, progressively increasing the average density (Vázquez-Semadeni et al. 2019). On the other hand, if the clump is in near equilibrium, has a fixed mass, and the collapse starts in the inner layers (Shu 1977), the average density is constant, or could decrease due to the material deposited onto the protostars.
![]() |
Fig. 8 Comparison of the number density derived from the dust continuum emission, assuming spherical geometry, and the number density derived with the method proposed in this work. The red line indicates the 1:1 correspondence. |
6 Summary and conclusions
This study has developed a straightforward framework for constructing and post-processing simplified yet realistically accurate models of molecular gas condensations. The framework provides an easy-to-use, accurate method for estimating number densities of molecular hydrogen. This method relies on methanol, a widespread, abundant species and some of its most common and accessible emission bands, around 96.7, 241.7, and 338.2 GHz (see Table 1). We aim to improve accuracy when deriving the physical parameters of molecular clumps by moving away from the standard, yet uncertain, method of estimating the number density from column density or mass measurements based on size assumptions from plane-of-sky projections. Our approach helps balance the process: it reduces the heavy reliance on temperature and spherical geometry assumptions in these calculations.
The results demonstrate that the selected line ratios, especially those at three millimetres, are robust against variations in temperature and density distributions along the LOS. The lines in the (2K − 1K) band, in fact, increase steadily with density in the range ~5 × 104−3 × 107 cm−3. This interval can be expanded to lower densities if all transitions are available, by combining the five line ratios analysed here. In this scenario, the minimum density that can be derived is as low as a few × 103, typical of entire molecular clouds. A comparison with uniform and isothermal clumps simulated with RADEX shows that for a given value of the ratios the discrepancy with the average number density along the LOS can reach two orders of magnitude, which shows the importance of including the clump envelope in the modelling, in the form of density gradients.
In Sect. 4, we provide an analytical approximation of the relation that we find between each line ratio and the average number density along the LOS. The limits of validity and typical model uncertainties (not accounting for observational ones) are also provided. While the analytical approximations provide a less precise estimate of the uncertainty compared to the full SAK code, they can be used to reconstruct a first estimate of the density structure from line ratio maps, without local code execution.
The main limitation of the method arises from the impact of molecular abundance on optical depth, which can shift the relationship. An increase in abundance by a factor of 100 results in an average decrease in the inferred density of a factor ≈15. Such a large jump is typical of regions affected by strong shocks, for example those associated with outflows. Alternatively, it can be found in regions with high temperatures causing desorption from grains, as it happens in hot cores. Otherwise, the abundance of methanol remains relatively constant in the cool envelope (e.g. Kristensen et al. 2010; Gerner et al. 2014; Giannetti et al. 2017b; Sabatini et al. 2021). In addition, under typical clump or core conditions, and more in general, for sufficiently high densities, the abundance of CH3OH can be estimated via a simple LTE analysis, which enables the selection or up-weighting of the best-fitting grid. For a higher level of precision, optical depths can be estimated, and corrected for, when computing ratios by observing the corresponding lines of 13CH3OH. In this case, using a run with low abundance to interpret the results would be preferable, because the impact of the optical depth would already be accounted for.
We applied our method to the ATLASGAL TOP100 dataset to investigate the evolution of number density in high-mass clumps, and we selectively focused on sources where both IRAM 30 m telescope- and Atacama Pathfinder Experiment (APEX) data were at our disposal. These clumps showed a spectrum of average densities at the dust emission peak spanning from approximately 2 × 105 to around ≈2.5 × 106 cm−3. As the clusterforming objects evolve, the density exhibited a remarkable, gradual ascent, which gained momentum with the progression of their life cycle, as depicted in Figure 9. This behaviour agrees with the theoretical expectations of an outside-in collapse, where the clumps are initially relatively uniform and slowly increase their average density under the compression of the infalling envelope. The timescales of the whole cluster-formation process as measured through statistical considerations (Urquhart et al. 2018) or the chemical evolution (Sabatini et al. 2021) are also roughly consistent with the free-fall time of the early clump, in agreement with the acceleration of the compression with evolution.
When we compare the densities from SAK and those made under the assumption of a spherical geometry (Fig. 8), we find indications that clumps might be elongated structures, with a size along the LOS comparable to the projected minor axis of the clumps, given their typical aspect ratio of 1.5–2. This behaviour is even more pronounced in the earliest stages of evolution.
Thanks to the robustness of specific ratios (I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), I(21 − 11)/I(2−1 − 1−1), I(5−1 − 4−1)/I(50 − 40), and I(7−1 − 6−1)/I(70 − 60) were considered in this study) to temperature variations and the distribution of material along the LOS, we have calibrated a reliable relationship between these ratios and the LOS-averaged number density. This provides a straightforward method to independently measure the number density of molecular gas in various astrophysical contexts, from high-resolution observations in external galaxies to clumps, cores, and discs in the Milky Way.
The primary advantage of SAK is its ability to reliably estimate the average number density along the LOS using both quick analytical relations and a more accurate numerical method based on multiple line ratios. This approach eliminates the need to create a grid of LVG models and remains robust to variations in temperature and density gradients. Additionally, SAK facilitates the creation of LVG grids for different use cases, and leverages distributed systems for speed. It also simplifies user experience through effortless installation with containerization and zero-code configuration. Our framework can be easily extended to handle more complex models. We plan to include this feature to enable postprocessing of fully fledged astrochemical simulations of clumps and cores.
![]() |
Fig. 9 Left: empirical cumulative distribution functions of the best-fit density for each evolutionary class in the TOP 100 sample. Right: violin plot of the bootstrapped density distribution per class; 10 000 samples have been drawn for each source from the inferred number density posterior. The combined PDF, determined via a KDE smoothing, is shown around the quartiles and whiskers of a box plot. |
Acknowledgements
We thank the referee for their constructive comments that greatly helped in the improvement of this work. This publication is based on data acquired with the Atacama Pathfinder Experiment (APEX). APEX is a collaboration between the Max-Planck-Institut fur Radioastronomie, the European Southern Observatory, and the Onsala Space Observatory. This work also make use of observations carried out with the IRAM 30 m Telescope, under project number 181-10. We acknowledge financial support PRIN-INAF 2019 From yOuNg Star clusters to planETary systems (ONSET) (P.I. Leurini).
Appendix A SAK Detailed infrastructure and automation
In the following, we provide additional information on how SAK is built, on what it can do, and on its internal workings. We first describe the Extract-Transform-Load (ETL) pipeline, which contains all of the SAK logic and code. Subsequently, we discuss how the application is organised as a whole.
The Python ETL pipeline consists of three different layers:
staging (stg);
model (mdl);
presentation (prs),
which correspond to the “classic” extraction, transformation, and loading of data. The pipeline is defined and orchestrated in a dedicated Python file that executes the process for a grid of parameters. It is configured with external files in YAML format, a global one, and one for each of the ETL layers. The available parameters are documented in a markdown file in the code repository3.
In the staging layer, a PostgresSQL database (DB) for storing and structuring all the data is initialised, that is, the tables are created (see Fig. A.1), and the RADMC 3D input files are generated to reflect the model physical grid of parameters. The complexity of the models created depends on the parameters specified in the staging configuration file. The postprocessing input files are modified automatically to reflect the desired configuration. For example, if the “star” section is defined, then one or more heating source(s) are placed in the grid, and the temperature distribution is recomputed based on their energy output, via the RADMC 3D “mctherm” command. Similarly, a step-function-like profile for the abundance can be adopted instead of a constant one, to simulate the evaporation of molecular species from grains in hot cores. In this case, the user has to define a threshold temperature and the factor by which the abundance of a species is increased above this threshold. Molecular data can retrieved automatically from the LAMDA database and stored in the execution directory. It is also possible to use custom molecular levels, transition, and collisional coefficients data, by specifying that the molecular data file is cached, and by providing it to the system. Once all the input files for RADMC 3D are created, they are compressed and stored in an archive. The name of the compressed archive is stored into the database for future reference. All of the grid parameters are saved into the DB.
![]() |
Fig. A.1 Entity-relation diagram created by the SAK tool. A description of the individual tables and columns can be found in the repository documentation. |
In the model layer RADMC 3D is executed to compute the synthetic spectrum including one or more line of one or more molecular species (and optionally the temperature distribution is computed and updated in the compressed archive of the input files), the output is created and converted to fits format. The postprocessing computation mode (LTE, LVG, optically thin LVG, as offered by RADMC 3D) can be selected directly from the input files, for example to investigate the impact of non-LTE excitation. The cube name and the model parameters are saved into the database.
Finally, in the presentation layer, the cubes are collapsed to compute the moment-zero maps, the ratio maps are derived, and the ratios themselves are aggregated over the entire map. All the relevant information is stored in the database in the appropriate table (cf. Fig. A.1).
The pipeline has been containerised both with Docker (Merkel 2014) and Apptainer (formerly Singularity; Kurtzer et al. 2017, 2021), to facilitate the execution of the system by different people on different machines. The application is divided into two services: the database, responsible for storing and structuring the data, and the ETL service, which encompasses the code itself. These two services are orchestrated by docker compose, a tool for defining and running multi-container applications4. The docker-compose.yaml file is available in the repository. The Dockerfile for the ETL container is provided, and the image can be retrieved from the repository. The Docker image can be imported in Apptainer for the execution on remote, distributed systems that do not support Docker, due to security issues. Indeed, a key security consideration is that Docker typically requires root access to function, which can be a limitation in some environments. We provide a script to facilitate this conversion step. The Apptainer definition for the database is also provided, so that this container can be created and run before executing the code itself.
Appendix B Kernel density estimates computation
To obtain the KDE of the ratio-number density relations, an Epanechnikov kernel (inverted parabola) was used. Bandwidths for the kernel were selected independently for average density and line ratio. The average densities along the LOS are not uniformly spaced in the diagram. Instead, they tend to cluster around the characteristic value of the density at 0.5 pc. Due to the model spacing in this parameter, using small bandwidth values can lead to larger values of the PDF at these specific density values (cf. Sect. 3.1). This could artificially introduce a bias in the density inference, and bandwidths were chosen to reduce this effect. When we did not consider ML-data augmentation, we adopted a bandwidth of 0.4 in logarithmic space for the number density axis for all ratios; for the line ratio axes we used 0.2 for I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1) ratios, and 0.8 and 0.4 for I(5−1 − 4−1)/I(50 − 40) and I(7−1 − 6−1)/I(70 − 60) ratios, respectively. Adding the data emulated though ML we could reduce these bandwidths by a factor of two, improving the final uncertainties in the inferred density. In general, larger values of the bandwidth reflect the lower sensitivity of specific ratios to the number density, according to the Spearman correlation coefficients listed in Table 4. In fact, the use of a larger kernel for less sensitive ratios reduces their weight on the final result, because the final posterior is wider. To reduce local variations and oscillations in figures showing the number density along the LOS (Figures 3, 6, C.3, C.4, C.5), the curve representing the maximum location of the PDFs, and the corresponding uncertainty, were smoothed using a moving average of 20 points.
Appendix C Detailed description of individual experiments
In the following the individual experiments and their results are described, as well as how they fit with one another.
C.1 Experiment 1 - Constant density and temperature
Our first experiment was designed to be simple, essentially replicating the setup of Leurini et al. (2004). We updated the collisional coefficients and narrowed the temperature range to be more representative of average values in high-mass clumps (Urquhart et al. 2018; Elia et al. 2021). The gas was distributed in a spherical fashion, with constant density and temperature. The usual grid of 630 clumps is built, similar to the one for the fiducial models described in Sect. 2.3, but with p = q = 0 in Equations 1 and 2. This experiment essentially acted as a first, quick, benchmark for the SAK architecture.
The results (Fig. C.1) are in qualitative agreement with those reported in Leurini et al. (2004). In particular, we compared directly the ratios I(20 − 10)/I(2−1 − 1−1) and I(5−1 − 4−1)/I(50 − 40) for average column densities of methanol of the order of 1015 − 1016 cm−2. This shows that the experiment architecture is solid, and confirms that methanol lines can be used to estimate the number density of H2, in principle with very little dependence on T. Moreover, it shows that the specific value of the collisional coefficients has limited influence on the general behaviour of the ratios.
Being such a widespread molecule, CH3OH could become the tracer of choice for measuring local values of the number density, both in our Galaxy and in external, nearby galaxies. However, the real world is more complex, and the method needs to be calibrated for ease of use, and its robustness to density variations along the LOS, as well as to other model parameters, has to be evaluated.
C.2 Experiment 2 - Isothermal clumps with power-law density distributions
The interstellar medium is not uniform, and a probe working only for constant densities would lose its usefulness in real-world applications. The values of the line ratios should depend the average value of n(H2) along the LOS, ideally not changing exceedingly for different distributions of the gas. To verify whether this is the case, we constructed three grids of models, similar to those described in Sect. 2.3, but with a constant temperature, that is, we used q = 0 in Eq. 2.
![]() |
Fig. C.1 Integrated line ratios over the entire clump as a function of temperature and characteristic number density for the isothermal, constant-density case, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). |
![]() |
Fig. C.2 Integrated line ratios over the entire clump as a function of temperature and number density for the isothermal case with a density distribution ∝ r−1.5, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). |
Again, to have an overview of the ratio behaviour in the clump (relevant for the observations of unresolved clumps or clouds, too), we show in Fig. C.2 the aggregated ratio as a function of temperature and characteristic density at 0.5 pc. The 96.7 GHz line ratios have a monotonic behaviour, increasing for larger characteristic densities, which is proportional to the average density of the clump. The ratios also show very weak dependence on the temperature of the gas.
Compared to the 96.7 GHz lines, higher-frequency lines have higher rates of radiative de-excitation. Consequently, for the same characteristic number density and temperature, they reach higher optical depths. They also include lines with energies above the ground state that can be significant compared to the gas temperature. This causes a more complex behaviour of the ratios that first increase and then decrease approaching one when both the lines become heavily optically thick. Relying solely on these lines to infer n(H2) in typical clump density regimes is, thus, less effective. This is because, excluding the peak ratio value, each ratio can correspond to two distinct density values, leading to ambiguity. However, if only those lines are available, the use of a prior may help in breaking the degeneracy. Such ratios are also sensitive to lower densities than those at 96.7 GHz and can be very useful when tracing gas on larger scales, for example in other galaxies, or to complement what is seen at 96.7 GHz, for the external layers of the clump.
The extremely interesting result here is that the values of the ratios are essentially insensitive to the density distribution for different values of p, for the lines at 96.7 GHz, which are confirmed to be the best tracers of number density. So integrated values of this ratio can be used to infer average densities ≳ 5 × 104cm−3. Combined 241.7 GHz lines, this can be pushed down to ~few × 103cm−3.
We can obtain more from these line ratios than the clump-averaged density. Because of how the system is built, we don’t just have access to the clump-scale ratios, but we also have the pixel-by-pixel maps of the ratios and of the LOS-average of the physical properties. So it is possible to test whether the relation holds for individual pixels, that is, at scales much smaller than the clump itself, and independently of the detailed geometry.
We followed the procedure described in Sect. 2.3, and the results are shown in Figure 3. All 96.7 GHz line ratios show a tight correlation with the average density along the LOS. As in the case of integrated ratios, no appreciable difference is seen among models with different density distributions, indicating that the relationship is very robust, at least for isothermal clumps. Again, the information from the 241.7 GHz and 338.1 GHz bands can be used to increase the applicability regime of the 96.7 GHz ratios to ~few × 103 cm−3.
C.3 Experiment 3 - Internally heated clumps
Observations and theoretical models both indicate that active SF significantly alters clump temperatures on large scales, resulting in power-law-like distributions (e.g. van der Tak et al. 2000b; Lin et al. 2022). A steep temperature profile such as the one we used (q = −0.5) maximises the potential variations in the line ratios. Because in Sect. C.2 we considered isothermal clumps, that is, with a power law exponent of 0, we encompass two extreme cases for the temperature distribution, maximising the effects on the ratios, exactly as we did for the density profiles.
Figure 5 shows the integrated ratios over the clump for the three different grids. The ratios show little dependence on the density distribution slope, especially at 96.7 GHz, and have a similar dependence on gas temperature. For higher frequency lines, the ratios distribution becomes broader along the number density axis for steeper density gradients. This makes sense, because the material inside the scaling radius is denser for steeper gradients, and therefore the ratios are higher, on average.
When one looks at the ratio as a function of the average value of the number density along the LOS (Fig. 3), though, the only difference compared to the isothermal case is a marginally larger dispersion of the points, which increases with frequency. This larger dispersion is virtually invisible in the density plots, because only a small fraction of the points shows this behaviour, and the kernel is relatively large. Temperature gradients are the responsible for this effect, but their importance remain of second order. That said, the constraints on the number density can be tightened if we also compare line ratios sensitive to the temperature. While this is important, we want to maintain the method as simple as possible, and adding such information would make our results much more model-dependent.
Different grids also show no appreciable differences in the LOS-averaged plots (Fig. C.3). The larger average values of the integrated ratio for steeper grids are therefore driven by the larger average density of the clumps, rather than by the distribution of the material along the LOS. This confirms that the proposed indicators are indeed sensitive to the number density, irrespective of how the material is distributed.
![]() |
Fig. C.3 Comparison between the relations when using a different power-law distribution for density. The green-shaded region corresponds to p = −1.2, the yellow one to p = −1.5, while the blue area refers to p = −1.8. |
C.4 Experiment 4 - Simulating evaporation from grains
Our understanding of methanol chemistry remains incomplete. In fact, accurately predicting the abundance of CH3OH, and related molecules such as H2CO, is a known weakness of current astrochemical models (Gerner et al. 2014; Sabatini et al. 2021). Despite this shortcoming, the qualitative behaviour can be reproduced. We know that CH3OH is formed onto dust grains, and that its abundance is boosted in the gas phase by around two orders of magnitude in hot cores, following evaporation from the grains, above T ~ 90 K (e.g. Giannetti et al. 2017b; Sabatini et al. 2021). We approximate this behaviour with a step function abundance profile. The abundance is increased by a factor of 100 above the threshold in temperature. Again, the same three grids of models with different density distributions are created.
The results from this experiment are particularly interesting, and suggest the direction for additional tests. Specifically, the central pixels, where the LOS passes through the hot-core-like region, exhibit higher ratios. This increase is not due to higher densities, but rather to the increased opacity of the lines. For very optically thick lines the ratio would be close to one for the line ratios considered (Fig. C.4). While this seems to make density inference more complicated, these points are separable in the plot by measuring the average abundance along the LOS. The optical depth also affects different bands differently, because of the larger spontaneous emission rates at higher Js, as mentioned earlier. In addition to this possibility of breaking the degeneracy, only very few points end up outside the relationship found for the constant abundance case. The net effect is to increase somewhat the potential uncertainty on the inferred number density, because the probability would be more spread out. The behaviour of the ratios for high abundances highlights the sensitivity of any single ratio to the line opacity, and therefore to the average abundance of the species, as well as its line width.
![]() |
Fig. C.4 Kernel density estimate of the probability density function for the line ratios as a function of temperature and average number density along the LOS, for the internally heated clump case with a density distribution ∝ r−1.5 and a hot-core-like abundance profile, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). |
C.5 Experiment 5-Optical depth
In this final check we wanted to go deeper on the impact of the line opacities on the ratio-number density relation. We constructed grids where the constant abundance is shifted one order of magnitude up and down from the fiducial value of 10−9, one where the microturbulence is doubled, and two with a hot-core abundance profile, where, again, the abundance is scaled up and down by a factor of 10.
Doubling the microturbulence level has a limited effect; it only slightly delays the ratio saturation at the constant value associated with high optical depths. This effect is noticeable at high number- and column densities (cf. Fig. C.5). The I(21 − 11)/I(2−1 − 1−1) is the most affected, with a discrepancy in the maximum location of slightly more than a factor of 2 at densities ≈107 cm−3. A larger effect can be seen when the constant abundance is rescaled, as discussed in Sect. 3.
In all cases, larger abundances make larger values of the ratio correspond to lower number densities, because lines are efficiently excited over a larger area of the clump, corresponding to lower densities. Also in this case, roughly estimating the average LOS fractional abundance allows to work with an appropriate set of models (cf. Sect. 2.5). We finally note that very large line ratios between high-frequency lines can only be reached if methanol abundance is high. This effect can be used, if we are in the right density regime, to add constraints to the abundance itself.
Assuming a hot-core-like abundance profile has the same consequences observed for the fiducial value of the abundance, discussed in Sections 3, and C.4. We therefore do not report any results for this case.
![]() |
Fig. C.5 Kernel density estimate of the probability density function for the line ratios as a function of temperature and average number density along the LOS, for the internally heated clump case with a density distribution ∝ r−1.5, and a doubled microturbulence level, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). The results for the fiducial values of microturbulence are shown in yellow, for reference. |
Appendix D Construction of the ML-emulation algorithms
In this Section we provide details on the training procedure of the ML algorithms. We describe how we prepared the data for training, how we trained the algorithms, what performances we reached and which algorithm was selected.
D.1 Data preprocessing
Initially, we preprocessed the data by applying a base-10 logarithm transformation. This transformation was applied to several parameters: the average number density and temperature along the LOS, the molecular column density, the reference number density and temperature, the standard deviation of the molecular number density along the LOS, and finally, either the moment zero or the line ratio (depending on the model). We chose the logarithmic transformation because these parameters span several orders of magnitude, which can hinder the algorithm’s learning efficiency. All these properties must always be larger than zero, another advantage of this transformation.
To augment the training data, we employed a binning strategy in the target variable to add columns evaluating the similarity of the input features. First, we divided the input data into 50 bins based on the value of the target variable. Within each bin, we calculated the average input features. Subsequently, for each data sample, we computed the cosine similarity between its input features and the average input features of its corresponding target bin. This approach is based on the assumption that similar input features should lead to similar output values. We finally applied a Quantile transformation with uniform output to the data, prior to training.
D.2 Model training and selection
The dataset was divided into training, development, and validation sub-samples. To ensure an accurate evaluation of the model’s ability to generalise to new characteristic densities, especially in the context of recreating moment zero maps for varying densities, we deliberately avoided random selection for the development and validation sets. We left out all the models with a specific characteristic number density at 0.5 pc, so that the performance of creating a moment zero map for a never-seen value of this parameter could be estimated accurately. The development and validation data have been arbitrarily chosen to be those with characteristic number densities of 2.7 × 104 cm−3 and 2.187 × 106 cm−3, respectively. Our baseline was created using an AutoSklearnRegressor from the autosklearn package (Feurer et al. 2015, 2022), and we used the moment zero as the target. This ensemble of models reaches an MSE of 10−2 − 10−1 for the prediction of the logarithm of the moment zero, depending on the line.
Given the limited dataset size (approximately 6 × 105 data points per grid), we opted to focus on decision-tree ensemble methods, such as Random Forest and XGBoost, rather than exploring deep neural network architectures. Decision-tree ensembles offered a faster hyperparameter tuning cycle and are known to provide robust results with datasets of this size. A grid search with a threefold cross-validation was carried out for a random forest regressor and an XGBoost (Chen & Guestrin 2016) to optimise the hyperparameters. For these kind of models, we also applied sample weights, based on the frequency on the target variable, computed by a KDE smoothing on the training set, using an Epanechnikov kernel with a bandwidth of 0.1 in the moment-zero log-space. Performances are finally evaluated on the validation dataset, which also helps to verify that the model is correctly capable of generating new images from scratch. The kind of model with the best performances is then selected on the basis of our validation dataset. The best performances were obtained with the XGBoost, reaching an MSE of 10−4 − 10−2, for our fiducial model and the same target as the baseline model. This model improved the accuracy of our emulated data between one and two orders of magnitude compared to the baseline model, depending on the specific line considered. The best-fit hyperparameters, as well as the scripts to reproduce the results are provided in the code repository (see Sect. A).
We then repeated the same training procedure, but this time we used the line ratios as the target variable. This approach was adopted to prevent the accumulation of uncertainties associated with predicting each line individually before calculating their ratio. The noise in the emulated ratios is further reduced, particularly for ratios calculated using lines in the (5K − 4K) and (7K − 6K) bands at 241.7 GHz and 338.2 GHz. For inferring the densities, we therefore rely on these models, which are the most precise. Those predicting the moment zero can still be useful to fine-tune calculations for specific sources; because the KDE contours are very similar between the two kinds of models, the description of the results holds also in this case.
Appendix E Line fitting results
Tables E.1 and E.2 present the line properties of the lines in the (2K − 1K) and (5K − 4K) methanol bands, used in Sect. 5 to compute the line ratios and infer the density for the TOP100 sources.
Line properties of the lines in the (2K − 1K) methanol band.
Line properties of the lines in the (5K − 4K) and (7K − 6K) methanol bands.
References
- Askne, J., Hoglund, B., Hjalmarson, A., & Irvine, W. M. 1984, A&A, 130, 311 [NASA ADS] [Google Scholar]
- Beuther, H., Schilke, P., Menten, K. M., et al. 2002, ApJ, 566, 945 [Google Scholar]
- Boogert, A. C. A., Pontoppidan, K. M., Knez, C., et al. 2008, ApJ, 678, 985 [Google Scholar]
- Boogert, A. C. A., Huard, T. L., Cook, A. M., et al. 2011, ApJ, 729, 92 [NASA ADS] [CrossRef] [Google Scholar]
- Boogert, A. C. A., Gerakines, P. A., & Whittet, D. C. B. 2015, ARA&A, 53, 541 [Google Scholar]
- Bovino, S., Lupi, A., Giannetti, A., et al. 2021, A&A, 654, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Carey, S. J., Noriega-Crespo, A., Mizuno, D. R., et al. 2009, PASP, 121, 76 [Google Scholar]
- Caselli, P., Walmsley, C. M., Tafalla, M., Dore, L., & Myers, P. C. 1999, ApJ, 523, L165 [Google Scholar]
- Caselli, P., Walmsley, C. M., Zucconi, A., et al. 2002, ApJ, 565, 344 [Google Scholar]
- Chen, T., & Guestrin, C. 2016, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 (New York, NY, USA: ACM), 785 [Google Scholar]
- Ciolek, G. E., & Basu, S. 2006, ApJ, 652, 442 [NASA ADS] [CrossRef] [Google Scholar]
- Csengeri, T., Leurini, S., Wyrowski, F., et al. 2016, A&A, 586, A149 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Cummins, S. E., Green, S., Thaddeus, P., & Linke, R. A. 1983, ApJ, 266, 331 [NASA ADS] [CrossRef] [Google Scholar]
- Dullemond, C. P., Juhasz, A., Pohl, A., et al. 2012, RADMC-3D: A multipurpose radiative transfer tool, Astrophysics Source Code Library [record ascl:1202.015] [Google Scholar]
- Elia, D., Molinari, S., Schisano, E., et al. 2017, MNRAS, 471, 100 [NASA ADS] [CrossRef] [Google Scholar]
- Elia, D., Merello, M., Molinari, S., et al. 2021, MNRAS, 504, 2742 [NASA ADS] [CrossRef] [Google Scholar]
- Feurer, M., Klein, A., Eggensperger, K., et al. 2015, in Advances in Neural Information Processing Systems, 28, 2962 [Google Scholar]
- Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., & Hutter, F. 2022, J. Mach. Learn. Res., 23, 1 [Google Scholar]
- Gerner, T., Beuther, H., Semenov, D., et al. 2014, A&A, 563, A97 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Giannetti, A., Brand, J., Sánchez-Monge, Á., et al. 2013, A&A, 556, A16 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Giannetti, A., Wyrowski, F., Brand, J., et al. 2014, A&A, 570, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Giannetti, A., Leurini, S., König, C., et al. 2017a, A&A, 606, L12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Giannetti, A., Leurini, S., Wyrowski, F., et al. 2017b, A&A, 603, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Goldsmith, P. F., & Langer, W. D. 1999, ApJ, 517, 209 [Google Scholar]
- Grim, R. J. A., Baas, F., Geballe, T. R., Greenberg, J. M., & Schutte, W. A. 1991, A&A, 243, 473 [NASA ADS] [Google Scholar]
- Henkel, C., Jacq, T., Mauersberger, R., Menten, K. M., & Steppe, H. 1987, A&A, 188, L1 [NASA ADS] [Google Scholar]
- Heyer, M., Gutermuth, R., Urquhart, J. S., et al. 2016, A&A, 588, A29 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Hoare, M. G., Purcell, C. R., Churchwell, E. B., et al. 2012, PASP, 124, 939 [Google Scholar]
- Humire, P. K., Henkel, C., Gong, Y., et al. 2020, A&A, 633, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Humire, P. K., Henkel, C., Hernández-Gómez, A., et al. 2022, A&A, 663, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Indebetouw, R., Brogan, C., Chen, C. H. R., et al. 2013, ApJ, 774, 73 [NASA ADS] [CrossRef] [Google Scholar]
- Kauffmann, J., Goldsmith, P. F., Melnick, G., et al. 2017, A&A, 605, L5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- König, C., Urquhart, J. S., Csengeri, T., et al. 2017, A&A, 599, A139 [Google Scholar]
- Kristensen, L. E., van Dishoeck, E. F., van Kempen, T. A., et al. 2010, A&A, 516, A57 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Krumholz, M. R., Dekel, A., & McKee, C. F. 2012, ApJ, 745, 69 [Google Scholar]
- Kurtzer, G. M., Sochat, V., & Bauer, M. 2017, PLoS ONE 12, e0177459 [CrossRef] [PubMed] [Google Scholar]
- Kurtzer, G. M., Clerget, C., Bauer, M., et al. 2021, https://doi.org/10.5281/zenodo.4667718 [Google Scholar]
- Lee, M. Y., Wyrowski, F., Menten, K., Tiwari, M., & Güsten, R. 2022, A&A, 664, A80 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Leurini, S., Schilke, P., Menten, K. M., et al. 2004, A&A, 422, 573 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Leurini, S., Schilke, P., Wyrowski, F., & Menten, K. 2005, in Astrochemistry: Recent Successes and Current Challenges, 231, eds. D. C. Lis, G. A. Blake, & E. Herbst, 99 [Google Scholar]
- Li, D., & Goldsmith, P. F. 2012, ApJ, 756, 12 [NASA ADS] [CrossRef] [Google Scholar]
- Li, P. S., Norman, M. L., Mac Low, M.-M., & Heitsch, F. 2004, ApJ, 605, 800 [NASA ADS] [CrossRef] [Google Scholar]
- Lin, Y., Wyrowski, F., Liu, H. B., et al. 2022, A&A, 658, A128 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Mangum, J. G., & Wootten, A. 1993, ApJS, 89, 123 [Google Scholar]
- Maret, S., Ceccarelli, C., Tielens, A. G. G. M., et al. 2005, A&A, 442, 527 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- McCarthy, T. P., Ellingsen, S. P., Breen, S. L., et al. 2020, MNRAS, 491, 4642 [NASA ADS] [CrossRef] [Google Scholar]
- Merkel, D. 2014, Linux J., 2014, 2 [Google Scholar]
- Mouschovias, T. C. 1991, ApJ, 373, 169 [NASA ADS] [CrossRef] [Google Scholar]
- Muraoka, K., Kondo, H., Tokuda, K., et al. 2020, ApJ, 903, 94 [NASA ADS] [CrossRef] [Google Scholar]
- Navarete, F., Leurini, S., Giannetti, A., et al. 2019, A&A, 622, A135 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Padovani, M., Galli, D., & Glassgold, A. E. 2009, A&A, 501, 619 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
- Pontoppidan, K. M., Dartois, E., van Dishoeck, E. F., Thi, W. F., & d’Hendecourt, L. 2003, A&A, 404, L17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Purcell, C. R., Hoare, M. G., Cotton, W. D., et al. 2013, ApJS, 205, 1 [Google Scholar]
- Rabli, D., & Flower, D. R. 2010, MNRAS, 406, 95 [Google Scholar]
- Redaelli, E., Bovino, S., Giannetti, A., et al. 2021, A&A, 650, A202 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Rigby, A. J., Moore, T. J. T., Eden, D. J., et al. 2019, A&A, 632, A58 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Roberts, H., & Millar, T. J. 2000, A&A, 361, 388 [NASA ADS] [Google Scholar]
- Roman-Duval, J., Federrath, C., Brunt, C., et al. 2011, ApJ, 740, 120 [NASA ADS] [CrossRef] [Google Scholar]
- Sabatini, G., Bovino, S., Giannetti, A., et al. 2021, A&A, 652, A71 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Sanhueza, P., Contreras, Y., Wu, B., et al. 2019, ApJ, 886, 102 [Google Scholar]
- Sanna, A., Giannetti, A., Bonfand, M., et al. 2021, A&A, 655, A72 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Schöier, F. L., van der Tak, F. F. S., van Dishoeck, E. F., & Black, J. H. 2005, A&A, 432, 369 [Google Scholar]
- Schuller, F., Menten, K. M., Contreras, Y., et al. 2009, A&A, 504, 415 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Sewiło, M., Indebetouw, R., Charnley, S. B., et al. 2018, ApJ, 853, L19 [CrossRef] [Google Scholar]
- Shimonishi, T., Watanabe, Y., Nishimura, Y., et al. 2018, ApJ, 862, 102 [NASA ADS] [CrossRef] [Google Scholar]
- Shirley, Y. L. 2015, PASP, 127, 299 [Google Scholar]
- Shu, F. H. 1977, ApJ, 214, 488 [Google Scholar]
- Tang, X. D., Henkel, C., Wyrowski, F., et al. 2018, A&A, 611, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Tassis, K. 2007, MNRAS, 379, L50 [Google Scholar]
- Tritsis, A., Tassis, K., & Willacy, K. 2016, MNRAS, 458, 789 [CrossRef] [Google Scholar]
- Urquhart, J. S., Busfield, A. L., Hoare, M. G., et al. 2007, A&A, 461, 11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Urquhart, J. S., Hoare, M. G., Purcell, C. R., et al. 2009, A&A, 501, 539 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Urquhart, J. S., Moore, T. J. T., Csengeri, T., et al. 2014, MNRAS, 443, 1555 [Google Scholar]
- Urquhart, J. S., König, C., Giannetti, A., et al. 2018, MNRAS, 473, 1059 [Google Scholar]
- Urquhart, J. S., Wells, M. R. A., Pillai, T., et al. 2022, MNRAS, 510, 3389 [NASA ADS] [CrossRef] [Google Scholar]
- van der Tak, F. F. S., van Dishoeck, E. F., & Caselli, P. 2000a, A&A, 361, 327 [NASA ADS] [Google Scholar]
- van der Tak, F. F. S., van Dishoeck, E. F., Evans, Neal J., I., & Blake, G. A. 2000b, ApJ, 537, 283 [Google Scholar]
- van der Tak, F. F. S., Black, J. H., Schöier, F. L., Jansen, D. J., & van Dishoeck, E. F. 2007, A&A, 468, 627 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Vázquez-Semadeni, E., Palau, A., Ballesteros-Paredes, J., Gómez, G. C., & Zamora-Avilés, M. 2019, MNRAS, 490, 3061 [Google Scholar]
- Vázquez-Semadeni, E., Gómez, G. C., & González-Samaniego, A. 2024, MNRAS, 530, 3445 [CrossRef] [Google Scholar]
- Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nat. Methods, 17, 261 [Google Scholar]
- Walmsley, C. M. 1987, in NATO Advanced Study Institute (ASI) Series C, 210, Physical Processes in Interstellar Clouds, eds. G. E. Morfill, & M. Scholer, 161 [Google Scholar]
- Walmsley, C. M., & Ungerechts, H. 1983, A&A, 122, 164 [Google Scholar]
- Walsh, A. J., Burton, M. G., Hyland, A. R., & Robinson, G. 1998, MNRAS, 301, 640 [Google Scholar]
- Wienen, M., Wyrowski, F., Walmsley, C. M., et al. 2021, A&A, 649, A21 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Wilner, D. J., Welch, W. J., & Forster, J. R. 1995, ApJ, 449, L73 [NASA ADS] [CrossRef] [Google Scholar]
All Tables
Spearman correlation coefficients between the line ratios considered in this work, the LOS-averaged number density and temperature.
Sources included in the proof-of-concept, classification, distance, mass, and number density results.
All Figures
![]() |
Fig. 1 Example of the radial density and temperature profiles inside the clump, for n(H2, r0) = 105 cm−3 and T(r0) = 25 K. |
In the text |
![]() |
Fig. 2 Schema of the procedure to derive the final density posterior from the combination of several line ratios. Two distinct ratios are considered in this example, the relations of which with n(H2) are shown in the top row. After generating a number of ratios, according to observations and uncertainties, the PDFs of individual ratios are summed, and then the resulting distribution for each line ratio is multiplied to obtain the final number density posterior. |
In the text |
![]() |
Fig. 3 Kernel density estimate of the probability density function for the line ratios as a function of average number density along the LOS, for all the temperatures considered in the grid, for the model assumptions indicated at the top of each column. The rows show the ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), I(21 − 11)/I(2−1 − 1−1), I(5−1 − 4−1)/I(50 − 40), and I(7−1 − 6−1)/I(70 − 60), in order, from top to bottom. RADEX computations for an isothermal, uniform clump are shown as red stars; the parameters used are described in the text. Note that for the I(5−1 − 4−1)/I(50 − 40), and I(7−1 − 6−1)/I(70 − 60) ratios the RADEX points at low densities are above the plotting limits, and reach values around 25 in both cases. |
In the text |
![]() |
Fig. 4 Improvement in the final number density posterior when adding ML-emulated data. In blue, a source displaying oscillations in the posterior is shown for the RT-only case; in orange, the same source with the number density posterior evaluated on the finer grid, generated by our final ML model. The oscillations caused by grid sparsity disappear. |
In the text |
![]() |
Fig. 5 Comparison of the integrated line ratios for different density distributions, corresponding to the three grids considered in the case of internally heated clumps. The columns of plots refer to the values of p indicated at the top of each column. We only show the I(21 − 11)/I(20 − 10) for the 3 mm lines, because all three have the same behaviour. |
In the text |
![]() |
Fig. 6 Comparison between the internally heated clump with the fiducial E-CH3OH methanol abundance of 10−9 (yellow shaded areas), the low-abundance (E − CH3OH/H2 = 10−10, 1/10 fiducial), low-optical depth case (green shaded areas) and the high-abundance (E − CH3OH/H2 = 10−8, 10 times fiducial), high-optical depth case (blue shaded areas). The light-shaded areas correspond to the 90% HPD, while the dark shaded areas correspond to the 50% HPD. |
In the text |
![]() |
Fig. 7 Comparison of the column density, as derived from a simple rotational diagram and computed from the model grid, on a pixel-by-pixel basis. The red line shows the one-to-one locus. |
In the text |
![]() |
Fig. 8 Comparison of the number density derived from the dust continuum emission, assuming spherical geometry, and the number density derived with the method proposed in this work. The red line indicates the 1:1 correspondence. |
In the text |
![]() |
Fig. 9 Left: empirical cumulative distribution functions of the best-fit density for each evolutionary class in the TOP 100 sample. Right: violin plot of the bootstrapped density distribution per class; 10 000 samples have been drawn for each source from the inferred number density posterior. The combined PDF, determined via a KDE smoothing, is shown around the quartiles and whiskers of a box plot. |
In the text |
![]() |
Fig. A.1 Entity-relation diagram created by the SAK tool. A description of the individual tables and columns can be found in the repository documentation. |
In the text |
![]() |
Fig. C.1 Integrated line ratios over the entire clump as a function of temperature and characteristic number density for the isothermal, constant-density case, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). |
In the text |
![]() |
Fig. C.2 Integrated line ratios over the entire clump as a function of temperature and number density for the isothermal case with a density distribution ∝ r−1.5, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). |
In the text |
![]() |
Fig. C.3 Comparison between the relations when using a different power-law distribution for density. The green-shaded region corresponds to p = −1.2, the yellow one to p = −1.5, while the blue area refers to p = −1.8. |
In the text |
![]() |
Fig. C.4 Kernel density estimate of the probability density function for the line ratios as a function of temperature and average number density along the LOS, for the internally heated clump case with a density distribution ∝ r−1.5 and a hot-core-like abundance profile, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). |
In the text |
![]() |
Fig. C.5 Kernel density estimate of the probability density function for the line ratios as a function of temperature and average number density along the LOS, for the internally heated clump case with a density distribution ∝ r−1.5, and a doubled microturbulence level, for lines at around 96.7 GHz (top row, for line ratios I(20 − 10)/I(2−1 − 1−1), I(21 − 11)/I(20 − 10), and I(21 − 11)/I(2−1 − 1−1)), those at around 241.7 GHz (bottom left, ratio I(5−1 − 4−1)/I(50 − 40)), and around 338.1 GHz (bottom right, ratio I(7−1 − 6−1)/I(70 − 60)). The results for the fiducial values of microturbulence are shown in yellow, for reference. |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.