Tracers of the ionization fraction in dense and translucent gas

Emeric Bron; Evelyne Roueff; Maryvonne Gerin; Jérôme Pety; Pierre Gratier; Franck Le Petit; Viviana Guzman; Jan H. Orkisz; Victor de Souza Magalhaes; Mathilde Gaudel; Maxime Vono; Sébastien Bardeau; Pierre Chainais; Javier R. Goicoechea; Annie Hughes; Jouni Kainulainen; David Languignon; Jacques Le Bourlot; François Levrier; Harvey Liszt; Karin Öberg; Nicolas Peretto; Antoine Roueff; Albrecht Sievers

doi:10.1051/0004-6361/202038040

Home

All issues

Volume 645 (January 2021)

A&A, 645 (2021) A28

Full HTML

Press Release

Highlight

Open Access

Issue		A&A Volume 645, January 2021


Article Number		A28
Number of page(s)		28
Section		Interstellar and circumstellar matter
DOI		https://doi.org/10.1051/0004-6361/202038040
Published online		23 December 2020

A&A 645, A28 (2021)

I. Automated exploitation of massive astrochemical model grids

Emeric Bron¹, Evelyne Roueff¹, Maryvonne Gerin², Jérôme Pety³^,2, Pierre Gratier⁴, Franck Le Petit¹, Viviana Guzman⁵, Jan H. Orkisz⁶, Victor de Souza Magalhaes³, Mathilde Gaudel², Maxime Vono⁷, Sébastien Bardeau³, Pierre Chainais⁸, Javier R. Goicoechea⁹, Annie Hughes¹⁰, Jouni Kainulainen⁶, David Languignon¹, Jacques Le Bourlot¹, François Levrier¹¹, Harvey Liszt¹², Karin Öberg¹³, Nicolas Peretto¹⁴, Antoine Roueff¹⁵ and Albrecht Sievers³

¹ LERMA, Observatoire de Paris, PSL Research University, CNRS, Sorbonne Universités, 92190 Meudon, France
e-mail: emeric.bron@obspm.fr
² LERMA, Observatoire de Paris, PSL Research University, CNRS, Sorbonne Universités, 75014 Paris, France
³ IRAM, 300 rue de la Piscine, 38406 Saint Martin d’Hères, France
⁴ Laboratoire d’Astrophysique de Bordeaux, Univ. Bordeaux, CNRS, B18N, Allee Geoffroy Saint-Hilaire, 33615 Pessac, France
⁵ Instituto de Astrofísica, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 4860, 7820436 Macul, Santiago, Chile
⁶ Chalmers University of Technology, Department of Space, Earth and Environment, 412 93 Gothenburg, Sweden
⁷ University of Toulouse, IRIT/INP-ENSEEIHT, CNRS, 2 rue Charles Camichel, BP 7122, 31071 Toulouse cedex 7, France
⁸ Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL, 59651 Villeneuve d’Ascq, France
⁹ Instituto de Física Fundamental (CSIC). Calle Serrano 121, 28006 Madrid, Spain
¹⁰ Institut de Recherche en Astrophysique et Planétologie (IRAP), Université Paul Sabatier, Toulouse cedex 4, France
¹¹ Laboratoire de Physique de l’Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, Paris, France
¹² National Radio Astronomy Observatory, 520 Edgemont Road, Charlottesville, VA 22903, USA
¹³ Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138, USA
¹⁴ School of Physics and Astronomy, Cardiff University, Queen’s buildings, Cardiff CF24 3AA, UK
¹⁵ Aix-Marseille University, CNRS, Centrale Marseille, Institut Fresnel, Marseille, France

Received: 27 March 2020
Accepted: 2 July 2020

Abstract

Context. The ionization fraction in the neutral interstellar medium (ISM) plays a key role in the physics and chemistry of the ISM, from controlling the coupling of the gas to the magnetic field to allowing fast ion-neutral reactions that drive interstellar chemistry. Most estimations of the ionization fraction have relied on deuterated species such as DCO⁺, whose detection is limited to dense cores representing an extremely small fraction of the volume of the giant molecular clouds that they are part of. As large field-of-view hyperspectral maps become available, new tracers may be found. The growth of observational datasets is paralleled by the growth of massive modeling datasets and new methods need to be devised to exploit the wealth of information they contain.

Aims. We search for the best observable tracers of the ionization fraction based on a grid of astrochemical models, with the broader aim of finding a general automated method applicable to searching for tracers of any unobservable quantity based on grids of models.

Methods. We built grids of models that randomly sample a large range of physical conditions (unobservable quantities such as gas density, temperature, elemental abundances, etc.) and computed the corresponding observables (line intensities, column densities) and the ionization fraction. We estimated the predictive power of each potential tracer by training a random forest model to predict the ionization fraction from that tracer, based on these model grids.

Results. In both translucent medium and cold dense medium conditions, we found several observable tracers with very good predictive power for the ionization fraction. Many tracers in cold dense medium conditions are found to be better and more widely applicable than the traditional DCO⁺/HCO⁺ ratio. We also provide simpler analytical fits for estimating the ionization fraction from the best tracers, and for estimating the associated uncertainties. We discuss the limitations of the present study and select a few recommended tracers in both types of conditions.

Conclusions. The method presented here is very general and can be applied to the measurement of any other quantity of interest (cosmic ray flux, elemental abundances, etc.) from any type of model (PDR models, time-dependent chemical models, etc.).

Key words: ISM: clouds / ISM: molecules / methods: statistical / methods: numerical / astrochemistry

© E. Bron et al. 2020

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

The so-called neutral component of the interstellar medium, despite being shielded from EUV (13.6 to 124 eV) stellar photons able to ionize hydrogen, retains a small ionization fraction (x(e⁻) = n(e⁻)∕n_H). The ionization mechanism depends on the type of region: FUV (6 to 13.6 eV) photons ionizing C and S in the low A_V surface layer of clouds, or cosmic rays, X rays, shocks, etc., in the densest parts. As a result, ionization fractions range between ~10⁻⁴ in low A_V cloud surfaces and down to ~10⁻⁹ in dense cores (e.g., Goicoechea et al. 2009; Draine 2011).

This ionization fraction controls several key aspects of neutral interstellar clouds. It determines the degree of coupling of the gas to the magnetic field: the neutrals, accounting for most of the mass of the fluid, are only indirectly sensitive to the presence of a magnetic field through their friction with the ions that remain coupled to the field, a process called ion-neutral friction. This coupling can provide a significant magnetic support against gravitational collapse of dense cores despite the low ionization fractions values found there, between 10⁻⁹ and 10⁻⁷ (Mestel & Spitzer 1956; Mouschovias 1976; Basu & Mouschovias 1994). The ionization fraction also controls the onset of the magneto-rotational instability (Balbus & Hawley 1991), the main mechanism of angular momentum transport in accretion disks. Moreover, the gas phase chemistry in dense molecular clouds is to a large extent driven by fast ion-neutral reactions (Herbst & Klemperer 1973; Oppenheimer & Dalgarno 1974). The build-up of chemical complexity thus depends on the ionization fraction of the medium. Finally, some common molecular tracers with high dipole moments, such as HCN and HCO⁺, have high inelastic collision cross sections with electrons, and their excitation can be significantly affected by electron collisions for ionization fractions ≳10⁻⁵ (Black & van Dishoeck 1991; Liszt 2012; Liszt & Pety 2016; Goldsmith & Kauffmann 2017). This makes the interpretation of their emission (e.g., to estimate gas density) sensitive to our knowledge of the local ionization fraction.

Direct observational estimation of the ionization fraction in neutral clouds is difficult, except in very specific regions (e.g., Goicoechea et al. 2009; Cuadrado et al. 2019 at the dissociation front in a photodissociation region). Direct estimation of the total charge accounted for by observable molecular ions in molecular clouds only yields a loose lower limit (e.g., Miettinen et al. 2011). Indirect methods based on tracers that are chemically sensitive to the ionization fraction have thus been commonly used. These methods have mostly involved measuring the deuterium fractionation through abundance ratios involving simple molecular ions like DCO⁺/HCO⁺ (Guelin et al. 1977, 1982; Dalgarno & Lepp 1984; Caselli et al. 1998) or N₂ D⁺/N₂H⁺ which is less affected by depletion (Caselli 2002). The idea is that the deuterium enrichment (defined as H₂ D⁺/H $_{3}^{+}$ $_3^+$ ), initiated by the exchange reaction $H_{3}^{+} + HD ⇌ H_{2} D^{+} + H_{2},$ $\begin{equation*} \mathrm{H}_3^+ + \mathrm{HD} \rightleftharpoons \mathrm{H}_2\mathrm{D}^+ + \mathrm{H}_2,\end{equation*}$ (1)

at low temperature, is limited by electronic dissociative recombination of H₂ D⁺, and that the resulting ratio is transmitted (with a known prefactor) to the deuteration fraction of other molecules such as HCO⁺. Using such tracers, the ionization fraction is deduced either by using approximate analytical formulae representing simplified networks (Caselli et al. 1998; Miettinen et al. 2011; Caselli 2002), or by adjusting an astrochemical model including a full chemical network to the observations, using stationary chemical models (Williams et al. 1998; Bergin et al. 1999; Caselli et al. 2002; Fuente et al. 2016), time-dependent ones (Maret & Bergin 2007; Shingledecker et al. 2016), or PDR models (Goicoechea et al. 2009). Despite the variety of determination methods, using different deuterated molecules, only very few works have proposed using other tracers than deuterated species. For instance, Flower et al. (2007) proposed the C₆ H⁻/C₆H ratio as a tracer of the ionization fraction, and Fossé et al. (2001) have investigated the relationship between the cyclic-to-linear ratio of C₃ H₂ and the ionization fraction. Deuteration-based approaches however suffer from several limitations due to the fact that they depend on other physical or chemical parameters that need to be determined independently. The initial deuteration reaction (Eq. (1)) is sensitive not only to the gas temperature but also to the essentially unmeasurable ortho-to-para ratio of H₂ (Pagani et al. 1992, 2011; Shingledecker et al. 2016). Indeed, the endothermicity of the reaction in the backward direction (192K) is very close to the J = 1 to J = 0 energy difference of H₂ (170.5K). Then, even a small fraction of o-H₂ (J = 1) contributes to H₂ D⁺ destruction (with a reduced endothermicity of ≃ 20 K) and restricts the deuteration process. In addition, ratios such as DCO⁺/HCO⁺ are linked to H₂ D⁺/H $_{3}^{+}$ $_3^+$ through reactions with neutral species like CO. The estimated deuterium fraction is therefore sensitive to the depletion factors of carbon, oxygen, and nitrogen that are not easy to evaluate (Caselli 2002). Moreover, deuterated tracers such as DCO⁺ are typicallyonly detectable in cold dense cores, representing only a tiny fraction of the observable area of a giant molecular cloud (GMC). Deuteration-based approaches are thus inadequate for an unbiased characterization of the conditions in GMCs as a whole.

Despitethe common use of advanced chemical models computing the abundances of hundreds ofspecies, the observed tracers to which these models are compared to estimate x(e⁻) (deuterated molecules such as DCO⁺) are still those initially proposed based on analytical reasoning using simplified chemical networks. The wealth of data produced in large chemical model grids remains largely unexploited. Their exploration of wide parameter spaces might reveal less intuitive but more efficient tracers. Based on this approach, we propose here a general and largely automatic method to identify the best observational predictors of the ionization fraction, when other important parameters such as the gas density, temperature or H₂ ortho-to-para ratio are unknown. We apply this method to propose new predictors of the ionization fraction as a function of the molecular cloud conditions. We use simple stationary chemical models with a complete up-to-date chemical network (Roueff et al. 2015), and use molecular ratios (column density ratios or integrated line intensity ratios) as observable tracers from which we seek to predict the ionization fraction. We base our investigation on the observed range of physical conditions and detected tracers in the IRAM-30m Large Program ORION-B (Outstanding Radio-Imaging of OrioN B, co-PIs: J. Pety and M. Gerin)¹ . In this program, we imaged 5 square degrees towards the southern part of the Orion B giant molecular cloud over most of the 3 mm atmospheric window (Pety et al. 2017; Gratier et al. 2017, 2020; Orkisz et al. 2017, 2019; Bron et al. 2018; Roueff et al. 2020).

In the context of dense cores, the ionization fraction is linked with the cosmic ray ionization rate (CRIR) and both are often studied from the same molecular ratios (although direct tracers of the cosmic ray ionization rate can also be used in more diffuse medium, in particular H $_{3}^{+}$ $_3^+$ , e.g., Indriolo & McCall 2012; Le Petit et al. 2016). In our context of the Orion B giant molecular cloud, where UV illumination controls the ionization fraction in large parts of the cloud, we focus here on the question of estimating the ionization fraction only, independently of the source of ionization. The task of tracing the cosmic ray ionization rate (which has also been attempted using astrochemical model grids, e.g., Barger & Garrod 2020) will be considered in a future application of the method presented here.

In this first article, we present a generic method to find the best tracers of an unobservable physical parameter and apply it to the search of new tracers of the ionization fraction among the species that are detectable in the ORION-B dataset. The observational application of the tracers found here to study the ionization fraction in the Orion B molecular cloud will be presented in a second paper (Guzman et al., in prep.).

In Sect. 2, we describe our general statistical method for determining the best predictors of a given unobservable parameter based on the results of a model grid. In Sect. 3, we present the models used in this study for the search of ionization fraction tracers. We then present the ranking of observable predictors in Sect. 4. For ease of application of our results, we provide in Sect. 5 analytical fit formulae to deduce the ionization fraction from each of the proposed best predictors. We finally discuss our results in Sect. 6 and present our conclusions in Sect. 7.

2 Method

2.1 Principle

Both observable line intensities (or column densities) on the one hand, and the ionization fraction x(e⁻) on the other hand, depend on multiple, unobservable physical parameters (e.g., gas density, elemental abundances, cosmic ray flux, ...). Our goal is to find reliable relationships between observable quantities and ionization fraction, despite lacking estimations of these hidden physical parameters. To do this, we first run model grids covering the whole possible parameter space. We then use a flexible regression method to fit x(e⁻) as a function of one of the potential observational tracers through the whole grid of chemical models. This means that we treat the effects of the variations of the hidden parameters as sources of noise on the prediction of the ionization fraction. Finally, we use a quantitative measurement of the fit quality as an estimate of the predictive power of each potential tracer. These estimates are used to rank the tracers and highlight the most powerful predictors of the ionization fraction. The fitted models for the best tracers will provide ready-to-use tools to be applied to observations. The pipeline tool AutoRank, implementing the procedure described below, is available online² .

2.2 Ranking tracers with Random Forests

Any a prioriinformation could easily be included in the method by sampling the parameters of the model grid according to a specific prior distribution. However, we wish to minimize the amount of a priori information injected in the method and to avoid making assumptions on the shape of the distributions of physical parameters (e.g., gas density, elemental abundances, cosmic ray flux, ...). We thus build a model grid that samples uniformly the possible range of values (see Sect. 3).

Our model grids provide us with a dataset comprising ionization fraction values and corresponding values of observable quantities. We will consider line intensity ratios or column density ratios as our observable quantities in this paper. The hidden physical parameter values introduce a non deterministic aspect to the relationship between x(e⁻) and the observables: models might have identical values of an observable but different x(e⁻) if the underlying physical parameters are different. Learning to predict x(e⁻) from a given observable is then a regression problem, with the uncertainty introduced by the hidden physical parameters playing the role of noise. Determining the best tracers of x(e⁻) is thus equivalent to finding observables for which the relationship to x(e⁻) is least affected by this noise (i.e., by the hidden physical conditions). This means finding the observables for which the most accurate regression model can be found.

For this regression problem, we choose to use Random Forests (Breiman 2001) because their flexibility makes it possible to fit general non-linear shapes, while their simplicity provides reasonable computational costs. This makes the method presented here very general and applicable to finding tracers of other physical parameters without any assumption on the shape of the relationship between the tracers and the target parameter. We will use RF for Random Forest in the rest of the paper. RF regression models are based on the concept of regression trees (Breiman et al. 1984), where a succession of binary decisions are made based on the input variables (e.g., x₃ < 2 or ≥ 2) and constant values are predicted in each of the subsets of the partition that the decision tree defines. While such decision trees are easily interpretable, they require large tree depths to be flexible but are prone to overfitting if this depth is too large. RF tackle this overfitting problem by using the simple idea that multiple overfitted regression models will, when averaged, give a better prediction as long as the errors they individually make are uncorrelated between models. In a RF, the individual trees are made as independent as possible by introducing randomness in two aspects: (1) the building of each tree only considers a random subset of the input variables, and (2) each tree is given a bootstrapped sample (i.e. drawn by random sampling with replacement from the original dataset, Breiman 1996) instead of the original full sample. This way, the datasets seen by the different trees are independent and each tree only sees a subset of the dataset (bootstrapped datasets typically contain only 63% of the points of the original dataset as repetition is allowed). This provides a very flexible regression model, which retains some of the interpretability of decision trees. RF have thus quickly become a standard method in Machine Learning (see e.g., Hastie et al. 2001). In addition, they allow to estimate the generalization error of the fit (i.e., the error made when predicting data not seen during training): as each tree has only seen a random bootstrapped sample from the data, it is possible to estimate for each datapoint a partial prediction using only the trees that have not seen this datapoint during training. As the sample seen by a given tree is called a bag, these partial predictions are called out-of-bag predictions (OOB). Gratier et al. (2020) also use RF in the context of the interstellar medium and introduce the method in detail.

We thustrain RF regression models for each observable (using only one observable at a time) and estimate the accuracy of the regression models. This accuracy is taken as an estimate of the predictive power of the observable quantity considered, for the purpose of predicting x(e⁻). The different observables can then be ranked according to this predictive power estimate. The accuracy of the regression model is estimated with the OOB R², $R^{2} = 1 - \frac{S S_{res}}{S S_{tot}} with the sums of squares$ $R^2 = 1 - \frac{S\!S_{\mathrm{res}}}{S\!S_{\mathrm{tot}}} \quad \mbox{with the sums of squares}$ $S S_{res} = \sum_{i} {(y_{i}^{pred} - y_{i}^{true})}^{2} and S S_{tot} = \sum_{i} {(\bar{y^{true}} - y_{i}^{true})}^{2},$ $S\!S_{\mathrm{res}} = \sum_i \left(y_i^{\textrm{pred}} - y_i^{\textrm{true}}\right)^2 \quad \mbox{and} \quad S\!S_{\mathrm{tot}} = \sum_i \left(\overline{y^{\textrm{true}}} - y_i^{\textrm{true}} \right)^2,$

where the index i runs across data points (individual model results), $y_{i}^{true}$ $y_i^{\textrm{true}}$ is the true value of ionization fraction (computed by the chemical model), $y_{i}^{pred}$ $y_i^{\textrm{pred}}$ is the OOB prediction value from the RF, and $\bar{y^{true}}$ $\overline{y^{\textrm{true}}}$ is the average of the (true) ionization fraction over the model grid. This coefficient R² gives the fraction of the total ionization fraction variance (across the full model grid) that the RF model can explain from the given observable predictor alone (i.e., it measures the fractional decrease from the initial variance of x(e⁻) to the variance of the residuals).

It is thus <1, with 1 representing perfect prediction (zero residual variance). We note that it can take a negative value when the model performs worse than predicting a constant value set at the average x(e⁻) of the dataset. A value of 0 indicates a performance equivalent to this constant prediction of the average. This R² value is used for the ranking of tracers. For more information, we also provide below the root mean square error $RMSE = \sqrt{\frac{1}{N} \sum_{i} {(y_{i}^{true} - y_{i}^{pred})}^{2}},$ $\mathrm{RMSE} = \sqrt{ \frac{1}{N} \sum_i \left(y_i^{\textrm{true}} - y_i^{\textrm{pred}}\right)^2 },$

where N is the number of chemical models in our grid. The RMSE is completely univocally related to the R² value, but ismore interpretable in terms of the amplitude of typical errors. We also provide the maximum absolute error, $max . abs . err . = \max_{i} | y_{i}^{true} - y_{i}^{pred} | .$ $\mathrm{max.\,abs.\,err.} = \max _i \left| y_i^{\textrm{true}} - y_i^{\textrm{pred}}\right| .$

This quantity, estimating the maximum error made by our regression model, is not guaranteed to converge when increasing the size of the dataset. It should thus not be interpreted further than being the largest error we observed in our limited-size sample.

The RF model depends on a few internal parameters (number of trees, maximum depth of trees, etc.). Their values can affect the quality of the model and its tendency to overfit. We used a number of trees in the forest N_trees = 400 and a maximum tree depth d_max = 4. The procedure used to select these values is described in Appendix A. Our tests show that this optimization scheme is not critical for our purpose: while the choice of parameter values does affect the quality of the best fit RF model, it does not change significantly the ranking of the predictors that we deduce from it.

3 Chemical models

Here, we use the chemical code presented in Roueff et al. (2015) to study isotopic fractionation of deuterium, carbon, and nitrogen compounds. Single zone models with fixed density, temperature, visual extinction, radiation field, cosmic ray ionization rate, ortho-to-para H₂ ratio, and depletion factors are computed at steady state.

We consider, for the present study, a chemical network including deuterium, and isotopic carbon and oxygen species where the deuterium, carbon and oxygen fractionation reactions have been introduced following the recent determinations of exothermicities by Mladenović & Roueff (2017). We introduce in particular D¹³ CO⁺. Apart from these specific fractionation reactions, the chemistry of isotopically substituted species is built automatically from the chemical network of the major components. The chemical reactions involving one single carbon-containing reactant and one single carbon-containing product are duplicated with the same reaction rate coefficient. Simple statistical assumptions are introduced when two carbon containing molecules are implied in the reaction. Consider for example the case of the reaction $CX + CY \to {CX}^{'} + {CY}^{'},$ $\begin{align*}\mathrm{CX} + \mathrm{CY} \rightarrow \mathrm{CX}' + \mathrm{C Y}', \end{align*}$

taking place with a reaction rate coefficient k. The reactions introduced for the isotopically substituted species are the following: $\begin{align}nn&^13 CX + CY \to \vphantom a^{13} {CX}^{'} + {CY}^{'} with k / 2, nn&^13 CX + CY \to {CX}^{'} + \vphantom a^{13} {CY}^{'} with k / 2, nn&CX + \vphantom a^{13} CY \to {CX}^{'} + \vphantom a^{13} {CY}^{'} with k / 2, nn&CX + \vphantom a^{13} CY \to \vphantom a^{13} {CX}^{'} + {CY}^{'} with k / 2, nn&^13 CX + \vphantom a^{13} CY \to \vphantom a^{13} {CX}^{'} + \vphantom a^{13} {CY}^{'} with k . nn\end{align}$ $\begin{align*} &^{13}\mathrm{CX} + \mathrm{CY} \rightarrow \vphantom{a}^{13}\mathrm{CX}' + \mathrm{CY}' \quad \mathrm{with}\ k/2, \\ &\hspace*{-2pt}^{13}\mathrm{CX} + \mathrm{CY} \rightarrow \mathrm{CX}' + \vphantom{a}^{13}\mathrm{CY}' \quad \mathrm{with}\ k/2, \\ &\hspace*{-0.6pt}\mathrm{CX} + \vphantom{a}^{13}\mathrm{CY} \rightarrow \mathrm{CX}' + \vphantom{a}^{13}\mathrm{CY}' \quad \mathrm{with}\ k/2, \\ &\mathrm{CX} + \vphantom{a}^{13}\mathrm{CY} \rightarrow \vphantom{a}^{13}\mathrm{CX}' + \mathrm{CY}' \quad \mathrm{with}\ \,k/2, \\ &^{13}\mathrm{CX} + \vphantom{a}^{13}\mathrm{CY} \rightarrow \vphantom{a}^{13}\mathrm{CX}' + \vphantom{a}^{13}\mathrm{CY}' \quad \mathrm{with} \ k . \end{align*}$

Such a procedure leads to an ensemble of 310 species linked through 8711 chemical reactions.

These models allow us to compute observable column density ratios for the hundreds of species included. Although commonly derived by observers, column densities are not the primary observable quantities, and we thus also compute line intensity ratios. To do so, we post-process the results of our chemical models using a non-LTE excitation and radiative transfer model (RADEX, van der Tak et al. 2007) assuming a typical linewidth of 1 km s⁻¹ (observed linewidths in the Orion B are typically of a few km s⁻¹ Pety et al. 2017). Results based on column density ratios and based on line intensity ratios will be presented separately in the following sections.

It is unlikely that a single tracer will provide a good estimate of the ionization fraction x(e⁻) in all possible physical conditions. Either the tracer will lose its relationship with x(e⁻) in some conditions, or it might be too weak to be observable in other conditions. We thus decided to divide the range of possible conditions into subregions corresponding to the different types of environments found in GMCs (Pety et al. 2017; Bron et al. 2018). We focus on two kinds of environments: the translucent medium and the cold and dense gas, and we derive separate rankings of tracers for these two environments. The range of physical conditions explored for each of these environments are chosen based on our previous studies of Orion B and listed in Table 1.

In both grids, the gas density and gas temperatures were varied covering the typical ranges for translucent medium (3 × 10²−3 × 10³ cm⁻³, 15–100 K) and cold and dense medium (10³−10⁶ cm⁻³, 7–20 K). In the translucent model grid, external FUV photons still play an important role in the chemistry and in controlling the ionization fraction. This FUV illumination is controlled through an external FUV field strength G₀ (see e.g., Hollenbach & Tielens 1999, p. 177) and an extinction A_V representingthe amount of shielding between the FUV source and the gas under consideration (also used as the depth of the slab when computing line intensities). We take into account self-shielding of H₂ by using the approximate expression of Draine & Bertoldi (1996) and introduce also the shielding of CO by H₂ from Heays et al. (2017). We consider lower extinctions and higher G₀ values in translucent medium (A_V in the range 2–6, G₀ of the external field in the range 1–1000) than in cold dense medium (A_V in the range 5–20, external G₀ set to 1). We explore average to moderately strong FUV illumination values in the low density translucent grid. Regions with both high density high FUV illumination correspond to dense photodissociation regions (PDR), in which strong chemical and physical stratification on small spatial scale is critical. These regions would thus require the use of complete PDR models, such as the Meudon PDR Code (Le Petit et al. 2006). We thus did not explore this type of conditions in the present study.

Given the uncertainties about the cosmic ray ionization rate in molecular clouds (Lepp 1992; McCall et al. 2003; Indriolo et al. 2007), we consider the range of value 10⁻¹⁷−10⁻¹⁵ s⁻¹ . In the cold gas conditions, in order to account for the reduced cosmic ray fluxes (Padovani et al. 2009), we limit this range to 10⁻¹⁷−10⁻¹⁶ s⁻¹ . As sulfur can be an important contributor of electrons in neutral gas but has a highly uncertain gas-phase elemental abundance (Agúndez & Wakelam 2013; Goicoechea et al. 2006), we explore in both grids values of [S], the relative sulfur abundance with respect to H, in the range 1.86 × 10⁻⁸−1.86 × 10⁻⁵.

We also explore ranges of H₂ ortho-para ratio (OPR $_{H_{2}}$ $_{\mathrm{H}_2}$ ) which impact significantly two reactions: namely, H₂ D⁺ + o-H₂ → H $_{3}^{+}$ $_3^+$ + HD (Pagani et al. 1992) where the energy endothermicity is reduced to 61.5 K (compared to 232K with p–H₂) and N⁺ + o-H₂ → NH⁺ + H which is slightly endothermic(~44 K), whereas N⁺ + p-H₂ → NH⁺ + H is more strongly endothermic(~170 K), as first emphasized by Le Bourlot (1991). For this reaction, we follow the prescription of Dislaire et al. (2012) which is derived from experimental results. We use higher values of the OPR in the warmer translucent medium (0.1−3) than in cold dense gas (10⁻⁴−10⁻¹). Finally, cold dense cores offer conditions where molecules can freeze out on dust grains, depleting the gas phase abundances of elements such as C and O. In our cold dense medium grid, we thus in addition explore depletion factors going from 1 (no depletion) to 10 (C elemental abundance 10 times lower than the reference values) witha constant C/O elemental ratio value of 0.6 (the elemental abundance of carbon is taken to be [C] = 1.32 × 10⁻⁴ when there is no depletion). Other parameters that might have an impact (although second order compared to the parameters considered here), such as variations of the metal elemental abundances or PAH abundance, were not considered in this study. The gas-phase elemental abundances for metals (relative to H) are taken to be [Fe] = 1.5 × 10⁻⁸, [Cl] = 1.8 × 10⁻⁷, [Si] = 8.2 × 10⁻⁷, [F] = 1.8 × 10⁻⁸, [Ar] = 3.29 × 10⁻⁶.

For each medium type, a set of 5000 models was run, sampling randomly and uniformly within the chosen parameter space. The adequacy of this number of models for our purpose is ascertained later when estimating the uncertainties on our results.

Given the variation by orders of magnitude both in the parameter values and the computed observables, we choose to work with the logarithm of all quantities. We sampled uniformly on the logarithm of each parameter within the ranges indicated above. The method described in Sect. 2 is applied on the logarithm of all quantities (i.e., training RF models using the logarithm of column density ratios or line intensity ratios to predict the logarithm of x(e⁻)). Representative error values such as the RMSE on logarithms are equivalent to representative error factors on the actual quantity. These corresponding error factors are given in parenthesis in the result tables of the following sections.

To get rid of possible instrumental, calibration and other source geometry effects, we choose to work only on ratios of observable quantities (either column density ratios or line intensity ratios). In the following, we use the term tracers for ratios of observable quantities.

Among all the species computed in our chemical model, we made a selection of species that are detected in the radio observations of the ORION-B project and potentially linked to the ionization fraction. Our search for the best tracers is made among the ratios of these selected species. For the translucent medium condition, we selected ¹³CO, C¹⁸O, HCO⁺, HCN, HNC, CN, C₂H, CS, SO, H₂CS, HCS⁺, CF⁺. The search for a best ratio was thus done among 66 possible column density ratios (and 66 line intensity ratios). For dense cold medium conditions, we considered the same selection with the addition of DCO⁺ and N₂ H⁺. We thus had 91 possible column density ratios (and 91 line intensity ratios) in this case. For line intensities, the exact transition and frequency considered for each species are listed in Table 2. In the RADEX computations of line intensities, we account for collisional excitation with electrons (using the ionization fraction computed by the chemical model) for species for which collisional data with electrons are available in RADEX (HCO⁺, HCN and C₂H). We note that for CN, excitation by electrons was not included as in the current version of RADEX, the collisional data that includes the hyperfine structure of CN does not include collisions with electrons. We chose to privilege the fact of accounting for the hyperfine structure here.

Table 1

Range of physical parameters explored for each of our two classes of medium: gas density n_H, gas temperature T_gas, incident FUVradiation field intensity G₀, line-of-sight visual extinction A_V, cosmic-ray ionization rate ζ, H₂ ortho-to-pararatio OPR $_{H_{2}}$ $_{\mathrm{H}_2}$ , depletion factor and sulfur gas-phase elemental abundance [S].

Table 2

Shorthand names, full quantum number designation, and frequency of the molecular lines we considered.

4 Tracers rankings

We applied the method described in Sect. 2 to the two chemical model grids (translucent medium conditions and cold dense medium conditions) presented in Sect. 3 in order to obtain a ranking of the selected potential tracers according totheir usefulness for predicting the ionization fraction.

4.1 Translucent medium

Figure 1 presents the predictive power (estimated as the R² of a RF fit) of each tracer for the best 20 tracers. The left panel shows the result when taking column density ratios as observable quantities, and the right panel the results when considering line intensity ratios. We see that the ranking is similar in both cases, suggesting that excitation and radiative transfer only have a moderate effect on the relationship between these tracers and the ionization fraction. A more complete ranking (covering all tracers having R² > 0.5) is given in Table B.1 (due to their sizes, the results tables of this section and of Sect. 5 are placed in Appendix B)³ .

In both cases, about ten different ratios are found to be each able to explain more than 80% of the ionization fraction variance (R² > 0.8). We emphasize that this means that an accurate prediction of the ionization fraction is possible from each of these tracers despite not knowing thevalues of the 7 physical and chemical parameters that have been varied in our model grid (cf. Table 1). The R² values are slightly lower when using line intensity ratios rather than column densities ratios, indicating that excitation and radiative transfer effects tend to increase the degeneracy between the ionization fraction and other unknown parameters, but this effect remains moderate. To illustrate the performance of the tracers found with this ranking, we show on the left panel of Fig. 2 the ionization fraction versus the best ranked column density ratio (C₂ H/HCN) in our grid of models for translucent medium conditions (blue symbols, with contours in shades of blue indicating iso-PDF contours encompassing 25, 50, and 75% of the distribution) and the prediction of the fitted RF model (solid red line), which is found in this case to explain 95.7% of the ionization fraction variance in our grid. The remaining scatter around the relationship represents the effect of ignoring all other parameters (gas density, temperature, UV field, H₂ OPR,...). The best ranked line intensity ratio (C₂H (1–0) / HCN (1–0)) is similarly shown on the right panel of Fig. 2 with the corresponding fitted RF model (solid red line).

In translucent gas, the ionization is still dominated by the effect of external FUV photons ionizing carbon (and to a lesser extent sulfur and chlorine), and is slowly decreasing as the total extinction increases. In the conditions covered by our translucent grid, we find x(e⁻) ranging from 2 × 10⁻⁴ to 2 × 10⁻⁷. C₂ H, which we find in several of the best ratios, is known to be enhanced in FUV illuminated environments (Pety et al. 2005, 2017; Cuadrado et al. 2015; Guzmán et al. 2015; Gratier et al. 2017): as explained in Beuther et al. (2008), C₂ H traces the amount of carbon not locked into CO, and is thus sensitive to the FUV flux through CO photodissociation and the presence of C and C⁺ at a significant abundance level. In our translucent medium model grid, we indeed find C⁺ to be the main charge carrier and thus to very strongly correlate with x(e⁻). H⁺ and H $_{3}^{+}$ $_3^+$ may also contribute to the electronic fraction in environments where the cosmic ionization reaches values above 10⁻¹⁶ s⁻¹ (Le Petit et al. 2016). However, C⁺, an open shell ion, is chemically reactive with various molecules, except H₂ ⁴, and is at the origin of a complex chemistry with insertion of carbon atoms. C⁺ itself is not straightforwardly detectable as its fine structure transition at 158 μm requires spaceborne or airborne observations. But we may expect that molecules involving C⁺ in the initial chemical steps allow to probe the electronic fraction. Our finding of ratios involving C₂ H (relative to e.g., HCN, HNC or CN) as good proxies of the electronic fraction is a natural consequence of the relevance of C⁺ as one of the main positive charge carriers. The initial step of C₂H formation involves indeed the C⁺ + CH → C $_{2}^{+}$ $_2^+$ + H reaction, followed by subsequent reactions with H₂ up to C₂ H $_{2}^{+}$ $_2^+$ , which recombines to form C₂H. Molecules such as HCN, HNC, or CO and its isotopologues, on the other hand, are saturated stable molecules which scale with column density. As a result, ratios such as C₂H/HCN, whose transitions are easily detectable, offer a convenient diagnostic tool of the electronic fraction in translucent medium. The electronic fraction is then, as shown in Fig. 2 (left panel), an increasing function of the C₂ H/HCN ratio.

CF⁺ is another proxy of C⁺, as describedin Neufeld et al. (2006); Guzmán et al. (2012), and ratios involving this ion are also found here to be good tracers of the ionization fraction. However, this ion is relatively scarce since it involves fluorine, which has a low relative abundance to H₂ and has only been detected in PDR environments so far (detectability issues are investigated in Sect. 6.2).

In order to estimate the reliability of our results and determine if our 5000-model grid is sufficient to explore the chosen parameter space for our purpose, we compute errorbars on the predictive power estimate (the R² of the RF fit). To do so, we use 10-fold cross validation: the model grid is randomly split into 10 parts, and for each of these parts, a RF model is trained on the other 9 parts and tested on the remaining part which it has not seen during training. From these 10 estimates of the R² (all made on samples unseen during training), a reliable estimate of the predictive power on unseen data is made, as well as an estimate of the standard error on this predictive power. The corresponding error bars are also shown in Fig. 1. For most of the points, the errorbars are smaller than the marker, and the inset on the left panel presents a zoom on the first five ratios, showing the magnitude of the errorbars. This shows that the uncertainty induced by the finite size of our model grid in negligibly small and that our 5000-model grid is sufficient for our purpose. However, this conclusion should be taken with some caution as ithas been shown that there exists no unbiased estimator of the variance of a cross-validation estimate (Bengio & Grandvalet 2004) and that a naive estimation of this variance tends to underestimate it by a factor of up to 4 (Varoquaux 2018).

Fig. 1

Ranking of column density ratios (left) or line intensity ratios (right) of observable tracers by order of the predictive power for predicting the ionization fraction (measured by the R² coefficient), in the case of translucent medium conditions (showing only the first 20). Errorbars of the R² estimates are computed by cross-validation (see text for explanations). Inset in the left panel: zoom on the first five ratios in order to make the magnitude of the errorbars visible.

Fig. 2

Ionization fraction versus the best column density ratio, C₂H/HCN (left panel), and the best line intensity ratio, C₂H (1–0) / HCN (1–0) (right panel) for tracing the ionization fraction in translucent medium conditions. The model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line, presented in Sect. 5), the analytical fit of the 1σ uncertainty (dashed black lines, presented in Sect. 5), and the bounds of the validity range of the analytical fit (vertical lines, presented in Sect. 5). The quality estimates of the two models are indicated on the figure.

Fig. 3

Ranking of column density ratios (left) or line intensity ratios (right) of observable tracers by order of the predictive power for predicting the ionization fraction (measured by the R² coefficient), in the case of dense cold medium conditions (showing only the first 20). Errorbars of the R² estimates are computed by cross-validation (see text for explanations).

4.2 Cold dense medium

Figure 3 presents the ranking of the best tracers in cold dense conditions (cf. Table 1), for both column density ratios (left panel) and line intensity ratios (right panel). Error bars computed by the cross validation procedure described above are also shown, confirming that the size of our model grid is sufficient for estimating the quality of the fits based on each tracer. We see that the R² values of the best tracers are slightly lower than in the translucent medium case, indicating slightly stronger degeneracies with unknown parameters in this case (note that depletion was varied in this cold dense medium, in addition to the parameters varied in the translucent medium grid). However, we still find several tracers explaining more than 80% of the variance in x(e⁻). The best column density ratio is here found to be CN/N₂H⁺. The cold dense environments are essentially ionized through cosmic rays and secondary UV photons induced by cosmic rays. Electrons are primarily produced by cosmic ray ionization of H₂ and destroyed in the efficient dissociative recombination reactions of the various molecular ions. He⁺ ions, produced by cosmic ray ionization of He, are also particularly efficient in ionizing the molecular reservoirs, CO, HCN, N₂, H₂ O. This contributes to forming atomic ions, in addition to the H $_{3}^{+}$ $_3^+$ molecular ion resulting from H₂ ionization and the other stable molecular ions resulting from proton transfer of H $_{3}^{+}$ $_3^+$ with stable molecules giving ions such as H₂ D⁺, HCO⁺, H₃ O⁺, or N₂ H⁺. The ionization carriers are then shared amongst several different species, going from the simple atomic ions that do not react with H₂ (i.e., C⁺, S⁺, H⁺) and closed shell molecular ions such as H $_{3}^{+}$ $_3^+$ , HCO⁺, H₃ O⁺, and N₂ H⁺. Molecular ions are principally destroyed by dissociative recombination reactions whereas atomic ions rather react with the present neutral molecules since radiative recombination is not efficient. One can thus expect that molecular ions are inversely proportional to the electron abundances, as seen with the CN/N₂H⁺ ratio which is found to increase monotonically with x(e⁻) (cf. Fig. 4, left panel).

As in the translucent case, we find slightly lower R² values for the best line intensity ratios than for the best column density ratio. However, we find a few ratios that have better scores as intensity ratios than as column density ratios. In particular, while the ¹³CO/HCO⁺ column density ratio is found to be a poor predictor of the ionization fraction, the ¹³CO (1–0) / HCO⁺ (1–0) line intensity ratio appears as one of the best tracers. Contrary to the previous cases, this is entirely an excitation effect. The abundance ratio of ¹³CO/HCO⁺ is found to be mostly uncorrelated with x(e⁻) and mostly constant in the cold dense medium model grid (ratio of about 10³ with a typical scatter of a factor of 2–3). However, HCO⁺ and ¹³CO have strongly different critical densities: ~2 × 10⁵ cm⁻³ for HCO⁺ (in cold dense medium conditions, x(e⁻) is too low for electron collisions to play a significant role) in comparison to ~ 2 × 10³ cm⁻³ for ¹³CO. In the range of densities considered in the cold dense medium grid (10³−10⁶ cm⁻³), ¹³CO excitation isthus mostly at local thermodynamic equilibrium (LTE) and its emissivity per molecule is thus constant with gas density, while HCO⁺ is transitioning from the sub-thermally excited regime to the LTE regime, and its emissivity per molecule thus increases with density. The ¹³CO (1–0)/HCO⁺ (1–0) ratio thus decreases with gas density. On the other hand, we find the gas density to be very strongly anti-correlated with x(e⁻) in these conditions (with a typical scatter of a factor of ~3), as cosmic ray ionization is the dominant source of ionization here and the recombination rates per ion scale with the gas density. These two effects combine to give a ¹³CO (1–0) / HCO⁺ (1–0) ratio that increases with x(e⁻) with a relatively tight correlation (cf. Fig. C.4, top right panel).

Figure 4 shows the relation between x(e⁻) and the best column density ratio, CN/N₂H (left panel), and the best line intensity ratio, CF⁺ (1–0) / DCO⁺ (1–0) (right panel). We see a larger scatter than in Fig. 2, but a clear relationship is still found.

We note that the classical DCO⁺/HCO⁺ ratio does not appear among the best tracers found here for cold dense medium conditions. This point is discussed in Sect. 6.1.

Fig. 4

Same as Fig. 2 for cold dense medium conditions.

5 Analytical fit formulas

If possible, we recommend using the RF models described in the previous section when attempting to estimate x(e⁻) from one of the best tracers listed above. However, the provided datafiles are dependent on a specific implementation of Random Forests (the scikit-learn module for Python, Pedregosa et al. 2011). For a simpler and more persistent solution (independent of any external software), we provide in this section simple analytical fit formulae for the best tracers found in Sect. 4. While the RF models are flexible enough to make the method described in Sect. 2 generally applicable to any model grid and any physical quantity we want to find tracers of, the analytical fits provided here use formulas that have been specifically chosen for the application presented here (finding predictors of the ionization fraction from our chemical model grid). There is no guarantee that these same formulae would perform adequately to find analytical fits with other model grids and/or another quantity to predict.

5.1 Prediction formulae

We use simple polynomial formulae (working as before on the logarithm of both the observable ratios and the ionization fraction) to fit the non-linear relationships between the best tracers found in Sect. 4 and x(e⁻). This is applied to all tracers which where found to have R² > 0.5 in the previous RF analysis.

In the cold dense medium conditions, we thus use a simple polynomial of order 5: $f^{dense} (x) = a_{0} + a_{1} x + a_{2} x^{2} + a_{3} x^{3} + a_{4} x^{4} + a_{5} x^{5},$ $\begin{equation*} f^{\mathrm{dense}}(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5,\end{equation*}$ (2)

where x is the log₁₀ of the column density ratio or line intensity ratio from which we want to predict log ₁₀(x(e⁻)), f^dense(x) is our fitting function to $\log_{10} (x (e^{-}))$ $\log_{10}\left(x(\mathrm{e}^-)\right)$ in cold dense gas conditions⁵, and the parameters a₀ to a₅ are our fit parameters. A fit is made for each of the tracers that had R² > 0.5 (more than half of the variance explained) in the rankings of Sect. 4. The corresponding fit coefficient values for each columndensity ratio are given in Table B.7 and the coefficient values for line intensity ratios are listed in Table B.8.

In the translucent medium conditions, x(e⁻) naturally reaches a plateau at the fractional abundance of carbon (1.32 × 10⁻⁴ in our undepleted models). We thus use a modified formula combining a polynomial of order 5 and a saturation: $f^{translucent} (x) = f_{max} - \log_{e} (1 + e^{- (a_{0} + a_{1} x + a_{2} x^{2} + a_{3} x^{3} + a_{4} x^{4} + a_{5} x^{5})}) .$ $\begin{equation*} f^{\mathrm{translucent}}(x) = f_{\mathrm{max}} - \log_e \left(1 + \textrm{e}^{-(a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5)} \right) .\end{equation*}$ (3)

The fit parameters here are f_max and a₀ to a₅ . As in the cold dense medium case, f^translucent(x) is our fitting function to $\log_{10} (x (e^{-}))$ $\log_{10}\left(x(\mathrm{e}^-)\right)$ in translucent gas conditions. The corresponding fit coefficient values for each column density ratio are listed in Table B.5 and the coefficient values for line intensity ratios are listed in Table B.6.

The order of the polynomial in these functions is chosen so that further increasing it yields only marginal increase in R² (estimated by cross-validation to avoid overfitting). The R² values found for the best tracers in each case are close to the values initially found with the RF models, indicating that our analytical fits are not significantly worse than the RF models, at least for the high-R² tracers. As an example, Figs. 2 and 4 also show the analytical fit (solid red line) in comparison to the RF model (solid black line). For a few of the lower R² tracers, the analytical fits perform significantly worse as can be seen on the tables of Appendix B by comparing the R² values of the RF models with the R² values of the analytical fits.

Fig. 5

Illustration of the two sources of uncertainties for the best column density ratio for tracing the ionization fraction in dense cold medium, CN/N₂H⁺. Left panel: analytical fit (solid black line), the standard deviation around this curve corresponding to the uncertainties on the fit coefficients (thin dashed red lines representing the 3σ level), and the bounds of the validity range defined in the text (vertical blue lines). Right panel: analytical fit (solid black line) and two estimates of the standard deviation corresponding to the residual scatter of the data points around the curve: the red dotted line presents a moving-average estimate of the local standard deviation, and the dashed black lines shows our analytical fit of the residual standard deviation. On both panels, the chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points.

5.2 Uncertainty formulae

Finally, a key point is to estimate uncertainties on our prediction of the ionization fraction. Here, we distinguish two sources of uncertainty.

Our best analytical fit is determined on a finite sample of models, so that the fit coefficients are only estimates of the theoretical best fit coefficients. Estimating again these fit coefficients from a different sample of models (drawn from the same distribution) would result in slightly different values, and these uncertainties on the fit coefficients in turn imply an uncertainty on the ionization fraction value predicted by the fit formula at any given value of the observable quantity (intensity ratio or column density ratio). In order to estimate this uncertainty on the predicted value, we proceed by bootstrapping: we repeat the fitting procedure on 100 bootstrapped samples (drawn from the original model grid) and report the standard deviation of the value of the fit function as its uncertainty.

The left panel of Fig. 5 shows the corresponding uncertainty (showing the 3σ level in dashed curves around the main prediction curve) in the case of the best column density ratio in cold dense gas conditions (CN/N₂H⁺). We name this uncertainty the fit coefficients uncertainty to distinguish it from the second form of uncertainty below. We define the validity domain of our fit as the range of values of the observable ratio where our best fit is sufficiently constrained by our finite grid of models for this fit coefficient uncertainty to be negligible. In practice, we define it as the range of ratios where the above-defined uncertainty remains lower than 2% of the predicted value. In the following, we will thus assume this uncertainty to be negligible inside of this validity domain and focus on the second form of uncertainties. The limits of the corresponding validity range are also shown on the left panel of Fig. 5 as blue vertical lines. Due to the tendency of high order polynomial fits to diverge quickly outside of the domain of the fitted dataset, the analytical fit formulae should not be used outside of the validity range defined here.

The second form of uncertainties comes from the unobservable parameters (density, temperature,...), which induce a scatter in the relationship between any of the line intensity ratios or column density ratios and the ionization fraction. Inside of the validity domain of the fit, this scatter in the residuals is much larger than the fit coefficients uncertainty, as can be seen in the left panel of Fig. 5, and is thus the dominant source of uncertainties. When applying the fit formula to real observations, in the ideal case where the chemical model used in this paper would be a perfect model of reality, we would expect the value predicted by the fit formula to commit a mean squared error equal to the variance of this scatter of the residuals, and this error represents our lack of information on the underlying physical conditions. As can be seen in the previous figures however, this residual scatter varies as a function of the observable predictor (the vertical scatter is smaller in some regions of the plot than in others). This residual variance as a function of the predictor can be estimated by a moving average method (shown in the right panel of Fig. 5, we used a window of 0.1 dex). In order to provide asimpler way of estimating this residual variance function, we fitted the simple analytical function $g (x) = | b_{0} + b_{1} x + b_{2} x^{2} + b_{3} x^{3} + b_{4} x^{4} + b_{5} x^{5} |$ $\begin{equation*} g(x) = \left| b_0 + b_1 x + b_2 x^2 + b_3 x^3 + b_4 x^4 + b_5 x^5 \right|\end{equation*}$ (4)

to the squared residuals, thus providing a fit to the local variance. We note that this function provides a fit to the variance on the prediction of log₁₀(x(e⁻)), and, as previously, x is the log₁₀ of the observable ratio. The absolute value in this function was chosen because the residual variance is by definition a positive quantity. The best fit coefficients to this residual variance function are given in Tables B.5–B.8. An example of the corresponding residual standard deviation function is shown in the right panel of Fig. 5, where we compare its moving-average estimate (dotted red curves around the main prediction curve) to its squared polynomial fit (dashed black curve), showing in both cases the 3σ level, for the best column density ratio in the cold dense medium case (CN/N₂H).

The resulting fits are also presented for the best ratio in the different cases in Figs. 2 and 4, showing the RF fit, the analytical fit and the scatter fit. Similar figures for each of the 6 best tracers for both the translucent medium and the cold dense medium conditions, and for both column density ratios and line intensities ratios, are presented in Appendix C.

6 Discussions

6.1 Traditional ionization tracers

One of the most commonly used ionization fraction tracers is DCO⁺/HCO⁺ (Guelin et al. 1977, 1982; Dalgarno & Lepp 1984; Caselli et al. 1998). It is used mainly in cold dense cores, where the temperatureis low enough to allow sufficient deuterium enrichment, and the column density is large enough to make DCO⁺ detectable. However, our results show that even in the cold dense gas regime, and despite including DCO⁺ in our list of potential tracers, the DCO⁺/HCO⁺ column density ratio is not ranked among the best tracers of the ionization fraction. In fact, it is ranked as the 38^th best tracer in dense cold gas conditions, with a R² of 0.57 only(cf. Table B.7). This ratio is often determined using observations of H¹³CO⁺ as H¹² CO⁺ can be optically thick in high-column-density lines of sight. We leave this aspect aside in this discussion by showing that the column density ratio DCO⁺/HCO⁺ itself (however it might be determined observationally) suffers from several limitations as a tracer of the ionization fraction.

The relationship between the DCO⁺/HCO⁺ abundance ratio and the ionization fraction found in our (cold and dense) model grid is shown in Fig. 6. The blue distribution shows the results of the model grid and the black line our best RF model, with a R² of 0.57. Wesee that two main problems limit the usability of DCO⁺/HCO⁺.

First, a large scatter of the ionization values (by up to 3 orders of magnitude) is present at all values of the ratio, despite having a significant fraction of the distribution tightly located around a clear relationship (the outermost blue contour encloses 75% of the distribution). As a result, the best fit RF model tries to make a compromise between the tightly located part of the distribution, and the scattered points at lower ionization fraction values. We found most of this scatter to be related to variations in the ortho-to-para ratio of H₂ (OPR $_{H_{2}}$ $_{\mathrm{H_2}}$ ), an unobservable parameter whose value remains difficult to estimate in observations of dense cold cores. For instance, selecting only the models having OPR $_{H_{2}} < 2.5 \times 10^{- 3}$ $_{\mathrm{H_2}} < 2.5\times 10^{-3}$ , we see in Fig. 6 (red dashed contour) that we retain only the unscattered part of the distribution. Thus the difficulty of obtaining reliable estimates of the OPR of H₂ in dense cores limits the use of DCO⁺/HCO⁺ as a tracer of the ionization fraction.

Second, even when selecting the low OPR $_{H_{2}}$ $_{\mathrm{H_2}}$ models, we see that the relationship presents a very steep slope at high ratio values (low ionization fraction values). As a result, for DCO⁺/HCO⁺ ratios above 10^−1.6, a range of ionization fractions of more than two orders of magnitude is possible. The ratio would then only be usable for lower ratios, equivalent to ionization fractions larger than 10^−6.5. Even if the model grid presented no scatter at all, a steep slope implies that small observational uncertainties on the ratio will induce large uncertainties on the predicted ionization fraction. Thus, relationships with steep slopes are of limited use.

These different effects combine to make the DCO⁺/HCO⁺ ratio a poor predictor of the ionization fraction, compared to the best ranked tracers found by our method.

Fig. 6

DCO⁺ over HCO⁺ column density ratio in our grid of dense cold gas models (blue points and contours), shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Our best fit Random Forest model is shown as a black line. The red dashed contours shows the distributions of the models with OPR $_{H_{2}} < 2.5 \times 10^{- 3}$ $_{\mathrm{H}_2} < 2.5\times10^{-3}$ .

6.2 Detectability constraints

So far, detectability constraints have been ignored. Predictive power has been tested from noiseless values of column density or line intensity ratios. However, the different lines considered here have widely different brightnesses and thus differ in terms of detectability with current instruments. We explore here the effect of various noise levels on the predictive power of the different line ratios. We will consider two noise setups corresponding to two observation scenarios: the case of one constant noise level for all lines, corresponding to the typical case of a line survey where faint lines are detected with a lower signal-to-noise ratio (S/N) than bright ones, and the case of a fixed S/N for all lines, corresponding to observations being designed to reach a set S/N for a few desired lines. These two cases correspond to the two opposite extremes of possible observation scenarios and will give us a general overview of the possible impact of noise on the performance of the tracers. In both cases, synthetic noise is added to the line integrated intensity values and we consider the S/N on the integrated intensity and not the peak intensity. All noise and S/N values quoted are for individual lines, not for line ratios.

In bothcases, in order to measure how the predictive power (measured as the R² value) is affected by noise, we perform a modified cross-validation. The model grid is randomly split in ten parts. For each of these tenths and for each line ratio:

1.
A RF model is trained from the other nine parts of the model grids, without any noise added.
2.
The trained RF is tested on the tenth under consideration, with added noise (either with a constant noise variance $σ_{noise}^{2}$ $\sigma^2_{\mathrm{noise}}$ for all models and all lines, or with a constant S/N for each model and line). Since the RF models take the log₁₀ of the intensity ratio as input and since the addition of noise can produce negative values, we only apply the RF model when both line integrated intensity values are above 1 σ. Otherwise, we do not use the RF model and simply take the average ionization fraction value of the grid as our prediction.

Finally, we take the average R² value obtained over the ten noisy tests as the predictive power of the tracer under noisy conditions. We thus avoid estimating R² from datapoints that have been seen during training, and we estimate the predictive power of a model trained on noiseless data when applied to noisy data (as would be the case when applying the results of this article to real observations).

The results for the translucent medium grid, for a few possible line ratios, are presented in Figs. 7 and 8. Figure 7 presents the scenario of a constant noise level σ_noise for all lines and all models, and shows how the R² of the prediction from different line ratios decreases when the noise level is increased. We show only the three best line ratios (according to the noiseless ranking of Table B.2): C₂ H (1–0)/HCN (1–0), C₂H (1–0)/¹³CO (1–0), and C₂H (1–0)/C¹⁸O (1–0), and the four tracers found to be the least sensitive to noise. We define the tracers least sensitive to noise as those having the highest σ_1∕2 among tracers with R_noiseless ≥ 0.7, thus giving a compromise between a good fit quality (high R_noiseless) and a slow decrease with increasing noise level (high σ_1∕2). The four best ratios found according to this definition and shown on the figure are ¹³CO (1–0)/C¹⁸O (1–0), C¹⁸O (1–0)/CF⁺ (1–0), ¹³CO (1–0)/CF⁺ (1–0), and HCN (1–0)/CF⁺ (1–0). We see that the best three tracers, all including C₂H, have their predictive power decreasing sharply at a relatively low noise level (their R² decreases by 50% at σ_noise ~ 5 × 10⁻³ K km s⁻¹). This is due to the relatively low brightness of the C₂H line in translucent conditions (median integrated intensity of ~7 × 10⁻³ K km s⁻¹ in our translucent medium grid). In comparison, some line ratios built from brighter lines, despite a lower R² on noiseless data, are found to perform better in the presence of noise: the R² for the ¹³CO (1–0)/C¹⁸O (1–0) ratio decreases by 50% at σ_noise ~ 4 × 10⁻² K km s⁻¹ (C¹⁸O (1–0) has a median integrated intensity of ~3 × 10⁻¹ K km s⁻¹ in this grid), the C¹⁸O (1–0)/CF⁺ (1–0) ratio has its R² decreased by 50% at σ_noise ~ 10⁻² K km s⁻¹ (CF⁺ (1–0) has a median integrated intensity of ~1.2 × 10⁻² K km s⁻¹ in this grid). We note, however, that ratios built from CF⁺ (1–0) actually only perform marginaly better than the best ratios involving C₂ H at high noise levels (see Fig. 7) due to the low brightness of CF⁺ (1–0). For a more exhaustive comparison of the line ratios, Table 3 gives for each line ratio the noise level σ_1∕2 at which the R² is decreased by half.

Figure 8 similarly shows the results for the scenario of a fixed S/N for all lines and all models, still in the case of translucent medium conditions. The variations of the R² of the prediction with the S/N are shown for the best seven tracers (according to the noiseless ranking of Table B.2). In this scenario, we find as expected that the predictive power drops at an S/N of order unity. The only exception is the ¹³CO (1–0)/C¹⁸O (1–0) ratio, which decreases slightly earlier. This is due to this ratio spanning a relatively small range of values in our grid of models (approximately one order of magnitude) while the other line ratios span ranges of four to six orders of magnitude. As the ionization fraction spans a range of values of 2.5 orders of magnitude in the grid, this implies that the relationship between the ionization fraction and the ¹³CO (1–0)/C¹⁸O (1–0) ratio has a much steeper slope than for the other line ratios. A steep slope then implies that small errors in the line ratios result in large errors in the predicted ionization fraction. As a result, the ¹³CO (1–0)/C¹⁸O (1–0) ratio requires significantly higher S/N than the other line ratios. Similarly to σ_1∕2, we define S/N_1∕2 as the S/N value at which R² is half of its value on noiseless data. The S/N_1∕2 values for all line ratios are also given in Table 3.

The results for dense and cold medium conditions are similarly shown in Figs. 9 and 10 and Table 4. In the case of a constant noise level, we find that the R² drop generally occurs at higher noise levels than in the translucent medium case, as expected because the lines are brighter in the cold dense medium due to higher column densities. Figure 9 shows the decrease of R² with σ_noise for the best three dense gas tracers (according to the noiseless ranking of Table B.2), and for the four ratios least sensitive two noise (according to the same definition as previously). We note that ¹³CO (1–0)/HCO⁺ (1–0), the second best ratio on noiseless data (therefore already shown on the figure), has also the highest σ_1∕2 value among ratios with R_noiseless ≥ 0.7, so that we show the next four ratios least sensitive to noise on the figure. These four ratios are found to be C¹⁸O (1–0)/HCO⁺ (1–0), HCO⁺ (1–0)/CN (1–0), SO (3-2)/CN (1–0), and HCN (1–0)/CN (1–0). The σ_1∕2 values for all line ratios are listed in Table 4. When considering a constant S/N scenario for the dense cold medium case, we again find that the R² drop occursat an S/N value of order unity, as shown in Fig. 10. The S/N_1∕2 values for all line ratios are listed in Table 4.

When interpreting the results of this study, one must keep in mind that the decrease in R² in our two scenarios (constant σ_noise or constant S/N) does not come from the same effect. In the constant σ_noise scenario, at a given σ_noise level one part of the grid has undetected or very low S/N values for the ratio under consideration, while the other part has high S/N. The R² decrease is indicative of the growing fraction of the parameter space with undetected/low S/N. As a result, even when finding a low overall R², there might remain a fraction of the parameter space were the predictor remains very good (usually, high column density, high volume density, high temperature,...), which we did not characterize here. In the constant S/N scenario on the other hand, the S/N is by design constant over all models, independent of the physical parameters. The decrease in overall R² is then more representative of the decrease in predictive power at any point in the parameter space. As a result, a ratio with a low σ_1∕2 value might still be usable in real observations with a higher noise level but would be restricted to high brightness regions of GMCs (in the corresponding lines), while a ratio cannot be used at all in observations with S/N significantly lower than its S/N_1∕2 value.

Fig. 7

Evolution of the R² performance of the Random Forest predictors (for translucent gas) when adding noise of constant variance $σ_{noise}^{2}$ $\sigma_{\mathrm{noise}}^2$ to all line intensities, as a function of the noise level σ_noise, for the 3 best tracers for translucent medium (C₂H(1–0)/HCN(1–0), C₂ H(1–0)/¹³CO(1–0), and C₂ H(1–0)/C¹⁸O(1–0)) and for the four tracers least affected by the noise (¹³CO(1–0)/C¹⁸O(1–0), C¹⁸O(1–0)/CF⁺(1–0), ¹³CO(1–0)/CF⁺(1–0), and HCN/CF⁺(1–0)).

Fig. 8

Evolution of the R² performance of the Random Forest predictors (for translucent gas) when adding noise of constant signal-to-noise ratio to the line intensities, as a function of the S/N, for the 7 best tracers for translucent medium.

Table 3

Impact of adding noise to the line intensities on the predictive power R² (for translucent medium), measured by the noise level σ_1∕2 (in a constantnoise level situation) and the signal-to-noise ratio S/N_1∕2 (in a constantS/N situation) for which R² reaches one half of its value in the absence of noise $R_{(noiseless)}^{2}$ $R^2_{\mathrm{(noiseless)}}$ .

Fig. 9

Evolution of the R² performance of the Random Forest predictors (for cold dense gas) when adding noise of constant variance $σ_{noise}^{2}$ $\sigma_{\mathrm{noise}}^2$ to the line intensities, as a function of the noise level σ_noise, for the 3 best tracers for cold dense medium (first three curves) and for the four tracers least affected by the noise (next four curves).

Fig. 10

Evolution of the R² performanceof the Random Forest predictors (for cold dense gas) when adding noise of constant signal-to-noise ratio to the line intensities, as a function of the S/N, for the 7 best tracers for cold dense medium.

Table 4

Impact of adding noise to the line intensities on the predictive power R² (for cold dense medium), measured by the noise level σ_1∕2 (in a constantnoise level situation) and the signal-to-noise ratio S/N_1∕2 for which the R² reaches one half of its value in the absence of noise $R_{(noiseless)}^{2}$ $R^2_{\mathrm{(noiseless)}}$ .

6.3 Chemical model reliability

Independently of the statistical method that we present in this article, the results obtained rely on the chosen chemical model and its limitations. Previous works on ionization fraction tracers have mostly used stationary-state results of single-zone chemical models (although some works have used time-dependent models, e.g., Maret & Bergin 2007; Shingledecker et al. 2016). As these previous studies have been mostly limited to deuteration-based tracers, the focus of the present article has been on highlighting the non-deuteration-based tracers that can be found for the ionization fraction from similar chemical models. We discuss here the impacts of our model’s limitations on our results.

Our single-zone model cannot include a detailed treatment of UV radiative transfer through the cloud. While most photodissociation rates can be simply estimated based on an assumed optical depth of dust protecting each model from UV photons (parameter A_V in our models), species such as H₂, CO and its isotopologues can be protected from photodissociation by self- or mutual-shielding. While self-shielding of H₂ and mutual shielding of CO by H₂ are included in our model using approximations (Draine & Bertoldi 1996; Heays et al. 2017), mutual shielding of ¹³CO and C¹⁸O by H₂ and ¹² CO are not included. As a result, in our translucent medium models where photodissociation by external UV photons still plays an important role, we expect the abundances of the rarer CO isotopologues to be less reliable than the other species. We note that the observations of ¹³CO and C¹⁸O in our ORION-B dataset indeed present specificities (systematic excitation temperature differences with ¹² CO, Bron et al. 2018; Roueff et al. 2020) that remain unexplained even by more complex 1D PDR models.

The only explicit surface reaction in our chemical model is H₂ formation, however we account for the freeze-out of CO through our depletion parameter. The list of species that we consider as possible tracers has been restricted to species that are not strongly affected by surface chemistry beyond the depletion effect. Extension to more complex molecules would require a chemical model including a more complete treatment of surface chemistry.

Anothersource of uncertainties comes from the experimental or theoretical estimates of the reaction rate coefficients used in our chemical network. While we did not directly perform a sensitivity analysis of the reaction rate coefficients, our model grids include temperature variations which subsequently impact the reactions rate coefficients through their temperature dependence. This is especially true for the important dissociative recombination reactions which display significative temperature dependences. Temperature is then considered as an unobserved parameter when searching for good tracers of x(e⁻). The discovery of strong relationships between some of the ratios and x(e⁻), despite temperature variations in the grid, thus indicates that these relationships are to some extent robust to the reaction rates. A careful analysis of the magnitude and correlations of model uncertainties resulting from reaction rate uncertainties in the chemical network would deserve a separate study.

Time-dependent effects are expected to be more important in cold dense medium conditions than in translucent medium conditions as photochemistry has shorter timescales in the latter case. Time-dependent effects that result from the time evolution of some physical parameter (e.g., density and temperature during the contraction of a core) while the chemistry follows in a quasi-stationary way are in part accounted for in our models by exploring a large range of the various physical parameters that can be subject to variations (see Table 1). In addition, the progressive freeze out of CO on dust grains is accounted for by considering a range of depletion factors for carbon and oxygen. Similarly, the slow evolution of OPR $_{H_{2}}$ $_{\mathrm{H_2}}$ in cold gas can keep parts of the chemistry (deuterium chemistry, nitrogen chemistry) in a time-dependent evolution that depends mainly on the evolution of OPR $_{H_{2}}$ $_{\mathrm{H_2}}$ . This is also in part accounted for by exploring a large range of OPR $_{H_{2}}$ $_{\mathrm{H_2}}$ values in our models. The tracer-finding method presented in this article will be applied to time-dependent chemical models in a future study.

The final and most important limitation of our model is that it does not include a spatial dimension: gas at a single value of density, temperature, etc. is assumed to be exposed to a given radiation field and protected by a given column density. As a result the possibility that different emission lines originate in separate layers of gas on the line of sight is completely neglected. Variations of the physical conditions along the line of sight can indeed have an important impact on the observables (e.g., Levrier et al. 2012). This limitation could be important both in the translucent medium where physical and chemical gradients are present due to the progressive extinction of the external UV field, and in dense cores with density and temperature gradients. As a result some caution must be exercised when choosing the line ratios to consider, ratios involving two species expected to emit in completely different regions should be avoided. For instance, the C₂ H (1–0)/N₂H⁺ (1–0) that is found as the fourth best line ratio in dense cold medium should be avoided: C₂ H is known to be a tracer of UV illumination and thus more likely to be emitted at the external surface of a given clump, while N₂ H⁺ is abundant in the inner regions of the core where CO is already significantly depleted. An application of our method to 1D PDR models to better account for this effect in translucent medium conditions will be carried out in a future work.

6.4 Parameter PDF in the model grids

In the model grids used in this study, we sampled uniformly (in logarithm) for the values of the unobservable physical conditions (gas density, temperature, UV field, etc.) in an hypercube defined by lower and upper bounds for each of the parameters. The results of our ranking method will depend on this assumed PDF (probability density function) for the physical conditions in the interstellar medium (ISM). We made here the choice of making the minimal assumption: knowing only reasonable lower and upper bounds on each of the parameters, the uniform distribution is the maximum entropy PDF (i.e. the PDF that best represents our assumed state of knowledge). As a perspective, if more a priori knowledge is available, then more accurate assumptions for the PDF (in particular for the correlations between the different physical parameters) could reveal additional tracers.

We notethat this assumption of a uniform PDF over a maximum support (in the sense that any more accurate PDF would have almost all of its weight enclosed in this support) makes it likely that any tracer found to have a very good relationship with the ionization fraction would keep a strong relationship for more accurate PDF choices (if the relationship is strong over the full hypercube, it should with high likelihood stay strong on subregions of this hypercube). In this sense, we expect the tracers found here to remain reliable, but a more accurate PDF choice might strongly increase the performance of some tracers found here to perform poorly and thus reveal additional tracers of the ionization fraction. This argument remains however qualitative as pathological cases of PDF might be constructed that would radically change the rankings of the tracers.

As a result, the precise rankings presented in this article could slightly change, but we expect the good tracers highlighted by these ranking to be reliable for more realistic PDFs of the physical conditions in GMCs.

6.5 Finalrecommandation

Based on the limitations discussed above (detectability and model reliability), we recommend the use of the following integrated line intensity ratios to trace the ionization fraction.

In translucent medium conditions, we recommend the use C₂H (1–0)/HCN (1–0), C₂H (1–0)/HNC (1–0), or C₂H (1–0)/CN (1–0). If sensitivity is an issue, HCN (1–0)/CF⁺ (1–0) can sometimes perform as well as the previously listed ratios.
In cold dense gas conditions, we recommend the use of CF⁺ (1–0)/DCO⁺ (1–0), ¹³CO (1–0)/HCO⁺ (1–0) or CN (1–0)/N₂H⁺ (1–0) if detectability is not an issue for these species, and of ¹³CO (1–0)/HCO⁺ (1–0) or C¹⁸O (1–0)/ HCO⁺ (1–0) otherwise.

This list is of course not exhaustive, and other ratios can give satisfactory predictions (see Tables 3 and 4) if the species listed above are not available.

In translucent gas conditions, this recommendation is based on the following points. After eliminating rarer CO isotopologues based on our discussion of mutual-shielding effects on selective photodissociation in low/moderate A_V regions (cf. Sect. 6.3), and eliminating ratios involving sulfur species that were found to require unreasonably low noise levels of 10⁻⁴−10⁻⁵ K km s⁻¹, the three remaining best line intensity ratios are C₂H (1–0)/HCN (1–0) and C₂H (1–0)/HNC (1–0) and C₂H (1–0)/CN (1–0). If noise sensitivity is critical, the tracer found to have the best predictive power at high noise levels is found in Sect. 6.2to be HCN (1–0)/CF⁺ (1–0) but is negligibly better than the three previously mentioned at high noise levels.

In cold dense gas conditions, the three best ratios are CF⁺ (1–0)/DCO⁺ (1–0), ¹³CO (1–0)/HCO⁺, and CN (1–0)/N₂H⁺ (1–0). If noise sensitivity is critical, we found in Sect. 6.2 that the ratios with the best predictive power at high noise levels are ¹³CO (1–0)/HCO⁺ and C¹⁸O (1–0)/HCO⁺ (1–0).

7 Conclusions

In this paper, we present a general statistical method to find the best observable tracers of an unobservable parameter based on a grid of models spanning the range of possible values for all the unknown underlying physical parameters (e.g., gas density, temperature, depletion, etc.). Our method estimates the predictive power of each potential observable tracer by training a flexible, non-linear regression model (a Random Forest model, making no assumption on the non-linear shape of the relationship to be found) on the task of predicting the target quantity from each of the potential tracers. The fit quality on test data, measured as the R² coefficient by cross-validation and out-of-bag estimation, is used to rank the potential tracers by order of predictive power.

In the context of our recent studies of the Orion B GMC (Pety et al. 2017; Gratier et al. 2017; Orkisz et al. 2017, 2019; Bron et al. 2018), we have applied this method to the important astrophysical question of tracing the ionization fraction in the neutral ISM, with the goal of being able to probe its variations across a whole GMC, from its translucent enveloppe to its dense cores. We considered grids of single-zone, stationary state astrochemical models exploring wide ranges of values in gas density, temperature, external UV field, A_V on the line of sight, cosmic ray ionization rate, ortho-to-para ratio of H₂, depletion factor, and sulfur elemental abundance. For a finer exploration of the possible conditions, we considered two grids corresponding to translucent medium conditions and cold dense medium conditions respectively, based on the different types of environments found in the Orion B GMC (Pety et al. 2017; Bron et al. 2018).

We considered successively column density ratios and line intensity ratios as potential tracers, focusing on species observable in the band at 100 GHz of our observations of Orion B. We find that in both cases and in both types of physical conditions, multiple ratios allow accurate predictions of the ionization fraction, with R² > 0.8 (and up to 0.96). We investigated the impact of the noise level on the predictive capability of the different ratios After accounting for detectability and model reliability, we recommend:

for translucent medium conditions, C₂H (1–0)/HCN (1–0), C₂H (1–0)/HNC (1–0) or C₂H (1–0)/CN (1–0);
for cold dense medium conditions, CF⁺ (1–0)/DCO⁺ (1–0), ¹³CO (1–0)/HCO⁺ (1–0) or CN (1–0)/N₂H⁺ (1–0) at low enough noise level, or ¹³CO (1–0)/HCO⁺ (1–0) or C¹⁸O (1–0)/HCO⁺ (1–0) if sensitivity is an issue.

In order to simplify the use of these predictors, we constructed ad hoc analytical fits (using polynomials or saturated polynomials) of the relationship of each observable tracer to the ionization fraction. Contrary to the Random Forest models, the choice of the analytical form of these fits is specific to the types of relationships observed in this specific application (different analytical forms might be necessary for other applications). We also provide analytical formulae to estimate the uncertainty on any measurement of the ionization fraction from these tracers. These tracers will be used to study the ionization fraction in the Orion B molecular cloud in a second paper (Guzman et al., in prep.).

The method presented here is very general and could be easily applied to finding tracers of other related (cosmic ray ionization rate, absolute electron abundance) or unrelated (gas density, temperature, OPR $_{H_{2}}$ $_{\textrm{H}_2}$ ,...) unobservable quantities. This method can also be extended simply to simultaneously use pairs (or more) of line ratios (by training RF models on the possible combinations of line ratios), which would likely further increase the quality of the prediction.

Acknowledgements

We thank the anonymous referee for his comments that helped improve this article. We thank the CIAS for their hospitality during the many workshops devoted to the ORION-B project. This work was supported in part by the Programme National “Physique et Chimie du Milieu Interstellaire” (PCMI) of CNRS/INSU with INC/INP, co-funded by CEA and CNES. This project has received financial support from the CNRS through the MITI interdisciplinary programs. The authors also acknowledge funding by Paris Observatory through the AF Astrochimie program. J.R.G. thanks Spanish MICI for funding support under grant AYA2017-85111-P.

Appendix A Random Forest parameter optimization

The Random Forest contains a few tuning parameters for which a value needs to be chosen before training. In particular, we will consider only the two most important here: the number of trees in the forest N_trees, and the maximum depth allowed for each tree, d_max. Another usual parameter is not considered here: the number of predictors in the random subset considered when choosing along which axis to make a split. Indeed, we will only train RF models with a single predictor at a time, so that this number is necessarily 1.

Since optimizing the choice of these parameters on the full model grids (that will then be used for training) and for each potential tracer will lead both to an increased risk of overfitting and a heavy computational time cost, we opt for a very limited optimization, where a single set of parameter values is used for all tracers (i.e. for all RF models trained on a single ratio)and for all model grids. The selection of these parameter values is done with a simplified procedure:

1.
For each model grid, RF models are first trained on each ratio using default parameter values (default sklearn values, N_trees = 100 and d_max = ∞), and the OOB R² value is calculated for each ratio.
2.
For the two ratios having the best and worst R² in the previous step, RF models are trained for a grid of parameter values exploring N_trees = 50−800 and d_max = 1−12, and the OOB R² is again computed for RF models with each possible combination of parameter values (the OOB value is used to limit the risksof overfitting).
3.
As the R² maps obtained show very flat minima, we found a common set of parameter values assuring a R² within 0.01 of the best value in all cases. The parameter values found are N_trees = 400, and d_max = 4.

In order to illustrate this procedure, Fig. A.1 shows for instance the resulting R² maps as a function of N_trees and d_max for the preliminary best (top) and worst (bottom) tracers in the translucent model grid when using line integrated intensity ratios. The red contours delimit the region where R² is within 0.01 of its maximum value. We see that the variations with N_trees are limited to a decrease at low values. Any large enough value could thus be chosen but larger values will induce higher computation time cost, so that a minimal acceptable value had to be chosen. The variations with d_max show a clear maximum (although rather flat). The R² value decreases for decreasing values of d_max below the optimum as the RF model then becomes not flexible enough to capture the relationship between the predictor and the target variable. Above the optimum, R² decreases for increasing values of d_max as overfitting starts to arise. The RF model becomes too flexible and starts to learn noise artefacts (we recall that the “noise” here is induced by the random sampling of the unobservable physical parameters of the chemical model).

Fig. A.1

R² (OOB value) as a function of the tuning parameters of the Random Forest (N_trees and d_max), for thepreliminary best (top panel) and worst (bottom panel) ratios found though a preliminary estimate using defaults values of the parameters, for line integrated intensity ratios and translucent medium conditions. The red contours show the region where the R² value is within 0.01 of the maximum.

Appendix B Tables for tracers ranking, performance, and fit coefficients

B.1 Tracers ranking and performance

We provide here the ranking and performance of single ratio RF models, obtained following the method described in Sect. 2.

B.1.1 Translucent medium

Tables B.1 and B.2 present the ranking we obtain for translucent medium conditions, respectively for column density ratios and integrated line intensity ratios. See Sect. 2 for a description of the method used, and Sect. 4 for a discussion of these results. For each ratio, we list the performance of the corresponding RF model measured through the (cross-validated) R², the equivalent root mean square error on log₁₀(x(e⁻)) and the corresponding error factor on x(e⁻), the maximum absolute error on log₁₀(x(e⁻)) and the corresponding error factor on x(e⁻). We also list for comparison the R² value obtained with the analytical fit described in Sect. 5.

B.1.2 Cold dense medium

Tables B.3 and B.4 present the ranking we obtain for cold dense medium conditions, respectively for column density ratios and integrated line intensity ratios.

B.2 Analytical fit coefficients

We provide here the fit coefficients for the analytical fits described in Sect. 5.

Table B.1

Ranking of column density ratios according to their usefulness to predict the ionization fraction in translucent medium conditions (measured through the R² of a fitted Random Forest model).

B.2.1 Translucent medium

Tables B.5 and B.6 list the fit coefficients for translucent medium conditions, respectively for column density ratios and integrated line intensity ratios. See Sect. 5 for a description of the method used. For each ratio, ranked according the results of Sect. 4, we list the fit coefficients (corresponding to the fit formula given in Eq. (3)) for predicting log₁₀(x(e⁻)), the quality of this fit estimated as the (cross-validated) R², the root mean square error on log₁₀(x(e⁻)) and corresponding error factor on x(e⁻), the maximum absolute error on log₁₀(x(e⁻)) and corresponding error factor on x(e⁻), the fit coefficients (corresponding to the fit formula given in Eq. (4)) for estimating the uncertainty on the prediction, and the limits of the validity range of the fit (given as log₁₀ of the ratio values).

B.2.2 Cold dense medium

Tables B.7 and B.8 list the fit coefficients for cold dense medium conditions, respectively for column density ratios and integrated line intensity ratios. For each ratio, ranked according the results of Sect. 4, we list the fit coefficients (corresponding to the fit formula given in Eq. (2)) for predicting log₁₀(x(e⁻)), the quality of this fit estimated as the (cross-validated) R², the root mean square error on log₁₀(x(e⁻)) and corresponding error factor on x(e⁻), the maximum absolute error on log₁₀(x(e⁻)) and corresponding error factor on x(e⁻), the fit coefficients (corresponding to the fit formula given in Eq. (4)) for estimating the uncertainty on the prediction, and the limits of the validity range of the fit (given as log₁₀ of the ratio values.

Table B.2

Ranking of line intensity ratios according to their usefulness to predict the ionization fraction in translucent medium conditions (measured through the R² of a fitted Random Forest model).

Table B.3

Ranking of column density ratios according to their usefulness to predict the ionization fraction in dense cold medium conditions (measured through the R² of a fitted Random Forest model).

Table B.4

Ranking of line intensity ratios according to their usefulness to predict the ionization fraction in dense cold medium conditions (measured through the R² of a fitted Random Forest model).

Table B.5

Fit coefficients (for our main fit and scatter fit) and fit quality for column density ratios in translucent medium conditions.

Table B.6

Fit coefficients (for our main fit and scatter fit) and fit quality for line intensity ratios in translucent medium conditions.

Table B.7

Fit coefficients (for our main fit and scatter fit) and fit quality for column density ratios in cold dense medium conditions.

Table B.8

Fit coefficients (for our main fit and scatter fit) and fit quality for line intensity ratios in cold dense medium conditions.

Appendix C Analytical model visualization

We present in this section figures of the RF models and analytical fits for the best six tracers in each case.

C.1 Translucent medium

Figures C.1 presents scatter plots of the ionization fraction versus each of the best six column density ratios in translucent medium conditions. Superimposed are the RF model (red line), the analytical fit (solid black line), the uncertainty fit (dashed lines), and the validity range (blue vertical lines). Figure C.2 similarly shows scatter plots of the ionization fraction versus each of the best six integrated line intensity ratios in translucent medium conditions.

C.2 Cold dense medium

Figures C.3 and C.4 similarly present the scatter plots (for column density ratios and integrated line intensity ratios respectively) for the best six tracers in cold dense medium conditions.

Fig. C.1

Ionization fraction versus column density ratio for the best six ratios found in Sect. 4 for translucent medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated on the figure.

Fig. C.2

Ionization fraction versus integrated line intensity ratio for the best six ratios found in Sect. 4 for translucent medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated on the figure.

Fig. C.3

Ionization fraction versus column density ratio for the best six ratios found in Sect. 4 for cold dense medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated on the figure.

Fig. C.4

Ionization fraction versus integrated line intensity ratio for the best six ratios found in Sect. 4 for cold dense medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated in the figure.

References

Agúndez, M., & Wakelam, V. 2013, Chem. Rev., 113, 8710 [Google Scholar]
Balbus, S. A., & Hawley, J. F. 1991, ApJ, 376, 214 [NASA ADS] [CrossRef] [Google Scholar]
Barger, C. J., & Garrod, R. T. 2020, ApJ, 888, 38 [NASA ADS] [CrossRef] [Google Scholar]
Basu, S., & Mouschovias, T. C. 1994, ApJ, 432, 720 [NASA ADS] [CrossRef] [Google Scholar]
Bengio, Y., & Grandvalet, Y. 2004, J. Mach. Learn. Res., 5, 1089 [Google Scholar]
Bergin, E. A., Plume, R., Williams, J. P., & Myers, P. C. 1999, ApJ, 512, 724 [NASA ADS] [CrossRef] [Google Scholar]
Beuther, H., Semenov, D., Henning, T., & Linz, H. 2008, ApJ, 675, L33 [NASA ADS] [CrossRef] [Google Scholar]
Black, J. H., & van Dishoeck, E. F. 1991, ApJ, 369, L9 [NASA ADS] [CrossRef] [Google Scholar]
Breiman, L. 1996, Mach. Learn., 24, 123 [Google Scholar]
Breiman, L. 2001, Mach. Learn., 45, 5 [CrossRef] [Google Scholar]
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. 1984, Classification and Regression Trees (New York: Chapman & Hall), 358 [Google Scholar]
Bron, E., Daudon, C., Pety, J., et al. 2018, A&A, 610, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Caselli, P. 2002, Planet. Space Sci., 50, 1133 [NASA ADS] [CrossRef] [Google Scholar]
Caselli, P., Walmsley, C. M., Terzieva, R., & Herbst, E. 1998, ApJ, 499, 234 [NASA ADS] [CrossRef] [Google Scholar]
Caselli, P., Walmsley, C. M., Zucconi, A., et al. 2002, ApJ, 565, 344 [Google Scholar]
Cuadrado, S., Goicoechea, J. R., Pilleri, P., et al. 2015, A&A, 575, A82 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Cuadrado, S., Salas, P., Goicoechea, J. R., et al. 2019, A&A, 625, L3 [CrossRef] [EDP Sciences] [Google Scholar]
Dalgarno, A., & Lepp, S. 1984, ApJ, 287, L47 [NASA ADS] [CrossRef] [Google Scholar]
Dislaire, V., Hily-Blant, P., Faure, A., et al. 2012, A&A, 537, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Draine, B. T. 2011, Physics of the Interstellar and Intergalactic Medium (Princeton: Princeton University Press) [Google Scholar]
Draine, B. T., & Bertoldi, F. 1996, ApJ, 468, 269 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Flower, D. R., Pineau Des Forêts, G., & Walmsley, C. M. 2007, A&A, 474, 923 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Fossé, D., Cernicharo, J., Gerin, M., & Cox, P. 2001, ApJ, 552, 168 [NASA ADS] [CrossRef] [Google Scholar]
Fuente, A., Cernicharo, J., Roueff, E., et al. 2016, A&A, 593, A94 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Goicoechea, J. R., Pety, J., Gerin, M., et al. 2006, A&A, 456, 565 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Goicoechea, J. R., Pety, J., Gerin, M., Hily-Blant, P., & Le Bourlot, J. 2009, A&A, 498, 771 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Goldsmith, P. F., & Kauffmann, J. 2017, ApJ, 841, 25 [NASA ADS] [CrossRef] [Google Scholar]
Gratier, P., Bron, E., Gerin, M., et al. 2017, A&A, 599, A100 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gratier, P., Pety, J., Bron, E., et al. 2020, A&A, 645, A27 [CrossRef] [EDP Sciences] [Google Scholar]
Guelin, M., Langer, W. D., Snell, R. L., & Wootten, H. A. 1977, ApJ, 217, L165 [NASA ADS] [CrossRef] [Google Scholar]
Guelin, M., Langer, W. D., & Wilson, R. W. 1982, A&A, 107, 107 [NASA ADS] [Google Scholar]
Guzmán, V., Pety, J., Gratier, P., et al. 2012, A&A, 543, L1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Guzmán, V. V., Pety, J., Goicoechea, J. R., et al. 2015, ApJ, 800, L33 [NASA ADS] [CrossRef] [Google Scholar]
Hastie, T., Tibshirani, R., & Friedman, J. 2001, The Elements of Statistical Learning, Springer Series in Statistics (New York: Springer New York Inc.) [Google Scholar]
Heays, A. N., Bosman, A. D., & van Dishoeck, E. F. 2017, A&A, 602, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Herbst, E., & Klemperer, W. 1973, ApJ, 185, 505 [NASA ADS] [CrossRef] [Google Scholar]
Hollenbach, D. J., & Tielens, A. G. G. M. 1999, Rev. Mod. Phys., 71, 173 [Google Scholar]
Indriolo, N., & McCall, B. J. 2012, ApJ, 745, 91 [NASA ADS] [CrossRef] [Google Scholar]
Indriolo, N., Geballe, T. R., Oka, T., & McCall, B. J. 2007, ApJ, 671, 1736 [NASA ADS] [CrossRef] [Google Scholar]
Le Bourlot, J. 1991, A&A, 242, 235 [NASA ADS] [Google Scholar]
Le Petit, F., Nehmé, C., Le Bourlot, J., & Roueff, E. 2006, ApJS, 164, 506 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Le Petit, F., Ruaud, M., Bron, E., et al. 2016, A&A, 585, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lepp, S. 1992, IAU Symp., 150, 471 [Google Scholar]
Levrier, F., Le Petit, F., Hennebelle, P., et al. 2012, A&A, 544, A22 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Liszt, H. S. 2012, A&A, 538, A27 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Liszt, H. S., & Pety, J. 2016, ApJ, 823, 124 [NASA ADS] [CrossRef] [Google Scholar]
Maret, S., & Bergin, E. A. 2007, ApJ, 664, 956 [NASA ADS] [CrossRef] [Google Scholar]
McCall, B. J., Huneycutt, A. J., Saykally, R. J., et al. 2003, Nature, 422, 500 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
Mestel, L., & Spitzer, L., 1956, MNRAS, 116, 503 [NASA ADS] [CrossRef] [Google Scholar]
Miettinen, O., Hennemann, M., & Linz, H. 2011, A&A, 534, A134 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Mladenović, M., & Roueff, E. 2017, A&A, 605, A22 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Mouschovias, T. C. 1976, ApJ, 207, 141 [NASA ADS] [CrossRef] [Google Scholar]
Neufeld, D. A., Schilke, P., Menten, K. M., et al. 2006, A&A, 454, L37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Oppenheimer, M., & Dalgarno, A. 1974, ApJ, 192, 29 [NASA ADS] [CrossRef] [Google Scholar]
Orkisz, J. H., Pety, J., Gerin, M., et al. 2017, A&A, 599, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Orkisz, J. H., Peretto, N., Pety, J., et al. 2019, A&A, 624, A113 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Padovani, M., Galli, D., & Glassgold, A. E. 2009, A&A, 501, 619 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Pagani, L., Salez, M., & Wannier, P. G. 1992, A&A, 258, 479 [NASA ADS] [Google Scholar]
Pagani, L., Roueff, E., & Lesaffre, P. 2011, ApJ, 739, L35 [NASA ADS] [CrossRef] [Google Scholar]
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
Pety, J., Teyssier, D., Fossé, D., et al. 2005, A&A, 435, 885 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Pety, J., Guzmán, V. V., Orkisz, J. H., et al. 2017, A&A, 599, A98 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Roueff, E., Loison, J. C., & Hickson, K. M. 2015, A&A, 576, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Roueff, A., Gerin, M., Gratier, P., et al. 2020, A&A, 645, A26 [Google Scholar]
Shingledecker, C. N., Bergner, J. B., Le Gal, R., et al. 2016, ApJ, 830, 151 [CrossRef] [Google Scholar]
van der Tak, F. F. S., Black, J. H., Schöier, F. L., Jansen, D. J., & van Dishoeck, E. F. 2007, A&A, 468, 627 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Varoquaux, G. 2018, NeuroImage, 180, 68 [CrossRef] [Google Scholar]
Williams, J. P., Bergin, E. A., Caselli, P., Myers, P. C., & Plume, R. 1998, ApJ, 503, 689 [NASA ADS] [CrossRef] [Google Scholar]

¹

Informations and data related to the ORION-B program can be found at http://www.iram.fr/~pety/ORION-B/

²

http://autorank.ism.obspm.fr

³

Datafiles containing the training RF models and tables of the rankings presented in this section are available at http://www.iram.fr/~pety/ORION-B/data.html.

⁴

Except in strong PDR environments, where a small fraction of vibrationally excited H₂ may overcome the endothermicity barrier. Strong PDR environments are not considered here.

⁵

We use the notation log ₁₀ for the logarithm in base 10, and log _e for the natural logarithm.

All Tables

Table 1

Range of physical parameters explored for each of our two classes of medium: gas density n_H, gas temperature T_gas, incident FUVradiation field intensity G₀, line-of-sight visual extinction A_V, cosmic-ray ionization rate ζ, H₂ ortho-to-pararatio OPR $_{H_{2}}$ $_{\mathrm{H}_2}$ , depletion factor and sulfur gas-phase elemental abundance [S].

In the text

Table 2

Shorthand names, full quantum number designation, and frequency of the molecular lines we considered.

In the text

Table 3

Impact of adding noise to the line intensities on the predictive power R² (for translucent medium), measured by the noise level σ_1∕2 (in a constantnoise level situation) and the signal-to-noise ratio S/N_1∕2 (in a constantS/N situation) for which R² reaches one half of its value in the absence of noise $R_{(noiseless)}^{2}$ $R^2_{\mathrm{(noiseless)}}$ .

In the text

Table 4

Impact of adding noise to the line intensities on the predictive power R² (for cold dense medium), measured by the noise level σ_1∕2 (in a constantnoise level situation) and the signal-to-noise ratio S/N_1∕2 for which the R² reaches one half of its value in the absence of noise $R_{(noiseless)}^{2}$ $R^2_{\mathrm{(noiseless)}}$ .

In the text

Table B.1

Ranking of column density ratios according to their usefulness to predict the ionization fraction in translucent medium conditions (measured through the R² of a fitted Random Forest model).

In the text

Table B.2

Ranking of line intensity ratios according to their usefulness to predict the ionization fraction in translucent medium conditions (measured through the R² of a fitted Random Forest model).

In the text

Table B.3

Ranking of column density ratios according to their usefulness to predict the ionization fraction in dense cold medium conditions (measured through the R² of a fitted Random Forest model).

In the text

Table B.4

Ranking of line intensity ratios according to their usefulness to predict the ionization fraction in dense cold medium conditions (measured through the R² of a fitted Random Forest model).

In the text

Table B.5

Fit coefficients (for our main fit and scatter fit) and fit quality for column density ratios in translucent medium conditions.

In the text

Table B.6

Fit coefficients (for our main fit and scatter fit) and fit quality for line intensity ratios in translucent medium conditions.

In the text

Table B.7

Fit coefficients (for our main fit and scatter fit) and fit quality for column density ratios in cold dense medium conditions.

In the text

Table B.8

Fit coefficients (for our main fit and scatter fit) and fit quality for line intensity ratios in cold dense medium conditions.

In the text

All Figures

Fig. 1

Ranking of column density ratios (left) or line intensity ratios (right) of observable tracers by order of the predictive power for predicting the ionization fraction (measured by the R² coefficient), in the case of translucent medium conditions (showing only the first 20). Errorbars of the R² estimates are computed by cross-validation (see text for explanations). Inset in the left panel: zoom on the first five ratios in order to make the magnitude of the errorbars visible.

In the text

Fig. 2

Ionization fraction versus the best column density ratio, C₂H/HCN (left panel), and the best line intensity ratio, C₂H (1–0) / HCN (1–0) (right panel) for tracing the ionization fraction in translucent medium conditions. The model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line, presented in Sect. 5), the analytical fit of the 1σ uncertainty (dashed black lines, presented in Sect. 5), and the bounds of the validity range of the analytical fit (vertical lines, presented in Sect. 5). The quality estimates of the two models are indicated on the figure.

In the text

	Fig. 3 Ranking of column density ratios (left) or line intensity ratios (right) of observable tracers by order of the predictive power for predicting the ionization fraction (measured by the R² coefficient), in the case of dense cold medium conditions (showing only the first 20). Errorbars of the R² estimates are computed by cross-validation (see text for explanations).
In the text

	Fig. 4 Same as Fig. 2 for cold dense medium conditions.
In the text

Fig. 5

Illustration of the two sources of uncertainties for the best column density ratio for tracing the ionization fraction in dense cold medium, CN/N₂H⁺. Left panel: analytical fit (solid black line), the standard deviation around this curve corresponding to the uncertainties on the fit coefficients (thin dashed red lines representing the 3σ level), and the bounds of the validity range defined in the text (vertical blue lines). Right panel: analytical fit (solid black line) and two estimates of the standard deviation corresponding to the residual scatter of the data points around the curve: the red dotted line presents a moving-average estimate of the local standard deviation, and the dashed black lines shows our analytical fit of the residual standard deviation. On both panels, the chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points.

In the text

Fig. 6

DCO⁺ over HCO⁺ column density ratio in our grid of dense cold gas models (blue points and contours), shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Our best fit Random Forest model is shown as a black line. The red dashed contours shows the distributions of the models with OPR $_{H_{2}} < 2.5 \times 10^{- 3}$ $_{\mathrm{H}_2} < 2.5\times10^{-3}$ .

In the text

Fig. 7

Evolution of the R² performance of the Random Forest predictors (for translucent gas) when adding noise of constant variance $σ_{noise}^{2}$ $\sigma_{\mathrm{noise}}^2$ to all line intensities, as a function of the noise level σ_noise, for the 3 best tracers for translucent medium (C₂H(1–0)/HCN(1–0), C₂ H(1–0)/¹³CO(1–0), and C₂ H(1–0)/C¹⁸O(1–0)) and for the four tracers least affected by the noise (¹³CO(1–0)/C¹⁸O(1–0), C¹⁸O(1–0)/CF⁺(1–0), ¹³CO(1–0)/CF⁺(1–0), and HCN/CF⁺(1–0)).

In the text

	Fig. 8 Evolution of the R² performance of the Random Forest predictors (for translucent gas) when adding noise of constant signal-to-noise ratio to the line intensities, as a function of the S/N, for the 7 best tracers for translucent medium.
In the text

	Fig. 9 Evolution of the R² performance of the Random Forest predictors (for cold dense gas) when adding noise of constant variance $σ_{noise}^{2}$ $\sigma_{\mathrm{noise}}^2$ to the line intensities, as a function of the noise level σ_noise, for the 3 best tracers for cold dense medium (first three curves) and for the four tracers least affected by the noise (next four curves).
In the text

	Fig. 10 Evolution of the R² performanceof the Random Forest predictors (for cold dense gas) when adding noise of constant signal-to-noise ratio to the line intensities, as a function of the S/N, for the 7 best tracers for cold dense medium.
In the text

Fig. A.1

R² (OOB value) as a function of the tuning parameters of the Random Forest (N_trees and d_max), for thepreliminary best (top panel) and worst (bottom panel) ratios found though a preliminary estimate using defaults values of the parameters, for line integrated intensity ratios and translucent medium conditions. The red contours show the region where the R² value is within 0.01 of the maximum.

In the text

Fig. C.1

Ionization fraction versus column density ratio for the best six ratios found in Sect. 4 for translucent medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated on the figure.

In the text

Fig. C.2

Ionization fraction versus integrated line intensity ratio for the best six ratios found in Sect. 4 for translucent medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated on the figure.

In the text

Fig. C.3

Ionization fraction versus column density ratio for the best six ratios found in Sect. 4 for cold dense medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated on the figure.

In the text

Fig. C.4

Ionization fraction versus integrated line intensity ratio for the best six ratios found in Sect. 4 for cold dense medium conditions. The chemical model grid is shown as a scatter plot, with the central crowded regions replaced by PDF isocontours containing 25, 50, and 75% of the points. Superimposed are the RF model (red line), the analytical fit (solid black line), and the analytical fit of the 1σ uncertainty (dashed black lines). The quality estimates of the two models are indicated in the figure.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Agúndez, M., & Wakelam, V. 2013, Chem. Rev., 113, 8710 [Google Scholar]

[2] Balbus, S. A., & Hawley, J. F. 1991, ApJ, 376, 214 [NASA ADS] [CrossRef] [Google Scholar]

[3] Barger, C. J., & Garrod, R. T. 2020, ApJ, 888, 38 [NASA ADS] [CrossRef] [Google Scholar]

[4] Basu, S., & Mouschovias, T. C. 1994, ApJ, 432, 720 [NASA ADS] [CrossRef] [Google Scholar]

[5] Bengio, Y., & Grandvalet, Y. 2004, J. Mach. Learn. Res., 5, 1089 [Google Scholar]

[6] Bergin, E. A., Plume, R., Williams, J. P., & Myers, P. C. 1999, ApJ, 512, 724 [NASA ADS] [CrossRef] [Google Scholar]

[7] Beuther, H., Semenov, D., Henning, T., & Linz, H. 2008, ApJ, 675, L33 [NASA ADS] [CrossRef] [Google Scholar]

[8] Black, J. H., & van Dishoeck, E. F. 1991, ApJ, 369, L9 [NASA ADS] [CrossRef] [Google Scholar]

[9] Breiman, L. 1996, Mach. Learn., 24, 123 [Google Scholar]

[10] Breiman, L. 2001, Mach. Learn., 45, 5 [CrossRef] [Google Scholar]

[11] Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. 1984, Classification and Regression Trees (New York: Chapman & Hall), 358 [Google Scholar]

[12] Bron, E., Daudon, C., Pety, J., et al. 2018, A&A, 610, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[13] Caselli, P. 2002, Planet. Space Sci., 50, 1133 [NASA ADS] [CrossRef] [Google Scholar]

[14] Caselli, P., Walmsley, C. M., Terzieva, R., & Herbst, E. 1998, ApJ, 499, 234 [NASA ADS] [CrossRef] [Google Scholar]

[15] Caselli, P., Walmsley, C. M., Zucconi, A., et al. 2002, ApJ, 565, 344 [Google Scholar]

[16] Cuadrado, S., Goicoechea, J. R., Pilleri, P., et al. 2015, A&A, 575, A82 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Cuadrado, S., Salas, P., Goicoechea, J. R., et al. 2019, A&A, 625, L3 [CrossRef] [EDP Sciences] [Google Scholar]

[18] Dalgarno, A., & Lepp, S. 1984, ApJ, 287, L47 [NASA ADS] [CrossRef] [Google Scholar]

[19] Dislaire, V., Hily-Blant, P., Faure, A., et al. 2012, A&A, 537, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[20] Draine, B. T. 2011, Physics of the Interstellar and Intergalactic Medium (Princeton: Princeton University Press) [Google Scholar]

[21] Draine, B. T., & Bertoldi, F. 1996, ApJ, 468, 269 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[22] Flower, D. R., Pineau Des Forêts, G., & Walmsley, C. M. 2007, A&A, 474, 923 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[23] Fossé, D., Cernicharo, J., Gerin, M., & Cox, P. 2001, ApJ, 552, 168 [NASA ADS] [CrossRef] [Google Scholar]

[24] Fuente, A., Cernicharo, J., Roueff, E., et al. 2016, A&A, 593, A94 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[25] Goicoechea, J. R., Pety, J., Gerin, M., et al. 2006, A&A, 456, 565 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[26] Goicoechea, J. R., Pety, J., Gerin, M., Hily-Blant, P., & Le Bourlot, J. 2009, A&A, 498, 771 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[27] Goldsmith, P. F., & Kauffmann, J. 2017, ApJ, 841, 25 [NASA ADS] [CrossRef] [Google Scholar]

[28] Gratier, P., Bron, E., Gerin, M., et al. 2017, A&A, 599, A100 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[29] Gratier, P., Pety, J., Bron, E., et al. 2020, A&A, 645, A27 [CrossRef] [EDP Sciences] [Google Scholar]

[30] Guelin, M., Langer, W. D., Snell, R. L., & Wootten, H. A. 1977, ApJ, 217, L165 [NASA ADS] [CrossRef] [Google Scholar]

[31] Guelin, M., Langer, W. D., & Wilson, R. W. 1982, A&A, 107, 107 [NASA ADS] [Google Scholar]

[32] Guzmán, V., Pety, J., Gratier, P., et al. 2012, A&A, 543, L1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[33] Guzmán, V. V., Pety, J., Goicoechea, J. R., et al. 2015, ApJ, 800, L33 [NASA ADS] [CrossRef] [Google Scholar]

[34] Hastie, T., Tibshirani, R., & Friedman, J. 2001, The Elements of Statistical Learning, Springer Series in Statistics (New York: Springer New York Inc.) [Google Scholar]

[35] Heays, A. N., Bosman, A. D., & van Dishoeck, E. F. 2017, A&A, 602, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[36] Herbst, E., & Klemperer, W. 1973, ApJ, 185, 505 [NASA ADS] [CrossRef] [Google Scholar]

[37] Hollenbach, D. J., & Tielens, A. G. G. M. 1999, Rev. Mod. Phys., 71, 173 [Google Scholar]

[38] Indriolo, N., & McCall, B. J. 2012, ApJ, 745, 91 [NASA ADS] [CrossRef] [Google Scholar]

[39] Indriolo, N., Geballe, T. R., Oka, T., & McCall, B. J. 2007, ApJ, 671, 1736 [NASA ADS] [CrossRef] [Google Scholar]

[40] Le Bourlot, J. 1991, A&A, 242, 235 [NASA ADS] [Google Scholar]

[41] Le Petit, F., Nehmé, C., Le Bourlot, J., & Roueff, E. 2006, ApJS, 164, 506 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[42] Le Petit, F., Ruaud, M., Bron, E., et al. 2016, A&A, 585, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[43] Lepp, S. 1992, IAU Symp., 150, 471 [Google Scholar]

[44] Levrier, F., Le Petit, F., Hennebelle, P., et al. 2012, A&A, 544, A22 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[45] Liszt, H. S. 2012, A&A, 538, A27 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[46] Liszt, H. S., & Pety, J. 2016, ApJ, 823, 124 [NASA ADS] [CrossRef] [Google Scholar]

[47] Maret, S., & Bergin, E. A. 2007, ApJ, 664, 956 [NASA ADS] [CrossRef] [Google Scholar]

[48] McCall, B. J., Huneycutt, A. J., Saykally, R. J., et al. 2003, Nature, 422, 500 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]

[49] Mestel, L., & Spitzer, L., 1956, MNRAS, 116, 503 [NASA ADS] [CrossRef] [Google Scholar]

[50] Miettinen, O., Hennemann, M., & Linz, H. 2011, A&A, 534, A134 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[51] Mladenović, M., & Roueff, E. 2017, A&A, 605, A22 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[52] Mouschovias, T. C. 1976, ApJ, 207, 141 [NASA ADS] [CrossRef] [Google Scholar]

[53] Neufeld, D. A., Schilke, P., Menten, K. M., et al. 2006, A&A, 454, L37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[54] Oppenheimer, M., & Dalgarno, A. 1974, ApJ, 192, 29 [NASA ADS] [CrossRef] [Google Scholar]

[55] Orkisz, J. H., Pety, J., Gerin, M., et al. 2017, A&A, 599, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[56] Orkisz, J. H., Peretto, N., Pety, J., et al. 2019, A&A, 624, A113 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[57] Padovani, M., Galli, D., & Glassgold, A. E. 2009, A&A, 501, 619 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[58] Pagani, L., Salez, M., & Wannier, P. G. 1992, A&A, 258, 479 [NASA ADS] [Google Scholar]

[59] Pagani, L., Roueff, E., & Lesaffre, P. 2011, ApJ, 739, L35 [NASA ADS] [CrossRef] [Google Scholar]

[60] Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]

[61] Pety, J., Teyssier, D., Fossé, D., et al. 2005, A&A, 435, 885 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[62] Pety, J., Guzmán, V. V., Orkisz, J. H., et al. 2017, A&A, 599, A98 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[63] Roueff, E., Loison, J. C., & Hickson, K. M. 2015, A&A, 576, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[64] Roueff, A., Gerin, M., Gratier, P., et al. 2020, A&A, 645, A26 [Google Scholar]

[65] Shingledecker, C. N., Bergner, J. B., Le Gal, R., et al. 2016, ApJ, 830, 151 [CrossRef] [Google Scholar]

[66] van der Tak, F. F. S., Black, J. H., Schöier, F. L., Jansen, D. J., & van Dishoeck, E. F. 2007, A&A, 468, 627 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[67] Varoquaux, G. 2018, NeuroImage, 180, 68 [CrossRef] [Google Scholar]

[68] Williams, J. P., Bergin, E. A., Caselli, P., Myers, P. C., & Plume, R. 1998, ApJ, 503, 689 [NASA ADS] [CrossRef] [Google Scholar]