Machine learning-accelerated chemistry modeling of protoplanetary disks

Grigorii V. Smirnov-Pinchukov; Tamara Molyarova; Dmitry A. Semenov; Vitaly V. Akimkin; Sierk van Terwisga; Riccardo Francheschi; Thomas Henning

doi:10.1051/0004-6361/202244691

Home

All issues

Volume 666 (October 2022)

A&A, 666 (2022) L8

Full HTML

Open Access

Issue		A&A Volume 666, October 2022


Article Number		L8
Number of page(s)		10
Section		Letters to the Editor
DOI		https://doi.org/10.1051/0004-6361/202244691
Published online		11 October 2022

A&A 666, L8 (2022)

Letter to the Editor

Machine learning-accelerated chemistry modeling of protoplanetary disks

Grigorii V. Smirnov-Pinchukov¹, Tamara Molyarova², Dmitry A. Semenov¹^,3, Vitaly V. Akimkin², Sierk van Terwisga¹, Riccardo Francheschi¹ and Thomas Henning¹

¹ Max Planck Institute for Astronomy, Königstuhl 17, 69117 Heidelberg, Germany
e-mail: smirnov@mpia.de
² Institute of Astronomy, Russian Academy of Sciences, Pyatnitskaya str. 48, Moscow 119017, Russia
³ Department of Chemistry, Ludwig Maximilian University, Butenandtstr. 5-13, 81377 Munich, Germany

Received: 5 August 2022
Accepted: 22 September 2022

Abstract

Aims. With the large amount of molecular emission data from (sub)millimeter observatories and incoming James Webb Space Telescope infrared spectroscopy, access to fast forward models of the chemical composition of protoplanetary disks is of paramount importance.

Methods. We used a thermo-chemical modeling code to generate a diverse population of protoplanetary disk models. We trained a K-nearest neighbors (KNN) regressor to instantly predict the chemistry of other disk models.

Results. We show that it is possible to accurately reproduce chemistry using just a small subset of physical conditions, thanks to correlations between the local physical conditions in adopted protoplanetary disk models. We discuss the uncertainties and limitations of this method.

Conclusions. The proposed method can be used for Bayesian fitting of the line emission data to retrieve disk properties from observations. We present a pipeline for reproducing the same approach on other disk chemical model sets.

Key words: astrochemistry / methods: numerical / protoplanetary disks / stars: pre-main sequence / ISM: molecules / submillimeter: planetary systems

© G. V. Smirnov-Pinchukov et al. 2022

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model.

Open Access funding provided by Max Planck Society.

1. Introduction

Time-dependent gas-grain chemical kinetics codes are widely used in the astrochemical modeling of the ISM, protoplanetary disks, and even exoplanetary atmospheres. A typical computational time to calculate the chemical evolution of an isolated volume over 10⁵ − 10⁶ years usually takes about 0.1–10 seconds on a single CPU core, depending on the complexity of the chemical network and variability of the physical conditions. More complex networks – that include deuterium or carbon isotopologues (Albertsson et al. 2013; Yu et al. 2016) or that separately treat reactions on dust surface and in the bulk of icy mantles (Vasyunin et al. 2009; Garrod 2013) – increase the number of reactions and slow down the calculations drastically. For a typical grid size of a 2D protoplanetary disk model of 100 × 100, a session of chemical kinetics modeling takes at least 15 minutes up to several days on a single CPU core. This is a reasonable timeframe for forward modeling but too long for any retrieval of disk physical parameters based on a fitting of molecular data. Recently, Keil et al. (2022) proposed a new method for deriving the physical parameters of uniform gas clouds using Markov chain Monte Carlo sampling with realistic chemical and radiative transfer modeling and successfully applied it to the L1544 data. However, protoplanetary disks exhibit more complex structures with strong gradients of physical conditions and another approach is thus needed.

With the recent large programs on protoplanetary disk chemistry at the Atacama Large Millimeter/submillimeter Array (ALMA, MAPS: Öberg et al. 2021; 2021.1.00128.L/AGE-PRO PI: Ke Zhang, in progress; see also Guzmán et al. 2021; Ilee et al. 2021) and Northern Extended Millimeter Array (NOEMA, L19ME/PRODIGE, PI: P. Caselli, Th. Henning., in progress, see Semenov et al., in prep.), as well as large, partially spatially unresolved surveys of circumstellar disk populations (ODISEA: Cieza et al. 2019; Ansdell et al. 2016), the amount of available data for analysis expands rapidly. In addition, high angular- and frequency-resolution observations of molecular emission lines in individual disks are now common (e.g., Pegues et al. 2020, 2021; Garufi et al. 2020). Analyses of such data can be vastly improved if all the lines are taken into account simultaneously (Fedele & Favre 2020; Holdship & Viti 2022). However, protoplanetary disk model retrieval in a Bayesian sense is still in its infancy.

Our goal is to build a function (an estimator) to return the reasonably accurate chemical composition using the smallest possible subset of local physical conditions characterizing the protoplanetary disk. Applications of machine learning techniques are emerging in all fields of astronomy and physics (e.g., Dieleman et al. 2015; Dunjko & Briegel 2018; Carleo et al. 2019; Ribas et al. 2020; Ardévol Martínez et al. 2022), providing a robust, human-independent, and flexible way to find dependencies and correlations within a data set. The approach is also being applied to astrochemistry: among others, Lee et al. (2021) searched for similarities between molecules to propose possible detectable molecules, Grassi et al. (2022) suggested a method to reduce the number of species for chemical kinetics, Holdship et al. (2021) explored a large parameter space to provide time-dependent chemistry, and, most recently, Villadsen et al. (2022) presented a way to predict binding energies of molecules on dust surfaces.

In this work, we present a solution to the problem of chemical model performance. We computed a grid of 540 thermo-chemical protoplanetary disk models, containing more than two million physical bins in various disk environments, but at fixed elemental and dust compositions and at the same age, which we describe in Sect. 2. We used a K-nearest neighbors (KNN) machine learning (ML) algorithm (Goldberger et al. 2005) to create a robust interpolation between local physical conditions in the disk and the abundances of molecular and atomic species, described in detail in Sect. 3. Once the estimator is trained, the chemistry can be predicted in milliseconds per disk model, making it much faster than the next bottleneck, namely, the line radiative transfer. In Sect. 4, we demonstrate the performance and limitations of the method on a small set of chemical species. In Sect. 5, we summarize our findings.

2. Thermo-chemical protoplanetary disk model grid

Chemical kinetics models, such as ALCHEMIC (Semenov et al. 2010), NAUTILUS (Ruaud et al. 2016), UCLCHEM (Holdship et al. 2017), and KROME (Grassi et al. 2014) can be very flexible and include dozens to tens of thousands of parameters. Some of these parameters describe local physical conditions, such as gas and dust density and temperature as well as the local ultraviolet (UV) radiation field. Others are global parameters, for instance, the ionization rate by radioactive nuclides or such details as the probability modificator of a gas particle sticking to a dust grain after a collision, which affect a large fraction of the considered chemical reaction network. The initial elemental or molecular composition of the matter make up another set of important input parameters. In addition, the chemical network itself contains an extensive data set of reaction rates, with only ∼30% of the rate values known to an adequate level of accuracy.

Classical chemical kinetics codes can utilize any combination of these parameters, but most parameters are fixed in real applications. Also, some parameters could correlate between various chemical calculations. For example, it is reasonable to expect that the low-density, high-temperature regions of protoplanetary disks lie in the disk atmosphere, where the chemistry is dominated by a limited set of photo- and gas-phase reactions. On the other hand, cold and dense disk regions are typical for the midplane, where photochemistry is much less important and gas-grain interactions and surface reactions play a major role. This simple observation has led to the idea of constructing simplified chemical models that allow for quick computations that are nonetheless feasible to establish the abundances of simple species such as CO, without the need for full chemical modeling (Williams & Best 2014). Unfortunately, such approaches cannot be easily generalized to predict disk chemical composition for a larger set of observed molecules or important coolants.

To create a reference data set of protoplanetary disks with known chemical structures, we used the ANDES astrochemical model of a 2D axisymmetric hydrostatic disk. It employs a chemical network based on the ALCHEMIC network (Semenov & Wiebe 2011), with deuterium-bearing molecules and deuterium fractionation included following Albertsson et al. (2013, 2014). The network describes 1247 species and 38 347 reactions, including gas-phase and surface two-body reactions, adsorption and reactive desorption, photoreactions, and ionization or dissociation by X-rays, cosmic rays, and radioactive nuclides (Akimkin et al. 2013; Molyarova et al. 2017, 2018). The rates of surface reactions are adjusted to mimic the chemical inactivity of the bulk icy mantles; they are multiplied by a factor equal to the fraction of the upper layers in the total number of surface particles. Following Eistrup et al. (2016), we adopted an icy molecular initial composition based on the abundances of prestellar cores (Öberg et al. 2011). We ran the time-dependent chemical evolution for the typically assumed age of 1 Myr (Willacy et al. 1998; Aikawa et al. 2002).

The disk physical structure in our models is defined through stellar mass, M_star, disk mass, M_disk, and the characteristic radius, R_c; these parameters define the distribution of density, temperature, and radiation field in the (R, z) plane. The stellar mass governs the stellar temperature and luminosity, which are calculated at the age of 1 Myr using the evolutionary models by Yorke & Bodenheimer (2008). The X-ray radiation field is calculated using Bruderer et al. (2009). The interstellar cosmic ray ionization rate was calculated according to Padovani et al. (2018). We create an ensemble of 540 models with different stellar mass M_star, disk mass, M_disk, critical radius, R_c, and stellar X-ray luminosity, L_X, to cover a wide range of physical conditions typical for protoplanetary disks: M_star = 0.2...1.0 M_⊙, M_disk = 0.1...10 % M_star, R_c = 20...170 au, and L_X = 10^29...31 erg s⁻¹. Each model includes 50 logarithmically spaced radial points in the range of R = 0.1...1000 au and 80 vertical points in the range of z/R = 0...1. The dust size distribution is described by a power law with a −3.5 exponent between 5 × 10⁻⁷ and 2.5 × 10⁻³ cm. The UV radiation field for photoreactions is calculated using dust opacities based on Draine & Lee (1984). An averaged grain size of 3.7 × 10⁻⁵ cm is adopted for surface reaction rates. The UV excess from accretion is defined as L_acc = 1.5 GṀ_accM_star/R_star. The effective temperature of the accretion region is assumed to be 10 000 K. A summary of these parameters is presented in Table 1. With 4000 spatial points in each model, we have 2 160 000 points in total, sampling the chemical model output in conditions typical of protoplanetary disks. The total computing resource usage for the data generation took about 1 core year.

Table 1.

ANDES input grid.

Table 2.

Initial chemical abundances.

3. KNN estimator training

First, we built a data frame using pandas (Reback et al. 2021), which contains a subset of local physical parameters and a set of selected chemical species. For demonstration purposes, we chose CO, HCO⁺, DCO⁺, and electrons, relevant to ionization studies (Smirnov-Pinchukov et al. 2020; Aikawa et al. 2021). We also demonstrate the possible applications to other species in Appendix A. We have selected observationally-relevant disk positions with gas number density above 10⁴ cm⁻³, resulting in 1 183 212 data points. The subsequent analysis was performed using the Scikit-learn python library (Pedregosa et al. 2011). The physical quantities (input features) and the chemical abundances normalized to the total number of H atoms (output features) were renormalized to a uniform distribution in order to make the parameter space more uniformly sampled (see the example in Fig. 1). We split the data into a training set (432 disks, 946 586 points) and a test set (108 disks, 236 626 points). All the points from a single disk model must appear either in the training or a test set to avoid overfitting. The algorithm should predict the new points based on points from the other disks rather than interpolating nearby points of the same disk.

Fig. 1.

Example of the re-normalization of log₁₀X_CO abundance prior to interpolation. The peak at −4 in the unnormalized data corresponds to the condition when all the available C and O form CO.

We used the KNeighborsRegressor estimator (Goldberger et al. 2005). The algorithm finds the k nearest data points (in the input feature space) and interpolates the output feature values between them. For robustness, we choose the median value of these data points. The value of k should be chosen based on the data. If k = 1, it is a classical “nearest” interpolation. This sort of interpolation is very sensitive to outliers (overfitting) which, in our case, can represent a rare combination of parameters or even a numerical failure of the original chemical kinetics solver within ANDES. If k is too large, then the local behavior of chemistry cannot be properly caught. In the worst case, where k equals the number of data points, the solution would simply represent their median value.

We used cross-validation to choose the value of k and ensure the quality of the interpolation. In this approach, the training set is divided into 10 parts (splits), with all points from each disk being in the same split. The estimator is trained on 9/10 of the training data set and its performance is benchmarked against the remaining part, using the square sum of errors metric in the renormalized space to quantify the quality of the fit. This is repeated for each split, and the average performance is estimated. Then the same procedure repeats for another value of k, and this way, the best-performing value of k is found. The estimator with the best k is fitted again afterward on the entire training set. The results can be saved as a python binary (“pickle”) file and used as a fast callable function in other applications.

4. Results and discussion

First, we chose the local gas density, dust temperature, and ionization rate (the minimum set of key physical parameters for chemistry) as input features. We provide an example of the performance of the fit in Fig. 2, using the gas-phase CO molecule. As seen in panel a, CO is abundant in dense regions with a temperature above 30 K. Typically, the CO snowline should be at around 20 K. However, in our modeling, CO is also absent in the gas at higher temperatures due to the chemical transformation into CO₂ on dust grain surfaces (Molyarova et al. 2017; Bosman et al. 2018). At lower densities, which correspond to the outer disk, photodesorption by the interstellar UV and ionizing radiation maintain some amount of CO in the gas phase, enough for self-shielding. Low-density and high-temperature areas belong to the disk atmosphere, where the UV-radiation destroys CO. The fit reproduces this general behavior, showing insignificant systematic error (bias, dex) on the panel b. The fit scatter (panel c) is especially low (< 0.5 dex) for the inner disk and midplane. Most of the disk CO gas is present in the inner disk and, hence, the fit reproduces the majority of the gas-phase CO in disks with a reasonable level of accuracy. Moreover, the fit correctly predicts low CO abundances outside the CO snowline and in the disk atmosphere. Significant scatter (above 1 dex) is present only at the radiation-sensitive transition zone between the atmosphere and the rest of the disk in the low-density area. The top-left corner (very high density, very low temperature) is not covered by the original data set, as regions with such conditions never appeared in the disk model grid.

Fig. 2.

Performance of ML-accelerated chemistry predictions for CO. (a): mean log₁₀ predicted relative abundance as a function of local temperature, gas density, and ionization rate. Darker areas correspond to larger relative (to H atoms) abundance. (b): median of the difference between the predicted values and test set data (bias, dex), in dex, as a function of temperature and density. Gray areas correspond to an unbiased fit. (c): the standard deviation between the predicted values and test set data, in dex (std, dex). (d): relative density (histogram) of species within the data points, with contours, which are also present on other panels. Various regions of the protoplanetary disk are described on panel (a). A detailed description of the processes leading to this figure is in the main text. Other molecules are shown in Fig. A.1.

Notably, we can reproduce deuterium fractionation. In Fig. A.1, we show other species included in the estimator’s output: HCO⁺, DCO⁺, and e⁻. While electron density is fitted almost perfectly, the HCO⁺ and DCO⁺ fits show ∼1 dex scatter in the transition zone between the inner disk and the photodissociation area, where just a small amount of gas is present. Overall, with just three inputs’ feature set, the estimator is able to predict the disk chemistry with a good accuracy below 0.5 dex for most of the parameter space.

The addition of more input parameters to the input features further increases the quality of the fit. In Fig. A.2, we show the model after adding local UV radiation intensity to the set of the input features. We can see a significantly lower scatter in the whole parameter space. Temperature and density remain the best predictors of the disk chemistry, explaining the major variations of ≫5 dex, with the local ionization rate contributing to ∼1 − 2 dex and ultraviolet field contributing to ∼1 dex in relevant disk regions. The impact of ionizing radiation is more important than the UV for the molecules, as they reside in the deeper layers of the disk, while X-ray and cosmic rays penetrate deeper towards the midplane. Even in the three-parameter fit, it is important to take the local UV radiation field into account for the data generation process. Nevertheless, as the UV field is correlated with the combination of other input parameters, it is not necessary to have it as an input feature for the KNN algorithm.

5. Conclusions

We applied, for the first time, a machine-learning estimator to physical-chemical protoplanetary disk models to predict chemical abundances much more quickly than traditional “full” chemical calculations. Our estimator uses a small and easily-calculated set of local physical parameters as input features: the dust temperature, density, and ionization rate, with the possible addition of a local UV radiation strength. We applied this method to a pre-computed database of 540 protoplanetary disks of various masses, sizes, and stellar properties, including X-ray luminosities. We studied the effectiveness and limitations of the method due to the small input feature set, demonstrating how the addition of a local UV field improves the accuracy for four species in the gas: CO, HCO⁺, DCO⁺, and e⁻.

This approach is general and can be applied not only to this set of species and ANDES thermo-chemical disk models, but also to other species and astrochemical applications. For such purposes, the outputs of other astrochemical models (e.g., Bruderer et al. 2009; Woitke et al. 2009) with relevant physical parameters and desired chemical species abundances can be processed with the same approach. We publish the ANDES-generated data and the Jupyter notebook to reproduce our results on GitHub¹. These results will be used in an upcoming series of papers to fit the molecular data obtained in the framework of the large observing program on NOEMA, L19ME (PI: Th. Henning). We use this method to rapidly calculate chemical composition for the Bayesian retrieval of disk physical parameters using CO isotopologues (Francheschi et al., in prep.) and a combined fit of CO and HCO⁺ isotopologues (Smirnov-Pinchukov et al., in prep.).

¹

https://github.com/SmirnGreg/diskchef_chemistry

Acknowledgments

The authors acknowledge the contribution of the Python open-source community for providing high-quality data analysis tools. All figures were created using matplotlib (Hunter 2007). G.S.P. thanks Morgan Fouesneau, Ivelina Momcheva, and Markus Schmalzl for discussions about machine learning at MPIA. T.M. and V.A. were supported by the grant 075-15-2020-780 (N13.1902.21.0039) of Ministry of Science and Higher Education of the Russian Federation. T.H. and D.S. acknowledge support from the European Research Council under the Horizon 2020 Framework Program via the ERC Advanced Grant Origins 83 24 28.

References

Aikawa, Y., van Zadelhoff, G. J., van Dishoeck, E. F., & Herbst, E. 2002, A&A, 386, 622 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Aikawa, Y., Cataldi, G., Yamato, Y., et al. 2021, ApJS, 257, 13 [NASA ADS] [CrossRef] [Google Scholar]
Akimkin, V., Zhukovska, S., Wiebe, D., et al. 2013, ApJ, 766, 8 [NASA ADS] [CrossRef] [Google Scholar]
Albertsson, T., Semenov, D. A., Vasyunin, A. I., Henning, T., & Herbst, E. 2013, ApJS, 207, 27 [CrossRef] [Google Scholar]
Albertsson, T., Semenov, D., & Henning, T. 2014, ApJ, 784, 39 [NASA ADS] [CrossRef] [Google Scholar]
Ansdell, M., Williams, J. P., van der Marel, N., et al. 2016, ApJ, 828, 46 [Google Scholar]
Ardévol Martínez, F., Min, M., Kamp, I., & Palmer, P. I. 2022, A&A, 662, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bosman, A. D., Walsh, C., & van Dishoeck, E. F. 2018, A&A, 618, A182 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bruderer, S., Doty, S. D., & Benz, A. O. 2009, ApJS, 183, 179 [Google Scholar]
Carleo, G., Cirac, I., Cranmer, K., et al. 2019, Rev. Mod. Phys., 91, 045002 [NASA ADS] [CrossRef] [Google Scholar]
Cieza, L. A., Ruíz-Rodríguez, D., Hales, A., et al. 2019, MNRAS, 482, 698 [Google Scholar]
Dieleman, S., Willett, K. W., & Dambre, J. 2015, MNRAS, 450, 1441 [NASA ADS] [CrossRef] [Google Scholar]
Draine, B. T., & Lee, H. M. 1984, ApJ, 285, 89 [NASA ADS] [CrossRef] [Google Scholar]
Dunjko, V., & Briegel, H. J. 2018, Rep. Prog. Phys., 81, 074001 [NASA ADS] [CrossRef] [Google Scholar]
Eistrup, C., Walsh, C., & van Dishoeck, E. F. 2016, A&A, 595, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Fedele, D., & Favre, C. 2020, A&A, 638, A110 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Garrod, R. T. 2013, ApJ, 765, 60 [Google Scholar]
Garufi, A., Podio, L., Codella, C., et al. 2020, A&A, 636, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Goldberger, J., Hinton, G. E., Roweis, S., & Salakhutdinov, R. R. 2005, in Advances in Neural Information Processing Systems, eds. L. Saul, Y. Weiss, & L. Bottou (MIT Press), 17 [Google Scholar]
Grassi, T., Bovino, S., Schleicher, D. R. G., et al. 2014, MNRAS, 439, 2386 [Google Scholar]
Grassi, T., Nauman, F., Ramsey, J. P., et al. 2022, A&A, accepted, [arXiv:2104.09516] [Google Scholar]
Guzmán, V. V., Bergner, J. B., Law, C. J., et al. 2021, ApJS, 257, 6 [CrossRef] [Google Scholar]
Holdship, J., & Viti, S. 2022, A&A, 658, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Holdship, J., Viti, S., Jiménez-Serra, I., Makrymallis, A., & Priestley, F. 2017, AJ, 154, 38 [NASA ADS] [CrossRef] [Google Scholar]
Holdship, J., Viti, S., Haworth, T. J., & Ilee, J. D. 2021, A&A, 653, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]
Ilee, J. D., Walsh, C., Booth, A. S., et al. 2021, ApJS, 257, 9 [NASA ADS] [CrossRef] [Google Scholar]
Keil, M., Viti, S., & Holdship, J. 2022, ApJ, 927, 203 [NASA ADS] [CrossRef] [Google Scholar]
Lee, K. L. K., Patterson, J., Burkhardt, A. M., et al. 2021, ApJ, 917, L6 [NASA ADS] [CrossRef] [Google Scholar]
Molyarova, T., Akimkin, V., Semenov, D., et al. 2017, ApJ, 849, 130 [Google Scholar]
Molyarova, T., Akimkin, V., Semenov, D., et al. 2018, ApJ, 866, 46 [NASA ADS] [CrossRef] [Google Scholar]
Öberg, K. I., Boogert, A. C. A., Pontoppidan, K. M., et al. 2011, ApJ, 740, 109 [Google Scholar]
Öberg, K. I., Guzmán, V. V., Walsh, C., et al. 2021, ApJS, 257, 1 [CrossRef] [Google Scholar]
Padovani, M., Ivlev, A. V., Galli, D., & Caselli, P. 2018, A&A, 614, A111 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
Pegues, J., Öberg, K. I., Bergner, J. B., et al. 2020, ApJ, 890, 142 [Google Scholar]
Pegues, J., Öberg, K. I., Bergner, J. B., et al. 2021, ApJ, 911, 150 [NASA ADS] [CrossRef] [Google Scholar]
Reback, J., McKinney, W., jbrockmendel, et al. 2021, https://doi.org/10.5281/zenodo.4681666 [Google Scholar]
Ribas, Á., Espaillat, C. C., Macías, E., & Sarro, L. M. 2020, A&A, 642, A171 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Ruaud, M., Wakelam, V., & Hersant, F. 2016, MNRAS, 459, 3756 [Google Scholar]
Semenov, D., & Wiebe, D. 2011, ApJS, 196, 25 [Google Scholar]
Semenov, D., Hersant, F., Wakelam, V., et al. 2010, A&A, 522, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Smirnov-Pinchukov, G. V., Semenov, D. A., Akimkin, V. V., & Henning, T. 2020, A&A, 644, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Vasyunin, A. I., Semenov, D. A., Wiebe, D. S., & Henning, T. 2009, ApJ, 691, 1459 [Google Scholar]
Villadsen, T., Ligterink, N. F. W., & Andersen, M. 2022, A&A, 666, A45 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Willacy, K., Klahr, H. H., Millar, T. J., & Henning, T. 1998, A&A, 338, 995 [NASA ADS] [Google Scholar]
Williams, J. P., & Best, W. M. J. 2014, ApJ, 788, 59 [Google Scholar]
Woitke, P., Kamp, I., & Thi, W. F. 2009, A&A, 501, 383 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Yorke, H. W., & Bodenheimer, P. 2008, in Massive Star Formation: Observations Confront Theory, eds. H. Beuther, H. Linz, & T. Henning, ASP Conf. Ser., 387, 189 [Google Scholar]
Yu, M., Willacy, K., Dodson-Robinson, S.E., Turner, N. J., & Evans, N. J. I. 2016, ApJ, 822, 53 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Additional figures

Fig. A.1 demonstrates the application of the method on HCO⁺, DCO⁺, and electrons for three input features: local temperature, gas density, ionization rate. On the Fig. A.2 we present the result for the same molecules and CO with the local UV radiation strength added to the features list. In addition, we used our study to demonstrate the summary of the same method application to a larger number of different species, with four input features in Fig. A.3.

Fig. A.1.

Performance of ML-accelerated chemistry predictions for HCO⁺, DCO⁺, and electrons.

Fig. A.2.

Effect of adding UV radiation strength to the set of input features. The panels show the performance of ML-accelerated chemistry predictions, as in Fig. 2. Adding UV radiation strength increases the accuracy of the fit, but only slightly in the areas of the parameter space dominated by the selected molecules. Depending on the molecular species and constraints on calculation time, it is not necessary to use UV as a parameter.

Fig. A.3.

Application of the same method to a broader set of species. Performance of ML-accelerated chemistry prediction, same as in Fig. 2, based on four input features.

Fig. A.3.

continued.

Fig. A.3.

continued.

All Tables

Table 1.

ANDES input grid.

In the text

Table 2.

Initial chemical abundances.

In the text

All Figures

	Fig. 1. Example of the re-normalization of log₁₀X_CO abundance prior to interpolation. The peak at −4 in the unnormalized data corresponds to the condition when all the available C and O form CO.
In the text

Fig. 2.

Performance of ML-accelerated chemistry predictions for CO. (a): mean log₁₀ predicted relative abundance as a function of local temperature, gas density, and ionization rate. Darker areas correspond to larger relative (to H atoms) abundance. (b): median of the difference between the predicted values and test set data (bias, dex), in dex, as a function of temperature and density. Gray areas correspond to an unbiased fit. (c): the standard deviation between the predicted values and test set data, in dex (std, dex). (d): relative density (histogram) of species within the data points, with contours, which are also present on other panels. Various regions of the protoplanetary disk are described on panel (a). A detailed description of the processes leading to this figure is in the main text. Other molecules are shown in Fig. A.1.

In the text

	Fig. A.1. Performance of ML-accelerated chemistry predictions for HCO⁺, DCO⁺, and electrons.
In the text

Fig. A.2.

Effect of adding UV radiation strength to the set of input features. The panels show the performance of ML-accelerated chemistry predictions, as in Fig. 2. Adding UV radiation strength increases the accuracy of the fit, but only slightly in the areas of the parameter space dominated by the selected molecules. Depending on the molecular species and constraints on calculation time, it is not necessary to use UV as a parameter.

In the text

	Fig. A.3. Application of the same method to a broader set of species. Performance of ML-accelerated chemistry prediction, same as in Fig. 2, based on four input features.
In the text

	Fig. A.3. continued.
In the text

	Fig. A.3. continued.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Aikawa, Y., van Zadelhoff, G. J., van Dishoeck, E. F., & Herbst, E. 2002, A&A, 386, 622 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[2] Aikawa, Y., Cataldi, G., Yamato, Y., et al. 2021, ApJS, 257, 13 [NASA ADS] [CrossRef] [Google Scholar]

[3] Akimkin, V., Zhukovska, S., Wiebe, D., et al. 2013, ApJ, 766, 8 [NASA ADS] [CrossRef] [Google Scholar]

[4] Albertsson, T., Semenov, D. A., Vasyunin, A. I., Henning, T., & Herbst, E. 2013, ApJS, 207, 27 [CrossRef] [Google Scholar]

[5] Albertsson, T., Semenov, D., & Henning, T. 2014, ApJ, 784, 39 [NASA ADS] [CrossRef] [Google Scholar]

[6] Ansdell, M., Williams, J. P., van der Marel, N., et al. 2016, ApJ, 828, 46 [Google Scholar]

[7] Ardévol Martínez, F., Min, M., Kamp, I., & Palmer, P. I. 2022, A&A, 662, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[8] Bosman, A. D., Walsh, C., & van Dishoeck, E. F. 2018, A&A, 618, A182 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[9] Bruderer, S., Doty, S. D., & Benz, A. O. 2009, ApJS, 183, 179 [Google Scholar]

[10] Carleo, G., Cirac, I., Cranmer, K., et al. 2019, Rev. Mod. Phys., 91, 045002 [NASA ADS] [CrossRef] [Google Scholar]

[11] Cieza, L. A., Ruíz-Rodríguez, D., Hales, A., et al. 2019, MNRAS, 482, 698 [Google Scholar]

[12] Dieleman, S., Willett, K. W., & Dambre, J. 2015, MNRAS, 450, 1441 [NASA ADS] [CrossRef] [Google Scholar]

[13] Draine, B. T., & Lee, H. M. 1984, ApJ, 285, 89 [NASA ADS] [CrossRef] [Google Scholar]

[14] Dunjko, V., & Briegel, H. J. 2018, Rep. Prog. Phys., 81, 074001 [NASA ADS] [CrossRef] [Google Scholar]

[15] Eistrup, C., Walsh, C., & van Dishoeck, E. F. 2016, A&A, 595, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[16] Fedele, D., & Favre, C. 2020, A&A, 638, A110 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Garrod, R. T. 2013, ApJ, 765, 60 [Google Scholar]

[18] Garufi, A., Podio, L., Codella, C., et al. 2020, A&A, 636, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[19] Goldberger, J., Hinton, G. E., Roweis, S., & Salakhutdinov, R. R. 2005, in Advances in Neural Information Processing Systems, eds. L. Saul, Y. Weiss, & L. Bottou (MIT Press), 17 [Google Scholar]

[20] Grassi, T., Bovino, S., Schleicher, D. R. G., et al. 2014, MNRAS, 439, 2386 [Google Scholar]

[21] Grassi, T., Nauman, F., Ramsey, J. P., et al. 2022, A&A, accepted, [arXiv:2104.09516] [Google Scholar]

[22] Guzmán, V. V., Bergner, J. B., Law, C. J., et al. 2021, ApJS, 257, 6 [CrossRef] [Google Scholar]

[23] Holdship, J., & Viti, S. 2022, A&A, 658, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[24] Holdship, J., Viti, S., Jiménez-Serra, I., Makrymallis, A., & Priestley, F. 2017, AJ, 154, 38 [NASA ADS] [CrossRef] [Google Scholar]

[25] Holdship, J., Viti, S., Haworth, T. J., & Ilee, J. D. 2021, A&A, 653, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[26] Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]

[27] Ilee, J. D., Walsh, C., Booth, A. S., et al. 2021, ApJS, 257, 9 [NASA ADS] [CrossRef] [Google Scholar]

[28] Keil, M., Viti, S., & Holdship, J. 2022, ApJ, 927, 203 [NASA ADS] [CrossRef] [Google Scholar]

[29] Lee, K. L. K., Patterson, J., Burkhardt, A. M., et al. 2021, ApJ, 917, L6 [NASA ADS] [CrossRef] [Google Scholar]

[30] Molyarova, T., Akimkin, V., Semenov, D., et al. 2017, ApJ, 849, 130 [Google Scholar]

[31] Molyarova, T., Akimkin, V., Semenov, D., et al. 2018, ApJ, 866, 46 [NASA ADS] [CrossRef] [Google Scholar]

[32] Öberg, K. I., Boogert, A. C. A., Pontoppidan, K. M., et al. 2011, ApJ, 740, 109 [Google Scholar]

[33] Öberg, K. I., Guzmán, V. V., Walsh, C., et al. 2021, ApJS, 257, 1 [CrossRef] [Google Scholar]

[34] Padovani, M., Ivlev, A. V., Galli, D., & Caselli, P. 2018, A&A, 614, A111 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[35] Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]

[36] Pegues, J., Öberg, K. I., Bergner, J. B., et al. 2020, ApJ, 890, 142 [Google Scholar]

[37] Pegues, J., Öberg, K. I., Bergner, J. B., et al. 2021, ApJ, 911, 150 [NASA ADS] [CrossRef] [Google Scholar]

[38] Reback, J., McKinney, W., jbrockmendel, et al. 2021, https://doi.org/10.5281/zenodo.4681666 [Google Scholar]

[39] Ribas, Á., Espaillat, C. C., Macías, E., & Sarro, L. M. 2020, A&A, 642, A171 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[40] Ruaud, M., Wakelam, V., & Hersant, F. 2016, MNRAS, 459, 3756 [Google Scholar]

[41] Semenov, D., & Wiebe, D. 2011, ApJS, 196, 25 [Google Scholar]

[42] Semenov, D., Hersant, F., Wakelam, V., et al. 2010, A&A, 522, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[43] Smirnov-Pinchukov, G. V., Semenov, D. A., Akimkin, V. V., & Henning, T. 2020, A&A, 644, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[44] Vasyunin, A. I., Semenov, D. A., Wiebe, D. S., & Henning, T. 2009, ApJ, 691, 1459 [Google Scholar]

[45] Villadsen, T., Ligterink, N. F. W., & Andersen, M. 2022, A&A, 666, A45 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[46] Willacy, K., Klahr, H. H., Millar, T. J., & Henning, T. 1998, A&A, 338, 995 [NASA ADS] [Google Scholar]

[47] Williams, J. P., & Best, W. M. J. 2014, ApJ, 788, 59 [Google Scholar]

[48] Woitke, P., Kamp, I., & Thi, W. F. 2009, A&A, 501, 383 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[49] Yorke, H. W., & Bodenheimer, P. 2008, in Massive Star Formation: Observations Confront Theory, eds. H. Beuther, H. Linz, & T. Henning, ASP Conf. Ser., 387, 189 [Google Scholar]

[50] Yu, M., Willacy, K., Dodson-Robinson, S.E., Turner, N. J., & Evans, N. J. I. 2016, ApJ, 822, 53 [NASA ADS] [CrossRef] [Google Scholar]