ExoMDN: Rapid characterization of exoplanet interior structures with mixture density networks

Philipp Baumeister; Nicola Tosi

doi:10.1051/0004-6361/202346216

Home

All issues

Volume 676 (August 2023)

A&A, 676 (2023) A106

Full HTML

Open Access

Issue		A&A Volume 676, August 2023


Article Number		A106
Number of page(s)		14
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202346216
Published online		17 August 2023

A&A 676, A106 (2023)

ExoMDN: Rapid characterization of exoplanet interior structures with mixture density networks^★,^★★

Philipp Baumeister¹^,2 and Nicola Tosi¹

¹ Institute of Planetary Research, German Aerospace Center (DLR), Rutherfordstraße 2, 12489 Berlin, Germany
e-mail: philipp.baumeister@dlr.de
² Department of Astronomy and Astrophysics, Technische Universität Berlin, Hardenbergstraße 36, 10623 Berlin, Germany

Received: 22 February 2023
Accepted: 14 June 2023

Abstract

Aims. Characterizing the interior structure of exoplanets is essential for understanding their diversity, formation, and evolution. As the interior of exoplanets is inaccessible to observations, an inverse problem must be solved, where numerical structure models need to conform to observable parameters such as mass and radius. This is a highly degenerate problem whose solution often relies on computationally expensive and time-consuming inference methods such as Markov chain Monte Carlo.

Methods. We present ExoMDN, a machine-learning model for the interior characterization of exoplanets based on mixture density networks (MDN). The model is trained on a large dataset of more than 5.6 million synthetic planets below 25 Earth masses consisting of an iron core, a silicate mantle, a water and high-pressure ice layer, and a H/He atmosphere. We employ log-ratio transformations to convert the interior structure data into a form that the MDN can easily handle.

Results. Given mass, radius, and equilibrium temperature, we show that ExoMDN can deliver a full posterior distribution of mass fractions and thicknesses of each planetary layer in under a second on a standard Intel i5 CPU. Observational uncertainties can be easily accounted for through repeated predictions from within the uncertainties. We used ExoMDN to characterize the interiors of 22 confirmed exoplanets with mass and radius uncertainties below 10 and 5%, respectively, including the well studied GJ 1214 b, GJ 486 b, and the TRAPPIST-1 planets. We discuss the inclusion of the fluid Love number k₂ as an additional (potential) observable, showing how it can significantly reduce the degeneracy of interior structures. Utilizing the fast predictions of ExoMDN, we show that measuring k₂ with an accuracy of 10% can constrain the thickness of core and mantle of an Earth analog to ≈13% of the true values.

Key words: planets and satellites: interiors / planets and satellites: composition / methods: numerical / methods: statistical

^★

Full Table A.1 is only available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr (130.79.128.5) or via https://cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/676/A106

^★★

ExoMDN is freely accessible through the GitHub repository https://github.com/philippbaumeister/ExoMDN

© The Authors 2023

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

In the past decade, the number of discovered exoplanets has been growing rapidly, with more than 5000 planets confirmed to date. Characterizing the interior structures of these planets, that is, the size and mass of their main compositional reservoirs, is critical to understanding the processes that govern their formation, evolution, and potential to support life (Spiegel et al. 2014; Van Hoolst et al. 2019). Numerical models are commonly used to compute interior structures that fit to observed mass and radius of the planet (e.g., Sotin et al. 2007; Valencia et al. 2007; Fortney et al. 2007; Wagner et al. 2011; Zeng & Sasselov 2013; Unterborn & Panero 2019; Baumeister et al. 2020; Huang et al. 2022). However, unlike planets in the Solar System for which a wealth of observational data is available ranging from geodetic observations to in situ seismic measurements, for exoplanets, mass and radius are often the only parameters that can be determined. As a result, the interior structure is highly degenerate, with many qualitatively different interior compositions that can match the observations equally well (Rogers & Seager 2010; Dorn et al. 2015, 2017b; Brugger et al. 2017). Probabilistic inference methods, such as Markov chain Monte Carlo (MCMC) sampling, are regularly utilized to obtain a comprehensive picture of possible planetary interiors, while also taking into account observational uncertainties (Rogers & Seager 2010; Dorn et al. 2015, 2017b; Dorn & Heng 2018; Acuña et al. 2021). Given prior estimations of interior parameters, probabilistic inference methods allow for the determination of posterior probabilities that best fit the observations. However, in general, MCMC methods are computationally intensive and time-consuming, requiring calculations of hundreds of thousands of interior structure models. The interior inference of a single exoplanet can therefore take from hours to days. Furthermore, a dedicated framework combining both a forward interior structure model and an MCMC scheme is necessary, which can limit the large-scale applicability of these techniques due to the need for specialized expertise in planetary interior modeling. To fully exploit the ever-increasing number of exoplanet detections, a fast alternative to MCMC inference is needed.

Here, we present ExoMDN, a standalone machine-learning-based model that is capable of providing a full inference of the interior structure of low-mass exoplanets in under a second without the need for a dedicated interior model. We have made both the trained models and the training routines available in a GitHub repository¹. The purpose of ExoMDN is to provide a rapid first general characterization of an exoplanet interior, which can then be investigated further with more detailed, specialized models.

2 Machine Learning for interior characterization

In recent years, machine-learning based methods have become increasingly relevant in planetary science because of their ability to facilitate and speed up otherwise very time-consuming calculations. Deep neural networks in particular have been applied to the detection of transits (Chaushev et al. 2019; Malik et al. 2021; Valizadegan et al. 2022), for atmospheric retrievals (Márquez-Neila et al. 2018; Zingales & Waldmann 2018; Himes et al. 2022), in geodynamic simulations (Atkins et al. 2016; Agarwal et al. 2021a,b), in planet formation models (Alibert & Venturini 2019; Cambioni et al. 2019; Emsenhuber et al. 2020; Auddy et al. 2022), as well as for characterizing exoplanet interiors (Baumeister et al. 2020; Zhao & Ni 2021; Haldemann et al. 2023).

In an earlier work (Baumeister et al. 2020), we presented a proof-of-concept method to characterize exoplanet interiors using mixture density networks (MDNs; Bishop 1994), which can predict the full probability distribution of parameters by approximating these with a linear combination of Gaussian kernels. We trained an MDN to infer the range of plausible thicknesses of compositional layers in a planet based on mass and radius inputs. However, this was not a full characterization of the interior, as our network could only predict the marginals of the posterior distribution. While this gives an accurate estimation of the range of admissible parameter values, it does not allow us to pinpoint specific interior structures that fit observed mass and radius nor determining correlations between the various layers. For this purpose, the prediction of the full, multidimensional posterior distribution is required.

ExoMDN builds upon our previous work and is capable of providing a full inference of the entire posterior distribution of interior structures for a planet in a fraction of a second (e.g., on a standard Intel i5 CPU). In addition, we include the equilibrium temperature of the planet as an input parameter to the network in addition to mass and radius, by improving on the atmosphere and water layers in the underlying forward model used to generate the training data. In particular, we used the full water phase diagram compiled by Haldemann et al. (2020) in place of the previous simple isothermal, high-pressure ice layer and we modeled an isothermal atmosphere instead of the previous zero-temperature approach. The use of the equilibrium temperature thus implicitly includes the orbital distance as an observable parameter. We further improved the robustness of the underlying forward model at high pressures by adopting updated high-pressure equations of state (EoS) for the silicate mantle and iron core. A comparison of the old and new forward models can be found in Fig. B.5.

We used our interior model to first generate a dataset of ≈5.6 million synthetic planets spanning the desired parameter space of interior structures, planet masses, and equilibrium temperatures. We then trained a mixture density network to predict the parameters of a mixture of multivariate normal distributions, with the aim of approximating the posterior distribution for a given set of input parameters, namely, mass, radius, and equilibrium temperature. In order for the MDN to handle multidimensional predictions, we applied log-ratio transformations on the training data to convert the interior structures into new coordinates that the MDN can easily handle. We present two trained models: one trained on planetary mass, radius, and equilibrium temperature and the second including the fluid Love number k₂ as an additional input. Fluid Love numbers describe the shape of a rotating planet in hydrostatic equilibrium. The second-degree Love number k₂ is particularly interesting for exoplanet interior characterization, as it depends solely on the interior density distribution (Kellermann et al. 2018; Padovan et al. 2018; Baumeister et al. 2020). In a body with k₂ = 0, the entire mass is concentrated in the center, while k₂ = 1.5 corresponds to a fully homogeneous body. For a number of exoplanets, k₂ is potentially measurable through either second-order effects on the shape of the transit light curve (Hellard et al. 2019; Akinsanmi et al. 2019), or through the apsidal precession of the orbit (Csizmadia et al. 2019).

3 Methods

3.1 Interior model

We compute planetary interior structures with our code TATOOINE (Baumeister et al. 2020; MacKenzie et al. 2023). Each planet consists of compositionally distinct layers. The model takes as input the planet mass, M_p, the mass fractions of each layer, w_i, and the equilibrium temperature, T_eq (defined at the top of the atmosphere). From the top of the planet toward the center, the model calculates radial profiles of mass, m, pressure, P, and density, ρ, by solving the equations for mass conservation (1a), hydrostatic equilibrium (1b), as well as the equation of state (EoS, 1c) relating pressure, density, temperature, T, and composition, c: $\frac{d m (r)}{d r} = 4 π r^{2} ρ (r),$ ${{{\rm{d}}m\left( r \right)} \over {{\rm{d}}r}} = 4\pi {r^2}\rho \left( r \right),$ (1a) $\frac{d P (r)}{d r} = - \frac{G m (r) ρ (r)}{r^{2}},$ ${{{\rm{d}}P\left( r \right)} \over {{\rm{d}}r}} = - {{Gm\left( r \right)\rho \left( r \right)} \over {{r^2}}},$ (1b) $P (r) = f (ρ (r), T (r), c (r)),$ $P\left( r \right) = f\left( {\rho \left( r \right),T\left( r \right),c\left( r \right)} \right),$ (1c)

where G is the gravitational constant. The planet radius, R_p, is iteratively adjusted until the mass at the planet center approaches zero. This yields a final planet radius and the radius fractions of each layer d_i. We fix the pressure at the top of the atmosphere to 10 mbar. We focus here on planets below 25 M_⊕. We consider four distinct layers: an iron core, a silicate mantle, a water layer, and an H/He atmosphere.

3.1.1 Iron core

We assumed that the core consists of pure, solid, hcp-iron. We used the temperature-dependent, high-pressure EoS by Bouchet et al. (2013) for pressures below 234.4 GPa. At higher pressures, we switch to the high-pressure EoS from Hakim et al. (2018), valid up to 10 TPa.

The presence of lighter elements in the core such as sulfur or hydrogen can significantly reduce the density of the core, which in turn can have large effects on the core size and consequently on the planet radius (Hakim et al. 2018). The amount of lighter elements in an exoplanet’s core is hard to constrain, as it not only depends on the initial abundances in the protoplanetary disk, but also on the processes of core formation and magma ocean cooling (Hirose et al. 2021). For a proper treatment of the interior inference, the amount of light elements should be taken as a free parameter, which will increase the degeneracy of interior structures even more. For simplicity and to better illustrate our method, here we neglect the presence of lighter elements in the core, following an approach commonly used in the exo-planet community (e.g., Seager et al. 2007; Wagner et al. 2011). Nevertheless, we note that the uncertainty in the light elements budget of the core can be easily incorporated into our method by sampling from a range of core compositions upon creating the training dataset (Sect. 3.4).

3.1.2 Silicate mantle

The silicate layer consists of an upper mantle composed of olivine (Mg, Fe)₂SiO₄ and pyroxene (Mg, Fe)₂Si₂O₆, a lower mantle composed of magnesiowüstite (Mg, Fe)O and bridgmanite (Mg, Fe)SiO₃, and a high-pressure phase of magnesiowüstite and post-perovskite. The transition from upper to lower mantle is assumed to occur at a fixed pressure of 23 GPa, and the transition to post-perovskite at a pressure of: $P (T) = 89.184 GPa + 13.3 MPa K^{- 1} T,$ $P\left( T \right) = 89.184\,{\rm{GPa}}\,{\rm{ + }}\,{\rm{13}}{\rm{.3}}\,{\rm{MPa}}\,{{\rm{K}}^{ - 1}}\,T,$ (2)

following Tateno et al. (2009), where T is the adiabatic temperature in the mantle. We model the upper and lower mantle with a modified Tait EoS from Holland & Powell (2011) and the post-perovskite phase with the generalized Rydberg EoS described in Wagner et al. (2011).

Similar to the composition of the core, the mantle composition could be varied, for example, using stellar abundances as proxies of planet composition (e.g., Dorn et al. 2017a; Hinkel & Unterborn 2018). However, since the relation between star and planet composition is not straightforward (Plotnykov & Valencia 2020), for simplicity, we assume the silicate mantle to have an Earth-like composition with a molar Mg/Si ratio of 1.131 and a magnesium number (Mg#) of 0.9 (Sotin et al. 2007). The Mg/Si ratio determines the mixing ratio of the respective mantle minerals, with the Mg# determining the ratio of the respective Mg and Fe end members.

3.1.3 Water layer

For the water layer, we used the tabulated AQUA EoS from Haldemann et al. (2020), spanning a wide temperature and pressure range and including gas, liquid, and solid water phases. Liquid and solid layers are assumed to be fully convective, with an adiabatic temperature profile calculated from the adiabatic gradient given by the AQUA table for any given temperature and pressure. Water vapor is assumed to be part of an isothermal atmosphere at the equilibrium temperature.

3.1.4 H/He atmosphere

The low densities of many exoplanets hint at extended primordial envelopes composed of hydrogen and helium (e.g., Jontof-Hutter 2019). We therefore include an outer gaseous H/He envelope of solar-like composition (71% hydrogen, 29% helium by mass) based on the EoS from Saumon et al. (1995). We treat the atmosphere as isothermal with a temperature equal to the equilibrium temperature, an approach also employed, for example, by Dorn et al. (2017b) and Zeng et al. (2019). While an isothermal atmosphere certainly does not capture the full complexities of exoplanet atmospheres, more detailed atmosphere models would require the inclusion of additional parameters such as infrared and optical opacities, as well as the intrinsic temperature of the planet (e.g., Guillot 2010). Since the goal of this work is to explore the machine-learning method and its applications, we have chosen a simplified atmosphere model to limit the overall model complexity of the model and of the training data. However, more complex atmosphere models, in particular those specifically designed to treat gas giant planets (e.g., Fortney et al. 2007; Nettelmann et al. 2011; Leconte & Chabrier 2012), can be easily incorporated into ExoMDN by producing suitable training data.

3.2 Compositional data

Ideally, an inference model for planets should provide a set of desired parameters that fully describe the interior, such as the thickness or mass of each interior layer. These quantities represent a type of compositional data, where the sum of the D components is always constant (e.g., the planet radius or mass). In our case, we are interested in the relative mass and thickness of each layer in the planet, so that $\sum_{i}^{D} x_{i} = 1,$ $\sum\limits_i^D {{x_i}} = 1,$ (3)

where x_i is the relative thickness or mass of the ith planet layer. This restricted space is known as the simplex 𝒮^D, and is commonly represented in the form of a ternary diagram (e.g., Rogers & Seager 2010).

The nature of compositional data can make the statistical treatment cumbersome. The constraints imposed by Eq. (3) can give rise to spurious correlations. The shapes of probability distributions can be distorted and skewed, and trying to fit distributions to sample data may lead to points lying outside the simplex (Aitchison 1982; Pawlowsky-Glahn & Egozcue 2006). In particular, this means that Gaussian distributions, which are commonly used to represent continuous data, cannot be utilized directly to describe distributions of compositional data, as parts of the distribution would fall outside the closed space. This point is especially relevant for this work: the simple parameterization of Gaussian distributions makes them convenient candidates for components in mixture distributions in order to approximate arbitrary posterior distribution with neural networks, as the entire mixture is described by only a few parameters. It is therefore highly desirable to extend the usefulness of Gaussian mixtures to the analysis of compositional data, while retaining the convenience of their simple parameterization.

One solution is to introduce a set of coordinate changes called log-ratio transformations (Aitchison 1982), which transforms the data coordinates from the simplex into (unconstrained) real space by way of logarithmic ratios between coordinates. We focus here on the additive log-ratio transformation alr: 𝒮^D → ℝ^D−1, which takes the logarithm of pairwise ratios between D − 1 coordinates and an arbitrarily chosen Dth coordinate (x_D), thereby reducing the dimension of the new space by one: $alr (x_{i}) = y_{i} = \ln \frac{x_{i}}{x_{D}} .$ ${\rm{alr}}\left( {{x_i}} \right) = {y_i} = \ln {{{x_i}} \over {{x_D}}}.$ (4)

The back-transformation onto the simplex is given by (Aitchison 1982) ${alr}^{- 1} (y_{i}) = x_{i} = {\begin{array}{l} \frac{\exp (y_{i})}{1 + Σ_{j}^{D} \exp (y_{i})} & (i = 1, \dots, D - 1), \\ \frac{1}{1 + Σ_{j}^{D} \exp (y_{i})} & (i = D) . \end{array}$ ${\rm{al}}{{\rm{r}}^{ - 1}}\left( {{y_i}} \right) = {x_i} = \left\{ {\matrix{ {{{\exp \left( {{y_i}} \right)} \over {1 + \Sigma _j^D\exp \left( {{y_i}} \right)}}} \hfill & {\left( {i = 1, \ldots ,D - 1} \right),} \hfill \cr {{1 \over {1 + \Sigma _j^D\exp \left( {{y_i}} \right)}}} \hfill & {\left( {i = D} \right).} \hfill \cr } } \right.$ (5)

3.3 Mixture density networks

Neural networks are a widely used tool in machine learning due to their ability to learn complex, nonlinear mappings between input variables x and output variables t. Neural networks can model this mapping by learning from a set of training data which provide concrete examples of the output values corresponding to each set of input values. Conventionally, neural networks are trained by minimizing the mean squared error between known values from the training data and predicted outputs from the neural network. However, this approach tends to be wholly inadequate for inverse problems, where one set of input values may correspond to multiple output values, or more generally, to some posterior probability density p(t | x), (i.e., the probability density of t given some input x). To preserve the practicality of neural networks and extend their functionality to include arbitrary probability functions, Bishop (1994) introduced a class of neural networks called mixture density networks, which combine a conventional neural network with a mixture density model. The posterior p(t | x) can be approximated by a linear combination of m kernel functions ϕ_i(t | x), $p (t | x) = \sum_{i = 1}^{m} α_{i} (x) ϕ_{i} (t | x),$ $p\left( {{\bf{t}}|{\bf{x}}} \right) = \sum\limits_{i = 1}^m {{\alpha _i}} \left( {\bf{x}} \right){\phi _i}\left( {{\bf{t}}|{\bf{x}}} \right),$ (6)

where α_i are mixture weights. Various functions can be chosen for ϕ_i(t | x). We focus here on a mixture model with Gaussian kernels of the form: $\begin{array}{l} ϕ_{i} (t | x) = & \frac{1}{{(2 π)}^{c / 2} \det {(Σ_{i})}^{1 / 2}} \\ \times \exp {- \frac{1}{2} {(t - μ_{i})}^{⊤} Σ_{i}^{- 1} (t - μ_{i})}, \end{array}$ $\matrix{ {{\phi _i}\left( {{\bf{t}}|{\bf{x}}} \right) = } \hfill & {{1 \over {{{\left( {2\pi } \right)}^{{c \mathord{\left/ {\vphantom {c 2}} \right. \kern-\nulldelimiterspace} 2}}}\det {{\left( {{\Sigma _i}} \right)}^{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}}}}}} \hfill \cr {} \hfill & { \times \exp \left\{ { - {1 \over 2}{{\left( {{\bf{t}} - {\mu _i}} \right)}^ \top }\Sigma _i^{ - 1}\left( {{\bf{t}} - {\mu _i}} \right)} \right\},} \hfill \cr }$ (7)

where c is the dimension of t (i.e., the number of output variables), and μ_i is the center of the ith Gaussian kernel with a diagonal covariance matrix, Σ_i: $Σ_{i} = diag (σ_{i}) = [\begin{matrix} σ_{i, 1} \\ ⋱ \\ σ_{i, c} \end{matrix}] .$ ${\Sigma _i} = {\rm{diag}}\left( {{\sigma _i}} \right) = \left[ {\matrix{ {{\sigma _{i,1}}} & {} & {} \cr {} & \ddots & {} \cr {} & {} & {{\sigma _{i,c}}} \cr } } \right].$ (8)

The conditional probability distribution p(t | x) is completely described by weights, α_i, means, μ_i, and variance, σ_i. Training the MDN to predict these outputs therefore allows for the distribution to be reconstructed. With m mixtures and c parameters, the total number of network outputs is (2c + 1)m. A mixture density network is built as a conventional feedforward neural network, where the last layer approximates the distribution parameters (Fig. 1). The model can then be trained with a maximum likelihood approach by minimizing the average negative log-likelihood L across the training data set: $\begin{array}{l} ℒ & = - \frac{1}{N} \sum_{k = 1}^{N} \ln ℒ_{k} \\ = - \frac{1}{N} \sum_{k = 1}^{N} \ln (\sum_{i = 1}^{m} α_{i} (x_{k}) ϕ_{i} (t_{k} | x_{k})), \end{array}$ $\matrix{ {\cal L} \hfill & { = - {1 \over N}\sum\limits_{k = 1}^N {\ln \,{{\cal L}_k}} } \hfill \cr {} \hfill & { = - {1 \over N}\sum\limits_{k = 1}^N {\ln \left( {\sum\limits_{i = 1}^m {{\alpha _i}\left( {{{\bf{x}}_k}} \right){\phi _i}\left( {{{\bf{t}}_k}|{{\bf{x}}_k}} \right)} } \right)} ,} \hfill \cr }$ (9)

where N is the size of the training data set.

Fig. 1

Schematic overview of the MDN architecture and inference procedure.

Table 1

Prior distributions of model parameters for training data generation.

3.4 Training data and network architecture

We created a data set of 5.6 million synthetic planets randomly sampled from the prior distributions, summarized in Table 1. The planet mass was chosen uniformly between 0.1 and 25 M_⊕. Each planet was set at a specific equilibrium temperature ranging from 100 to 1000 K. The mass fraction of each planetary layer was sampled from the simplex so that they add to one. The gas mass fraction w_Gas was sampled logarithmically with a lower limit of 10⁻⁸, while the other mass fractions were sampled uniformly. Given these inputs, the TATOOINE model calculates planet radius and thickness of each layer. For each planet, we also calculated the fluid Love number, k₂, using the matrix-propagator approach from Padovan et al. (2018).

To prepare the training data, we log-ratio transformed both mass fractions and radius fractions according to Eq. (4) using the core mass and radius as a base coordinate (x_D in Eq. (4)). This is the key difference to our previous work, which enables the prediction of multivariate distributions of mass and radius fractions. The log-ratio transformation enforces the condition that the mass and radius fractions add up to one and allows the network to operate in unbounded real space, ℝ, instead of the simplex, 𝒮, which permits the use of Gaussian kernels as described above.

As a preprocessing step before training and prediction, we also log-transformed the planet mass, which we found to slightly improve the training performance. We took 70% of the data set for training, using the remaining 30% to evaluate the performance of the MDN during training. In addition, we retained a small set of data for final model validation (see Sect. 4).

We trained the MDN to predict the parameters of the posterior distributions of the log-ratio-transformed mass fractions (ln w_Mantle/w_Core, ln w_Water/w_Core, ln w_Gas/w_Core) and radius fractions (ln d_Mantle/d_Core, ln d_Water/d_Core, ln d_Gas/d_Core). We trained two models with different sets of inputs: Model 1 with M_p, R_p, T_eq and Model 2 with Mp, Rp, T_eq, k₂.

The MDN is built from a feedforward neural network using the Keras framework (Chollet et al. 2015) and TensorFlow (Abadi et al. 2016), with the MDN output layer adapted from Martin & Duhaime (2019). The best MDN architecture was found through hyperparameter optimization using the Keras-Tuner framework (O’Malley et al. 2019). We optimized for the number of hidden layers, the number of units per layer, the learning rate, as well as the batch size. We kept the number of mixture components fixed at m = 50, because we noticed that the tuner would always optimize for the highest available number of mixtures, but with very small mixture weights for most components. We found 50 components to be a good middle ground where training accuracy was good, but without too many components contributing little to the posterior distribution.

The architecture that yielded the best training performance for Model 1 consists of three hidden layers with 384 nodes per layer with a batch size of 750. For Model 2, the best architecture consists of three hidden layers with 896 nodes per layer with a batch size of 1000. Models with a base learning rate of 0.001 performed best in both cases.

Each hidden layer is activated with a rectified linear unit (ReLU), which is a commonly used activation function in deep learning models (Nair & Hinton 2010; Goodfellow et al. 2017). To ensure that the variances are always positive, we activated σ_i in the output layer with a nonnegative exponential linear unit (NNELU) after Brando (2017): $NNELU (x) = {\begin{array}{l} x + 1 & for x \geq 0, \\ \exp (x) & for x < 0. \end{array}$ ${\rm{NNELU}}\left( x \right) = \left\{ {\matrix{ {x + 1} \hfill & {{\rm{for}}\,x \ge 0,} \hfill \cr {\exp \left( x \right)} \hfill & {{\rm{for}}\,x < 0.} \hfill \cr } } \right.$ (10)

The nodes for mixture weights, α_i, and means, μ_i, are activated with a linear function to allow for unrestricted output values. To avoid overfitting, we applied an early stopping of the learning algorithm once the validation loss did not improve for eight consecutive training epochs. To improve training performance, we reduced the learning rate by a factor of ten every time the validation loss stopped improving for more than four epochs during training, down to a lower bound for the learning rate of 10⁻⁸. This helps fine-tune the model weights once a near-optimal set of parameters has been learned. The MDN was trained on a GPU workstation with eight NVIDIA RTX A5000 graphics cards. The (wall clock) training time for a model was around three hours.

3.5 Backtransformation to mass and radius fractions

From the predicted parameters of the MDN, the approximate posterior distribution of the log-ratio transformed mass and radius fractions corresponding to the given inputs of the MDN can be reconstructed according to Eq. (6). The log-ratio space is not particularly useful for interpreting the inferred interior structure distributions. However, the back-transformation of the Gaussian mixture onto the simplex (Eq. (5)) is mathematically unwieldy, as the normal distributions are highly deformed when in the compositional space. Instead, we randomly sample a sufficiently large number of points from the log-ratio posterior probability distribution and transform these back into compositional space. This is conceptually similar to MCMC sampling and gives a good approximation of the posterior distribution.

3.6 Incorporating measurement uncertainties

The current network architecture is built on the assumption that the input parameters are known exactly without uncertainties. However, except for Solar System planets, observations of exoplanets will always come with considerable measurement uncertainties. With ExoMDN, measurement uncertainties can be taken into account in a straightforward way by repeatedly sampling n times from within the error bars of the input parameters, predict the interior distribution for each sample, and combining the results into a single posterior distribution. This can be either done via summing up each Gaussian mixture in log-ratio space and then normalizing the resulting distribution, or by first sub-sampling from each prediction n′ times and then merging the samples (for a final dataset size of n × n′ samples). Subsampling first and then merging is considerably less memory and processing intensive, as the full posterior distribution can be built up sequentially from each planet sample. Summing up all predicted posterior distributions first requires loading the entire posterior distribution, consisting of n × m multivariate Gaussian kernels, into memory. Sampling from this mixture distribution can be computationally very expensive for large sample sizes, n, which are needed to treat the measurement uncertainties well. We find that both approaches display no functional difference in the predicted full posterior distributions (Fig. B.4). We therefore chose the approach of sampling first from each prediction and then merging, as it is also easy to implement in the current prediction pipeline. However, n’ should be chosen significantly smaller than n to avoid oversampling of specific mass-radius-temperature pairs.

4 Validation

To establish the accuracy of the trained MDN, we validated it in two ways: by forward modeling and by independent inference. In the first case, we used the predicted mass fractions as inputs to the forward model and recomputed the interior structures of planets to investigate how well the planet radius can be retrieved from the predictions. This allows us to put constraints on systematic errors in the MDN outputs. In the second case, we ensured that the MDN predictions are accurate and consistent with other inference methods by comparing the predicted posterior distributions with those obtained by an independent inference approach.

4.1 Radius accuracy

We used the MDN to predict the interior structures distributions of 500 randomly selected planets out of the test data set. We took 200 samples of interior structures for each prediction and model these planets with the TATOOINE forward model by taking the mass fractions of the layers as inputs (i.e., 10000 sample points in total). We then compared the relative error $\frac{Δ R}{R_{p}} = \frac{(R_{val} - R_{p})}{R_{p}}$ ${{{\rm{\Delta }}R} \over {{R_{\rm{p}}}}} = {{\left( {{R_{{\rm{val}}}} - {R_{\rm{p}}}} \right)} \over {{R_{\rm{p}}}}}$ between the true planet radius R_p and the recomputed planet radius R_val obtained from the MDN predictions (Fig. 2).

We find that the recomputed planet radii of Model 1 fit closely to the expected ones, with the MDN introducing a slight overestimation of the radius of about 0.4% (Fig. 2a). The MDN introduces a small amount of noise into the recomputed radii, with 80% of planets having a radius error of less than 1.5%. The MDN does not perform equally well across the parameter space. Low-density planets tend to have a wider spread in radius errors (Fig. 2b), largely independent of planet mass (Fig. 2c). Higher equilibrium temperatures increase the error slightly (Fig. 2d). We attribute the larger errors mainly to the atmosphere. The recomputed radius errors tend to be the largest in planets with extensive gas envelopes (Fig. 2e). Small errors in the prediction of the gas mass fraction are amplified into larger radius errors due to the low density of the atmosphere. In addition, the transformation from log-ratios to mass and radius fractions amplifies any small uncertainty present in the atmosphere-core log-ratio predictions.

For Model 2, we find that the planetary radii can be reproduced very accurately with a relative radius error of less than 0.55% (Fig. 3a). As with Model 1, the radius is slightly overestimated by 0.4%. We additionally check how well the fluid Love number k₂ is reproduced by computing the relative error $\frac{Δ k_{2}}{k_{2}} = \frac{(k_{2, val} - k_{2})}{k_{2}}$ ${{{\rm{\Delta }}{k_2}} \over {{k_2}}} = {{\left( {{k_{2,{\rm{val}}}} - {k_2}} \right)} \over {{k_2}}}$ , where k_2,val is the Love number of the validation planet to be reproduced, and k₂ is the fluid Love number calculated from the predicted interior mass fractions. We find that k₂ is reproduced well, with more than 80% of the points falling within 2.3% of the true k₂ value (Fig. 3b). While the median of the data set sits at zero error, the data set is slightly skewed toward low k₂ values. This is most likely caused by the atmosphere. The fluid Love number, k₂, is highly sensitive to the density structure of the planet, especially in the upper layers. Slight overestimations of the atmosphere mass fractions result in larger underestimations of k₂.

Fig. 2

Radius accuracy of Model 1 after recalculating the planet interior based on the MDN prediction. Panel a shows the distribution of the relative radius error of 10000 sample points. The blue line marks the median, with the blue area showing the range where 80% of values lie. Panels b–e show the standard deviation in relative radius errors σ for a variety of planet parameters: bulk density (b), planet mass (c), equilibrium temperature (d), and average atmosphere thickness ${\bar{d}}_{Gas}$ ${{\bar d}_{{\rm{Gas}}}}$ of recomputed planet samples (e). Each point represents one of 500 planets from the test data set (see the text for more details).

4.2 Independent inference

We randomly selected 20 planets from a test dataset that the MDN did not see during training and ran an independent inference of their interior structures using a straightforward Monte-Carlo sampling method, assuming a radius uncertainty of 1%. We assessed how well the predicted posterior distributions, P, fit to the posterior distributions from the validation set, Q, by calculating the Hellinger distance H(P, Q) for each marginal distribution following the approach by Haldemann et al. (2023). The Hellinger distance is an integrated metric bounded between 0 and 1 that measures the similarity of two probability distributions. Two identical probability distributions have a Hellinger distance of 0, while a Hellinger distance of 1 is reached when there is no overlap between the two distributions. We binned the data into n = 20 bins with sample frequencies p_i and q_i in each bin. The (squared) Hellinger distance is then given by $H^{2} (P, Q) = \frac{1}{2} \sum_{i}^{n} {(\sqrt{p_{i}} - \sqrt{q_{i}})}^{2} .$ ${H^2}\left( {P,Q} \right) = {1 \over 2}\sum\limits_i^n {{{\left( {\sqrt {{p_i}} - \sqrt {{q_i}} } \right)}^2}.}$ (11)

The average Hellinger distance $\bar{H}$ ${\bar H}$ over the 20 validation planets is shown in Table 2 for both the log-ratio outputs from the MDN and the transformed compositional mass and radius fractions. We find that the predicted log-ratio distributions compare very well to the validation set, with Hellinger distances around 1 × 10⁻³. This corresponds to two normal distributions differing in their means by about 3 × 10⁻³ units (assuming a standard deviation of 1), or in the standard deviation by 0.2% (assuming the same mean). Figure 4 shows an example for a well predicted validation planet with small Hellinger distances.

The transformed mass and radius fractions also fit well, albeit with slightly higher Hellinger distances around 3 × 10⁻³. The gas mass fraction w_gas is the least well constrained parameter here with $\bar{H} = 2.16 \times 10^{- 2}$ $\bar H = 2.16 \times {10^{ - 2}}$ . This mirrors the effect already discussed in Sect. 4.1.

Fig. 3

Radius accuracy (a) and k₂ accuracy (b) of model 2 after recalculating the planet interior based on the MDN prediction. Panel a shows the distribution in the relative radius error, panel b of the relative k₂ error of 10 000 sample points. Blue lines mark the median, with the blue areas showing the range where 80% of values lie.

Table 2

Average Hellinger distance $\bar{H}$ ${\bar H}$ for 20 randomly selected validation planets of all MDN (log-ratio) output distributions and of their corresponding transformed parameters.

Fig. 4

Example of the Hellinger distances for a well-predicted planet (4.722 M_⊕, 1.82 R_⊕) from the 20 validation planets. The blue line marks the independent validation, the orange line shows the ExoMDN prediction.

5 Results

5.1 Earth and Neptune

We demonstrate the ability of ExoMDN to perform an interior characterization of Earth and Neptune by treating them as if they were exoplanets, where only the mass and radius are measured, and the equilibrium temperature is set according to their orbital distance. Earth represents the archetypical rocky planet whose internal structure is best known of all the planets in the Solar System. Neptune lies on the upper end of the mass range we investigate and is a representative example of volatile-rich planets.

The MDN prediction takes the form of a six-dimensional distribution of the log-ratios of masses and thicknesses of the planetary layers, which can be transformed back to layer mass and thickness, as described in Sect. 3.5. For clarity, we will focus in this section only on the thickness of the layers. Figures showing the mass fractions can be found in the appendix.

Figure 5 shows the posterior distribution of log-ratios for Earth, as approximated by the MDN, given Earth’s mass, radius, and equilibrium temperature of 255 K. The ellipses in the upper right plots show the location and covariance of each Gaussian kernel, with the colors marking the respective mixture weights, α_i. The kernels are well spaced with little overlap and most mixture weights are similar. This indicates that the MDN is able to efficiently leverage all its 50 kernels to construct the posterior distribution.

We sampled 200 000 points from the log-ratio distribution to construct the posterior distribution of the actual layer thickness, which are shown in Fig. 6. Given only three observables (mass, radius, and equilibrium temperature), the prediction indicates a mostly rocky planet composed of an iron core which makes up at least 50% of the planet, and with only little water and gas. The colored symbols mark end-member interior structures with only two layers. In this case, only three of these exist, namely: (1) the actual structure of the Earth with an iron core making up 55% of the interior, and a silicate mantle on top; (2) an iron-water planet composed of a relative core size of 73% and an ice layer; and (3) an iron-gas planet composed of a massive iron core making up 80% of the planet, and a H/He envelope taking up the rest. In cases 2 and 3, the iron core needs to be very large to compensate for the low density of the water and gas layers. Although these two cases are probably not likely to occur in nature, they demonstrate that the interior cannot be fully constrained without additional constraints. In this example, the thickness of the silicate mantle in particular can barely be constrained.

It should be noted that due to our assumption of uniform priors, the most commonly predicted interior structures encompass a combination of all four layers. For this reason, the distributions presented should not be understood as definitive probabilities, but rather as the number of potential solutions for each given layer thickness fraction. Consequently, the actual interior of Earth lies outside the bulk of the predicted distribution. In fact, only a single solution exists that matches Earth’s mass and radius with only an iron core and a silicate mantle, and no water or extended atmosphere.

For Neptune, the MDN predicts a substantial atmosphere between 20 and 70% of the planet’s radius (Fig. 7) and only a small iron core (≤40% of the radius). We note here that rather than the actual temperature of 51 K, we used an equilibrium temperature of 100 K, which is the lowest temperature for which the AQUA EoS used for the water layer is valid. The predicted interior structures lie well within previously published results, which generally agree on Neptune having a small iron-silicate core of about 20% of the planet’s radius and an atmosphere of about 30–40%, with a water-rich envelope in between (e.g., Hubbard et al. 1991; Podolak et al. 1995; Nettelmann et al. 2013; Neuenschwander & Helled 2022).

Fig. 5

Predicted log-ratios of the thickness of interior layers for an Earth-like planet with 1 M_⊕ and 1 R_⊕. The ellipses in the top right plots mark the location and covariance of each of the 50 Gaussian kernels, with the colors showing the mixture weight of each kernel.

Fig. 6

Predicted thickness of interior layers for an Earth-like planet with 1 M_⊕ and 1 R_⊕. The colored points mark possible end-member compositions, which are illustrated on the right. The red circle corresponds to Earth’s true interior structure. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles.

5.2 Application to exoplanets

One of the main advantages of using neural networks for interior structure inference over other methods such as MCMC sampling is the speed at which the posterior distribution can be obtained. The inference process for MCMC sampling can take several hours per individual planet (e.g., Haldemann et al. 2023). The MDN can perform the same prediction in fractions of a second. In addition, the MDN model is optimized for bulk processing of inputs owing to the Keras framework, allowing for multiple input samples to be predicted simultaneously. Between 1 and 1000 input data points, we find little difference in the computation time needed by the MDN for a prediction (t_MDN, Table 3). The sampling from the predicted distribution and transformation to mass and radius fractions (see. Sect. 3.5) takes up most of the time (t_sampling). Even so, predicting and sampling a thousand planets is possible in under six seconds on a conventional laptop processor². In fact, the main limitation to predicting a large number of planets simultaneously is the amount of available computer memory.

These fast prediction times mean that interior structures can be inferred for every exoplanet for which mass, radius, and equilibrium temperature are known. To demonstrate this, we selected planets from the NASA Exoplanet Archive³ that lie in the parameter space of our training data (Table 1) and for which upper and lower mass and radius uncertainties are given. We used ExoMDN to infer the interior structure of each planet, incorporating the mass, radius, and equilibrium temperature uncertainties according to Sect. 3.6. For each planet, we sampled 5000 mass, radius, and equilibrium temperature points from within a normal distribution given by the uncertainties, and predicted the posterior distribution for each point. From each of these posterior distributions, we then generated an additional ten random samples. In total, this yields 50000 samples of interior structures per planet, forming the full posterior distribution and spanning the range of measurement uncertainties.

Table A.1 shows the 22 planets from this dataset where mass uncertainties are below 10% and radius uncertainties are below 5%. This includes the well studied planets GJ 1214 b, GJ 486 b, and the TRAPPIST-1 planets, among others. For each planet, we provide the predicted median thickness of each interior layer, alongside the ranges in which 90% of the solutions are found. The total time to produce this data was ≈30s. A more extensive data set of 75 planets with radius and mass uncertainties of 10 and 20%, respectively, including both mass fractions and thickness, is available online at the CDS.

Upcoming exoplanet missions such as PLATO (Rauer et al. 2014) will significantly increase the number of exoplanets with well-determined masses and radii. PLATO in particular will allow for the radii of Earth-sized planets to be determined within an accuracy of up to 3%, while follow-up ground based observations are expected to constrain the mass of these planets with an accuracy of 10% or better. We can leverage the fast prediction times of ExoMDN to investigate the degree to which the interior of a planet could be constrained based on the accuracy of the mass and radius determination. Taking Earth and Neptune as examples, we imposed a 10% mass uncertainty and varied the radius uncertainty between 1 and 20% (Fig. 8). As above, in each case we sample 10000 times from within mass and radius uncertainties, and take ten random samples from each predicted posterior distribution for a total of 100 000 samples.

We find that with a radius accuracy of 3%, the core radius fraction of Earth can be constrained to $d_{Core} = {0.69}_{- 0.14}^{+ 0.10}$ ${d_{{\rm{Core}}}} = 0.69_{ - 0.14}^{ + 0.10}$ (error bars are the 5th and 95th percentiles, respectively), which is close to the value we found assuming a perfect knowledge of mass and radius (Fig. 6). With a radius accuracy of 10%, the core size is significantly less well constrained $(d_{Core} = {0.68}_{- 0.27}^{+ 0.12})$ $\left( {{d_{{\rm{Core}}}} = 0.68_{ - 0.27}^{ + 0.12}} \right)$ . Similarly, with a radius accuracy of 3%, the predicted atmosphere thickness of a Neptune analog (Fig. 8b) is $d_{Gas} = {0.45}_{- 0.12}^{+ 0.12}$ ${d_{{\rm{Gas}}}} = 0.45_{ - 0.12}^{ + 0.12}$ , which is again close to the value obtained assuming no error in mass and radius (Fig. 7). With a radius accuracy of 10%, the uncertainty of d_Gas grows to $d_{Gas} = {0.45}_{- 0.21}^{+ 0.16}$ ${d_{{\rm{Gas}}}} = 0.45_{ - 0.21}^{ + 0.16}$ . Increasing the radius accuracy has little effect on the possibility to constrain the layers below the atmosphere. Due to the low density of the atmosphere, different planet radii can be easily accommodated by small changes in atmosphere mass without significantly affecting the other layers. In a sense, the presence of a large atmosphere obscures the inference of the thickness of the deeper layers.

The radius accuracy controls to a large extent the uncertainties in the predicted thickness of the various layers. For completeness, we show in Fig. B.3 predictions of Earth- and Neptune-like interiors obtained when both radius and mass accuracies are varied simultaneously (from 1 to 20% and from 3 to 40%, respectively). Indeed, the inferred structures are very similar to those we obtained by fixing the mass accuracy to 10%.

Table 3

Average inference time for different numbers of planets.

Fig. 7

Predicted thickness of interior layers for a Neptune-like planet with 17.1 M_⊕ and 3.865 R_⊕. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95 percentiles.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

Fig. 8

Effect of radius uncertainty, δR, on the ability to constrain the interior for Earth (a) and Neptune (b) analogs. Each panel shows the marginal distributions for each interior layer with increasing amounts of radius uncertainty. An uncertainty of 10% in mass and 2% in T_eq has been assumed for both planets in all cases.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

5.3 Constraining the interior with k₂

Mass, radius, and equilibrium temperature alone are not sufficient to fully constrain the interior of a planet, as demonstrated above. The fluid Love number k₂ is a potential direct link from observation to interior structure, as it only depends on the density distribution in the planet. This stands in contrast to for example the elemental abundances of the host star, which may be representative of the bulk abundances of the planet and its atmosphere (Dorn et al. 2015, 2017a; Brugger et al. 2017; Spaargaren et al. 2020), but which necessitate additional assumptions about the planet formation and evolution history.

Figure 9 shows the MDN prediction of the interior of Earth, given knowledge of Earth’s value of k₂ = 0.933 (Lambeck 1980). With this added information, the MDN is capable of fully constraining Earth’s actual interior (particular in comparison to Fig. 6). In fact, the constraints from k₂ are strong enough that the composition of the iron core becomes important. The planets in the training data are modeled with a pure iron core, while Earth has about 10–15 wt.% of lighter elements in its core (Poirier 1994). Thus, the MDN predicts a smaller core radius of 51%, while the true core size is about 54.5% of the total radius. In practice, of course, the measurements k₂ for exoplanets will be associated with considerable uncertainties. Constraining a planet’s interior to the degree shown in Fig. 9 is therefore unlikely in the near future. Nevertheless, we can utilize the fast predictions of the MDN to estimate the accuracy that would be needed to properly constrain the interior. We performed a number of predictions for Earth and Neptune analogs with increasing k₂ uncertainties, in addition to mass, radius, and equilibrium temperature uncertainties of 5, 3, and 2%, respectively). These values are representative of a very well studied and characterized exoplanet, which would likely be needed for an accurate measurement of the Love number. For Neptune, we take a value of k₂ = 0.392, which we calculated after Hubbard (1984) from the gravitational moment J₂ = 3.408 43 × 10⁻³ (Jacobson 2009). The predicted results are detailed in Fig. 10. As the uncertainty in k₂ grows, the interior of both planets becomes less and less constrained. We find that with a k₂ uncertainty of 10%, Earth’s core and mantle thickness could be constrained to about ±13% of their actual value (within the 5 and 95% percentiles). With a k₂ uncertainty of 20%, mantle and core can be constrained to within ±17%. Even with large k₂ uncertainties, Earth could be clearly identified as a rocky planet with very little water and a thin atmosphere. The uncertainties of mass and radius put a limit on how well the interior can be determined. With the given mass and radius uncertainties, we find that in the Earth-like case, k₂ uncertainties lower than 10% do not constrain the interior further. For Neptune, a 10% uncertainty in k₂ could help constrain the atmospheric thickness to $22_{- 13}^{+ 10} %$ $22_{ - 13}^{ + 10}\%$ of the planet’s radius.

Fig. 9

Predicted thickness of interior layers for an Earth-like planet with 1 M_⊕ and 1 R_⊕, where also k₂ is known (k₂ = 0.933) in addition to mass and radius. The red circle marks Earth’s true interior structure. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles. Compared to Fig. 6, the axis limits have been adjusted to better show the model results.

6 Discussion and conclusions

MDNs can provide a reliable way to rapidly characterize the interior structure of exoplanets within fractions of a second. Compositional data such as mass fractions of individual interior layers can be easily accommodated in the network by using log-ratio transformations. An additional benefit of a machine learning approach over other inference methods is that the forward model computations are decoupled from the actual inference process. The training data are calculated separately before training and the training process encodes the information from the training data into the network weights. The trained network itself is standalone and interior inferences can be performed without requiring the training data, a dedicated interior model, a separate inference scheme, or prior expertise about exoplanet interior modeling. This stands in contrast to MCMC sampling, where running the data-generating forward model during the inference is an integral part of exploring the posterior distributions.

While in this work the training data were generated from a single forward model, this is actually not necessary for the training of the network. Since the forward model data generation is separated from the inference itself, different parts of the data set can be modeled by different dedicated forward models, for example to include both Jupiter-like and low-mass planets which may require different modeling approaches. Importantly, this means that the training data can be computed, collected, and combined from multiple sources without much overhead and without the need to integrate different numerical codes into a single model, as would be needed for MCMC sampling. Furthermore, this means that this method is easily extendable to different models and applicable to other inverse problems. However, the necessary prior generation of training data locks the model assumptions of the forward model into the training data. Changing the forward model therefore requires computing a new set of training data and training of a new neural network. This may be a drawback if the forward model assumptions often change (e.g., with different atmosphere compositions).

Our method is best suited for problems where the number of constraining parameters is relatively small. The required number of training samples increases (potentially exponentially) with each additional parameter, a phenomenon which has been termed the “curse of dimensionality” (Bellman 1966). This can make the generation and handling of training data cumbersome and time-consuming for larger numbers of constraining parameters, as it is generally necessary to sample from the entire investigated parameter space to achieve good MDN performance. A potential way to alleviate this issue could be to generate the training data “on the fly” and train the network with an incremental learning approach (van de Ven et al. 2022), where the network learns continuously with new incoming data, thus reducing the need to save large amounts of data.

Conditional invertible neural networks (cINN) may be another alternative, as demonstrated by Haldemann et al. (2023). These potentially work better with higher-dimensional data while requiring comparatively less training data, with the tradeoff that the network setup is more complex and predictions are generally slower.

As with other machine learning methods, the nature of the training process introduces a small amount of intrinsic noise into the model. However, we have shown that the errors introduced by this are generally small (see Sect. 4.1), particularly for exoplanets where uncertainties in the observable quantities are relatively large.

The file size of the fully trained model is only 4 ≈ 6.8 MB, which facilitates sharing and distribution online. The posterior distributions predicted by ExoMDN provide a first characterization of newly observed planets, which can then be further explored with dedicated models. ExoMDNs posterior distributions could also be employed as advanced priors for MCMC inferences based on more sophisticated forward models to help speed up their convergence. We believe that ExoMDN is a valuable tool for the exoplanet science community to gain access to a rapid first characterization of the possible interiors of low-mass planets.

Fig. 10

Effect of k₂ uncertainty on the ability to constrain the interior for Earth (a) and Neptune (b) analogs. Each panel shows the marginal distributions for each interior layer with increasing amounts of k₂ uncertainty. An uncertainty of 5 in mass, 3 in radius, and 2% in T_eq has been assumed for both planets in all cases.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

¹

https://github.com/philippbaumeister/ExoMDN

²

All predictions were performed on an Intel® Core™ i5-8250U CPU.

³

https://exoplanetarchive.ipac.caltech.edu/

Acknowledgements

We thank Heike Rauer for important suggestions and discussions on the role of uncertainties in mass and radius, and an anonymous referee for their comments, which helped improve a previous version of the manuscript. We acknowledge the support of the DFG priority program SPP 1992 “Exploring the Diversity of Extrasolar Planets” (TO 704/3-1) and of the research unit FOR 2440 “Matter under planetary interior conditions” (PA 3689/1-1). Training of ExoMDN was performed on a GPU Workstation also sponsored by the DFG research unit FOR 2440 (grant number RE 882/19-2), which is gratefully acknowledged.

Appendix A Exoplanet predictions

Table A.1

Predicted thicknesses d_i of interior layers for exoplanets with well known masses and radii.

Appendix B Additional figures

Fig. B.1

Predicted mass fraction of interior layers for an Earth-like planet with 1 M_⊕ and 1 R⊕. The red circle corresponds to Earth’s true interior structure. Due to the presence of lighter elements in Earth’s core, the actual core mass of Earth 33% is slightly lower than what is predicted by ExoMDN for a solution with no water and atmosphere (39%). The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles.

Fig. B.2

Predicted mass fraction of interior layers for a Neptune-like planet with 17.1 M_⊕ and 3.865 R_⊕. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

Fig. B.3

Effect of radius (δR) and mass (δM) uncertainty on the ability to constrain the interior for Earth (a) and Neptune (b) analogs. Each panel shows the marginal distributions for each interior layer with increasing amounts of radius and mass uncertainty. A T_eq uncertainty of 2% has been assumed for both planets.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

Fig. B.4

Comparison of the two possible approaches to incorporate measurement uncertainties (Sect. 3.6) for an Earth-like planet with 5% radius and 10% mass uncertainty. 5000 posterior distributions were predicted from random mass, radius, and T_eq inputs within the uncertainties. The red line shows the predicted thickness of each interior layer obtained by first summing up all 5000 posterior distributions and then taking 5000 random samples from the combined mixture, while the dark blue, light blue, and green lines show those obtained by first taking 1, 10, and 100 samples, respectively, from each of the 5000 posteriors and then combining the samples.

Fig. B.5

Illustration of the differences in interior models between the previous work (Baumeister et al. 2020, black line) and this work (blue and red lines, for two different equilibrium temperatures T_eq = 100K and T_eq = 1000K, respectively). The figure shows density profiles of a representative 5 M_⊕ planet with w_Core = 0.2, w_Mantle = 0.49, w_Water = 0.3, and w_Gas = 0.01 (Panel a). Panel b shows a zoomed-in view of only the water and atmosphere layers. The dashed lines mark the respective planets’ radii.

References

Abadi, M., Agarwal, A., Barham, P., et al. 2016, ArXiv e-prints [arXiv:1603.04467] [Google Scholar]
Acuña, L., Deleuil, M., Mousis, O., et al. 2021, A&A, 647, A53 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Agarwal, S., Tosi, N., Kessel, P., Breuer, D., & Montavon, G. 2021a, Phys. Rev. Fluids, 6, 113801 [NASA ADS] [CrossRef] [Google Scholar]
Agarwal, S., Tosi, N., Kessel, P., et al. 2021b, Earth and Space Science, 8, 4 [CrossRef] [Google Scholar]
Aitchison, J. 1982, J. R. Stat. Soc. Ser. B Methodol., 44, 139 [Google Scholar]
Akinsanmi, B., Barros, S. C. C., Santos, N. C., et al. 2019, A&A, 621, A117 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Alibert, Y., & Venturini, J. 2019, A&A, 626, A21 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Atkins, S., Valentine, A. P., Tackley, P. J., & Trampert, J. 2016, Phys. Earth Planet. Int., 257, 171 [CrossRef] [Google Scholar]
Auddy, S., Dey, R., Lin, M.-K., Carrera, D., & Simon, J. B. 2022, ApJ, 936, 93 [NASA ADS] [CrossRef] [Google Scholar]
Baumeister, P., Padovan, S., Tosi, N., et al. 2020, ApJ, 889, 42 [Google Scholar]
Bellman, R. 1966, Science, 153, 34 [NASA ADS] [CrossRef] [Google Scholar]
Bishop, C. M. 1994, Mixture Density Networks, Tech. Rep., Aston University [Google Scholar]
Bouchet, J., Mazevet, S., Morard, G., Guyot, F., & Musella, R. 2013, Phys. Rev. B, 87, 094102 [Google Scholar]
Brando, A. 2017, Master’s Thesis, Universitat Politecnica de Catalunya, Spain [Google Scholar]
Brugger, B., Mousis, O., Deleuil, M., & Deschamps, F. 2017, ApJ, 850, 93 [NASA ADS] [CrossRef] [Google Scholar]
Cambioni, S., Asphaug, E., Emsenhuber, A., et al. 2019, ApJ, 875, 40 [NASA ADS] [CrossRef] [Google Scholar]
Chaushev, A., Raynard, L., Goad, M. R., et al. 2019, MNRAS, 488, 5232 [Google Scholar]
Chollet, F. et al. 2015, Keras, https://keras.io/ [Google Scholar]
Csizmadia, Sz., Hellard, H., & Smith, A. M. S. 2019, A&A, 623, A45 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dorn, C., & Heng, K. 2018, ApJ, 853, 64 [Google Scholar]
Dorn, C., Khan, A., Heng, K., et al. 2015, A&A, 577, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dorn, C., Hinkel, N. R., & Venturini, J. 2017a, A&A, 597, A38 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dorn, C., Venturini, J., Khan, A., et al. 2017b, A&A, 597, A37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Emsenhuber, A., Cambioni, S., Asphaug, E., et al. 2020, ApJ, 891, 6 [NASA ADS] [CrossRef] [Google Scholar]
Fortney, J. J., Marley, M. S., & Barnes, J. W. 2007, ApJ, 659, 1661 [Google Scholar]
Goodfellow, I., Bengio, Y., & Courville, A. 2017, Deep Learning (Cambridge: MIT Press) [Google Scholar]
Guillot, T. 2010, A&A, 520, A27 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Hakim, K., Rivolidini, A., Van Hoolst, T., et al. 2018, Icarus, 313, 61 [NASA ADS] [CrossRef] [Google Scholar]
Haldemann, J., Alibert, Y., Mordasini, C., & Benz, W. 2020, A&A, 643, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Haldemann, J., Ksoll, V., Walter, D., et al. 2023, A&A, 672, A180 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Hellard, H., Csizmadia, S., Padovan, S., et al. 2019, ApJ, 878, 119 [NASA ADS] [CrossRef] [Google Scholar]
Himes, M. D., Harrington, J., Cobb, A. D., et al. 2022, Planet. Sci. J., 3, 91 [NASA ADS] [CrossRef] [Google Scholar]
Hinkel, N. R., & Unterborn, C. T. 2018, ApJ, 853, 83 [NASA ADS] [CrossRef] [Google Scholar]
Hirose, K., Wood, B., & Vočadlo, L. 2021, Nat. Rev. Earth Environ., 2, 645 [NASA ADS] [CrossRef] [Google Scholar]
Holland, T. J. B., & Powell, R. 2011, J. Metamorph. Geol., 29, 333 [Google Scholar]
Huang, C., Rice, D. R., & Steffen, J. H. 2022, MNRAS, 513, 5256 [NASA ADS] [CrossRef] [Google Scholar]
Hubbard, W. B. 1984, Planetary Interiors (New York, N.Y.: Van Nostrand Reinhold) [Google Scholar]
Hubbard, W. B., Nellis, W. J., Mitchell, A. C., et al. 1991, Science, 253, 648 [NASA ADS] [CrossRef] [Google Scholar]
Jacobson, R. A. 2009, AJ, 137, 4322 [Google Scholar]
Jontof-Hutter, D. 2019, Ann. Rev. Earth Planet. Sci., 47, 141 [Google Scholar]
Kellermann, C., Becker, A., & Redmer, R. 2018, A&A, 615, A39 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lambeck, K. 1980, The Earth’s Variable Rotation: Geophysical Causes and Con- sequences, Cambridge Monographs on Mechanics (Cambridge: Cambridge University Press) [CrossRef] [Google Scholar]
Leconte, J., & Chabrier, G. 2012, A&A, 540, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
MacKenzie, J., Grenfell, J. L., Baumeister, P., et al. 2023, A&A, 671, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Malik, A., Moster, B. P., & Obermeier, C. 2021, MNRAS, 513, 5505 [NASA ADS] [Google Scholar]
Márquez-Neila, P., Fisher, C., Sznitman, R., & Heng, K. 2018, Nat. Astron., 2, 719 [CrossRef] [Google Scholar]
Martin, C., & Duhaime, D. 2019, https://zenodo.org/record/2578015 [Google Scholar]
Nair, V., & Hinton, G. E. 2010, in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10 (Omnipress), 807 [Google Scholar]
Nettelmann, N., Fortney, J. J., Kramm, U., & Redmer, R. 2011, ApJ, 733, 2 [NASA ADS] [CrossRef] [Google Scholar]
Nettelmann, N., Helled, R., Fortney, J. J., & Redmer, R. 2013, Planet. Space Sci., 77, 143 [Google Scholar]
Neuenschwander, B. A., & Helled, R. 2022, MNRAS, 512, 3124 [CrossRef] [Google Scholar]
O’Malley, T., Bursztein, E., Long, J., et al. 2019, https://github.com/keras-team/keras-tuner [Google Scholar]
Padovan, S., Spohn, T., Baumeister, P., et al. 2018, A&A, 620, A178 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Pawlowsky-Glahn, V., & Egozcue, J. J. 2006, Geol. Soc. London Spec. Pub., 264, 1 [NASA ADS] [CrossRef] [Google Scholar]
Plotnykov, M., & Valencia, D. 2020, MNRAS, 499, 932 [CrossRef] [Google Scholar]
Podolak, M., Weizman, A., & Marley, M. 1995, Planet. Space Sci., 43, 1517 [NASA ADS] [CrossRef] [Google Scholar]
Poirier, J.-P. 1994, Phys. Earth Planet. Interiors, 85, 319 [NASA ADS] [CrossRef] [Google Scholar]
Rauer, H., Catala, C., Aerts, C., et al. 2014, Exp. Astron., 38, 249 [Google Scholar]
Rogers, L. A., & Seager, S. 2010, ApJ, 712, 974 [Google Scholar]
Saumon, D., Chabrier, G., & van Horn, H. M. 1995, ApJS, 99, 713 [NASA ADS] [CrossRef] [Google Scholar]
Seager, S., Kuchner, M., Hier-Majumder, C., & Militzer, B. 2007, ApJ, 669, 1279 [Google Scholar]
Sotin, C., Grasset, O., & Mocquet, A. 2007, Icarus, 191, 337 [Google Scholar]
Spaargaren, R. J., Ballmer, M. D., Bower, D. J., Dorn, C., & Tackley, P. J. 2020, A&A, 643, A44 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Spiegel, D. S., Fortney, J. J., & Sotin, C. 2014, Proc. Natl. Acad. Sci., 111, 12622 [NASA ADS] [CrossRef] [Google Scholar]
Tateno, S., Hirose, K., Sata, N., & Ohishi, Y. 2009, Earth Planet. Sci. Lett., 277, 130 [CrossRef] [Google Scholar]
Unterborn, C. T., & Panero, W. R. 2019, J. Geophys. Res. Planets, 124, 1704 [NASA ADS] [CrossRef] [Google Scholar]
Valencia, D., Sasselov, D. D., & O’Connell, R. J. 2007, ApJ, 665, 1413 [Google Scholar]
Valizadegan, H., Martinho, M. J. S., Wilkens, L. S., et al. 2022, ApJ, 926, 120 [NASA ADS] [CrossRef] [Google Scholar]
van de Ven, G. M., Tuytelaars, T., & Tolias, A. S. 2022, Nat. Mach. Intell., 4, 1185 [CrossRef] [Google Scholar]
Van Hoolst, T., Noack, L., & Rivoldini, A. 2019, Adv. Phys. X, 4, 1630316 [NASA ADS] [Google Scholar]
Wagner, F., Sohl, F., Hussmann, H., Grott, M., & Rauer, H. 2011, Icarus, 214, 366 [Google Scholar]
Zeng, L., & Sasselov, D. 2013, PASP, 125, 227 [Google Scholar]
Zeng, L., Jacobsen, S. B., Sasselov, D. D., et al. 2019, Proceedings of the Natl. Acad. Sci., 116, 9723 [NASA ADS] [CrossRef] [Google Scholar]
Zhao, Y., & Ni, D. 2021, A&A, 650, A177 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Zingales, T., & Waldmann, I. P. 2018, AJ, 156, 268 [NASA ADS] [CrossRef] [Google Scholar]

All Tables

Table 1

Prior distributions of model parameters for training data generation.

In the text

Table 2

Average Hellinger distance $\bar{H}$ ${\bar H}$ for 20 randomly selected validation planets of all MDN (log-ratio) output distributions and of their corresponding transformed parameters.

In the text

Table 3

Average inference time for different numbers of planets.

In the text

Table A.1

Predicted thicknesses d_i of interior layers for exoplanets with well known masses and radii.

In the text

All Figures

	Fig. 1 Schematic overview of the MDN architecture and inference procedure.
In the text

Fig. 2

Radius accuracy of Model 1 after recalculating the planet interior based on the MDN prediction. Panel a shows the distribution of the relative radius error of 10000 sample points. The blue line marks the median, with the blue area showing the range where 80% of values lie. Panels b–e show the standard deviation in relative radius errors σ for a variety of planet parameters: bulk density (b), planet mass (c), equilibrium temperature (d), and average atmosphere thickness ${\bar{d}}_{Gas}$ ${{\bar d}_{{\rm{Gas}}}}$ of recomputed planet samples (e). Each point represents one of 500 planets from the test data set (see the text for more details).

In the text

	Fig. 3 Radius accuracy (a) and k₂ accuracy (b) of model 2 after recalculating the planet interior based on the MDN prediction. Panel a shows the distribution in the relative radius error, panel b of the relative k₂ error of 10 000 sample points. Blue lines mark the median, with the blue areas showing the range where 80% of values lie.
In the text

	Fig. 4 Example of the Hellinger distances for a well-predicted planet (4.722 M_⊕, 1.82 R_⊕) from the 20 validation planets. The blue line marks the independent validation, the orange line shows the ExoMDN prediction.
In the text

	Fig. 5 Predicted log-ratios of the thickness of interior layers for an Earth-like planet with 1 M_⊕ and 1 R_⊕. The ellipses in the top right plots mark the location and covariance of each of the 50 Gaussian kernels, with the colors showing the mixture weight of each kernel.
In the text

Fig. 6

Predicted thickness of interior layers for an Earth-like planet with 1 M_⊕ and 1 R_⊕. The colored points mark possible end-member compositions, which are illustrated on the right. The red circle corresponds to Earth’s true interior structure. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles.

In the text

Fig. 7

Predicted thickness of interior layers for a Neptune-like planet with 17.1 M_⊕ and 3.865 R_⊕. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95 percentiles.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

In the text

Fig. 8

Effect of radius uncertainty, δR, on the ability to constrain the interior for Earth (a) and Neptune (b) analogs. Each panel shows the marginal distributions for each interior layer with increasing amounts of radius uncertainty. An uncertainty of 10% in mass and 2% in T_eq has been assumed for both planets in all cases.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

In the text

Fig. 9

Predicted thickness of interior layers for an Earth-like planet with 1 M_⊕ and 1 R_⊕, where also k₂ is known (k₂ = 0.933) in addition to mass and radius. The red circle marks Earth’s true interior structure. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles. Compared to Fig. 6, the axis limits have been adjusted to better show the model results.

In the text

Fig. 10

Effect of k₂ uncertainty on the ability to constrain the interior for Earth (a) and Neptune (b) analogs. Each panel shows the marginal distributions for each interior layer with increasing amounts of k₂ uncertainty. An uncertainty of 5 in mass, 3 in radius, and 2% in T_eq has been assumed for both planets in all cases.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

In the text

Fig. B.1

Predicted mass fraction of interior layers for an Earth-like planet with 1 M_⊕ and 1 R⊕. The red circle corresponds to Earth’s true interior structure. Due to the presence of lighter elements in Earth’s core, the actual core mass of Earth 33% is slightly lower than what is predicted by ExoMDN for a solution with no water and atmosphere (39%). The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles.

In the text

Fig. B.2

Predicted mass fraction of interior layers for a Neptune-like planet with 17.1 M_⊕ and 3.865 R_⊕. The diagonal plots show the marginal distributions of each layer, with the blue dashed lines marking the median value and the dotted lines the 5th and 95th percentiles.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

In the text

Fig. B.3

Effect of radius (δR) and mass (δM) uncertainty on the ability to constrain the interior for Earth (a) and Neptune (b) analogs. Each panel shows the marginal distributions for each interior layer with increasing amounts of radius and mass uncertainty. A T_eq uncertainty of 2% has been assumed for both planets.^(*) Instead of Neptune’s equilibrium temperature of 51 K, a value of 100 K was used to be in line with the parameter range of the training data.

In the text

Fig. B.4

Comparison of the two possible approaches to incorporate measurement uncertainties (Sect. 3.6) for an Earth-like planet with 5% radius and 10% mass uncertainty. 5000 posterior distributions were predicted from random mass, radius, and T_eq inputs within the uncertainties. The red line shows the predicted thickness of each interior layer obtained by first summing up all 5000 posterior distributions and then taking 5000 random samples from the combined mixture, while the dark blue, light blue, and green lines show those obtained by first taking 1, 10, and 100 samples, respectively, from each of the 5000 posteriors and then combining the samples.

In the text

Fig. B.5

Illustration of the differences in interior models between the previous work (Baumeister et al. 2020, black line) and this work (blue and red lines, for two different equilibrium temperatures T_eq = 100K and T_eq = 1000K, respectively). The figure shows density profiles of a representative 5 M_⊕ planet with w_Core = 0.2, w_Mantle = 0.49, w_Water = 0.3, and w_Gas = 0.01 (Panel a). Panel b shows a zoomed-in view of only the water and atmosphere layers. The dashed lines mark the respective planets’ radii.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Abadi, M., Agarwal, A., Barham, P., et al. 2016, ArXiv e-prints [arXiv:1603.04467] [Google Scholar]

[2] Acuña, L., Deleuil, M., Mousis, O., et al. 2021, A&A, 647, A53 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[3] Agarwal, S., Tosi, N., Kessel, P., Breuer, D., & Montavon, G. 2021a, Phys. Rev. Fluids, 6, 113801 [NASA ADS] [CrossRef] [Google Scholar]

[4] Agarwal, S., Tosi, N., Kessel, P., et al. 2021b, Earth and Space Science, 8, 4 [CrossRef] [Google Scholar]

[5] Aitchison, J. 1982, J. R. Stat. Soc. Ser. B Methodol., 44, 139 [Google Scholar]

[6] Akinsanmi, B., Barros, S. C. C., Santos, N. C., et al. 2019, A&A, 621, A117 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[7] Alibert, Y., & Venturini, J. 2019, A&A, 626, A21 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[8] Atkins, S., Valentine, A. P., Tackley, P. J., & Trampert, J. 2016, Phys. Earth Planet. Int., 257, 171 [CrossRef] [Google Scholar]

[9] Auddy, S., Dey, R., Lin, M.-K., Carrera, D., & Simon, J. B. 2022, ApJ, 936, 93 [NASA ADS] [CrossRef] [Google Scholar]

[10] Baumeister, P., Padovan, S., Tosi, N., et al. 2020, ApJ, 889, 42 [Google Scholar]

[11] Bellman, R. 1966, Science, 153, 34 [NASA ADS] [CrossRef] [Google Scholar]

[12] Bishop, C. M. 1994, Mixture Density Networks, Tech. Rep., Aston University [Google Scholar]

[13] Bouchet, J., Mazevet, S., Morard, G., Guyot, F., & Musella, R. 2013, Phys. Rev. B, 87, 094102 [Google Scholar]

[14] Brando, A. 2017, Master’s Thesis, Universitat Politecnica de Catalunya, Spain [Google Scholar]

[15] Brugger, B., Mousis, O., Deleuil, M., & Deschamps, F. 2017, ApJ, 850, 93 [NASA ADS] [CrossRef] [Google Scholar]

[16] Cambioni, S., Asphaug, E., Emsenhuber, A., et al. 2019, ApJ, 875, 40 [NASA ADS] [CrossRef] [Google Scholar]

[17] Chaushev, A., Raynard, L., Goad, M. R., et al. 2019, MNRAS, 488, 5232 [Google Scholar]

[18] Chollet, F. et al. 2015, Keras, https://keras.io/ [Google Scholar]

[19] Csizmadia, Sz., Hellard, H., & Smith, A. M. S. 2019, A&A, 623, A45 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[20] Dorn, C., & Heng, K. 2018, ApJ, 853, 64 [Google Scholar]

[21] Dorn, C., Khan, A., Heng, K., et al. 2015, A&A, 577, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[22] Dorn, C., Hinkel, N. R., & Venturini, J. 2017a, A&A, 597, A38 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[23] Dorn, C., Venturini, J., Khan, A., et al. 2017b, A&A, 597, A37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[24] Emsenhuber, A., Cambioni, S., Asphaug, E., et al. 2020, ApJ, 891, 6 [NASA ADS] [CrossRef] [Google Scholar]

[25] Fortney, J. J., Marley, M. S., & Barnes, J. W. 2007, ApJ, 659, 1661 [Google Scholar]

[26] Goodfellow, I., Bengio, Y., & Courville, A. 2017, Deep Learning (Cambridge: MIT Press) [Google Scholar]

[27] Guillot, T. 2010, A&A, 520, A27 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[28] Hakim, K., Rivolidini, A., Van Hoolst, T., et al. 2018, Icarus, 313, 61 [NASA ADS] [CrossRef] [Google Scholar]

[29] Haldemann, J., Alibert, Y., Mordasini, C., & Benz, W. 2020, A&A, 643, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[30] Haldemann, J., Ksoll, V., Walter, D., et al. 2023, A&A, 672, A180 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[31] Hellard, H., Csizmadia, S., Padovan, S., et al. 2019, ApJ, 878, 119 [NASA ADS] [CrossRef] [Google Scholar]

[32] Himes, M. D., Harrington, J., Cobb, A. D., et al. 2022, Planet. Sci. J., 3, 91 [NASA ADS] [CrossRef] [Google Scholar]

[33] Hinkel, N. R., & Unterborn, C. T. 2018, ApJ, 853, 83 [NASA ADS] [CrossRef] [Google Scholar]

[34] Hirose, K., Wood, B., & Vočadlo, L. 2021, Nat. Rev. Earth Environ., 2, 645 [NASA ADS] [CrossRef] [Google Scholar]

[35] Holland, T. J. B., & Powell, R. 2011, J. Metamorph. Geol., 29, 333 [Google Scholar]

[36] Huang, C., Rice, D. R., & Steffen, J. H. 2022, MNRAS, 513, 5256 [NASA ADS] [CrossRef] [Google Scholar]

[37] Hubbard, W. B. 1984, Planetary Interiors (New York, N.Y.: Van Nostrand Reinhold) [Google Scholar]

[38] Hubbard, W. B., Nellis, W. J., Mitchell, A. C., et al. 1991, Science, 253, 648 [NASA ADS] [CrossRef] [Google Scholar]

[39] Jacobson, R. A. 2009, AJ, 137, 4322 [Google Scholar]

[40] Jontof-Hutter, D. 2019, Ann. Rev. Earth Planet. Sci., 47, 141 [Google Scholar]

[41] Kellermann, C., Becker, A., & Redmer, R. 2018, A&A, 615, A39 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[42] Lambeck, K. 1980, The Earth’s Variable Rotation: Geophysical Causes and Con- sequences, Cambridge Monographs on Mechanics (Cambridge: Cambridge University Press) [CrossRef] [Google Scholar]

[43] Leconte, J., & Chabrier, G. 2012, A&A, 540, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[44] MacKenzie, J., Grenfell, J. L., Baumeister, P., et al. 2023, A&A, 671, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[45] Malik, A., Moster, B. P., & Obermeier, C. 2021, MNRAS, 513, 5505 [NASA ADS] [Google Scholar]

[46] Márquez-Neila, P., Fisher, C., Sznitman, R., & Heng, K. 2018, Nat. Astron., 2, 719 [CrossRef] [Google Scholar]

[47] Martin, C., & Duhaime, D. 2019, https://zenodo.org/record/2578015 [Google Scholar]

[48] Nair, V., & Hinton, G. E. 2010, in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10 (Omnipress), 807 [Google Scholar]

[49] Nettelmann, N., Fortney, J. J., Kramm, U., & Redmer, R. 2011, ApJ, 733, 2 [NASA ADS] [CrossRef] [Google Scholar]

[50] Nettelmann, N., Helled, R., Fortney, J. J., & Redmer, R. 2013, Planet. Space Sci., 77, 143 [Google Scholar]

[51] Neuenschwander, B. A., & Helled, R. 2022, MNRAS, 512, 3124 [CrossRef] [Google Scholar]

[52] O’Malley, T., Bursztein, E., Long, J., et al. 2019, https://github.com/keras-team/keras-tuner [Google Scholar]

[53] Padovan, S., Spohn, T., Baumeister, P., et al. 2018, A&A, 620, A178 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[54] Pawlowsky-Glahn, V., & Egozcue, J. J. 2006, Geol. Soc. London Spec. Pub., 264, 1 [NASA ADS] [CrossRef] [Google Scholar]

[55] Plotnykov, M., & Valencia, D. 2020, MNRAS, 499, 932 [CrossRef] [Google Scholar]

[56] Podolak, M., Weizman, A., & Marley, M. 1995, Planet. Space Sci., 43, 1517 [NASA ADS] [CrossRef] [Google Scholar]

[57] Poirier, J.-P. 1994, Phys. Earth Planet. Interiors, 85, 319 [NASA ADS] [CrossRef] [Google Scholar]

[58] Rauer, H., Catala, C., Aerts, C., et al. 2014, Exp. Astron., 38, 249 [Google Scholar]

[59] Rogers, L. A., & Seager, S. 2010, ApJ, 712, 974 [Google Scholar]

[60] Saumon, D., Chabrier, G., & van Horn, H. M. 1995, ApJS, 99, 713 [NASA ADS] [CrossRef] [Google Scholar]

[61] Seager, S., Kuchner, M., Hier-Majumder, C., & Militzer, B. 2007, ApJ, 669, 1279 [Google Scholar]

[62] Sotin, C., Grasset, O., & Mocquet, A. 2007, Icarus, 191, 337 [Google Scholar]

[63] Spaargaren, R. J., Ballmer, M. D., Bower, D. J., Dorn, C., & Tackley, P. J. 2020, A&A, 643, A44 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[64] Spiegel, D. S., Fortney, J. J., & Sotin, C. 2014, Proc. Natl. Acad. Sci., 111, 12622 [NASA ADS] [CrossRef] [Google Scholar]

[65] Tateno, S., Hirose, K., Sata, N., & Ohishi, Y. 2009, Earth Planet. Sci. Lett., 277, 130 [CrossRef] [Google Scholar]

[66] Unterborn, C. T., & Panero, W. R. 2019, J. Geophys. Res. Planets, 124, 1704 [NASA ADS] [CrossRef] [Google Scholar]

[67] Valencia, D., Sasselov, D. D., & O’Connell, R. J. 2007, ApJ, 665, 1413 [Google Scholar]

[68] Valizadegan, H., Martinho, M. J. S., Wilkens, L. S., et al. 2022, ApJ, 926, 120 [NASA ADS] [CrossRef] [Google Scholar]

[69] van de Ven, G. M., Tuytelaars, T., & Tolias, A. S. 2022, Nat. Mach. Intell., 4, 1185 [CrossRef] [Google Scholar]

[70] Van Hoolst, T., Noack, L., & Rivoldini, A. 2019, Adv. Phys. X, 4, 1630316 [NASA ADS] [Google Scholar]

[71] Wagner, F., Sohl, F., Hussmann, H., Grott, M., & Rauer, H. 2011, Icarus, 214, 366 [Google Scholar]

[72] Zeng, L., & Sasselov, D. 2013, PASP, 125, 227 [Google Scholar]

[73] Zeng, L., Jacobsen, S. B., Sasselov, D. D., et al. 2019, Proceedings of the Natl. Acad. Sci., 116, 9723 [NASA ADS] [CrossRef] [Google Scholar]

[74] Zhao, Y., & Ni, D. 2021, A&A, 650, A177 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[75] Zingales, T., & Waldmann, I. P. 2018, AJ, 156, 268 [NASA ADS] [CrossRef] [Google Scholar]

ExoMDN: Rapid characterization of exoplanet interior structures with mixture density networks★,★★

1 Introduction

2 Machine Learning for interior characterization

3 Methods

3.1 Interior model

3.1.1 Iron core

3.1.2 Silicate mantle

3.1.3 Water layer

3.1.4 H/He atmosphere

3.2 Compositional data

3.3 Mixture density networks

3.4 Training data and network architecture

3.5 Backtransformation to mass and radius fractions

3.6 Incorporating measurement uncertainties

4 Validation

4.1 Radius accuracy

4.2 Independent inference

5 Results

5.1 Earth and Neptune

5.2 Application to exoplanets

5.3 Constraining the interior with k2

6 Discussion and conclusions

Acknowledgements

Appendix A Exoplanet predictions

Appendix B Additional figures

References

All Tables

All Figures

ExoMDN: Rapid characterization of exoplanet interior structures with mixture density networks^★,^★★

5.3 Constraining the interior with k₂