Open Access
Issue
A&A
Volume 665, September 2022
Article Number A106
Number of page(s) 29
Section Astronomical instrumentation
DOI https://doi.org/10.1051/0004-6361/202243760
Published online 20 September 2022

© E. Alei et al. 2022

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.

1 Introduction

Temperate terrestrial exoplanets are predicted to be very abundant in our galaxy (Bryson et al. 2021). These planets are ideal candidates when searching for life beyond our Solar System. A powerful way to characterize a terrestrial exoplanet in the context of its habitability is by detecting and studying its atmosphere with the goal of constraining its surface conditions. Atmospheric spectra are influenced by many parameters and processes, such as the chemical composition, the temperature structure of the atmosphere, the presence of clouds, and emission and scattering from the surface.

The detection and characterization of potentially habitable, rocky exoplanets is challenging with current facilities. For this reason, there is widespread interest in the community to build new instruments for the search for life in the universe, as reported in the White Paper series in the context of the ESA “Voyage 2050” process1, as well as the US Astro 2020 Decadal survey (National Academies of Sciences, Engineering, and Medicine 2021). Space missions that aim at characterizing terrestrial exoplanets have been proposed, such as the Habitable Exoplanet Observatory (HabEx; Gaudi et al. 2020) and the Large Ultraviolet Optical Infrared Surveyor (LUVOIR; Peterson et al. 2017), which focus on the reflected (visible and near-infrared) portion of the planetary spectrum, as well as the Large Interferometer for Exoplanets (LIFE; Quanz et al. 2022, hereafter Paper I), which will characterize terrestrial planets in the thermal (mid-infrared) emitted portion of the planetary spectrum. Using nulling interferometry, LIFE will allow us to constrain the radius and effective temperature of (terrestrial) exoplanets, as well as provide unique information about their atmospheric structure and composition (Dannert et al. 2022; Konrad et al. 2022, hereafter Papers II and III, respectively).

Due to the current lack of high-quality observational data, we must rely for now on simulated observations of terrestrial planets to create and improve the analysis algorithms, but also to provide scientific and technical requirements when planning a mission. This effort is currently ongoing within the LIFE Initiative; in a previous study (Paper III), we built a Bayesian retrieval routine to estimate the planetary and atmospheric parameters of a simulated modern Earth twin at a distance of 10 pc as it would be observed by LIFE. In this work, we extend this exercise to other stages in the evolution of Earth’s atmosphere.

Our planet has been habitable for about 4.4 billion years (see, e.g., Heller et al. 2021, and references therein). In this context, we define a planet as habitable if its physical and chemical conditions would allow water, if present, to be liquid on the surface.

In the prebiotic stage of Earth’s evolution, the atmosphere lacked O2 (currently about 21% of the atmospheric composition by volume). It was instead a CO2-N2-H2O-rich atmosphere, with traces of CH4 from volcanism. Early forms of life developed under a reducing environment and survived under anaerobic conditions (Olson et al. 2018, and references therein). Methanogenesis was thought to be a dominant metabolism at this stage (around 3.5 Ga), which would explain the increase in CH4 in the atmosphere (see, e.g., Wolfe & Fournier 2018).

Around 3 Ga, life-forms that could use carbon dioxide to produce oxygen (via oxygenic photosynthesis) appeared (Marais 2000). This eventually led to a significant increase in O2, maximally up to ~1% PAL2 (see Gregory et al. 2021; Lyons et al. 2014, 2021, and references therein) around 2.33 Ga (Luo et al. 2016), during the so-called Great Oxygenation Event (GOE). There is also evidence pointing to a second increase in the O2 abundance (up to ~10% PAL) that occurred around 0.8 Ga, during the “Neoproterozoic Oxygenation Event” (NOE; Shields-Zhou & Och 2011; Campbell & Squire 2010).

The high abundance of carbon dioxide in the early Earth’s atmosphere would have enhanced the atmospheric greenhouse effect, allowing Earth to be habitable despite the fainter solar irradiation (see, e.g., Feulner 2012, and references therein). The positive feedback between the carbon-silicate cycle and the increase in irradiation would have then allowed temperatures conducive to liquid water to be maintained over the last 4 Ga. The increase in irradiation from the Sun over the eons has made the weathering of CO2 more efficient, decreasing the amount of carbon dioxide in the atmosphere and thus dampening the atmospheric greenhouse effect (see, e.g., Graham 2021, and references therein). The appearance of photosynthetic life-forms and the onset of plate tectonics also contributed to the depletion of atmospheric CO2.

Numerous processes, including biology and geology, have driven the wide-ranging evolution of Earth’s atmosphere during the various epochs of its development. Our modern atmosphere represents only a small fraction of Earth’s evolutionary states. It is therefore important to simulate a suitable range of different atmospheric epochs from Earth’s history when investigating Earth-like atmospheres. For this study, we simulated observations obtained by LIFE starting from theoretical spectra of four distinct epochs of Earth’s atmospheric evolution, produced from a self-consistent one-dimensional climate and photochemistry model coupled with a line-by-line radiative transfer model (Rugheimer & Kaltenegger 2018). The observed spectra were simulated using the LIFE noise simulator LIFESIM (for details on the simulator, see Paper II). We then used the Bayesian retrieval routine presented in Paper III to characterize the different atmospheres.

We aim to address a number of research questions. The science-driven questions are: (1) how well LIFE will be able to characterize atmospheres of habitable planets; (2) whether LIFE will be able to differentiate between different atmospheres, and with what confidence; (3) what the impact of clouds on this assessment will be; and (4) what the most promising (combinations of) detectable biosignatures are.

Technology- and computationally driven questions include: (1) whether the combination of the spectral resolution (R = λ/∆λ), signal-to-noise ratio (S/N), and wavelength range defined in Paper III are still adequate for this case study; (2) what the caveats and limitations of the Bayesian retrieval routine are; and (3) what systematics may arise when comparing two different models (e.g., in terms of differences in line lists, scattering treatment, and identification of biomarkers).

We discuss how we adapted the input spectra to simulate LIFE observations and describe the grid of scenarios in Sect. 2. We show and describe the results in Sect. 3. A thorough discussion of our findings and of the potential systematic uncertainties of the retrieval routine is provided in Sect. 4. In Sect. 5, we report the main takeaway points from this study, and in Sect. 6 we provide an overview of ongoing and future studies.

2 Methods

We start by discussing the details of the input spectra that were used in this study (Sect. 2.1). We then discuss the updates on the Bayesian retrieval routine (Sect. 2.2). We describe the assumptions that our model takes into account and the main potential source of systematic errors in the retrievals in Sect. 2.3.

2.1 Input spectra and scenarios

We considered spectra corresponding to four different evolutionary epochs of Earth: the prebiotic Earth (3.9 Ga), the Earth shortly after the GOE (2.0 Ga), the NOE (0.8 Ga), and modern Earth. All considered Earth spectra were produced by Rugheimer & Kaltenegger (2018). These self-consistent spectra were produced using a one-dimensional convective-radiative transfer model loosely coupled with a one-dimensional climate model and a one-dimensional photochemistry model. The authors accounted for the thermal chemistry and photochemistry of more than 55 species. The atmospheres were modeled up to 10−4 bar and split into 100 layers. The radiative forcing of clouds was included by adjusting the surface albedo of the planet.

The results of the photochemistry-climate-radiative model were then fed to a line-by-line radiative transfer model to produce emission spectra. The line lists and the pressure broadening coefficients were from the HITRAN 2016 database (Gordon et al. 2017). Surface scattering was included in the calculations, assuming 70% ocean, 2% coast, and 28% land. In some scenarios, a partial cloud coverage was directly included in the calculation of the emission spectrum. In these cloudy cases, the authors assumed a 60% cloud coverage (split into 40% water clouds at 1 km altitude, 40% water clouds at 6 km altitude, and 20% ice clouds at 12 km altitude) consistent with an averaged Earth cloud model. Aerosol was not included in the calculation. The model has been validated from the visual to the mid-infrared (MIR) wavelength ranges with observations of Earth (Kaltenegger et al. 2007; Kaltenegger & Traub 2009; Rugheimer et al. 2013). For further details, we refer the reader to Rugheimer & Kaltenegger (2018).

For each epoch, we considered both clear sky and cloudy sky spectra, which yields a total of eight scenarios. We assigned every modeled scenario an identifier and a specific color, as listed in Table 1. We use these identifiers throughout the remainder of the paper.

We simulated observations with LIFE via the LIFESIM tool (see Paper II for a description of the simulator). LIFESIM estimates the wavelength-dependent S/N considering all major astrophysical noise sources (stellar leakage, local zodiacal dust emission, and exo-zodiacal dust emission). We considered an Earth-sized planet on a 1 AU orbit around a Sun-like star at a 10 pc distance. For our baseline analyses, we assumed the nominal simulation parameters for LIFESIM as summarized in Table 2 (see Paper I and Paper II for details). We considered an exo-zodi level of three times the local zodiacal dust density, based on the results from the HOSTS survey (Ertel et al. 2020). Similarly to Paper III, we assumed that the noise does not impact the flux. Rather, the noise calculated by LIFESIM represents the uncertainty of each spectral point. This might lead to optimistic results. However, running retrievals on non-randomized spectra can still provide useful information, since they approximate the average retrieval behavior on randomized spectra (see Appendix C of Paper III for a detailed discussion).

Table 1

Model description, identifiers, and colors.

2.2 Updates on the Bayesian retrieval framework

We denote the input spectra from Sect. 2.1 as the “true spectra.” To simulate a LIFE-like observation of these targets, we ran LIFESIM on the true spectra, thus obtaining simulated “observed spectra.” We then performed a retrieval on the observed spectra using petitRADTRANS (Mollière et al. 2019) as “forward model” in the retrieval routine, and the Bayesian sampler model pyMultiNest (Buchner et al. 2014) as “parameter estimation routine” (cf. Paper III).

The theoretical one-dimensional atmospheric model petitRADTRANS (Mollière et al. 2019) applies the radiative transfer equation to calculate spectra corresponding to a set of parameters. These parameters describe the bulk parameters (planetary mass and radius), the pressure-temperature (PT) structure (approximated by a fourth-order polynomial), and the chemical composition of the atmosphere.

Our Bayesian retrieval framework recursively draws combinations of parameters from a set of “priors” that describe the “a priori” probability distribution of each parameter (listed in Table 3) and uses the forward model to compute the corresponding spectra. Then, the Bayesian framework tests how well these calculated spectra fit the observed one using a “likelihood” function (see Eq. (3) in Paper III). In order to sample the prior space efficiently, our retrieval relies on the parameter estimation routine pyMultiNest (Buchner et al. 2014), which is based on MultiNest (Feroz et al. 2009). This routine applies the Nested Sampling algorithm (Skilling 2006) to fit the theoretical spectral model to the observed spectrum and thereby yields estimates and uncertainties for the model parameters. These estimates are the “posterior probability distributions” (or “posteriors”). The posteriors contain the information on which combinations of model parameters best describe the observed spectrum. For more details about the Bayesian retrieval framework, we refer the reader to Paper III.

In our previous work, we argue that the effects of scattering on simulated MIR spectra are negligible at the considered resolutions and LIFESIM noise patterns. In that study, we had to be particularly mindful of the computing time. The version of petitRADTRANS used in Paper III only allowed spectra at R = 1000 to be calculated, which were subsequently binned down to the resolution of the input spectrum (R = 35–100 in Paper III). Calculating a spectrum at R = 1000 excluding scattering required ≈0.5 seconds, whereas including scattering required ≈18 seconds. Since millions of spectra have to be calculated for a single Bayesian retrieval, including scattering was prohibitive with respect to the computing time, especially when considering a large grid of retrieval runs. Recent updates to petitRADTRANS have enabled us to compute spectra at any resolution, provided a grid of correlated-k tables at that resolution is available. These tables can be produced with petitRADTRANS by binning down the R = 1000 opacity tables. We compiled a correlated-k opacity database for R = 50 and used it to produce spectra directly at this resolution. This reduces the computation time per spectrum significantly and allows us to compute emission spectra in ≈0.04 seconds when excluding scattering, and in ≈0.5 seconds when including scattering. This reduction in computing time allowed us to include scattering in the theoretical spectral model.

Other updates on petitRADTRANS were performed. The treatment of collision-induced absorption (CIA) was modified by updating the interpolation of the CIA tables, originally in FORTRAN, to Python. Minor variations in the interpolation options (log-linear compared to nearest neighbor in Paper III) make the new model not directly comparable with its previous version. However, under Earth-like conditions, the differences in the CIA signature on the spectrum remain negligible. Also, the CIA features impact mostly the short wavelengths (λ < 6 μm), where the LIFESIM noise is large. Furthermore, the treatment of scattering was updated. Originally implemented in petitRADTRANS only for gaseous planets (see Mollière et al. 2019), it was adapted to include the scattering by a rocky surface. More information on the new implementation of scattering can be found in Appendix A, as well as the petitRADTRANS documentation2.

This new version of petitRADTRANS is available in the main GitLab repository3.

Using the updated retrieval framework, we ran retrievals for the eight different spectra introduced in Sect. 2.1. We retrieved the same parameters as in Paper III leaving most prior distributions unchanged. An exhaustive list of the parameters and priors used, and the corresponding expected values for the different epochs, are listed in Table 3. Most priors are represented by a boxcar function (hereafter “uniform” priors): every value is equally probable if within a certain range. Two notable exceptions are the priors for the planetary radius Rpl and the logarithm of the planetary mass log10(Mpl), whose priors are Gaussian. The prior we assumed on Rpl is based on the radius estimate expected from observing a terrestrial planet with LIFE during its search phase (see Paper II for details). The log10(Mpl) prior was inferred from Rpl using the statistical mass-radius relation presented in Chen & Kipping (2016; see Paper III for details).

Table 2

Simulation parameters used in LIFEsim for the baseline analyses.

Table 3

Summary of the parameters used in the retrievals, their expected values, and their prior distributions.

2.3 Assumptions and discrepancies

When performing retrievals, we are limited by the maximum number of parameters retrieved in a reasonable computing time. For this reason, we need to make a few simplifications. First, as in Paper III, we parameterized the P–T profile in the retrieval by a fourth-order polynomial. Second, we assumed the abundances of the considered species to be independent of altitude. Third, our retrieval framework did not model clouds for any of the considered scenarios. This is a strong simplification for the cases where we retrieve cloudy input spectra. However, this retrieval approach allows us to investigate the biases in the results obtained when retrieving a cloudy spectrum assuming a cloud-free atmosphere. The addition of a cloud model to our retrieval framework will be tackled in a future study. Finally, a surface reflectance of 0.1 was assumed for all the wavelength range. This is a common value for water-rich habitable terrestrial planets, dominated by the low infrared reflectance of oceans and ice.

In contrast, Rugheimer & Kaltenegger (2018) used P–T profiles that were self-consistently calculated by their climate-photochemistry model to generate the input spectra. The chemical abundance profiles were also altitude-dependent and calculated self-consistently by the climate-photochemistry model. In order to compare the results of the retrievals with the input, we approximated the input P–T and abundance profiles to calculate the expected values of the parameters considered in the retrievals. Regarding the thermal structure of the atmosphere, we determined the expected values of the polynomial coefficients (a4, a3, a2, a1, a0) listed in Table 3 by fitting a fourth-order polynomial to the self-consistent P–T profiles. As for the abundances, we assumed the weighted means over the pressure grid of the altitude-dependent abundance profiles. This is sensible, since the denser layers of the atmosphere (which corresponds to higher pressures) are generally the ones that contribute more to the spectrum. The expected values for the considered species are listed in Table 3 as well. For what concerns the surface reflectance, Rugheimer & Kaltenegger (2018) considered a wavelength-dependent reflectance. However, the reflectance still averaged 0.1 in the wavelength range of interest. We would therefore not expect large variations due to this parameter.

There are also differences between the opacity tables that the two models use. We used the default set of opacities for petitRADTRANS as presented in Mollière et al. (2019). We added the N2O opacity from the ExoMol database (Chubb et al. 2021). The CIA opacities were taken from the HITRAN database. Details and reference papers corresponding to the opacity line lists are shown in Tables 4 and 5. In contrast, the input spectra were calculated using HITRAN 2016 opacities (Rugheimer & Kaltenegger 2018). Differences in line lists and broadening coefficients are therefore to be expected and may cause biases in the results.

Due to these differences in the atmospheric models, we would imagine finding some small discrepancies between the spectra that were published in Rugheimer & Kaltenegger (2018) and the ones that our framework can calculate. We discuss this particular aspect in Sect. 4.4.

Table 4

References for the molecular opacities used in the retrievals.

3 Results

In this section, we show the results from the retrievals on the grid of different input spectra (see Table 1) assuming the baseline parameters listed in Table 2. We start by analyzing the retrieved spectra (Sect. 3.1) to offer a broad overview of the retrieval performance. Then, we study the retrieved P–T profiles (Sect. 3.2), the planetary parameters (Sect. 3.3) and abundances (Sect. 3.4). We also ran additional retrievals of the same scenarios (shown in Table 1) by varying R and S/N. We compare the results of these retrievals in Sect. 3.5.

Table 5

References for the CIA and Rayleigh opacities used in the retrievals.

3.1 Retrieved emission spectra

The main outputs of the Bayesian retrieval framework are the posterior distributions of the parameters, necessary to produce theoretical spectra, that best match the data. The posteriors can be visualized as an N-dimensional space that is a subset of the larger N-dimensional prior space, N being the number of parameters. Each point included in the posterior space has N coordinates and represents a combination of N parameters that, if fed to the theoretical spectral model, would produce a spectrum that was determined by the Bayesian framework to resemble the observed spectrum.

From the available sets of parameters within the posteriors that the routine has calculated, we can therefore produce “retrieved spectra.” These are shown in Fig. 1. Each subplot presents the results for a specific model, compared to the input spectrum binned down to R = 50. Similarly, Fig. 2 shows the logarithm of the ratio between the retrieved emission spectrum and the input emission spectrum for each scenario.

The retrieved spectra are generally in good agreement with the input spectra (within 1σ) for all considered cases. This shows that our retrieval framework can reproduce the simulated input spectra, regardless of the complexity of the input model (in terms of thermal and abundance profiles, and cloud coverage). However, we notice regions with larger uncertainties, especially at wavelengths shorter than ≈8 μm. Here, the ratio between the retrieved spectra and the input spectrum (as shown in Fig. 2) reaches up to a few orders of magnitude. Such differences are, however, still within the noise uncertainty (gray-shaded areas). Smaller differences can be noticed in the main CO2 band at ≈15 μm. Since the noise is not as high as it is in the short wavelength range, these differences are probably due to discrepancies in the opacity tables (see Sect. 4.4).

The parameter estimation routine included in the Bayesian retrieval framework has the task of minimizing the difference between model output and data. For this, no detailed parameterization of the relevant physical processes is required. Many of the relevant physical and chemical parameters are correlated (e.g., the planetary mass and the pressure; the pressure and the chemical abundances), often in a nonlinear way. It is therefore possible for the parameter estimation routine to produce similar spectra as the full (physical) input model over a diverse set of parameters, as a result of such correlations. Hence, it is appropriate to question whether the parameter estimation routine results are on the one hand physically representative, or whether degeneracies and systematics between the retrieved parameters could be influencing the results. The next sections will explore these issues in more detail.

thumbnail Fig. 1

Retrieved spectra compared to the input spectra (black dots) for the various scenarios, ordered by epoch (columns) and cloud coverage (rows). The gray-shaded area indicates the LIFEsim uncertainty. The color-shaded areas represent the confidence envelopes (darker shading corresponds to a higher confidence). The scenarios are color-coded according to Table 1.

thumbnail Fig. 2

Ratios between the retrieved flux and the input flux (in logarithmic scale) for the various scenarios, ordered by epoch (columns) and cloud coverage (rows). The gray-shaded area indicates the LIFEsim uncertainty. The color-shaded areas represent the confidence envelopes (darker shading corresponds to a higher confidence). The scenarios are color-coded according to Table 1.

3.2 Retrieved P–T profiles

In Fig. 3, we show the retrieved P–T profiles compared to the input profiles for all combinations of the four epochs (columns) and the two cloud coverages (rows).

The vertical shape of the retrieved P–T profiles in the lower atmosphere (pressures ≥10−2 bar) roughly follows that of the true P–T profiles. In most cases, the true profiles are contained within the 1er uncertainty envelope. As in Paper III, the uncertainties grow larger at higher altitudes (pressures ≤10−2bar). This indicates that, for the quality of the input spectra we consider for this study, it is not possible to distinguish atmospheres with a stratospheric temperature inversion (i.e., the modern Earth scenario) from those with an isothermal stratosphere (i.e., the NOE, GOE, and prebiotic scenarios). This retrieval limitation is a result of the small overall contribution of the upper atmospheric layers to the planet’s MIR emission spectrum.

A few additional inconsistencies between the retrieved and input P–T profiles are also apparent in the lower altitudes (high pressures). A general feature in the three biotic epochs (modern, NOE, and GOE Earth) is the retrieval of underestimated values for the ground pressure P0 (~0.1 bar as opposed to the true value of ~1 bar). This occurs for both the cloud-free and cloudy spectra. Such an offset could be explained by systematic differences between the radiative transfer models used to produce and retrieve the simulated spectra. We discuss this in more detail in Sect. 4.4. The ground temperatures T0 are on average well retrieved for all clear sky scenarios. In contrast, the retrievals performed for the cloudy spectra systematically underestimate T0, with differences between the retrieved and true value ≲25 K. These results have an impact on assessing the habitability of the simulated exoplanets, which will be discussed in more detail in Sect. 4.

For the prebiotic Earth input spectra, the retrievals provide estimates for P0 and T0 that are in agreement with the true parameter values. Furthermore, the overall uncertainties on the retrieved P–T profile are generally smaller than for the other epochs. However, for the cloudy prebiotic Earth (PRE-C) spectrum, the retrieved P–T profile is a few tens of kelvin warmer than the true value in the intermediate layers of the atmosphere (~10−1 to ~10−3bar). This effect is likely related to the much weaker emission features in the PRE-C scenario compared to the other epochs. We discuss the impact of neglecting clouds in the retrievals in Sect. 4.2.

thumbnail Fig. 3

Retrieved P–T proflies compared to the input profiles (solid black line) for the various scenarios, ordered by epoch (columns) and cloud coverage (rows). The scenarios are color-coded according to Table 1. In each subplot, we also show an inset plot with the two-dimensional histogram of the retrieved surface P–T values. The 1σ, 2σ% and 3cr confidence levels in the P–T profiles and the two-dimensional histograms are indicated by the increasing intensity of the color fill (darker shading corresponds to a higher confidence).

3.3 Retrieved planetary parameters

Figure 4 shows the posterior density distributions we retrieve for the planetary parameters (Rpl, Mpl) and surface conditions (P0, T0) for all considered input spectra. Rpl, Mpl, and P0 are directly retrieved by our framework, while the T0 posterior is calculated from the P–T parameters (polynomial coefficients ai) and Po posteriors. The results are color-coded based on Table 1 and grouped by epoch (rows) and planetary parameters (column). The retrieved parameters for the clear and cloudy input spectrum of the same epoch are shown in the same subplot to facilitate comparison.

We obtain good estimates for all of the parameters considered, especially for the cloud-free scenarios. The posterior distributions roughly follow a Gaussian distribution and are typically centered on the true values.

By comparing the retrieved posteriors for Rpl to the corresponding Gaussian prior range (shown in Fig. 4 as a dotted line), we observe that we manage to better constrain the planet radius Rpl with respect to the prior distribution. For the GOE and modern Earth scenario, the retrieved posteriors do not significantly depend on the cloud coverage. For the NOE and prebiotic input we underestimate Rpl in the retrievals of the cloudy spectra (retrieved as ~0.8–0.9P instead of 1 R). This difference is more pronounced in the prebiotic scenario.

For what concerns Mpl, the retrieval analysis does not add further constraints on the estimates for the planet’s mass. Also, there is no noticeable difference in the retrieval results for Mpl between the four epochs. This finding holds for both the clear and cloudy scenarios and is in agreement with the results we presented in Paper III. It is important to note that, in contrast to the prior assumption (see Sect. 2.2), Mpl is not linked to Rpl through a mass-radius relationship during the retrieval, but it is instead a free parameter. This means that both Mpl and Rpl are independently drawn from their respective prior. In the retrieval, we use both parameters to calculate the planet’s surface gravity, which is required to compute the theoretical emission spectrum. In addition to the surface gravity calculation, we also use Rpl to scale the flux emitted per unit area at the top of the atmosphere (as calculated by petitRADTRANS) to the observed exoplanet flux at a distance d from the observer (generally well known, 10 pc in our study). We do so by multiplying the flux at the top of the atmosphere by the factor (Rpl/d)4.

Since the surface gravity is typically not directly constrainable due to the gravity-abundance degeneracy (see Sect. 3.5), the retrieval struggles to further constrain Mpl. In contrast, the retrieval does manage to constrain Rpl further, as it does not only depend on the surface gravity, but also on the distance-scaling of the spectrum.

The retrieved posteriors for the surface pressure P0 are significantly smaller than the assumed prior distribution (10−4 to 103 bar), meaning that we manage to pose strong constraints on Po with respect to the assumed prior knowledge on the parameter. However, the retrieval tends to underestimate the value of Po in all cases except for the prebiotic one. The retrieved posteriors in these cases are not well represented by a Gaussian, which indicates that the retrieval results for P0 could be degenerate. We discuss this in more detail in Sect. 4.3.

As shown in Sect. 3.2, the retrieved posteriors for the surface temperature T0 are centered on the true values for the cloud-free retrievals. For the cloudy input spectra, the retrievals tend to underestimate the surface temperature. The standard deviation of the retrieved T0 posteriors is roughly +20 K for all biotic scenarios. This is in agreement with the findings made in Paper III. The spread of retrieved posteriors could potentially be reduced by increasing the R or S/N of the input spectra (see Sect. 3.5). Observing strategies for the trade-off between R and S/N will be addressed in Sect. 4.3.

thumbnail Fig. 4

Posterior density distributions for the retrieved exoplanet parameters (columns) for the different epochs (rows) and cloud coverages. We follow the color-coding listed in Table 1 to differentiate the different scenarios. The vertical, solid lines mark the true values for each parameter. The dotted lines in the Rpl and Mpl plots indicate the assumed Gaussian priors. For P0 and T0 we assume broad, flat priors, which are not plotted.

3.4 Retrieved chemical abundance parameters

Figure 5 shows the retrieved posterior distributions for the main atmospheric gases. We again arrange the various scenarios by epoch (row), and atmospheric species (column) and use the color-coding from Table 1. The results for the clear and the cloudy retrievals of one epoch are shown in the same subplot to facilitate comparison.

We plot our expected abundances (listed in Table 3), which are the weighted means (with respect to the pressure) of the original abundance profiles, as black vertical lines. If no true value is plotted, the molecule is not present in the input spectrum. We further indicate the range of variability of the true, pressure-dependent abundance profiles (minimum to maximum) via the shaded gray area in each subplot.

We adopt the same posterior classification scheme that was introduced in Paper III, for an easier comparison of the results. This scheme divides the retrieved posteriors into the following four classes. The first is the constrained (C) class: the posterior is best described by a Gaussian distribution. This implies that abundances both significantly lower and higher than the true value can be ruled out. The second is the sensitivity limit (SL): the abundance is at the retrieval’s detection limit for the species. The posterior exhibits a distinct peak. However, low abundances are not ruled out. The posterior is best described by the convolution of a soft-step function with a Gaussian. The third is the upper limit (UL): the posterior resembles a soft-step function. Large abundances can be excluded, low ones cannot. The last is unconstrained (UC): we cannot retrieve information on the atmospheric abundance. The posterior resembles a constant function over the full prior range. For further details on the specifics of the posterior classification, we refer the reader to Appendix B of Paper III.

We obtain UC posteriors for the abundances of N2 and O2 in all retrievals performed. In accordance with the findings presented in Paper III, these molecules are not detectable in any of the considered scenarios. This finding indicates that the corresponding CIA spectral signatures are too weak to be detectable in the considered input spectra with R = 50 and S/N= 10. To increase readability, we choose not to show the retrieval results for N2 and O2 in Fig. 5. The posterior distributions of these molecules can however be found in the corner plots in Appendix B. Similarly, the trace gases N2O and CO are not detected in any of our retrievals, obtaining unconstrained posteriors for all epochs. The MIR absorption features of these molecules at the considered abundances are also too weak in the considered input spectra to be constrained in our retrievals. This can be seen by the flat posterior distributions for both species in all considered cases (see Fig. 5).

We detect CO2 in all retrievals and the received posterior distributions are generally Gaussian-like (C-type posteriors). Our results suggest that the median abundances of the different posterior distributions are higher than the true value for all the epochs. However, in the prebiotic, GOE, and NOE scenarios the true abundances still lie within the 1er envelope of the retrieved abundances. For the modern Earth scenarios, the true value lies within the 3cr range of the retrieved posterior. This is consistent with a “compensation effect” whereby the retrieval framework is correcting for the underestimated pressure. The degeneracy between chemical composition and atmospheric pressure is well known and it was already encountered in Paper III (see Sect. 4.1). All retrieved CO2 posteriors span about three orders of magnitude (3dex). They all appear very similar even though the expected values of CO2 span from roughly 0.01% (modern Earth) to the order of 10% (prebiotic Earth). This forbids the use of CO2, one of the major absorbers in the atmosphere, as a discriminator between the considered epochs. To reduce the variance in the retrieved abundances, an increase in R and/or S/N might be recommended (see Sect. 3.5).

For the remaining species (O3, CH4, and H2O), the retrieval results depend on the considered epoch: O3 is retrieved accurately in both the modern Earth and NOE Earth scenarios (C-type posteriors). These are the two cases where O3 is more abundant (~10−6 in mass fraction). In contrast, for the prebiotic and GOE scenario, we only retrieve upper limits (UL-type posteriors) for the O3 abundance. This means that we can rule out high abundances of O3 (≳10−6 in mass fraction), but cannot exclude abundances below the retrieved upper limit.

We manage to detect CH4 in the NOE and GOE Earth spectra (C-type posterior), which have a higher CH4 abundance. For the modern and prebiotic Earth spectra, where the abundances are lower, we retrieve an SL-type posterior, which is characterized by a peak in the distribution roughly at the true value and a non-negligible tail toward low abundances. Similar to CO2, CH4 is generally overestimated when detected. However, the true value still lies within the 2cr envelope of the retrieved posterior distribution. The conditional retrieval of O3 and CH4 is particularly significant when discussing the detectability of biosignatures in Earth-like planets with LIFE, which will be discussed in Sect. 4.5.

H2O is constrained in both the modern and prebiotic scenarios, as well as the clear NOE Earth (NOE-CF) model, and the retrieved posteriors are centered on the true value. In contrast, we only detect SL-type posteriors for both the GOE Earth spectra and the cloudy NOE model. In this case, such a difference in retrieval performance cannot be explained by a difference in the abundance of the species, which is fairly constant throughout all the epochs (around ~10−2 in mass fraction). This is likely to be related to the overestimation of CH4, a species that is much more abundant in the GOE and NOE scenarios (10−3 in mass fraction compared to 10−6 for the modern and prebiotic epochs). The spectral signature of H2O overlaps with that of CH4 between 5 and 7 μm, which is where the noise level is very high and the flux levels are low. Therefore, the retrieval framework favors a higher abundance of CH4 at the expense of a larger uncertainty on H2O in these scenarios.

Finally, our results suggest only modest differences in the retrieved abundances obtained from the clear and cloudy input spectra. The presence of clouds in the spectrum does not seem to deteriorate the abundance estimation capabilities for most molecules. In contrast, as seen in the previous subsection, the characterization of a cloudy atmosphere with a cloud-free model will likely result in biases in the retrieved physical parameters. We discuss this topic further in Sect. 4.2.

thumbnail Fig. 5

Posterior density distributions for the retrieved species (columns), the different epochs (rows), and the different cloud coverage scenarios. Results from the various scenarios use the color-coding from Table 1. The solid black lines indicate the expected values for each species, which vary depending on the epoch. The gray-shaded area marks the range of values in the vertically nonconstant abundance profiles, which were used to compute the input spectra.

thumbnail Fig. 6

Retrieved exoplanet parameters for the different scenarios with varying R and S/N values. The error bars denote the 68% confidence intervals. For Mpl and Rpl, we also plot the assumed prior distributions. For T0 and P0, we assumed flat, broad priors. The vertical lines mark the true parameter values.

3.5 Runs at higher resolution and/or signal-to-noise ratio

In this section, we investigate whether our retrieval results can be improved by increasing the quality (R and/or S/N) and thus the information content of the input spectra. We ran ancillary retrievals for the eight scenarios and chose the following combinations of R and S/N5: (1) R = 50 and S/N = 10 (the reference case); (2) R = 100 and S/N = 10; (3) R = 50 and S/N = 20; and (4) R = 100 and S/N = 20.

We provide a summary of the results obtained from these additional retrieval runs in Figs. 6 (planetary parameters) and 7 (abundances). Results from the different ancillary runs are represented using different markers. The results for the different epochs are color-coded according to Table 1. Here, we only show the results for the clear input spectra. The plots corresponding to the cloudy scenarios can be found in Appendix C.

We are particularly interested in significant increases in accuracy (i.e., the retrieved values agree better with the input “truth”) or in precision (i.e., the posterior’s variance is reduced). For higher S/N we expect more precise results since the uncertainty in the input spectrum is lower. This yields stronger constraints on the model parameters. An increase in R should allow for a more robust identification and characterization of the spectral features and thus more accurate retrieval results. Both increased accuracy and/or precision could allow us to differentiate between the different epochs (and generally between different planets) more clearly. This will be of great importance, especially when searching for signatures of life in exoplanetary atmospheres (see Sect. 4.5). We should point out that in our current simulation setup, which ignores (systematic) instrumental noise terms, doubling R at a constant S/N means doubling the integration time, while doubling S/N at a constant R means, roughly, quadrupling it6. This information is crucial for the mission planning and will be further discussed in Sect. 4.3.

In Fig. 6, we notice that increasing the S/N to 20 while keeping R = 50 (square markers) generally results in a narrower posterior for Rpl. We observe a reduction in the variance of the Rpl posterior by up to a factor of 2 compared to the reference case (R = 50, S/N = 10; the circular markers). An increase in R (R = 100, S/N = 10; the diamond markers) causes the variance of the Rpl posterior to shrink to about 70% of the reference case variance. In contrast, we observe no noticeable gain in the accuracy of the retrieved value for Rpl when increasing S/N and R at the same time. On the other hand, the precision of the measurement at R = 100, S/N = 20 improves significantly, with the variance of the Rpl posterior shrinking up to three times compared to the reference case.

We further find that the retrieval of the planetary mass Mpl does not improve significantly when moving to higher R and S/N input spectra. We observe no significant increase in both accuracy and precision. This finding is consistent with the results shown in Paper III. The underlying reason for this observation is the degeneracy between the surface gravity (and thus also Mpi) and the abundances of trace gases (see, e.g., Paper III, Mollière et al. 2015; Feng et al. 2018; Madhusudhan 2018; Quanz et al. 2021). Since gravity and abundances are involved in the hydrostatic equilibrium, it is possible to reproduce the same spectral feature using different combinations of these parameters. This broadens the variance of the posteriors of Mpl and of the atmospheric species.

Increasing the quality of the input spectrum does improve the accuracy of the retrieval for P0 in the clear modern Earth (MOD-CF) case. The results for the other epochs do not exhibit a similar trend with increasing input quality. This failure to retrieve accurate ground pressure estimates is likely rooted in differences between the opacity tables used by the retrieval framework and the ones used to calculate the input spectra (see Sect. 4.4 for more details). Additionally, no noticeable decrease in the variance of the retrieved P0 estimate is present for higher values of R or S/N. This is likely a result of the pressure-abundance degeneracy, which has already been described in Sect. 3.4.

For the surface temperature T0, we do not notice any substantial improvements in the accuracy of the retrieved values when increasing R or S/N. However, as for Rpl, we observe a significant reduction in the variance of the posteriors when increasing S/N and R. Compared to the reference case, the uncertainty in T0 is reduced by a factor of 2 for the runs with S/N = 20 and to about 70% of the reference variance for the runs with R = 100. These improvements in temperature accuracy could be crucial when assessing the potential habitability of an observed exoplanet.

In Fig. 7, we summarize the retrieved posterior distributions in the abundances for the reference case (R = 50 and S/N = 10, circular markers) and all other R and S/N combinations. The abundance posteriors are classified according to our classification scheme (see Sect. 3.4 and Paper III).

Generally, we observe that increases in both S/N and R do not significantly improve the accuracy nor the precision of the retrieved posteriors for the majority of the scenarios. This is again the result of the pressure- and gravity-abundance degeneracies. In particular, the pressure-abundance degeneracy is responsible for the shifts with respect to the true values, whereas the gravity-abundance degeneracy defines the variance of the abundance posteriors. The effects of the pressure-abundance degeneracy can be noticed for CO2 in the MOD-CF scenario. For the reference case (circular marker) we strongly underestimated P0, which is compensated by an overestimation in the CO2 abundance. As we move to higher R and S/N input spectra, our estimate for P0 improves, which results in better accuracies for the retrieved CO2 abundance. The same connection between P0 and the retrieved abundances can be seen for all other constrained species.

In contrast, the variance of the CO2 posterior does not decrease significantly with increases in R and S/N since it is limited by the variance of the Mpl posterior (due to the gravity-abundance degeneracy), which is the same for all considered cases. While this behavior describes the results for most species well, there are some noteworthy exceptions that we will discuss here.

Firstly, there could be a tentative detection of O3 (an SL posterior) in the clear GOE Earth (GOE-CF) epoch when increasing the S/N to 20 (square marker). If also the resolution is increased to R = 100 (triangular marker), we could better constrain the O3 abundance. Purely increasing R to 100 would not improve the accuracy or the precision of O3 (diamond marker). Similarly, increasing the S/N would allow for detection of CH4 in all four epochs, which was not possible for the reference case (circular marker). However, the retrieved CH4 abundances are one to two orders of magnitude higher than the truths. Results suggest similar, but less pronounced, systematic offsets with respect to the true values for the other constrained species. These offsets are likely the result of a combination of the degeneracy between P0 and the abundances and systematic errors, such as differences in the molecular line lists (see Sect. 4.4). Both O3 and CH4 are of particular interest for astrobiology, since they are indicative of disequilibrium chemistry in the atmosphere and could indicate the presence of biological activity on the planet. We discuss this in more detail in Sects. 4.3 and 4.5.

Furthermore, an increase in S/N would enable robust detection of H2O in the GOE-CF epoch (a C- instead of an SL-type posterior). On the contrary, increasing the resolution alone does not have the same effect.

Finally, CO is unconstrained for all epochs and R-S/N pairs, which indicates that this species could not be detected in an

Earth-like atmosphere with LIFE. Similarly, none of the runs were able to fully constrain the N2O abundance. The retrieval can only provide upper limits on the N2O abundance, which often only manage to rule out atmospheric abundances greater than 1 % in mass fraction. The retrieval is therefore not sensitive to these molecules - their spectral signatures are too small compared to the LIFEsim noise to be detected even in the best considered scenario. An exception would be the GOE-CF at R = 100 and S/N = 20 scenario, for which we retrieve a wrong estimate of N2O (around 1% in mass fraction), about six orders of magnitude larger than the true value. The retrieval is most likely fitting the noise and/or spectral signatures of most of the other species at shorter wavelengths (λ ≲ 8 μm). Hence, when analyzing observations of potentially habitable terrestrial planets, we should be mindful not only of the false positive mechanisms that may be active in the atmosphere, but also of the false positives that the retrieval routines can infer. One could try to solve this issue by averaging over multiple retrieval runs, or by reducing the prior space with inferred knowledge from independent observations.

The retrievals of the ancillary cloudy input spectra (shown in Appendix C) do not show any noticeable improvement in either accuracy or precision for all scenarios with increased R and S/N. The values of Rpl and T0 are still underestimated for all the scenarios. However, all considered R and S/N combinations still allow for an atmospheric abundance characterization for the cloudy input spectra. This analysis is subject to similar limitations as those already discussed for the subset of clear input spectra. The impact of clouds in retrievals will be discussed in more detail in Sect. 4.2.

thumbnail Fig. 7

Retrieved atmospheric abundances for the different ancillary runs. Results belonging to the various scenarios are provided using the color-coding from Table 1. We use different markers for the runs at different R-S/N (see legend). The solid lines indicate the expected values for each species, which vary depending on the epoch. The gray-shaded areas mark the range of values in the vertically nonconstant abundance profiles of the input spectra. The posterior distributions were classified using our posterior classification scheme (see Sect. 3.4 for details).

4 Discussion

In Sect. 4.1, we compare the results we obtain for the cloud-free modern Earth twin with the results from a similar study performed in Paper III. As previously mentioned, we retrieved spectra of cloudy exoplanets, while neglecting clouds in the forward model of the retrieval framework. We describe this effect in Sect. 4.2. We discuss the impact of the quality of the data on the retrievals in Sect. 4.3 and the systematic effects of the retrieval runs in Sect. 4.4. Finally, we quantify the potential that LIFE has in differentiating the various epochs (Sect. 4.5).

For completeness, we mention that we also tested the impact of varying complexity of the theoretical spectral model on retrievals, by including and excluding scattering and/or CIA in the calculation. Through the analysis and comparison of ancillary retrieval grids, we confirm that including or neglecting scattering and CIA in the calculation does not influence the quality of the results. We show the results in Appendix D.

4.1 Comparison with Paper III

To allow for a proper comparison, we selected the model from Paper III that uses the same R, S/N, and wavelength range (4–18.5 μm, R = 50, and S/N = 10). The major difference between these two retrieval studies is that in Paper III retrievals were performed using the same theoretical atmospheric model that was also used to generate the input spectra. In contrast, in this work we used an atmospheric model in the retrieval that is different than the one that was used to generate the spectrum. Further, in our previous study, we assumed abundance profiles that were vertically constant while the spectra calculated by Rugheimer & Kaltenegger (2018) were based on a self-consistent, altitude-dependent atmospheric composition. In Fig. 8 we compare the retrieval results for the constrained planetary parameters and abundances from Paper III to our findings for the MOD-CF case.

In the upper panel of Fig. 8, we plot the retrieval results for the planetary parameters. The planetary radius Rpl is well constrained with respect to the assumed prior distribution and both posteriors are roughly centered on the corresponding truths. However, the spread of the MOD-CF Rpl posterior is larger than in Paper III, which indicates that the radius is slightly less well constrained. For Mpl our results are comparable to Paper III. Our results for the surface pressure P0 and surface temperature T0 agree less well with the results presented in Paper III. This is probably caused by small differences in the input P–T profiles, as well as potential systematic errors, which we discuss in more detail in Sect. 4.4.

In the lower panel of Fig. 8, we show the results obtained for the abundances of the trace gases that are constrained (C- or SL- type posteriors) by our retrieval analysis. We observe that the MOD-CF retrieval tends to overestimate the true abundances, while the estimates from Paper III appear more accurate. The retrieved posterior types match for all of the atmospheric gases considered. Additionally, for CO2, O3, and H2O, the spread of the posteriors for the MOD-CF runs is comparable to the results from Paper III. The larger spread in our CH4 abundance is the result of a slightly reduced sensitivity, which is most likely evoked by differences in the atmospheric scenarios used to generate the input spectrum and in the retrievals. These differences will reduce the accuracy overall of the retrieval results.

thumbnail Fig. 8

Comparison of retrieval results for constrained planetary parameters and atmospheric abundances in the MOD-CF case (blue, square marker) with results from Paper III (brown, circular marker) for input spectra with the same properties (wavelength coverage 4-18.5 μm, R = 50, S/N = 10). The vertical black lines indicate the true values assumed for the parameters in each study. For parameters where we assumed a non-flat prior, we indicate the prior range (black, pentagonal marker). The error bars on the constrained posteriors denote the 68% confidence intervals.

4.2 Impact of clouds on retrieval results

As pointed out in Sect. 3, using a cloud-free atmospheric model in our retrievals will likely have introduced biases into our results for the cloudy scenarios. The presence of clouds in an atmosphere will reduce the MIR continuum emission of the observed exoplanet. The emission spectra of terrestrial exoplanets are typically dominated by the lowest, nonopaque atmospheric layers. In cloudy exoplanets, part of this thermal emission is hidden below the clouds. Thus, the atmospheric layers above the clouds contribute more to the overall spectrum. Because the atmospheric temperature at the top of the clouds is typically lower than the surface temperature of the planet, cloud coverage will generally lead to a cooler retrieval temperature with reduced continuum flux. This reduction in continuum flux can be clearly seen when comparing the clear to the cloudy spectra in Fig. 1.

Since the theoretical spectral model we use for our retrievals assumes a cloud-free atmosphere, the continuum emission must be reduced in other ways to achieve a satisfactory fit to the input spectrum. This reduction can be obtained by reducing the radius (and thus the emitting surface) and/or the surface temperature of the exoplanet. Due to these compensation effects, most of our retrievals of cloudy input spectra yield smaller radii and/or cooler surface temperatures than the cloud-free inputs (see Fig. 4). This is also valid at higher spectral resolutions and signal-to-noise ratios, showing that the compensation effects are independent of the quality of the data (see Fig. C.1).

In addition to the surface conditions, the thermal structure of the layers and the chemical composition of the atmosphere also play a major role in shaping the emission spectrum, especially in the absorption and emission features. This could yield biased retrieval results for other parameters, such as the ai coefficients of the polynomial P–T profile. In our results, the PRE-C model shows the clearest signature of this compensation effect (see Fig. 3). This degeneracy between cloud coverage and thermal structure in retrievals has also been found in other studies (see, e.g., Mollière et al. 2020, and references therein).

Such biased results could be misleading, especially when trying to analyze the habitability of an observed exoplanet. If, by neglecting clouds in our theoretical spectral model, we underestimate the surface temperature, we could therefore misclassify habitable exoplanets. A clear example is the cloudy modern Earth (MOD-C) scenario: using our cloud-free forward model to retrieve this cloudy spectrum causes the retrieved ground temperature to be colder than 275 K. Such low temperatures suggest a potentially uninhabitable planet, which we know is not the correct interpretation for the MOD-C spectrum.

On the other hand, when looking at the retrieved chemical abundances (see Fig. 5), we observe only minor variations in the shape of the posteriors for all the major absorbing gases in the atmosphere. This indicates that, despite having a major impact on the retrieved physical parameters (Rpl and T0), retrieving cloudy spectra with a cloud-free model does not significantly impact the chemical characterization of the atmosphere.

It is important to mention that the clouds included in Rugheimer & Kaltenegger (2018) act purely as continuum absorbers or emitters, with no specific cloud spectral features. Assuming a more complex treatment of clouds in the input spectra might still cause degenerate results, especially in the retrieval of a correct water abundance.

Therefore, including a cloud model in the theoretical spectral model that we use for retrievals could improve the quality of the results. However, this depends on the goal of the analysis. If we aim to characterize the chemical composition of the atmosphere, it may be sufficient to use cloud-free retrievals. This would be a smart strategy considering that including a somewhat realistic cloud treatment in the theoretical spectral model significantly increases the number of retrieved parameters and subsequently the running time. Performing retrievals on input spectra that include visible and near-infrared data in addition to the MIR observations will likely provide additional information about the cloud composition and structure. In this sense, coupling with data acquired by HabEx or LUVOIR and LIFE may significantly improve retrieval results. We will compare the retrieval performance for different cloud models and discuss the capabilities of joint reflected light and thermal emission retrievals in future publications.

4.3 Increasing the quality of the input spectra

The results of the retrievals performed assuming other combinations of R and S/N, described in Sect. 3.5, show that increasing the S/N to 20 will allow us to detect both O3 and CH4 in more cases. Increasing to R = 100 would also improve our results, especially when combined with an increase in S/N. This is an interesting finding for multiple reasons.

From the scientific point of view, simultaneously detecting O3 (which can provide an indirect estimate of O2) and CH4 would be a strong indicator of chemical disequilibrium in the atmosphere possibly hinting at the existence of biological activity. Such a detection would make the respective exoplanet a high-priority target for the search for life beyond the Solar System. This concept will be further explored in Sect. 4.5.

From the technical point of view, it would mean that one needs to consider longer integration times while maintaining a stable architecture of the interferometer array. For the assumptions in our baseline case (see Table 2), doubling the resolution would roughly correspond to a doubling of the integration time (from ~50 to ~100 days) while doubling the S/N would translate in integration times roughly four times longer (from ~50 to ~200 days). This poses challenges in terms of mission technical feasibility as well as mission scheduling. Increasing the instrument throughput, for which we assumed a conservative value (cf. Paper I), or the aperture size would bring the required integration times down. Also, the nearest rocky exoplanets orbiting within the habitable zone (HZ) of their solar-type host stars may not be 10 pc away. Bryson et al. (2021) estimate that with 95% confidence the nearest HZ planet around G and K dwarfs is ~6pc away and they predict ~4 HZ rocky planets around G and K dwarfs within 10 pc of the Sun. Taking all this together, we would therefore recommend to stick to the baseline requirements for LIFE of R = 50 and S/N = 10, as proposed in Paper III, since they allow for a reliable and quantitative characterization of the most important physical and chemical properties of the considered atmospheres. The most promising targets could then be observed further to increase the S/N, thus allowing a more precise characterization of the atmosphere.

4.4 Systematics and current challenges

Thus far, we can confidently conclude that our Bayesian framework can retrieve consistent and robust results. This is not only valid for simulated observations generated with petitRADTRANS (see Paper III), but also for input spectra produced by other radiative transfer models (here by Rugheimer & Kaltenegger 2018). These results are highly promising in the context of analyzing real observational data in the future. However, as we mentioned in the previous sections, our work has identified some aspects that may lead to biased results. Some issues are linked to the intrinsic limitations of the Bayesian retrieval routine we described in Sect. 2.3. Ideally, these can be mitigated to improve the results, for example by choosing a different P–T profile parametrization, or by adding a cloud model to the retrieval. Further, we purposely chose to perform our retrievals assuming uniform priors for most parameters where all values were possible if within a specified, wide range (see Sect. 2 for details). However, for future observations, the prior space might already be constrained (e.g., if one or more parameters are already measured by independent observations) and this would likely improve our retrieval results.

Despite these possibilities, we will eventually be limited by two factors. The first is the number of parameters that the Bayesian framework can handle within reasonable computing time. This limit on the number of parameters will remain unless novel parameter estimation algorithms emerge. An example would be the use of machine-learning retrieval routines (e.g., Waldmann 2016; Marquez-Neila et al. 2018; Cobb et al. 2019). Second, for a given resolution the information content of the spectrum is limited. Therefore, considering additional parameters in the retrieval framework could bias the results, for example causing a false positive inference of an atmospheric species.

However, the most relevant issues are independent of the parameter estimation routine. They are rooted in the intrinsic differences between individual radiative transfer models used to produce the MIR input spectra and the theoretical spectra in the retrievals. Such discrepancies may be caused by a slightly different treatment of physical or chemical processes, or differences in the assumed opacity tables. To investigate these issues, we computed the MIR spectra for the four clear-sky scenarios (MOD-CF, NOE-CF, GOE-CF, and PRE-CF) using petitRADTRANS. We assumed exactly the same input parameters (i.e., P–T profile, abundances, planetary dimensions) that Rugheimer & Kaltenegger (2018) used to produce their spectra. We show the results for R = 50 in Fig. 9. The petitRADTRANS spectra are plotted as solid lines using the color scheme from Table 1. The input spectra from Rugheimer & Kaltenegger (2018) are shown as black dots. The error bars indicate the LIFESIM uncertainty used in the main grid of retrievals (S/N = 10 at 11.2 μm).

We observe that the petitRADTRANS spectra deviate (mostly within the LIFESIM uncertainty) from the spectra calculated by Rugheimer & Kaltenegger (2018), despite both models assuming the exact same input. While the absorption features are generally in agreement with each other, the spectra produced by petitRADTRANS show a higher continuum flux, especially around 8–12 μm. This discrepancy is likely linked to differences in the opacity tables used by the two radiative transfer models. As stated in Sect. 2.3, these differ with respect to the wing cutoff, the line list databases, and the pressure broadening coefficients.

To prevent the wings of the pressure-broadened lines from extending to infinity (nonphysical), it is necessary to introduce a wing cutoff. However, different radiative transfer models assume different cutoff thresholds (see the comparisons performed by, e.g., Lee et al. 2019; Baudino et al. 2017; Barstow et al. 2020). Rugheimer & Kaltenegger (2018) used a wing cutoff at 25 cm−1 from the line center. In contrast, the line cutoff used for the petitRADTRANS opacity tables assumes an exponential line wing decrease (for details, see Mollière et al. 2019). This may explain the higher continuum emission observed for all petitRADTRANS spectra.

Regarding the line list databases, the default opacity tables used by petitRADTRANS stem from different sources. They were calculated from the HITEMP, HITRAN 2012, or ExoMol line lists (see Table 4). In contrast, the spectra from Rugheimer & Kaltenegger (2018) were computed using only the HITRAN 2016 line lists, which in some cases are more recent than the ones adopted in our study. At the pressures and temperatures of interest in the study, we would not expect large variations in the line lists, provided all the databases are synchronous. The use of different versions of the same database (e.g., HITRAN 2012 versus HITRAN 2016) might cause variations in the opacities since databases more recently updated generally include more transition lines (see, e.g., Gordon et al. 2017). Furthermore, the default petitRADTRANS opacities only account for transitions of the main isotope, while the opacity tables used in Rugheimer & Kaltenegger (2018) can account for additional isotopes.

For the pressure broadening coefficients: to compute the line profiles, it is necessary to account for collision-driven line broadening. This depends on the pressure and composition of each atmospheric layer. For most molecules, both models assumed air broadening, which is based on a modern-Earth-like atmospheric composition. However, for CH4, the petitRADTRANS opacity table assumed a theoretical broadening model based on Eq. (15) in Sharp & Burrows (2007), which was experimentally validated. Another exception is N2O, for which H-He broadening was assumed (see Chubb et al. 2021). However, at the pressures and temperatures of interest, we do not expect large differences due to pressure broadening (Sharp & Burrows 2007; Mollière et al. 2019; Gharib-Nezhad & Line 2019; Chubb et al. 2021). We mention it here for completeness.

These differences likely also account for a substantial part of the offsets we find in the retrieved parameter values. Future inter-comparison studies could help us define a “best practice” upon which to agree, as a community, to compute opacity tables for retrievals in order to minimize these systematic effects. Furthermore, ongoing experimental work will be necessary to improve the completeness of the transition line databases and reduce discrepancies.

thumbnail Fig. 9

Comparison of the R = 50 MIR spectra of the four clear-sky epochs (MOD-CF, NOE-CF, GOE-CF, and PRE-CF; solid lines following the color scheme of Table 1) produced with petitRADTRANS with the results from Rugheimer & Kaltenegger (2018) (black dots) assuming the same input parameters (i.e., P–T profile, abundances, planetary dimensions). The error bars indicate the LIFEsim uncertainty assumed for the main grid of retrievals (S/N = 10 at 11.2 μm).

4.5 Differentiating the epochs

A quantitative approach to differentiate between the various scenarios is through the results of the retrieval analyses. We performed a first qualitative step in this direction in Sects. 3.2, 3.3, and 3.4, where we visually compared the retrieved P–T profiles and the posteriors for the planetary parameters and abundances. Through visual comparison, we find that differentiating the epochs via the retrieved P–T structure and planetary parameters is challenging. By studying the retrieved abundance posteriors, we find that the best candidates to perform such differentiation are O3 and CH4. This finding is especially interesting since the O2–CH4 pair is generally considered the strongest biosignature (see Lovelock 1965; Lederberg 1965) and O2 can be constrained from O3 through atmospheric chemistry models. Thus, the detection of one or both of these molecules will likely trigger follow-up observations and could allow us to separate between potentially alive and lifeless planets. However, a more in-depth characterization of the atmospheres is limited by the large variance in the posteriors of all these species, which typically exceeds one order of magnitude.

A more quantitative separation between the retrieved posterior distributions for the various epochs can be achieved by considering the difference between the cumulative posterior distribution functions of two epochs for a model parameter. This approach is similar to the Kolmogorov-Smirnov test (Kolmogorov 1933; Smirnov 1939), which is generally used to assess whether two samples are drawn from the same underlying distribution. Given a model parameter M with prior range X = [Xmin, Xmax], we calculated the cumulative distribution GM(x) for xX of the retrieved posterior P(x) as follows:

(1)

We then compared the cumulative distribution functions and of two different epochs, a and b, by considering the maximum difference between them:

(2)

Thus, small values of ∆ indicate that the compared posterior distributions only show small differences relative to each other. In this case, it is hard to differentiate between the retrieved posteriors. On the other hand, larger values of ∆ indicate that the differences between the two posteriors are likely to correspond to different underlying true values of the considered parameter.

We can calculate ∆ for all the combinations of the various scenarios and all parameters. We get particularly interesting results for CH4 and O3. Figure 10 shows the cumulative distribution functions for all the combinations of the clear sky scenarios (MOD-CF, NOE-CF, GOE-CF, and PRE-CF) calculated from the posteriors of CH4 and O3, for R = 50 and S/N = 10. In each subplot of the corner plot we annotate the values of ∆ (percentage) corresponding to each combination. On the diagonals, the retrieved posteriors for every scenario are shown for reference. We keep the color scheme defined by Table 1. Regarding CH4, we can fairly confidently distinguish between the clear prebiotic Earth (PRE-CF) and the Earth after the GOE (GOE-CF), for which ∆ = 95%, as well as between PRE-CF and the Earth after the NOE (NOE-CF), for which ∆ = 90%. The distinction between the prebiotic Earth and the modern Earth (MOD-CF), as well as between the NOE and the GOE Earth is more difficult (∆ ≤ 31%). For O3, we observe a clear division into two subgroups: on the one side the modern and NOE Earth, where we have a clear detection of O3, and on the other hand, the GOE and prebiotic Earth, where we only retrieve an upper limit on the abundance. The high value of ∆ ~ 90% between all combinations of MOD-CF or NOE-CF versus GOE-CF or PRE-CF clearly allows such a distinction. This is in agreement with what we concluded from Fig. 5 in Sect. 3.4. However, in contrast to the qualitative discussion based on the appearance of the posteriors, ∆ provides a promising metric to quantify the magnitude of these differences.

In Fig. 11, we summarize the calculated ∆ values for all combinations of cloud-free input spectra and the different R-S/N pairs considered in Sect. 4.3, for a total of four tables. Within the tables, each cell shows the ∆ value (percentage) for a given parameter (columns) and a comparison of two specific scenarios (rows). The cells are also colored according to the value of ∆, with darker hues for larger values of ∆. As mentioned above, the biggest differences between the posteriors at R = 50, S/N = 10 can be observed for the molecules CH4 and O3. Furthermore, we observe some differences for CO2 and H2O, as well as for the P0 posteriors. However, as discussed previously, these differences are rooted in degeneracies between the pressure and the abundances (see Sect. 4.1) and are not caused by large physical differences in the underlying atmospheres. These findings are generally still valid as we move to higher R and S/N. Similar conclusions can also be drawn for the cloudy inputs (see Appendix C).

Since we are able to confidently detect O3 for a clear Earth after the NOE and for the modern Earth and we can distinguish these two epochs from earlier scenarios (prebiotic and GOE Earth), we can infer that LIFE would be able to detect traces of life as we know it in an Earth-like atmosphere when the abundance of O2 has passed the 10% PAL threshold. This is consistent with other studies that focused on different wavelength ranges, such as the work by Kawashima & Rugheimer (2019) based on LUVOIR. The biosignature pair CH4-O3 might be even easier to detect when the abundance of O2 is around 10% PAL (NOE Earth), rather than on modern Earth. The NOE Earth scenario is particularly favored since the atmosphere is filled with enough O3 to be detectable, but a low enough abundance of O2 to deplete the CH4 in the atmosphere. These results are also consistent with the results shown in Kawashima & Rugheimer (2019). In other words, if LIFE were to observe the Earth at various stages of its evolution orbiting the Sun at a 10 pc distance, it would be able to detect strong indicators of life starting from around 0.8 Ga (NOE Earth). The detection of CH4 with an upper limit on O3 would also allow a tentative detection of potential biological activity up to 2.0 Ga (GOE Earth).

We must keep in mind that the epochs that we chose for our study are momentary “snapshots” in the continuous evolution of Earth, even though these four scenarios represent the major changes in our atmosphere. Still, other evolutionary paths are possible in the context of exoplanets, especially when considering other stellar classes. Realistically, all promising candidates would be followed-up with additional observations within the LIFE mission. It is beyond the scope of this work to conclusively infer the presence of a biosphere from the measured spectra of potentially habitable candidates, As discussed in other works such as Meadows et al. (2018) or Krissansen-Totton et al. (2022), we would require a thorough discussion of the context information available for the observed planetary system before claiming a “life detection.” However, the presented retrieval results are certainly an important piece of information for the development of frameworks for systematically assessing biosignature detections (e.g., Catling et al. 2018; Walker et al. 2018, see also the NFoLD Community Report from the Biosignatures Standards of Evidence Workshop7).

thumbnail Fig. 10

Comparison of the cumulative distribution functions of the CH4 posteriors (left corner plot) and the O3 posteriors (right corner plot) for all the combinations of the clear sky scenarios (MOD-CF, NOE-CF, GOE-CF, and PRE-CF). The retrieved posteriors for each scenario are shown on the diagonal. Following the color scheme in Table 1, we show the posteriors and cumulative distribution functions as: solid blue lines (MOD-CF); dotted red lines (NOE-CF); dashed green lines (GOE-CF); and dash-dotted purple lines (PRE-CF).

thumbnail Fig. 11

Maximum difference, ∆, between the cumulative posteriors for the different model parameters for each combination of input spectra (cloud-free subset) and different R–S/N pairs. The background of each cell in the tables is related to the value of ∆ (darker hues for larger ∆).

5 Conclusions

The Bayesian retrieval framework introduced in Paper III and extended here has delivered insightful answers to the questions introduced in Sect. 1. These can be summarized as follows:

  1. LIFE can characterize prebiotic and biotic worlds. We can constrain the surface temperatures with an uncertainty of around 20 K. We can confirm, exclude, or give upper limits on the presence of several astrobiologically relevant molecules that show signatures in the MIR bands. In particular, LIFE can detect o3 in the atmosphere if the O2 mass fraction is on the order of ~10% PAL. In terrestrial atmospheres CH4 can be constrained if its abundance is ~0.1% in mass fraction. For lower abundances (around 10−6 in mass fraction), LIFE will detect an upper limit on CH4 (SL-type posterior). Simultaneously constraining O3 and CH4 will be possible in atmospheres with an O2 abundance of around 10% PAL. This is in agreement with other studies based on a different wavelength range (Kawashima & Rugheimer 2019). Such a result is relevant in terms of the detection of biosignatures in the atmospheres of habitable exoplanets.

  2. Neglecting clouds in the retrievals could cause biases in the determination of the thermal structure of the atmosphere of cloudy exoplanets. However, cloud-free retrievals of cloudy spectra can still yield accurate results for what concerns the atmospheric composition.

  3. We confirm that the minimum requirements in spectral resolution and S/N for a MIR mission such as LIFE found in Paper III are also sufficient for the scenarios considered here. Improving the S/N would allow for a clearer detection of O3 and CH4 even when these species are less abundant (until ~10−7 in mass fraction). Therefore, a better characterization could be obtained by observing promising targets longer during the characterization phase of the mission (or by increasing the instrument throughput and/or aperture size).

  4. We are able to demonstrate that inter-model comparison and retrieval is possible, with the caveats and limitations detailed in Sect. 4.4. The most important discrepancies in the retrievals are caused by the use of different opacity tables, in particular for what concerns the line wing cutoff treatment. Degeneracies and correlations in the posteriors appear as a result of the various relations among various parameters.

6 Next steps

Several new interesting questions and opportunities for more detailed studies arise from this work. First of all, we plan to work on a study that will take advantage of the model selection potential that Bayesian retrievals have to offer, for example by comparing retrievals including and excluding non-retrieved parameters (e.g., the CO abundance). We are also performing retrievals assuming various cloud models (Konrad et al., in prep.). Retrievals of hazy planets (see, e.g., Arney et al. 2016), as well as ocean worlds, might also help us further quantify the science potential of LIFE for a variety of different planet types.

Another interesting study would be to increase S/N and R to even higher values. This will not only evaluate the extreme limits of a concept such as LIFE but also help us better understand if retrievals are limited by R rather than S/N (e.g., due to unresolved narrow features at low R). It would also be useful to compare different R-S/N combinations, this time fixing the observing time. This would help us quantify the best R-S/N combination needed to optimize the characterization of a terrestrial atmosphere. Further work is needed to optimize the yield in the characterization phase of the LIFE mission concept. The estimates of the observation time needed to establish knowledge about the habitability and the presence of biologically relevant molecules in the atmosphere that we derived here are crucial pieces of information for these follow-up studies.

In this work, we only used simulated data obtained with the LIFE mission. However, in the future there will likely be more information available for each system and planet. Therefore, it will be important to put this study in context with other observations. For instance, joint retrievals of reflected light data obtained with LUVOIR or HabEx at optical-to-near-infrared wavelengths and thermal emission spectra as obtained by LIFE would provide useful insight into the synergies between the various missions.

One of the most important open questions regarding the ultimate goal of detecting extrasolar life will require us to put our results in the context of life detection frameworks (e.g., Green et al. 2021; Catling et al. 2018; Walker et al. 2018). Our ongoing retrieval efforts could be useful for the fine-tuning of such frameworks. These frameworks, in turn, would provide insight into the meaning and the likelihood of a potential biosignature detection, which would allow us to infer the presence of life-forms on another planet and justify such inferences.

Acknowledgements

This work has been carried out within the framework of the National Center of Competence in Research PlanetS supported by the Swiss National Science Foundation under grants 51NF40_182901 and 51NF40_205606. S.P.Q. and E.A. acknowledge the financial support from the SNSF. P.M. acknowledges support from the European Research Council under the European Union’s Horizon 2020 research and innovation program under grant agreement No. 832428. We thank the reviewer Dr. Joanna Barstow for the helpful comments and suggestions. E.A. and B.S.K. thank Jean Hayoz and J.L.G. thanks ISSI Team 464 for useful discussions. E.A. carried out the analyses, created the figures, and wrote the bulk part of the manuscript. B.S.K. and D.A. wrote part of the manuscript. S.P.Q. initiated the project, guided the project, and wrote part of the manuscript. All authors discussed the results and commented on the manuscript. This research made use of: Astropy (http://www.astropy.org), a community-developed core Python package for Astronomy (Astropy Collaboration 2013, 2018); Matplotlib (https://matplotlib.org/3.1.1/index.html) (Hunter 2007); pandas (pandas development team 2020); seaborn (https://seaborn.pydata.org).

Appendix A Scattering of terrestrial exoplanets

As discussed in Mollière et al. (2020), petitRADTRANS was updated to treat scattering. This was done using the Feautrier method (Feautrier 1964). This is a third-order method that allows the treatment of the radiative transfer equation in the diffusive regime.

The Feautrier method solves the angle- and frequency-dependent radiative transfer equation for both the planetary and the stellar radiation field. These can be treated separately, since the radiative transfer equation (Eq. A.1) depends only linearly from the intensity I:

(A.1)

Here, μ = cos θ where θ is the angle between a light ray and the surface normal, τ is the optical depth, I is the intensity, and S is the source function.

Conceptually, for any direction μ of a ray, there also exists a ray in direction −μ, where μ ∈ [−1,1]. It is possible to instead let μ run from 0 to 1 only, and define rays I+ and I parallel and antiparallel to this direction. For one of these, the projection onto the atmospheric normal vector (defined by the scalar product) will be positive (going upward), while for the other one it will be negative (that is, going downward). Eq. A.1 can be therefore rewritten as

(A.2)

(A.3)

To solve these, it is convenient to define other two variables,

(A.4)

(A.5)

such that Eqs. A.2 and A.3 become

(A.6)

(A.7)

Replacing IH as defined by Eq. A.6 into Eq. A.7, we obtain Feautrier’s equation:

(A.8)

In this paper we only take thermal scattering (i.e., scattering of the planetary radiation) into account. We, therefore, neglect the scattering of the direct stellar contribution. However, since the radiative transfer equation depends only linearly on Iν, the contribution of the stellar radiation can be treated as an additional term in the calculation (see Mollière et al. 2017). This term is also included in the latest version of petitRADTRANS and we refer to the online documentation for a more detailed description8.

Purely considering the planetary radiation, we define the boundary conditions at the top of the atmosphere:

(A.9)

meaning that there is no planetary radiation coming downward from the top of the atmosphere, and at the surface

(A.10)

The constraint on I- at the lower boundary is composed of the thermal emission of the surface itself (blackbody radiation scaled by the surface emissivity esurf) and by a portion of the incoming planetary radiation that is reflected by the surface. The wavelength dependence of the effectiveness of the reflection depends on the “surface albedo” or “reflectance” asurf. The average scattered intensity Jscat is the integral of I+ over all the possible angles (, which corresponds to the light that comes from the top layers):

(A.11)

The boundary conditions translate, in terms of IH and IJ, in

(A.12)

and

(A.13)

It is possible to thus solve Eq. A.8 for i ≠ 1, i ≠ N by discretization:

(A.14)

which can be expressed in matrix form by extracting the coefficients ai, bi, and ci:

(A.15)

To take into account the boundary conditions, at i = 1 the value of a1 is 0, while at i = N both cN and aN will be 0, since there is no dependence from the (N − 1)th layer in the boundary condition A.13; bN, as a consequence, will be equal to 1.

The tridiagonal matrix can be inverted to retrieve the corresponding values of IJ through multiple iterations. This iterative process is needed to correctly take into account the scattering contribution into the source function terms Si. During the first iteration of the Feautrier’s routine, the scattering contribution has yet to be properly calculated. The source function at this step corresponds to the thermal blackbody radiation produced by the atmospheric layer i at temperature Ti:

(A.16)

For any other iteration, the model will consider the previous solution for Ij to calculate the new source function, which will then include the contribution of the photons that have been scattered in the previous steps. In the case of i = N, the source function must correspond to the right term in Eq. A.13, computed using the mostrecent estimate of I+(Psurf ) and Jscat(Psurf ). This process can be accelerated through the accelerated lambda iteration and Ng methods (see Mollière et al. 2017, p. 75)).

From that value, it is possible to calculate IH using Eq. A.6. The emergent flux at the top of the atmosphere can be then calculated as follows:

(A.17)

The iterations stop once the estimate of the flux has reached a convergence value.

Appendix B Corner plots

Corner plots for the retrieval runs at the reference R and S/N are shown in this section. We grouped both the cloudy and the clear sky retrievals for each epoch in the same figure, in order to compare the results. Namely: Fig. B.1 shows the corner plots of the two modern Earth scenarios (MOD-CF and MOD-C); Fig. B.2 shows the NOE Earth scenarios (NOE-CF and NOE-c); the GOE Earth scenarios (GOE-CF and GOE-C) are in Fig. B.3; finally, the prebiotic scenarios (PRE-CF and PRE-C) are shown in Fig. B.4.

The models are color-coded according to Table 1. Also, the results for the clear sky retrievals are shown using dashed contour lines, while the cloudy models are represented using solid lines. The table on the top right of each figure shows the expected values for each parameter, together with the estimates and the 1σ uncertainty for the two scenarios.

thumbnail Fig. B.1

Corner plot for the posterior distributions from the retrievals of the MOD-CF (dashed contour lines) and MOD-C (solid contour lines) scenarios. The black lines indicate the expected values for every parameter. The retrieved values (median and 1σ uncertainties) are shown in the table in the top-right corner, together with the expected values. The scenarios are color-coded according to Table 1.

thumbnail Fig. B.2

As for Fig. B.1 but for the NOE-CF and NOE-C scenarios.

thumbnail Fig. B.3

As for Fig. B.1 but for the GOE-CF and GOE-C scenarios.

thumbnail Fig. B.4

As for Fig. B.1 but for the PRE-CF and PRE-C scenarios.

Appendix C Cloudy scenarios: Additional figures

In this section we provide additional plots for the cloudy scenarios. In Figs. C.1 and C.2 we show the retrieved exoplanet parameters and abundances for the different scenarios with varying R and S/N values. Finally, we plot in Fig. C.3 the maximum difference ∆ between the cumulative posteriors for the different model parameters, for each combination of the cloudy scenarios and different R-S/N pairs.

thumbnail Fig. C.1

As for Fig. 6 but for the cloudy scenarios.

thumbnail Fig. C.2

As for Fig. 7 but for the cloudy scenarios.

thumbnail Fig. C.3

As for Fig. 11, but for the cloudy scenarios.

Appendix D Bayes factor analysis: Other epochs

As described in Sect. 2, the theoretical spectral model was updated with respect to Paper III and it now takes into account additional physical processes. For the results presented in Sect. 3, we ran retrievals using the most updated version of the Bayesian framework.

The additional flexibility of petitRADTRANS now allows us to quantify the impact of CIA and scattering in retrievals. We ran additional retrievals on the clear sky scenarios for R = 50 and S /N = 10. In these retrievals, we altered the number of physical processes that were treated in the petitRADTRANS theoretical spectral model as follows: (1) including both CIA and scattering (setup used in Sect. 3); (2) excluding both CIA and scattering; (3) including scattering and excluding CIA; and (4) including CIA and excluding scattering.

In the runs where scattering is included, we consider self-scattering, surface scattering of the thermal radiation, and gaseous Rayleigh scattering (see Table 5 for references). We do not include aerosol and cloud scattering in the calculation. Since our theoretical spectral model neglects clouds, in this analysis we considered only the cloud-free scenarios. The effect of modeling cloudy spectra using a cloud-free retrieval model will be discussed in detail in Sect. 4.2.

To determine the theoretical spectral model configuration that best reproduces the input spectra we performed a Bayes factor analysis. The Bayes factor is defined as

(D.1)

where M1 and M2 represent two different model configurations, each with their corresponding Bayesian evidence given the input data D. In the case of Eq. D.1, the Bayes factor provides an indication whether M1 or M2 better describes the data. We can use the Jeffreys scale (see Table D.1) to interpret the values of the Bayes factor K. This approach was extensively described and used in Paper III, to which we refer for more details.

We calculate the Bayes factor corresponding to every possible combination of the four different setups previously outlined. We summarize the results obtained for the MOD-CF epoch in Fig. D.1.

Here, the diagonal shows the values of the Bayesian evidence of each of the four setups Mi. The triangular matrix of six boxes below the diagonal is instead filled with the logarithm of the Bayes factors K (see Eq. D.1) for each combination of theoretical spectral model setups, as well as their interpretation according to the Jeffreys scale (Table D.1). The cells are color-coded according to the color bar in the lower panel, whose edges are determined by the Jeffreys scale.

Table D.1

Jeffreys scale (Jeffreys 1998).

The blue and red color range adopted in the color bar was chosen deliberately in order to illustrate that, for any pair of models M1 and M2, a redder shade would mean that M1 is more likely to reproduce the data compared to M2, while a bluer shade would instead prefer M2 over M1 . Our results show colors that lie somewhere in the middle of the range of the color bar, which corresponds to a value of log10(K) generally very close to 0, as confirmed by the text within the cells. This means that there is no clear preference for any of the tested setups: we find no evidence that one of the considered setups outperforms the others in describing the input data. Including or excluding CIA and scattering (one or both) results in negligible differences in the retrieval results. This means that CIA in the spectra or/and spectral features induced by scattering are not detectable in retrieval studies at the considered R and S/N of the input. This analysis shows us that it is justifiable to neglect CIA and scattering in MIR retrievals of spectra with R = 50 and S /N = 10 (the minimum requirements for LIFE determined by Paper III), with negligible loss in the quality of the retrieval results.

The results for the remaining epochs exhibit similar behavior. In Fig. D.2 we show the results for the NOE Earth, Fig. D.3 shows the ones for the GOE Earth, and the results for the prebiotic Earth are shown in Fig. D.4.

thumbnail Fig. D.1

Bayesian evidence, , for each setup for the MOD-CF scenario (on the diagonal) and the Bayes factor for every pair of retrieval setups for the MOD-CF scenario (lower triangle). The cells in the lower triangle are color-coded according to the color bar, whose limits are determined by the Jeffreys scale (see Table D.1).

thumbnail Fig. D.2

As for Fig. D.1 but for the NOE-CF scenario.

thumbnail Fig. D.3

As for Fig. D.1 but for the GOE-CF scenario.

thumbnail Fig. D.4

As for Fig. D.1 but for the PRE-CF scenario.

References

  1. Arney, G., Domagal-Goldman, S. D., Meadows, V. S., et al. 2016, Astrobiology, 16, 873 [Google Scholar]
  2. Astropy Collaboration (Robitaille, T.P., et al.) 2013, A&A, 558, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Astropy Collaboration (Price-Whelan, A.M., et al.) 2018, AJ, 156, 123 [NASA ADS] [CrossRef] [Google Scholar]
  4. Barstow, J. K., Changeat, Q., Garland, R., et al. 2020, MNRAS, 493, 4884 [NASA ADS] [CrossRef] [Google Scholar]
  5. Baudino, J.-L., Mollière, P., Venot, O., et al. 2017, ApJ, 850, 150 [Google Scholar]
  6. Bryson, S., Kunimoto, M., Kopparapu, R. K., et al. 2021, AJ, 161, 36 [NASA ADS] [CrossRef] [Google Scholar]
  7. Buchner, J., Georgakakis, A., Nandra, K., et al. 2014, A&A, 564, A125 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  8. Burch, D. E., Gryvnak, D. A., Patty, R. R., & Bartky, C. E. 1969, J. Opt. Soc. Am., 59, 267 [NASA ADS] [CrossRef] [Google Scholar]
  9. Campbell, I. H., & Squire, R. J. 2010, Geochim. Cosmochim. Acta, 74, 4187 [NASA ADS] [CrossRef] [Google Scholar]
  10. Catling, D. C., Krissansen-Totton, J., Kiang, N. Y., et al. 2018, Astrobiology, 18, 709 [NASA ADS] [CrossRef] [Google Scholar]
  11. Chen, J., & Kipping, D. 2016, ApJ, 834, 17 [Google Scholar]
  12. Chubb, K. L., Rocchetto, M., Yurchenko, S. N., et al. 2021, A&A, 646, A21 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  13. Cobb, A. D., Himes, M. D., Soboczenski, F., et al. 2019, AJ, 158, 33 [NASA ADS] [CrossRef] [Google Scholar]
  14. Dannert, F., Ottiger, M., Quanz, S. P., et al. 2022, A&A, 664, A22 (Paper II) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  15. Ertel, S., Defrère, D., Hinz, P., et al. 2020, AJ, 159, 177 [Google Scholar]
  16. Feautrier, P. 1964, Compt. Rend. Acad. Sci. (Serie non specifiee), 258, 3189 [NASA ADS] [Google Scholar]
  17. Feng, Y. K., Robinson, T. D., Fortney, J. J., et al. 2018, AJ, 155, 200 [NASA ADS] [CrossRef] [Google Scholar]
  18. Feroz, F., Hobson, M. P., & Bridges, M. 2009, MNRAS, 398, 1601 [NASA ADS] [CrossRef] [Google Scholar]
  19. Feulner, G. 2012, Rev. Geophys., 50, RG2006 [CrossRef] [Google Scholar]
  20. Gaudi, B. S., Seager, S., Mennesson, B., et al. 2020, ArXiv e-prints, [arXiv:2001.06683] [Google Scholar]
  21. Gharib-Nezhad, E., & Line, M. R. 2019, ApJ, 872, 27 [NASA ADS] [CrossRef] [Google Scholar]
  22. Gordon, I. E., Rothman, L. S., Hill, C., et al. 2017, J. Quant. Spec. Radiat. Transf., 203, 3 [NASA ADS] [CrossRef] [Google Scholar]
  23. Graham, R. J. 2021, Astrobiology, 21, 1406 [NASA ADS] [CrossRef] [Google Scholar]
  24. Green, J., Hoehler, T., Neveu, M., et al. 2021, Nature, 598, 575 [NASA ADS] [CrossRef] [Google Scholar]
  25. Gregory, B. S., Claire, M. W., & Rugheimer, S. 2021, Earth Planet. Sci. Lett., 561, 116818 [CrossRef] [Google Scholar]
  26. Hartmann, J. M., Boulet, C., Brodbeck, C., et al. 2002, J. Quant. Spec. Radiat. Transf., 72, 117 [NASA ADS] [CrossRef] [Google Scholar]
  27. Harvey, A. H., Gallagher, J. S., & Levelt Sengers, J. M. H. 1998, J. Phys. Chem. Ref. Data, 27, 761 [NASA ADS] [CrossRef] [Google Scholar]
  28. Heller, R., Duda, J.-P., Winkler, M., Reitner, J., & Gizon, L. 2021, PalZ, 95, 563 [NASA ADS] [CrossRef] [Google Scholar]
  29. Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [Google Scholar]
  30. Jeffreys, H. 1998, The Theory of Probability, Oxford Classic Texts in the PhysicalSciences (OUP Oxford), 432 [Google Scholar]
  31. Kaltenegger, L., & Traub, W. A. 2009, ApJ, 698, 519 [Google Scholar]
  32. Kaltenegger, L., Traub, W. A., & Jucks, K. W. 2007, ApJ, 658, 598 [NASA ADS] [CrossRef] [Google Scholar]
  33. Karman, T., Gordon, I. E., van der Avoird, A., et al. 2019, Icarus, 328, 160 [Google Scholar]
  34. Kawashima, Y., & Rugheimer, S. 2019, AJ, 157, 213 [NASA ADS] [CrossRef] [Google Scholar]
  35. Kolmogorov, A. 1933, Inst. Ital. Attuari, Giorn., 4, 83 [Google Scholar]
  36. Konrad, B. S., Alei, E., Angerhausen, D., et al. 2022, A&A, 664, A23 (Paper III) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  37. Krissansen-Totton, J., Thompson, M., Galloway, M. L., & Fortney, J. J. 2022, Nat. Astron., 6, 189 [NASA ADS] [CrossRef] [Google Scholar]
  38. Lederberg, J. 1965, Nature, 207, 9 [NASA ADS] [CrossRef] [Google Scholar]
  39. Lee, E., Taylor, J., Grimm, S. L., et al. 2019, MNRAS, 487, 2082 [NASA ADS] [CrossRef] [Google Scholar]
  40. Lovelock, J. E. 1965, Nature, 207, 568 [NASA ADS] [CrossRef] [Google Scholar]
  41. Luo, G., Ono, S., Beukes, N. J., et al. 2016, Sci. Adv., 2, e1600134 [NASA ADS] [CrossRef] [Google Scholar]
  42. Lyons, T. W., Reinhard, C. T., & Planavsky, N. J. 2014, Nature, 506, 307 [NASA ADS] [CrossRef] [Google Scholar]
  43. Lyons, T. W., Diamond, C. W., Planavsky, N. J., Reinhard, C. T., & Li, C. 2021, Astrobiology, 21, 906 [NASA ADS] [CrossRef] [Google Scholar]
  44. Madhusudhan, N. 2018, Handbook of Exoplanets, 2153 [CrossRef] [Google Scholar]
  45. Marais, D. J. D. 2000, Science, 289, 1703 [CrossRef] [Google Scholar]
  46. Márquez-Neila, P., Fisher, C., Sznitman, R., & Heng, K. 2018, Nat. Astron., 2, 719 [CrossRef] [Google Scholar]
  47. Meadows, V. S., Reinhard, C. T., Arney, G. N., et al. 2018, Astrobiology, 18, 630 [Google Scholar]
  48. Mollière, P., Boekel, R.V., Dullemond, C., Henning, T., & Mordasini, C. 2015, ApJ, 813, 47 [CrossRef] [Google Scholar]
  49. Mollière, P., van Boekel, R., Bouwman, J., et al. 2017, A&A, 600, A10 [Google Scholar]
  50. Mollière, P., Wardenier, J. P., van Boekel, R., et al. 2019, A&A, 627, A67 [Google Scholar]
  51. Mollière, P., Stolker, T., Lacour, S., et al. 2020, A&A, 640, A131 [Google Scholar]
  52. National Academies of Sciences, Engineering, and Medicine. 2021, Pathways to Discovery in Astronomy and Astrophysics for the 2020s (Washington, DC: The National Academies Press) [Google Scholar]
  53. Olson, S. L., Schwieterman, E. W., Reinhard, C. T., & Lyons, T. W. 2018, Earth: Atmospheric Evolution of a Habitable Planet, eds. H.J. Deeg, & J.A. Belmonte (Cham: Springer International Publishing), 2817 [Google Scholar]
  54. Pandas Development Team, T. 2020, https://doi.org/10.5281/zenodo.3509134 [Google Scholar]
  55. Peterson, B. M., Fischer, D., & LUVOIR Science and Technology Definition Team. 2017, in American Astronomical Society Meeting Abstracts, 229, 405.04 [Google Scholar]
  56. Quanz, S. P., Absil, O., Angerhausen, D., et al. 2021, Atmospheric characterization of terrestrial exoplanets in the mid-infrared: biosignatures, habitability & diversity, Exp. Astron. [Google Scholar]
  57. Quanz, S. P., Ottiger, M., Fontanet, E., et al. 2022, A&A, 664, A21 (Paper I) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  58. Rothman, L., Gordon, I., Barber, R., et al. 2010, J. Quant. Spectr. Rad. Transf., 111, 2139 [NASA ADS] [CrossRef] [Google Scholar]
  59. Rothman, L. S., Gordon, I. E., Babikov, Y., et al. 2013, J. Quant. Spec. Radiat. Transf., 130, 4 [NASA ADS] [CrossRef] [Google Scholar]
  60. Rugheimer, S., & Kaltenegger, L. 2018, ApJ, 854, 19 [NASA ADS] [CrossRef] [Google Scholar]
  61. Rugheimer, S., Kaltenegger, L., Zsom, A., Segura, A., & Sasselov, D. 2013, Astrobiology, 13, 251 [NASA ADS] [CrossRef] [Google Scholar]
  62. Sharp, C. M., & Burrows, A. 2007, ApJS, 168, 140 [NASA ADS] [CrossRef] [Google Scholar]
  63. Shields-Zhou, G., & Och, L. 2011, GSA Today, 21, 4 [CrossRef] [Google Scholar]
  64. Skilling, J. 2006, Bayesian Anal., 1, 833 [Google Scholar]
  65. Smirnov, N. V. 1939, Bull. Math. Univ. Moscou, 2, 3 [Google Scholar]
  66. Sneep, M., & Ubachs, W. 2005, J. Quant. Spec. Radiat. Transf., 92, 293 [NASA ADS] [CrossRef] [Google Scholar]
  67. Thalman, R., Zarzana, K. J., Tolbert, M. A., & Volkamer, R. 2014, J. Quant. Spec. Radiat. Transf., 147, 171 [NASA ADS] [CrossRef] [Google Scholar]
  68. Thalman, R., Zarzana, K. J., Tolbert, M. A., & Volkamer, R. 2017, J. Quant. Spec. Radiat. Transf., 189, 281 [NASA ADS] [CrossRef] [Google Scholar]
  69. Waldmann, I. P. 2016, ApJ, 820, 107 [NASA ADS] [CrossRef] [Google Scholar]
  70. Walker, S. I., Bains, W., Cronin, L., et al. 2018, Astrobiology, 18, 779 [CrossRef] [Google Scholar]
  71. Wolfe, J. M., & Fournier, G. P. 2018, Nat. Ecol. Evol., 2, 897 [CrossRef] [Google Scholar]
  72. Yurchenko, S. N., & Tennyson, J. 2014, MNRAS, 440, 1649 [Google Scholar]

2

Present atmospheric level.

5

We remind the reader that the S/N refers to the value at a reference wavelength of 11.2 μm.

6

We refer the reader to the appendix of Paper I, where we show a breakdown of the typical noise contributions for planets detected around solar-type stars.

All Tables

Table 1

Model description, identifiers, and colors.

Table 2

Simulation parameters used in LIFEsim for the baseline analyses.

Table 3

Summary of the parameters used in the retrievals, their expected values, and their prior distributions.

Table 4

References for the molecular opacities used in the retrievals.

Table 5

References for the CIA and Rayleigh opacities used in the retrievals.

All Figures

thumbnail Fig. 1

Retrieved spectra compared to the input spectra (black dots) for the various scenarios, ordered by epoch (columns) and cloud coverage (rows). The gray-shaded area indicates the LIFEsim uncertainty. The color-shaded areas represent the confidence envelopes (darker shading corresponds to a higher confidence). The scenarios are color-coded according to Table 1.

In the text
thumbnail Fig. 2

Ratios between the retrieved flux and the input flux (in logarithmic scale) for the various scenarios, ordered by epoch (columns) and cloud coverage (rows). The gray-shaded area indicates the LIFEsim uncertainty. The color-shaded areas represent the confidence envelopes (darker shading corresponds to a higher confidence). The scenarios are color-coded according to Table 1.

In the text
thumbnail Fig. 3

Retrieved P–T proflies compared to the input profiles (solid black line) for the various scenarios, ordered by epoch (columns) and cloud coverage (rows). The scenarios are color-coded according to Table 1. In each subplot, we also show an inset plot with the two-dimensional histogram of the retrieved surface P–T values. The 1σ, 2σ% and 3cr confidence levels in the P–T profiles and the two-dimensional histograms are indicated by the increasing intensity of the color fill (darker shading corresponds to a higher confidence).

In the text
thumbnail Fig. 4

Posterior density distributions for the retrieved exoplanet parameters (columns) for the different epochs (rows) and cloud coverages. We follow the color-coding listed in Table 1 to differentiate the different scenarios. The vertical, solid lines mark the true values for each parameter. The dotted lines in the Rpl and Mpl plots indicate the assumed Gaussian priors. For P0 and T0 we assume broad, flat priors, which are not plotted.

In the text
thumbnail Fig. 5

Posterior density distributions for the retrieved species (columns), the different epochs (rows), and the different cloud coverage scenarios. Results from the various scenarios use the color-coding from Table 1. The solid black lines indicate the expected values for each species, which vary depending on the epoch. The gray-shaded area marks the range of values in the vertically nonconstant abundance profiles, which were used to compute the input spectra.

In the text
thumbnail Fig. 6

Retrieved exoplanet parameters for the different scenarios with varying R and S/N values. The error bars denote the 68% confidence intervals. For Mpl and Rpl, we also plot the assumed prior distributions. For T0 and P0, we assumed flat, broad priors. The vertical lines mark the true parameter values.

In the text
thumbnail Fig. 7

Retrieved atmospheric abundances for the different ancillary runs. Results belonging to the various scenarios are provided using the color-coding from Table 1. We use different markers for the runs at different R-S/N (see legend). The solid lines indicate the expected values for each species, which vary depending on the epoch. The gray-shaded areas mark the range of values in the vertically nonconstant abundance profiles of the input spectra. The posterior distributions were classified using our posterior classification scheme (see Sect. 3.4 for details).

In the text
thumbnail Fig. 8

Comparison of retrieval results for constrained planetary parameters and atmospheric abundances in the MOD-CF case (blue, square marker) with results from Paper III (brown, circular marker) for input spectra with the same properties (wavelength coverage 4-18.5 μm, R = 50, S/N = 10). The vertical black lines indicate the true values assumed for the parameters in each study. For parameters where we assumed a non-flat prior, we indicate the prior range (black, pentagonal marker). The error bars on the constrained posteriors denote the 68% confidence intervals.

In the text
thumbnail Fig. 9

Comparison of the R = 50 MIR spectra of the four clear-sky epochs (MOD-CF, NOE-CF, GOE-CF, and PRE-CF; solid lines following the color scheme of Table 1) produced with petitRADTRANS with the results from Rugheimer & Kaltenegger (2018) (black dots) assuming the same input parameters (i.e., P–T profile, abundances, planetary dimensions). The error bars indicate the LIFEsim uncertainty assumed for the main grid of retrievals (S/N = 10 at 11.2 μm).

In the text
thumbnail Fig. 10

Comparison of the cumulative distribution functions of the CH4 posteriors (left corner plot) and the O3 posteriors (right corner plot) for all the combinations of the clear sky scenarios (MOD-CF, NOE-CF, GOE-CF, and PRE-CF). The retrieved posteriors for each scenario are shown on the diagonal. Following the color scheme in Table 1, we show the posteriors and cumulative distribution functions as: solid blue lines (MOD-CF); dotted red lines (NOE-CF); dashed green lines (GOE-CF); and dash-dotted purple lines (PRE-CF).

In the text
thumbnail Fig. 11

Maximum difference, ∆, between the cumulative posteriors for the different model parameters for each combination of input spectra (cloud-free subset) and different R–S/N pairs. The background of each cell in the tables is related to the value of ∆ (darker hues for larger ∆).

In the text
thumbnail Fig. B.1

Corner plot for the posterior distributions from the retrievals of the MOD-CF (dashed contour lines) and MOD-C (solid contour lines) scenarios. The black lines indicate the expected values for every parameter. The retrieved values (median and 1σ uncertainties) are shown in the table in the top-right corner, together with the expected values. The scenarios are color-coded according to Table 1.

In the text
thumbnail Fig. B.2

As for Fig. B.1 but for the NOE-CF and NOE-C scenarios.

In the text
thumbnail Fig. B.3

As for Fig. B.1 but for the GOE-CF and GOE-C scenarios.

In the text
thumbnail Fig. B.4

As for Fig. B.1 but for the PRE-CF and PRE-C scenarios.

In the text
thumbnail Fig. C.1

As for Fig. 6 but for the cloudy scenarios.

In the text
thumbnail Fig. C.2

As for Fig. 7 but for the cloudy scenarios.

In the text
thumbnail Fig. C.3

As for Fig. 11, but for the cloudy scenarios.

In the text
thumbnail Fig. D.1

Bayesian evidence, , for each setup for the MOD-CF scenario (on the diagonal) and the Bayes factor for every pair of retrieval setups for the MOD-CF scenario (lower triangle). The cells in the lower triangle are color-coded according to the color bar, whose limits are determined by the Jeffreys scale (see Table D.1).

In the text
thumbnail Fig. D.2

As for Fig. D.1 but for the NOE-CF scenario.

In the text
thumbnail Fig. D.3

As for Fig. D.1 but for the GOE-CF scenario.

In the text
thumbnail Fig. D.4

As for Fig. D.1 but for the PRE-CF scenario.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.