Issue |
A&A
Volume 699, July 2025
|
|
---|---|---|
Article Number | A213 | |
Number of page(s) | 19 | |
Section | Numerical methods and codes | |
DOI | https://doi.org/10.1051/0004-6361/202453339 | |
Published online | 08 July 2025 |
Simulation-based inference has its own Dodelson–Schneider effect (but it knows that it does)
1
University Observatory, Faculty of Physics, Ludwig-Maximilians-Universität,
Scheinerstr. 1,
81677
Munich,
Germany
2
Munich Center for Machine Learning (MCML),
Germany
3
Excellence Cluster ORIGINS,
Boltzmannstr. 2,
85748
Garching,
Germany
★ Corresponding author: jed.homer@physik.lmu.de
Received:
6
December
2024
Accepted:
7
May
2025
Context. Making inferences about physical properties of the Universe requires knowledge of the data likelihood. A Gaussian distribution is commonly assumed for the uncertainties with a covariance matrix estimated from a set of simulations. The noise in such covariance estimates causes two problems: it distorts the width of the parameter contours, and it adds scatter to the location of those contours that is not captured by the widths themselves. For non-Gaussian likelihoods, an approximation may be derived via simulation-based inference (SBI). It is often implicitly assumed that parameter constraints from SBI analyses, which do not use covariance matrices, are not affected by the same problems as parameter estimation with a covariance matrix estimated from simulations.
Aims. We aim to measure the coverage and marginal variances of the posteriors derived using density-estimation SBI over many identical experiments to investigate whether SBI suffers from effects similar to those of covariance estimation in Gaussian likelihoods.
Methods. We used a neural-posterior and likelihood estimation with continuous and masked autoregressive normalising flows for density estimation. We fitted our approximate posterior models to simulations drawn from a Gaussian linear model, so the SBI result can be compared to the true posterior, and effects related to noise in the covariance estimate are known analytically. We tested linear and neural-network-based compression, demonstrating that neither method circumvents the issues of covariance estimation.
Results. SBI suffers an inflation of posterior variance that is equal to or greater than the analytical result in covariance estimation for Gaussian likelihoods for the same number of simulations. This inflation of variance is captured conservatively by the reported confidence intervals, leading to an acceptable coverage regardless of the number of simulations. The assumption that SBI requires a smaller number of simulations than covariance estimation for a Gaussian likelihood analysis is inaccurate. The limitations of traditional likelihood analysis with simulation-based covariance remain for SBI with finite simulation budget. Despite these issues, we show that SBI correctly draws the true posterior contour when there are enough simulations.
Key words: methods: data analysis / methods: statistical / cosmological parameters / large-scale structure of Universe
© The Authors 2025
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
1 Introduction
Current- and next-generation cosmological experiments such as the Dark Energy Survey (DES; Dark Energy Survey Collaboration 2016), Euclid (Laureijs et al. 2011), the Nancy Grace Roman Space Telescope (Eifler et al. 2021), the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST; Ivezić et al. 2019), the 4m Multi-Object Spectroscopic Telescope (4MOST; de Jong et al. 2019), and the Dark Energy Spectroscopic Instrument (DESI; Levi et al. 2019) will return a huge volume of observational data of the large-scale structure in the Universe. The purpose of this effort is to constrain the values of fundamental physical parameters as accurately and precisely as possible. The limit of the cosmological information that can be extracted from a measurement ultimately depends on the typically unknown likelihood function of the data. The likelihood function compares the data to a theoretical model, and this comparison may be inaccurate and imprecise because
the model prediction for the expectation value as a function of the parameters may not be known analytically, or it may be inaccurately predicted in numerical simulations;
in addition to the expectation value, the likelihood function, which gives the distribution of the measurement around the expectation, is not known.
The likelihood functions for observables of the large-scale structure are often unknown, and only approximate expressions for them exist (Uhlemann et al. 2020). This is particularly the case for higher order statistics. These obtain information beyond the amount extracted by traditional two-point functions (2pt Collaboration 2024) – and can potentially break parameter degeneracies, with many approaches based on measuring higher order correlations of cosmological fields having been proposed (Kacprzak et al. 2016; Uhlemann et al. 2020; Gruen et al. 2018; Friedrich et al. 2018; Hamaus et al. 2020; Contarini et al. 2023; Halder et al. 2021; Valogiannis & Dvorkin 2022; Davies et al. 2022; Hou et al. 2024). This promise but lack of analytical expressions for their likelihood promotes the use of SBI (or ‘likelihood-free’; Cranmer et al. 2020) methods to extract information from such statistics.
Simulation-based inference (SBI; Cranmer et al. 2020) covers a broad class of statistical techniques (e.g. Akeret et al. 2015; Cranmer et al. 2016; Papamakarios 2019; Lueckmann et al. 2017; Alsing et al. 2018; Cole et al. 2022) that derive an approximate likelihood or posterior model from a set of simulations paired with their model parameters. The likelihood is fitted with no assumptions on the data-generating process and allows for complex effects in the measurement process to be forward-modelled. This is in contrast with classical or explicit likelihood methods that require an analytic model for the expectation value and statistical uncertainties in the data. An additional claimed benefit of SBI methods is the ease of using multiple probes simultaneously without analytically modelling cross-correlations between the measurements (Fang et al. 2023; Reeves et al. 2024). The density estimation techniques used for SBI (Alsing et al. 2018; Alsing & Wandelt 2019; Lueckmann et al. 2017; Papamakarios 2019; Glöckler et al. 2022) apply generative models fitted with either maximum-likelihood optimisation or variational inference.
The use of SBI is now established in many branches of cosmology, with a variety of different methods for deriving posteriors from sets of simulations. These include density estimation of the likelihood or posterior (Alsing et al. 2018; Akeret et al. 2015; Makinen et al. 2021; Leclercq & Heavens 2021; Leclercq 2018); ABC (Akeret et al. 2015; Weyant et al. 2013); neural-ratio estimation (Cranmer et al. 2016; Cole et al. 2022), which has been applied to galaxy clustering (Tucci & Schmidt 2024; Modi et al. 2025; Hahn et al. 2023a,b; Lemos et al. 2024; Hou et al. 2024); weak lensing (Lin & Kilbinger 2015; Jeffrey et al. 2020; Gatti et al. 2024; Jeffrey et al. 2025); cosmic shear (Lin et al. 2023); cluster abundance (Tam et al. 2022); cosmic microwave background radiation (Cole et al. 2022); gravitational wave sirens (Gerardi et al. 2021); type Ia supernovae (Weyant et al. 2013); and the cosmic 21cm signal (Prelogović & Mesinger 2023), emerging as a fast and efficient method for deriving posteriors from measurements after fitting to forward-modelled simulations of data. The potential of SBI methods is said to allow for Bayesian inference with high-dimensional data when it comes to unknown or inaccurate models; this is for an expectation value and likelihood that are intractable or difficult to analyse using traditional likelihood-based methods.
Due to the issues surrounding modern observational cosmology, the potential of SBI methods for these problems, and the rapid innovations within the machine learning literature to implement the analyses, we would like to know if SBI methods can return posterior estimates whose locations scatter less compared to the true posterior than those of a Gaussian likelihood with simulation-estimated covariance for the same number of simulations. Moreover, if this is the case, we endeavour to find out whether SBI can determine the additional scatter in the location of its posterior estimates; that is, if SBI inflates its contours sufficiently (compared to the true posterior) to still achieve good parameter coverage in repeated experiments.
To answer these questions, we tested the SBI framework for Gaussian-data vectors with a parameter-independent covariance matrix and linear parameter dependence for the expectation value of the data. This allowed us to compare our SBI posterior estimates, in which the likelihood or data covariance is not known, to both the true posterior and to the posteriors that would be derived from covariance estimation within a Gaussian likelihood assumption. As a data vector, we assumed a tomographic cosmic-shear data vector, and we tested different techniques to compress it: score compression (Alsing & Wandelt 2018) with and without knowledge of the data vectors’ covariance as well as a neural-network-based compression. We then investigated how SBI combined with either of these compression techniques performs with respect to the scatter, width, and coverage of resulting parameter contours as a function of the number of simulations used for training (again, compared to the true posterior and posteriors derived from covariance estimation). These experiments test the common assertion that SBI is not affected by the errors that make covariance estimation impracticable for high-dimensional data vectors and computationally expensive simulations (e.g. Jeffrey et al. 2020; Jeffrey et al. 2025; Gatti et al. 2024). In Fig. 1, we illustrate the answers to our questions with a set of posteriors derived with a Gaussian-likelihood analysis – accounting for the unknown covariance – contrasted with SBI analyses with either compression method.
Our data were generated from a Gaussian likelihood with a model that is linear in its parameters, where a Fisher analysis, the noise bias correction of Hartlap et al. (2006), and the derivation of the excess scatter of the best-fit parameters in Dodelson & Schneider (2013) – known as the Dodelson–Schneider effect – are all valid. This allowed us to analytically determine the bestfit parameters and the confidence contours in each likelihood analysis for comparison with the posteriors derived with SBI.
In Sect. 2, we review the main issues for estimating data covariances and methods used to account for the noise in covariances estimated from simulations, highlighting where SBI is claimed to be advantageous. In Sect. 3, we outline the methods for density estimation, SBI, data compression used in this work, and a simple experiment in which we tested these methods1. In Sect. 4, we present results from the experiments, and in Sect. 6, we conclude.
![]() |
Fig. 1 Effects of estimated data error distribution on posterior contours and their locations. We show estimates |
2 Parameter estimation with noisy covariance matrices
We studied a simple inference problem in which the likelihood is not known. A common ansatz in cosmological analyses is to assume a Gaussian likelihood for a given set of statistics of the large-scale structure. Typically, this statistical model for the likelihood depends on a parametrised expectation value, ξ[π], and a covariance matrix, Σ[π] where π physical parameters need to be estimated from the data. If the true covariance is not known, then an estimate may be derived from simulated data. Often, this estimator is derived for one fiducial set of parameters, and the parameter dependence of the covariance is assumed to be negligible. Note, however, that this is not in principle a limitation of covariance estimation within the Gaussian likelihood assumption. It is possible to derive estimates,
, at different values of the parameters, π (and e.g. interpolate between them). We emphasise this because the parameter dependence of the (width of the) likelihood function is not the main reason why SBI may be preferable to covariance estimation paired with a Gaussian-likelihood assumption. Instead, SBI is most needed in cases where the Gaussian assumption itself (whether with or without parameter-dependent covariance) is insufficient.
The Gaussian likelihood assumption can be justified when the data are made up of measurements in independent sub-volumes of a survey area so that their sum is Gaussian distributed via the central limit theorem. In practice, the validation of this ansatz is demanding (Joachimi et al. 2021; Friedrich et al. 2021) and can result in scale cuts to the data vector, thus reducing the information return of the analysis.
However, even when the Gaussian-likelihood ansatz is justified, a covariance estimation only yields noisy versions of the true covariance, Σ, and its inverse Σ−1 (the precision matrix). If independent and accurately drawn realisations, ξi (i = 1 , ... , ns), of the data vector in simulated data are available, then an unbiased estimate for the data covariance Σ is calculated as
(1)
is the mean calculated over all available data. The inversion of this matrix is a non-linear process that causes S−1 to be a biased estimator of Σ−1 and to have significantly poorer noise properties than S (Taylor et al. 2013).
If the true likelihood function is indeed Gaussian, then the bias of S−1 can be corrected by applying a factor h as (Kaufman 1967; Hartlap et al. 2006)
(3)
The above correction only de-biases and not the full likelihood function, which even in the Gaussian case is a non-linear function of
. Surprisingly, though, at least the width of the likelihood function and of the parameter posteriors that might be derived from it are usually quite well approximated just by inserting (hS)−1 into the Gaussian likelihood function (Friedrich & Eifler 2017; Percival et al. 2021). The main problem of covariance estimation is not the width of parameter posteriors, but the location of the latter. The noise in (hS)−1 causes additional scatter in the location of parameter contours, which can make the overall scatter of maximum-posterior parameters in repeated experiments much larger than indicated by the posterior width itself (Dodelson & Schneider 2013; Friedrich & Eifler 2017).
It is this effect of increased parameter scatter that we were in search of. Naive covariance estimation together with the correction factor h does not understand that it suffers from this scatter (e.g. Fig. 1 of Friedrich & Eifler 2017). This raises the question as to why SBI – via which we estimated not only the covariance, but the full likelihood shape from simulations – would not suffer from this effect. If it does suffer from it, we want to know why SBI automatically adjusts its own contour size to account for this additional scatter in maximum-posterior parameters.
In the case of covariance estimation, Dodelson & Schneider (2013) found that the scatter of maximum-posterior parameters (or rather their parameter covariance) is enhanced compared to inverse Fisher matrix (i.e. compared to the naive expectation) via
(5)
where nξ is the number of data points, nπ is the number of parameters, and the factor B is given by
(6)
In order for parameter contours to take into account this additional uncertainty, the covariance of parameter posteriors should be enhanced by the Dodelson–Schneider factor:
(7)
(8)
where the approximation in the last line is valid if ns ≫ nξ ≫ nπ. A concrete recipe for how to widen parameter posterior in order to correct for the Dodelson–Schneider effect (and other, usually subdominant effects) was provided by Percival et al. (2021). There, the authors marginalised over the unknown, true covariance Σ in a Bayesian approach that takes into account the likelihood function, p(S|Σ), given by the Wishart distribution in the Gaussian case, and a prior distribution p(Σ) ∝ |Σ|m. They provided the coefficient m in the prior distribution that leads to the desired frequentist coverage of the resulting Bayesian parameter constraints (note in particular that the coefficient m considered by Sellentin & Heavens (2015) does not lead to that desired coverage).
As mentioned before, we want to investigate whether SBI suffers from problems analogous to the ones described above, and if it does, whether it self-corrects to obtain the desired coverage properties or whether a manual widening of posteriors as in the case of Gaussian covariance estimation is required. Note, however, that even if SBI automatically corrects for its own Dodelson–Schneider effect, this correction still means that posteriors are widened and this parameter information is diluted. This may significantly hinder SBI approaches from delivering on the promised improvements of parameter constraints compared to likelihood full analyses of summary statistics whose uncertainties can be modelled analytically.
One more comment to emphasise that the Dodelson–Schneider effect cannot be easily circumvented is that it is not possible to beat down fDS by simply compressing a given set of statistics to a smaller data vector. For example, for the popular MOPED compression (Heavens et al. 2000, or equivalently score compression; Alsing & Wandelt 2018), this would require knowledge of the covariance, Σ, in the first place. If that is approximated by an estimate, S, then this is just shifting the problem from one side of the statistical analysis to another. In fact, Dodelson & Schneider (2013, hereafter DS13) derived fDS exactly by considering the scatter of optimally compressed statistics that use noisy covariances.
Given the described assumptions, and upon acquiring a measurement, , inference on the values of model parameters is based on a Gaussian likelihood and the estimated precision matrix
:
(9)
A posterior distribution for the parameters in light of the measurement, using a prior density p(π), is expressed as
(11)
ignoring a parameter-independent normalisation factor.
For a given S and , the maximum a posteriori (MAP) or best-fit parameter estimates
are obtained with an estimated precision matrix of
as
(12)
Here, FΣ is the Fisher matrix, which is a function of the likelihood written as
(13)
for a Gaussian likelihood parametrised with the true precision matrix, Σ−1. The quantity ΔF is of the same form with the error on the precision matrix ΔΣ−1 instead of the unknown true precision matrix Σ−1. For a model ξ[π] – linear in π with Gaussian error bars on the data – the Fisher matrix defined above precisely quantifies the information content of the data upon the model at a fixed point in parameter space.
With the increase in the dimensionality of measurements returned from cutting-edge cosmological surveys, it may not be possible to obtain a precision matrix due to the singularity of the estimated covariance matrix (Hartlap et al. 2006; Dodelson & Schneider 2013; White & Padmanabhan 2015). In particular, this is true for data vectors that combine multiple probes that are critical to breaking parameter degeneracies and calibrate systematic effects independently (Kacprzak & Fluri 2022; Fang et al. 2023; Reeves et al. 2024). In this case, estimating one from simulations requires an unfeasible amount of computation (Taylor & Joachimi 2014; Friedrich & Eifler 2017).
However, if it is possible, a covariance matrix Σ may be estimated from data realisations themselves (e.g. Jackknifing or sub-sampling (Norberg et al. 2009; Friedrich et al. 2015; Mohammad & Percival 2022); a set of accurate numerical simulations that assumed an underlying cosmological model (Percival et al. 2014; Uhlemann et al. 2020); a theoretical covariance model (Schneider et al. 2002; Friedrich et al. 2021; Joachimi et al. 2021; Aghanim et al. 2020; Krause & Eifler 2017; Linke et al. 2023; Fang et al. 2023; Reeves et al. 2024), or some hybrid method combining these techniques (Friedrich & Eifler 2017; Hall & Taylor 2018)). Alternatively, covariance matrices computed from an analytical covariance model are noiseless and easily invertible, but they are only accurate given a good understanding of the statistical properties of the data. Ultimately, obtaining an analytic covariance may be an insurmountable task.
3 Methods
In the following subsections, we describe the data likelihood with which we ran our analyses, the SBI experiments, the data compression, and the normalising flow models we used for density estimation and hyperparameter optimisation.
3.1 Model
We ran an inference of cosmological parameters estimated from a measurement, drawn from a linearised model of the DES-Y3 cosmic-shear two-point function data vector, where noise is sampled from a fiducial covariance matrix. This data covariance is calculated with an analytic halo-based model from Krause & Eifler (2017). The linear model is a Taylor expansion of the full model around the fiducial point in parameter space π0, written as
(14)
The true data likelihood from which we sampled data in our experiments is written . We used a uniform prior on the parameters where Ωm ∈ [0.05, 0.55], σ8 ∈ [0.45, 1.0], and w0 ∈ [−1.4, −0.1]. The parameter set that maximises the likelihood (and therefore the posterior in this case) is given by Eq. (12).
The data vector consists of two-point functions of cosmicshear measurements (Schneider et al. 2002; Schneider 2006; Kilbinger 2015; Amon et al. 2022). The shear field is a map of coherent distortions from weak gravitational lensing of galaxy images in the large-scale structure matter distribution. The measurement is sensitive to the density fluctuations projected along the line of sight and weighted by the lens galaxy distribution. It is known to measure a degenerate combination of σ8 and the matter-density parameter Ωm.
Describing the experiments we ran.
3.2 Experimental setup
We ran a simple inference on a set of parameters from a noisy data vector with a linear compression. Note that since we assume a linear expectation value and constant covariance, this is equivalent to score compression (Alsing & Wandelt 2018, 2019). This experiment repeated when the data covariance is known and done separately when it is estimated from a set of ns simulations. When the data covariance is estimated, the linear compression increases the scatter in the estimated parameters. This affects the compression on the noisy data vector as well as the simulated data that were used to fit the normalising flow-density estimators. We also ran separate SBI experiments that use a neural network for the compression – this would be assumed to be a simple fix to the issues of covariance matrix estimation.
We measured the marginal uncertainty on the inferred parameters by calculating the variance of samples from the posteriors estimated with SBI conditioned on the noisy data vectors. We also measured the coverage of the posteriors for each data vector. Each analysis consists of separately applying neural-posterior estimation (NPE; Greenberg et al. 2019) and neural-likelihood estimation (NLE; Papamakarios et al. 2019) to fit a posterior when given a set of simulations and their model parameters (see Table 1 for a description of the separate analyses we ran). The total number of simulations available in the experiment mimics the number of available N-body simulations in a cosmological analysis. We repeated the NPE and NLE analyses 200 times for experiments where the data covariance is known and another 200 times for where it is estimated from simulations.
In one experiment, we sampled a set of ‘true’ parameters from a uniform prior π ~ p(π), initialised the density estimator parameters randomly, and generated a set of ns simulations. We did so either at the fiducial parameters, π, sampling , and calculated the sample covariance matrix S if Σ is unknown; or at a set of parameters sampled from the prior p(π) to train a neural network to compress our data. In the same experiment, we sampled physics parameters from a uniform prior
and generated ns simulations from the ns prior samples using the linearised model with noise sampled from the true covariance matrix, Σ. We did this to compress the simulations either with a linear compression
, parametrised by true expectation ξ[π], the true or estimated precision (Σ−1 or (hS)−1), and the theory derivatives ∂πξ[π]; or with a neural network trained on the first set of ns simulations and parameter pairs. Finally, we fitted a normalising flow to the set of compressed simulations and parameters
by maximising the log-probability (of the likelihood or posterior) with stochastic gradient descent, and we sampled the posterior given a measurement,
, compressed to,
, from the true data-generating likelihood.
This experiment is idealised in the following ways:
the measurement errors upon our data
were drawn from the same distribution from which the simulations used to fit the density estimation models were drawn;
there are no nuisance parameters in our modelling of the data to marginalise over;
the analytic compression of our data is (given enough simulations at the fiducial parameters) lossless, so the posterior given the summary is identical to the posterior given the data;
the true expectation value,
, lies in the parameter space, meaning there are parameters π such that
;
the Dodelson–Schneider correction factor (Eq. (5)) is exact given that the Fisher information of a Gaussian likelihood (with a model that is linear in the parameters) quantifies the posterior covariance.
Note that NLE requires a specific prior in order to calculate the posterior, whereas NPE implicitly uses the prior defined by the distribution of parameters that were used to generate the training data. We therefore adopted the priors used to sample the parameters (in the NLE analyses) to generate the training data of the flows and ensure both methods use the same prior and allow for a comparison of equivalent analyses.
The number of simulations ns input into an experiment using either NPE or NLE depends on the compression method being used. This is shown in Table 1 for reference. To compress our data, we used either a linear compression with , or
and, separately, a neural network fitted with a mean-squared error loss. Either option, except when Σ is known, requires 2ns simulations in total – ns for the covariance estimation, or neural-network training and ns for fitting the density estimator of the likelihood or posterior. The comparison of the posteriors from SBI should therefore be compared, for our Gaussian linear model, with a Gaussian-likelihood analysis using 2ns simulations.
Simulation-based inference methods fit the model, ξ[π], covariance, Σ, and likelihood shape simultaneously from simulations. In Appendix D, we use the same Gaussian linear model to test if fitting the expectation alone, with a known covariance, introduces significant uncertainty in the posterior as a function of ns. Since the sample’s mean and covariance are independent, this would only affect analyses for which , which is a concern for future experiments. However, this shows that the tests presented in this work are fair for SBI (the fitting of the model alongside the covariance has almost no effect), even for the lowest number of simulations we considered in our experiments. The experiments were run with eight CPU cores taking about 2–3 minutes for each, depending on if a neural network was trained for the data compression.
3.3 Compressing the data
For density estimation it is advantageous to reduce the dimensionality of the data. In the case of a linear compression (Tegmark et al. 1997; Heavens et al. 2000; Alsing & Wandelt 2018) one implicitly assumes a model to derive the statistics (Heavens et al. 2020), and the sampling distribution of the statistics is Gaussian only if the data errors are Gaussian. Neural-network-based summary statistics (Fluri et al. 2021; Kacprzak & Fluri 2022; Charnock et al. 2018; Prelogović & Mesinger 2024; Villanueva-Domingo & Villaescusa-Navarro 2022) can easily be fit to data (which are typically non-standard summary statistics), though they have no analytic likelihood for the summary given the input, and so extracting credible intervals is not currently possible, except with the use of SBI. Additionally, there is no guarantee that fitting a neural network to regress the model parameters of input data will produce an unbiased estimator of the parameters, since the MSE estimator is only the maximum-likelihood estimator for Gaussian-distributed data with unit covariance (Murphy 2022).
In the experiments in which or
, the data are linearly compressed via Eq. (12) to nπ summaries, so the normalising-flow-likelihood model is fitted to a Gaussian likelihood with a Fisher matrix; this is given by either FΣ or FS depending on whether the covariance is known or not. The noisy data covariance (as a function of ns) in our experiments limits the amount of information the normalising-flow posterior or likelihoods can extract about the model parameters. In the case that the precision matrix is known exactly and the model, ξ[π], is linear in the parameters, the linear compression conserves the information content of the data,
.
We also tested the use of a neural network, fψ, in compressing the data. The neural network consists of simple linear layers and non-linear activations. A simulation is input to the network, and the parameters of the network, ψ, are obtained by stochastic gradient descent of the mean-squared error loss:
(15)
Whilst many neural-network architectures exist for parametrising the compression function that can exploit symmetries in the data in different ways, they ultimately cannot invent or extract more information than a simple neural MLP network in this experiment: the optimal compression with either a noisy or the true data covariance is the best one can do. Given the Gaussian data likelihood in these experiments, the expectation value of the MAE loss is the same as that for the MSE loss for any neural network – given by .
One option that may help avoid the issues of covariance estimation would be to fit density estimators without compression of the simulations and data. This would, for the optimal and linear change of variables (e.g. in a normalising flow), still depend on the inverse of a data covariance that would be fitted from simulations.
3.4 Density estimation with normalizing flows
In order to derive posteriors from a measurement using density-estimation SBI methods, a density estimator was fitted to pairs of model parameters and simulated data to estimate the likelihood or posterior directly (Cranmer et al. 2020; Lueckmann et al. 2017; Papamakarios 2019; Alsing et al. 2018). Normalizing flows (Tabak & Vanden-Eijnden 2010; Tabak & Turner 2013; Jimenez Rezende & Mohamed 2016; Papamakarios et al. 2021) are a class of generative models that fit a sequence of bijective transformations from a simple base distribution to a complex data distribution. The transformation is estimated directly from the simulation and parameter pairs via minimising the KL-divergence between the unknown likelihood (posterior) and the flow likelihood (posterior). We used masked autoregressive flows (MAFs; Papamakarios et al. 2018) and continuous normalizing flows (CNFs; Grathwohl et al. 2018; Chen et al. 2019). We used CNFs in order to adopt some of the latest density estimation techniques from the machine-learning literature2. Two different models were used to validate and compare the performance of the density estimation of either flow.
Normalising flows transform data, x, to Gaussian-distributed samples, y. This mapping, when conditioned on physics parameters, π, and parameters, ϕ, of a neural network, fϕ, is written as
(16)
where y ~ 𝒢[y|0, 𝕀] and 𝕀 is the identity matrix. An exact log-likelihood estimate of the probability of a data point conditioned on physics parameters can be calculated with a normalising flow using a change of variables between y and x expressed as
(17)
where Jfϕ is the Jacobian of the normalising flow transform, fϕ, in Eq. (16).
These bijective transformations between two densities can be composed to produce more complex distributions by using separate transformations in a sequence. For a normalising flow with K transforms in sequence parametrised as
, the log-likelihood of the flow is written
(18)
We note that while it is common to use ensembles of density estimators together to diagnose the fit of the likelihood models (Alsing et al. 2018; Jeffrey et al. 2020; Gatti et al. 2024; Jeffrey et al. 2025), this should not be necessary for the simple Gaussian linear model we used here.
To obtain an approximate likelihood pϕ(π|x) (for NLE) or posterior model pϕ(x|π) (for NPE), we fitted a normalising flow to a set of simulations and parameters. The parameters of the normalizing flow model, ϕ, which maximise the log-likelihood pϕ(x|π) (or log-posterior), were obtained by minimising the forward KL-divergence between the unknown likelihood (posterior) q(x|pi) and the normalising-flow likelihood (posterior) pϕ(x|π).
The loss function for the normalising flow is then given by
(19)
Note that terms independent of ϕ are dropped since their derivative with respect to ϕ is zero. This implies the loss function for the normalising flow is given by
(20)
for NPE (since the derivation of Eq. (19) applies in the same way to the posterior). The CNF and MAF models we used for our experiments are described in detail in Appendix A.
Note that whilst diffusion (Song et al. 2021; Ho et al. 2020; Kingma et al. 2023) and flow-matching (Lipman et al. 2023) models avoid simulating the ODE-trajectory over t (e.g. from Gaussian variates to data samples), they inject noise at each time step to diffuse the data to noise. This stochastic process means that their density-estimation capabilities are limited in comparison to normalising flows due to the fact that diffusion only lower-bounds the log-likelihood of the data (Song et al. 2021; Ho et al. 2020; Kingma et al. 2023).
3.5 Architecture and fitting
Our CNF model has one hidden layer of eight hidden units with tanh(·) activation functions and an ODE solver time step of 0.1. We trained it using early stopping with a patience value of 40 epochs (Advani & Saxe 2017). The MAF models had five MADE transforms, which were parametrised by neural networks with two layers and 32 hidden units using tanh(·) activation functions. The MAFs used a patience value of 50 epochs. We used the ADAM (Kingma & Ba 2017) optimiser with a learning rate of 10−3 for a stochastic gradient descent of the negative log-likelihood loss with both density estimator models.
3.6 Hyperparameter optimisation
It is not computationally feasible in SBI analyses to optimise simultaneously for the architecture and parametrisation of a density estimation model when fitting the likelihood or posterior. Since the reconstructed posterior depends on these hyperparameters implicitly, we had to run our experiment separately to obtain the best settings when tested on data that were not part of the training set.
The parameters of the architecture and optimisation procedure are tuned by experimentation using optuna (Akiba et al. 2019) to find the parameters that minimise an optimisation function over repeated experiments where the true data covariance is known exactly. The experiment uses a further 104 data samples for training. The average log likelihood (or log posterior) of the flow was calculated on a separate test set of 104 simulation and parameter pairs. The architecture and training parameters that maximised this average log likelihood were chosen (see Appendix B for details of the hyperparameter optimisation).
4 Results
In the next section, we discuss the quantitative results of measuring the posterior widths (Sect. 4.1) and the coverage over repeated experiments (Sect. 4.2).
4.1 Posterior widths
Figure 2 shows the reconstructed posterior widths σ2[π] from repeated identical experiments as a function of the number of simulations ns used to fit the normalising flows (and where appropriate – estimating the data covariance or training a neural network for compression – which requires another ns simulations). The results for the other experiments (noted in Table 1) are shown in Appendix C.
The DS13 factor (Eq. (7)) sets the expected width of the scatter in the parameter estimators when both the model, ξ[π], and covariance, Σ, are determined by data. The factor is calculated for 2ns simulations (the size of the training set plus those used to estimate the data covariance that is used for the compression), with the number of bins in the data equal to the uncompressed data dimension of nξ = 450 bins. For each value of ns, the factor is plotted to compare the expected posterior width in a Gaussian-likelihood analysis – where the covariance is estimated from a set of simulations – to the SBI analyses. By default, every SBI analysis uses ns simulations to train the normalising flows.
The SBI analyses plotted for this comparison were repeated with four compressions. The first is a linear compression (Eq. (12)) in which the true covariance is known (e.g. = Σ). The second is a linear compression with the simulationestimated covariance
(calculated with ns simulations). The third is a linear compression with only the diagonal elements of this same covariance
. Last is a compression parametrised with a neural network that is trained on an additional ns simulations.
For the experiments where the true covariance is known, SBI can obtain posterior widths equal to the Fisher errors for all ns values as expected. The posterior widths for the compression converge closely to the DS13 factor at around 2 × 104 simulations (for the training set of the flows plus the covariance estimation), which means the flow models fit the correct likelihood shape. The
experiments (plotted with diamonds) return posterior widths that are less optimal, over all values of ns, compared to the marginal Fisher variances and the SBI analyses that use
. Despite the posterior width being significantly increased, for low values of ns of around a few hundred simulations, the posterior widths obtained when using
are below the Fisher variances expanded by the DS13 factor; that is, the posterior width when estimating the full covariance – for the same ns input to the experiment. This is simply due to the fact that the inversion of a diagonal matrix Sdiag. does not combine the noisy, off-diagonal elements (of the matrix S that is inverted) in a non-linear way. Nonetheless, the structure of the estimated covariance is incorrect, which inflates the variance of the summaries. This increases the marginal variances of the posteriors obtained by the flows.
The experiments involving a neural network for the compression function show a significantly different relation of the posterior width to the number of simulations compared to the and
experiments. The widths from the neural network compression experiments are not the same as any of the widths using linear compression since the network does not invert the true (or estimated) data covariance. Rather, if the network calculates a compression close to an optimal linear compression – it is possible that the network down-weights the noisier data-vector elements to minimise the MSE loss for a smaller number of simulations. This is not the same as the inverse-variance weighting of the linear compression in Eq. (12). Curiously, this effect – within the regime of ns ≤ nξ for a given value of nπ – allows one to obtain an average posterior width that is smaller than that of a posterior using an estimated covariance adjusted with the DS13 factor (calculated with 2ns simulations), despite the fact that the covariance is not known. How optimal the summary by a neural network is strongly depends on the training hyperparameters, optimisation method, and choice of loss function – though quantifying the information content is only possible with an analytic posterior such as in this work.
![]() |
Fig. 2 Average model parameter posterior variance reported by methods compared in this work, conditioned on noisy data vectors estimated with neural-likelihood estimation using a masked autoregressive flow. The colour-coded dashed lines show the (Fisher) variances that would have been measured if the exact data covariance had been known and used in a Gaussian likelihood ansatz with a flat prior. Cross points label posterior variances from SBI analyses where the exact data covariance was known for linear compression (Eq. (12)). Dotted lines show the expected variances of the maximum a posteriori that would have been measured when using a data covariance estimated from a set of ns simulations. Note that these lines multiply the Fisher variance with the factor (Eq. (7)) Dodelson & Schneider (2013) calculated using 2 × ns simulations. Circle points label posterior variances from SBI analyses where the data covariance was estimated from ns simulations and used in a linear compression. Diamond points label posterior variances obtained when using a compression with the diagonal elements of the estimated covariance matrix. Square points show variances obtained using a neural network for the compression, trained on ns simulations. All points include a 1-σ error bar on the variances estimated over all of the experiments. The additional simulations, not labelled on the x-axis but required for the separate compressions (where the true covariance is unknown), are noted for each method. When the true data covariance is not known, necessitating the use of double the number of simulations, the reconstructed posterior errors from SBI are significantly higher than the Dodelson & Schneider (2013) corrected errors when that correction is substantial. |
![]() |
Fig. 3 Coverage Fω; i.e. how often the true cosmology in the experiment is found inside the 68 (1 − σ) and 95 (2 − σ) credible regions of the estimated posterior (see Eq. (23)) against the number of simulations ns used for the training set. Shown is the result for a neural-likelihood estimation (with a MAF model) for independently sampled data vectors and data covariance matrices in a series of repeated experiments with the number of simulations for each experiment on the horizontal axis. The expected coverage of a Gaussian posterior with a debiased estimate of the precision matrix (using the Hartlap correction; Eq. (3)) and posterior covariance corrected with the Dodelson & Schneider (2013) factor is plotted for both coverage intervals with dashed lines. The grey lines show the expected coverage of the common approach using a Gaussian posterior with a precision matrix corrected applying the Hartlap factor (i.e. using the IJ prior). The additional simulations, not labelled on the x-axis but required for the separate compressions (where the true covariance is unknown), are noted for each method. The SBI posteriors obtain the correct coverage to within errors for all numbers of simulations ns and each compression method. |
4.2 Coverage
The expected coverage probability of the SBI posterior estimators measures the proportion of repeated identical experiments where a credible region of the posterior estimator contains the true parameter set used to generate the data vector. The coverage probability quantifies how conservative or overconfident the posterior estimator is compared to the true posterior.
We define Fω as the fraction of experiments where the true cosmology π is inside the 68.3% (ω = 0.68) or 95.4% (ω = 0.95) confidence contour around the MAP . The fraction of J posterior samples with posterior probability under the SBI estimator less than that of the true data-generating parameters πi for the i-th experiment is the empirical coverage probability of the i-th posterior; this is written as
(22)
where is the indicator function and πj is the j-th posterior sample from the i-th posterior. The fraction of experiments that obtain a coverage probability ω is calculated as
(23)
over a set of independent but identical experiments with independently sampled data, , data covariance matrices, S, and density estimator model parameters, ϕi. If the true covariance is known, then Fω should be equal to 0.68 for the 1 − σ region, or to 0.95 for the 2 − σ region if the data are sampled from the true likelihood and the posterior estimator has converged.
For the NLE experiments using an MAF, Figure 3 (see Appendix C for the remaining experiments in Table 1) shows the coverages measured over the repeated experiments. Either density estimation method obtains the correct coverage to within errors, regardless of the model used and whether the data covariance is known or not. This suggests that the normalising-flow likelihoods correctly expand their contours to account for the MAP scatter when using an estimated covariance or a trained neural network for compression, which would be expected given the hierarchical modelling by the normalising flow of the functional form of the likelihood (including the expectation and covariance) used to calculate the other likelihoods.
It should be noted that, as found in Friedrich & Eifler (2017) and Percival et al. (2021) for the same nξ and nπ as in our experiments, the use of the independence Jeffreys prior on true covariance matrix in Sellentin & Heavens (2015) gives a posterior with a model parameter covariance that matches a Gaussian posterior after scaling the data covariance matrix by the Hartlap factor and applying the factor from Dodelson & Schneider (2013) (Eq. (5)). Importantly, however, this posterior does not take into account the Dodelson–Schneider factor. Hence, in Fig. 3 (see also Appendix C) we plot the analytic expected coverage for the Gaussian posterior with a scaled parameter variance and debiased precision matrix, accounting for the simulations used to estimate the covariance as well as the training-set simulations. The SBI methods are both able to obtain the correct coverage for all ns values, regardless of whether the true covariance is known or not.
5 Discussion
Based on the results of this work – the widths of the SBI posteriors and their coverage fractions measured over many repeated experiments – the good news is that SBI functions as well as covariance estimation methods possibly can. The bad news is that the number of simulations required to obtain errors close to the true posterior variance exceeds the computational budget of existing simulation suites, even for data of modest dimensions from Gaussian linear models in which the true expectation, ξ[π], is known. The coverages and posterior widths for SBI presented in this work show that SBI – for modest nξ and nπ – does not obtain smaller widths than a Gaussian-likelihood analysis with access to an accurate model ξ[π].
When considering the posterior widths obtained with SBI – using both NLE and NPE – there is a discrepancy between each compression method. Within the limits of a low number of simulations, the linear compression parametrised with the diagonal elements of a simulation-estimated covariance, denoted , is closer to optimal (i.e. a smaller posterior width) in our example than a neural network and the
compression (for lower numbers of simulations, ns). This is because Sdiag. correctly (though limited by ns) estimates the variances of the data but not the covariances, thus reducing the noise propagated by the additional off-diagonal elements that would be estimated with S. This shows that the increase in posterior width obtained by ignoring the off-diagonal elements of the data covariance (using
) is smaller than the increase in width when estimating the off-diagonal elements (using
).
The posterior variance, when using a neural network for the compression, should be between and
since the assumption that all of the data-vector components are independent is very strong. Ultimately, this depends on the covariance structure of the statistic at hand. How optimal each of the compressions that we tested is with respect to the others is not fixed in order. The effects of one particularly unfavourable covariance structure are examined in Appendix F. The fact that the posterior variances derived with neural-network summaries follow – though are biased above – the Fisher variances (corrected for an estimated covariance via the DS13 factor) stems from the fact that the neural network does not invert any covariance; therefore, it cannot be biased by minimising the incorrect likelihood in the same way as the
compression. This will depend on the covariance structure of the data. Analysing a DS13-like effect for neural networks remains an objective for future work. Nonetheless, the mean-squared error loss, when minimised in training, does not guarantee an unbiased estimator with the correct variance. It is not clear exactly how the network compression affects the resulting posterior width when optimised with stochastic gradient descent (for low ns); this problem would not be detected in an analysis that is not directly comparable to an analytic posterior, in particular for the cases in which SBI is needed most.
Our results, based on commonly adopted compression methods, show a concerning inflation of the posterior width compared to the true posterior. This is particularly true for low-ns values, which are comparable to existing simulation budgets available to current analyses and where the widths are greater than those inflated by accounting for an unknown data covariance via the DS13 factor. One exception is the case where the compression only estimates the diagonal elements of true data covariance. It is possible that current analyses based on SBI methods fall in a regime of ns, nξ and nπ in which posteriors may only be derived with SBI via compression using a neural network since the data covariance is singular for ns < nξ. This shrouds the amount of information that is lost, because the comparison with a linear compression cannot be made, but it does not change the fact that for ns − nξ < nξ − nπ any compression at low ns is severely sub-optimal. Whilst this is true for our linear model and Gaussian errors, the problem will likely be worse for non-Gaussian errors and non-linear models; i.e. for applications that require SBI. As shown in our posterior width results (Sect. 4.1), the neural network can progress the information return relatively to the posterior obtained with a simulation-estimated covariance at a given low value of ns. Despite this improvement, the posterior errors are inflated by a factor of between two and four relatively to the true posterior – which increases significantly as a function of nξ (see Eq. (7)) under the assumption that the covariance structure is fixed for increasing nξ. However, if the data covariance structure stays fixed with increasing nξ, the optimality of the neural-network summary may be constant, whereas the DS13 factor would increase, thus decreasing the information in linear summaries using a simulation-estimated covariance compared to those from a neural network.
The structure of the data covariance significantly affects the information content in summaries from a linear compression (with an estimated covariance) compared to those from a neural network. In Appendix F, we display results of NPE experiments using an MAF density estimator repeated with a data covariance, Σr, that has large off-diagonal elements (unlike the covariance Σ used in the other sections of this work). For a covariance with strong correlations between elements in the data vector, a neural network compression is far less optimal – due to the network being ignorant of the data errors – compared to a linear compression with an estimated data covariance of . The same holds for a linear compression-parametrised
. (which minimises the correct χ2 using the correct model ξ[π], but given the wrong data covariance). This highlights the fact that the issue of covariance matrix estimation is not alleviated by data compression. For some statistics with covariances that are unfavourable in this way, the covariance estimation effects on the posterior widths are significant. Using a neural network to compress the data – so as to avoid the estimation of the covariance – not only not reduces the posterior width relatively to the DS13 factor, it actually substantially deteriorates the returned parameter constraints.
In the posterior widths of Fig. 2 (see also Appendix C), there is a shift towards lower variance for the reported widths. It should be noted that, as with other machine-learning approaches, the posterior density estimators in the NPE experiments absorb the prior defined by the training set. This requires us to force the same prior for the NLE experiments to obtain a direct comparison between the two approaches. That said, the NLE posteriors (for all compression methods) show a bias to slightly lower posterior widths, which can be seen by comparing the points to the Fisher variance (in Fig. 2 and Appendix C). For the NLE estimators, the likelihood function is more biased when the posterior width is low. This is because, similarly to the NPE estimator, the data likelihood is also informed by the prior from which the simulations used to fit the normalising flows are drawn. It should also be noted that some of the posteriors for NPE and NLE are truncated near the prior edges. This occurs more for lower ns since the estimated covariance causes additional scatter of the MAP – around which the contours are drawn – towards the prior edges.
Compared to analytic methods for either deriving covariance estimators or posteriors that account for a noisy covariance, SBI returns posteriors with correct coverage but larger errors. The posterior errors tend towards the Fisher errors at around 4 × 104 simulations – for compression and likelihood fitting – meaning that the SBI methods can correctly recover the true posterior given a noisy data vector, though this will depend on the size of the data vector. Comparing the results to the PME estimator (Friedrich & Eifler 2017), we see that SBI density estimation requires many more simulations to obtain a similar error width and coverage. For a DES-like data vector, Friedrich & Eifler (2017) found that 400 N-body simulations are sufficient to achieve negligible additional statistical uncertainties on parameter constraints from a noisy covariance estimate. Our results show that SBI appears to require many more simulations than the PME estimator. This number will be far greater for LSST and other next-generation surveys – which will also increase further in the presence of nuisance parameters.
Whilst interpreting our results, we noted that the DS13 correction term inflates the posterior covariance for a Gaussian likelihood and linear model for the expectation, whereas in density-estimation SBI approaches, the contour is drawn by a generative model. This estimator for the likelihood shape may suffer from an additional DS13-like term such that the contours drawn in the SBI analyses are inflated on average with respect to the Dodelson–Schneider factor for half the number of simulations (2ns simulations are required for compression and fitting the flow when Σ is unknown). In addition, the scatter of the posterior mean with respect to the true parameters may be biased above or below the DS13 scatter of the MAP around the truth. The unknown form of the flow likelihoods may contribute an additional scatter of the posterior mean ⟨π⟩π|xi to the true parameters π. In Appendix E, we measure the scatter of the SBI posterior mean around the true parameters. We fixed the true parameters at values in the centre of our prior to minimise effects from it that would affect the results of the SBI posterior widths and coverages. There appears to be some minimal scatter for ns < 2 × 103, which includes simulations to estimate as well as train the posterior or likelihood estimator.
6 Conclusions
The main result of our paper is illustrated by the combinations of measured posterior coverages with SBI methods (shown in Fig. 3) and measured posterior widths (shown in Fig. 2) for density-estimation SBI methods. Appendix C shows further results from the other experiments listed in Table 1.
These plots show that the SBI methods do not obtain the expected posterior widths from the analytical prescription of Dodelson & Schneider (2013) and Percival et al. (2021) when double the number of simulations are added to the experiment. Whilst the methods easily obtain the correct posterior errors σ2[π] and coverages at the 1− and 2 − σ levels when the covariance is known, the posterior errors are significantly higher than the expected Dodelson–Schneider-corrected errors (Eq. (5)) for a Gaussian posterior when the data covariance is estimated from an additional set of simulations (Percival et al. 2014). This effect is worst for low numbers of simulations – less than 2 × 103 in total – and it will only worsen with the deluge of new higher dimensional survey data. This implies that the application of current SBI methods to complex, non-Gaussian and non-linear statistics (or their combinations) with additional nuisance parameters may be premature for realistic analyses. Furthermore, were SBI to be used in place of an analytic likelihood, we would not expect the parameter constraints to be stronger for low numbers of simulations based on our results.
We note that the number of simulations required to obtain the correct posterior errors and coverage is of the same order as the number of simulations provided by simulation suites dedicated to machine-learning analyses such as Quijote (Villaescusa-Navarro et al. 2020), CAMELS (Villaescusa-Navarro et al. 2021), and CosmoGrid (Kacprzak et al. 2022). The exact requirement depends on the survey volume in which the statistic is measured. This number of simulations will need to increase to model the covariance, for a realistic volume, of statistics measured in these simulations in order to keep errors within limits required for next-generation surveys.
To summarise, we find that
posterior estimators from density-estimation SBI methods do not obtain smaller errors than the analytical solution for a Gaussian-data-likelihood ansatz with a simulation-estimated covariance when
the data are drawn from a Gaussian linear model,
the compression is parametrised by a simulation-estimated data covariance,
an accurate model for the expectation value is used,
and a number of simulations suitable for the analysis of current survey data are used (the number of which inevitably increases for future survey data).
Regardless, these methods
allow us to obtain the correct expected posterior coverage,
correctly estimate the likelihood and posterior shape in the limit of a large number of simulations, and
do not show significant scatter of the posterior mean (aside from the DS13 effect). However, there is an inflation of the contours (in addition to the DS13 affect) that is not due to density estimation with SBI, but due to SBI’s need for data compression, which in itself will be noisy since the
experiments obtain the true posterior widths for all ns.
In particular,
SBI ‘knows’ to expand its posterior contour due to the uncertainty in the estimated data covariance that dilutes the parameter constraints – whilst calibrating the coverages – derived with such methods, and
the errors from SBI methods are significantly larger than the DS13 inflation of errors (in a standard Gaussian-likelihood analysis) for of order a few thousand simulations when estimating Σ and simulating a training set.
It should be noted that the results of the analyses depend on us being able to optimise the density estimators with another set of 104 independent simulations to obtain the hyperparameters of the architecture and training procedure. The set must be independent of the training and validation sets, because these sets are used to fit the models and stop their training to avoid overfitting. The linear compression in our experiments would be less optimal for non-Gaussian statistics, further increasing the simulation budget for analyses to reduce the posterior errors. Also, the posterior widths, estimated with SBI when a neural network is used for the data compression, can be smaller than the DS13 posterior widths. This is only for a small number of simulations and in the regime where the errors are already much larger than the Fisher errors.
Despite SBI’s promise to seamlessly model combinations of complementary summary statistics, it is not clear based on current SBI methods and density-estimation techniques, that more information upon physical parameters of cosmological models can be reliably extracted with the computational resources available to generate the simulations that are required to fit these models. This a significant problem in particular because of the cross-correlations that must be modelled with different probes of a large-scale structure. Typically, large cross-correlations exist between probes in different tomographic bins; for example, with lensing peaks (Davies et al. 2022) and between smoothing scales in the matter PDF (Uhlemann et al. 2020).
For data from next-generation surveys and analyses that use combinations of probes, it is critical to model their correlated signals and systematics to extract as much information as possible from statistics measured at all scales (Krause & Eifler 2017; 2pt Collaboration 2024; Reeves et al. 2024). SBI methods are a promising set of tools that do obtain correctly calibrated posteriors, given enough simulations and scale favourably with computations compared to traditional MCMC methods. Despite this, the posteriors obtained by density-estimation SBI methods are significantly wider than the naive expectation – given a noisy compression – where no form for the likelihood is assumed and a low number of simulations are used to estimate the data covariance. This inflation of the posterior errors is significant given the typical simulation budgets available for analyses at present. This problem is not alleviated given the optimal linear compression: it remains (and is shifted to either side of the analysis) if the data covariance is not known. Furthermore, depending on the covariance structure, the compression may be insurmountable for neural-network-based compressions.
Acknowledgements
We would like to thank Cora Uhlemann, Kai Lehman, Sven Krippendorf, and Sankarshana Srinivasan for useful discussions. This work was supported by funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2094 – 390783311. O.F. was supported by a Fraunhofer-Schwarzschild-Fellowship at Universitäts Sternwarte München.
Appendix A Normalizing flows: Continuous and masked autoregressive flows
Here we describe two methods we choose to implement normalizing flows that fit either the likelihood or the posterior from simulations and parameters.
A.1 Masked autoregressive flows
To calculate the log-likelihood of the data given the parameters (Eq. (18)) MAFs (Papamakarios et al. 2018) stack a sequence of bijective transformations built using Masked autoencoders (MADEs; Germain et al. 2015). A MADE ‘encodes’ an input to a Gaussian distributed variable. The encoding uses a componentwise affine transformation so that the likelihood of the data factorises into a product of Gaussians for each component of the input.
The affine transformation parameters (a mean and a variance) for each component of the datavector are calculated autoregressively, modelling the PDF of each factorised component as a Gaussian being conditional on the other components preceding it. The autoregressive factorisation of the likelihood is expressed as
(A.1)
where each conditional distribution for each dimension xd depends only on part of the input x<d, and not on any other dimensions d′ where d < d′ ≤ D. If this was not the case, the conditionals would not satisfy the product rule of probability, and a log-likelihood estimate would not be possible.
The MADE network fϕ(x; π) ensures that the conditionals are correctly satisfied by masking inputs through the hidden layers. This ensures the output nodes, that parametrise the bijection, are only dependent on the input nodes dictated by the autoregressive factorisation. Since the autoregressive property above is implemented via a product of individual Gaussians, the Jacobian is given by the derivative of each of the output nodes with respect to the input data;
(A.2)
This shows that the Jacobian matrix is a triangular matrix since the derivative is only non-zero for i = j and j < i terms. This implies that the Jacobian is triangular, meaning that the determinant is simply the product of the diagonal entries, which for the MADE is given by . The sequence of MADEs used to build the MAF each have random ordering in the autoregressive factorisations of the data likelihood. The determinant of the Jacobian for this sequence of MADEs in the MAF is given by
where K is the number of individual MADE networks.
A.2 Continuous normalizing flows
Parametrising a flow without complex modelling of the Jacobian (Dinh et al. 2017; Durkan et al. 2019; Kingma & Dhariwal 2018) can be done using neural ordinary differential equations (Chen et al. 2019; Kidger 2021), which model the output of an infinitely deep neural network as the solution to an ordinary differential equation (ODE). As a normalizing flow this ODE maps a sample x from the unknown data distribution to a sample y from a multivariate Gaussian distribution. The path from x to y is parametrised by a ‘time’ variable t.
The Neural-ODE and its input is written as
(A.3)
We may solve an initial-value problem with this first-order ODE to obtain the state at a later time T, denoted y(T). With an initial value y(0) = x, this is written as
(A.4)
and so a numerical solver can estimate the forward pass of the infinitely deep network which maps a data point x to a latent y given model parameters π. Here we denote a differential equation solver algorithm as ‘ODESolve’ which in practise is a standard numerical solver.
The change in log-density from the base distribution to the unknown log-likelihood is calculated by solving another differential equation, known as the instantaneous change of variables (Chen et al. 2019; Grathwohl et al. 2018):
(A.5)
which can be seen as an instance of the Fokker-Planck equation for the time evolution of a random process, where in this case the diffusion coefficient is zero (known as the Liouville equation; (Chen et al. 2019; Kidger 2021). This gives the change in log-density when integrated as
(A.6)
To calculate the likelihood of a sample x given physics parameters π, an initial value problem is solved. The solution is
(A.7)
given the initial values
(A.8)
Which qualitatively amounts to mapping a data point to a Gaussian sample whilst calculating the Jacobian along the path. The model is trained via the same maximum-likelihood training described in Sect. 3 by obtaining the ODE solutions that maximise the log-probability of the data given the model parameters following Eqs. (A.8) and (A.7). The solutions can be differentiated due to the implementation of the solver in jax (Bradbury et al. 2018; Kidger 2021).
Appendix B Obtaining the best architecture
We choose the best architecture by sampling a wide variety of architectures and training hyperparameters (batch size, early-stopping patience and learning rate) for individual fits to the likelihood or posterior using MAFs or CNFs. We select the best models by repeating the same experiment with a training set of 104 simulations and an independent test set of 104 simulations to estimate the log-likelihood of the flow.
The hyperparameters of a likelihood or posterior fit with a CNF in a given experiment consists of parameters for the architecture of the flow model and the optimisation procedure. The parameters for the CNF architecture are
the number of hidden units in the network layers H ∈ [8, 16, 32, 64],
and the ODE solver time-step width dt ∈ [0.1, 0.05, 0.01].
and for the MAF architecture they are
the number of layers in the flow network L ∈ [1, 2, 3],
the number of hidden units in the network layers H ∈ [8, 16, 32, 64],
and the number of transforms (each of which use a network) K ∈ [1, 2, 3, 4, 5].
The parameters for the training optimisation are
the number of simulations and parameters in each batch B ∈ [40, 50, 60, 70, 80, 90, 100],
the learning rate of the optimiser η ∈ [10−5, ..., 10−3] (logarithmic spacing),
and the ‘patience’, the number of epochs required without a decreasing loss before manually stopping the optimisation, p ∈ [10, 20, 30, 40, 50].
We tested two optimisation functions over the repeated experiments. The first was the average flow log-likelihood on a separate test set of 104 simulation and parameter pairs. The architecture and training parameters that maximised this average log-likelihood were chosen and this is the standard method in density estimation SBI analyses in cosmology. The second function was a direct calculation of the KL-divergence between the true analytic Gaussian posterior and the reconstructed posteriors over a set of J = 100 independently sampled datavectors with I = 2000 posterior samples each. The estimated KL divergence is expressed as
(B.1)
Unfortunately this loss calculation is only feasible for the NPE experiments as the posterior samples for NLE require the use of an MCMC sampler on the flow log-likelihood and parameter prior. In practice, we found that for our experiments either optimisation objectives resulted in similar architectures. Optimisation with cross-validations of the validation loss approach was also tested, where independent training and test sets are used to obtain the validation loss. The validation losses from each validation are averaged and this is used as the optimisation function for the hyperparameters. This made no significant difference to the results.
Appendix C Results of additional experiments
Here we show the results of the remaining experiments (noted in Table 1) that are described in Sect. 4 but not plotted in the main text. In Fig. C.1, we show the marginal variances of the SBI posteriors against the number of simulations ns, for the experiments of NLE with a CNF, NPE with a CNF and NPE with a MAF flow. In Fig. C.2 we show the coverages of the posteriors from the same experiments. Figures C.3 and C.4 show sets of posteriors derived with varying numbers of simulations ns for the NPE and NLE methods respectively, using either flow model type. The contours between the NPE, NLE are consistent for each value of ns as well as with each posterior corrected with the DS13 factor for the value of ns for each posterior. The MAPs for these experiments, for each panel, scatter considerably for ns = 1000 and ns = 4000 when an estimated covariance is used.
![]() |
Fig. C.1 Similar to Fig. 3, showing model parameter posterior variance, conditioned on noisy datavectors, estimated with the SBI (the method and normalizing flow model used in each experiment is listed in each panel). The variances between NLE and NPE for each compression are similar except for the small underestimation of the variance in the NLE experiments, due to the likelihood being additionally informed by the prior. The additional simulations, not labeled on the x-axis, but required for the separate compressions (where the true covariance is unknown) are noted for each method. When the true data covariance is not known, requiring the use of double the number of simulations, the reconstructed posterior errors from SBI are significantly higher than the Dodelson & Schneider (2013) corrected errors for less than 4 × 103 simulations. |
![]() |
Fig. C.2 Coverage Fω, i.e. how often the true cosmology in the experiment is found inside the 68 (1 − σ) and 95 (2 − σ)) credible regions of the estimated posterior (see Eq. 23) against the number of simulations ns used for the training set. Shown here are the results for Neural Posterior Estimation (separately with a MAF and CNF model) and Neural Likelihood Estimation (with a CNF model) for independently sampled data vectors and data covariance matrices in a series of repeated experiments with the number of simulations for each experiment on the horizontal axis. The expected coverage of a Gaussian posterior with a debiased estimate of the precision matrix (using the Hartlap correction, Eq. 3) and posterior covariance corrected with the Dodelson & Schneider (2013) factor is plotted for both coverage intervals with dashed lines. The grey-toned lines show the expected coverage of the common approach using a Gaussian posterior with a precision matrix corrected by applying the Hartlap factor. The additional simulations, not labelled on the x-axis, but required for the separate compressions (where the true covariance is unknown) are noted for each method. The SBI posteriors obtain the correct coverage to within errors for all numbers of simulations ns and each compression method. |
![]() |
Fig. C.3 SBI posteriors (red for true covariance |
![]() |
Fig. C.4 SBI posteriors (red for |
Appendix D Fitting the expectation versus the covariance
The results of Dodelson & Schneider (2013) and Percival et al. (2021) depend on the assumption of a linear model ξ[π] with Gaussian errors - in particular when the expectation value is known exactly. For the SBI methods in this work that estimate the likelihood or posterior; the expectation, covariance and likelihood shape are all fit from simulations. To isolate the contribution to the posterior width from not knowing the expectation, we fit a model to a noisy datavector where the parameters and the expectation are not known in a Gaussian likelihood analysis. We estimate the mean from a set of simulations and use it to parametrise a Gaussian likelihood with the true covariance matrix (and the same prior used throughout this work). We sample the posterior with a MCMC sampler. The results are shown in Fig. D.1. This shows that the error contribution is noticeable but far less than that from the noise in the covariance estimate for the same number of simulations. Since the sample mean and covariance are independent (Anderson 2003) this test is sufficient to show that the error contribution to the SBI posterior - due to an unknown expectation - does not contribute significantly to the posterior width. This is true for all values of ns we consider in this work. The increase in width compared to when the covariance is known is due to the unknown data covariance.
![]() |
Fig. D.1 Posteriors derived from traditional likelihood analyses using a Gaussian likelihood with the true data covariance matrix, where the expectation ξ[π] is estimated from ns simulations. The posterior samples are obtained from MCMC sampling the analytic posterior. The same prior used in the experiments for this work where the model for the expectation is fit to data alongside the parameters and the data covariance is known. This shows that the error contribution to the posterior from an unknown model is much less than that due to the data covariance being estimated from the same number of simulations. |
Appendix E Scatter of posterior means against true parameters
In Fig. E.1 we show, for all the experiments described in the main text, the scatter of the SBI-posterior means against the true parameters. This is important to measure because whilst the target posterior is a multivariate Gaussian, the SBI posterior is by no means a multivariate Gaussian itself. This means that the posterior mean and the MAP do not necessarily coincide at the same point in parameter space. We measure the squared-difference of the posterior mean against the true parameters as a function of the number of simulations ns input into the experiment over all the experiments run at each ns value. The scatter (⟨π⟩ − π2, if greater than the posterior variance dictated by DS13, suggests that density estimation methods suffer from an additional effect that is not explained simply by the DS13 scattering.
![]() |
Fig. E.1 Scatter in the posterior mean of SBI with respect to the true parameters. The true parameters are fixed to the fiducial parameters π0 at approximately the centre of our prior to reduce truncation of the posterior at the prior boundaries which is more significant at lower ns (see Eq. 7). The bias in the Sdiag. estimated covariance is due to ignoring the cross correlations in the datavector - this applies similarly to the neural network (labelled ‘NN’) variances. The similarity of the points in these plots to the plots of the marginal variances of the SBI posteriors against the number of simulations shows that the posterior mean does not scatter significantly more than the MAP. |
Appendix F Experiments with an undesirable covariance matrix
As discussed in Sect. 4, the structure of the data covariance Σ has a significant effect on the reconstructed posterior widths. The effect can be seen in the posterior widths (shown in Fig. 3 and Appendix C) for SBI posterior estimators (either NLE or NPE) fit to summaries from a linear compression using or a neural network. This is a distinct effect from the DS13 inflation of posterior contours - which depends on the number of simulations ns at fixed nξ - which is caused by the noise in an estimated data covariance. If the covariance structure is such that there are large off-diagonal elements the benefit of a compression with either of these methods vanishes. This is because both methods ignore large cross-correlation elements in the true covariance.
To illustrate the effects of such a covariance on the compression methods (and therefore the posteriors derived with SBI) we create a covariance matrix Σr with large cross correlations between neighbouring elements in the datavector. The matrix has elements
(F.1)
where r ∈ [0, 1) is a coefficient that controls the correlation of the datavector components (when this covariance is used to generate data as in our experiments, i.e. as the ‘true’ covariance). As can be seen in Fig. F.1 the nature of this covariance means that the benefit of using a neural network (or simply estimating the diagonal elements Sdiag. of the covariance for a linear compression) vanishes with the posterior widths being far larger than the DS13 corrected variances for a sampled covariance matrix that estimates all of the elements. The bias is not present in the posterior widths from SBI when the full covariance is estimated for a linear compression - the DS13 inflation of errors persists.
![]() |
Fig. F.1 Average model parameter posterior variance conditioned on noisy datavectors estimated with Neural Posterior Estimation using a masked autoregressive flow. The same marginal Fisher variances, DS13 factors and compression methods are used for this plot as in Fig. 2. These results depend on using Σr as the true covariance (see Sect. 5) for the data generating process. When the true data covariance is not known and has significant non-diagonal elements (r = 0.2; see Appendix F), the compression using either a neural network fψ or an estimate of the diagonal elements Sdiag. of the covariance in a linear compression fails catastrophically. |
References
- 2pt Collaboration, Krause, E., Kobayashi, Y., et al. 2024, A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering Statistics [Google Scholar]
- Advani, M. S., & Saxe, A. M. 2017, arXiv e-prints [arXiv:1710.03667] [Google Scholar]
- Aghanim, N., Akrami, Y., Ashdown, M., et al. 2020, A&A, 641, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Akeret, J., Refregier, A., Amara, A., et al. 2015, JCAP, 2015, 043 [CrossRef] [Google Scholar]
- Akiba, T., Sano, S., Yanase, T., et al. 2019, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2623 [Google Scholar]
- Alsing, J., & Wandelt, B. 2018, MNRAS, 476, L60 [NASA ADS] [CrossRef] [Google Scholar]
- Alsing, J., & Wandelt, B. 2019, MNRAS, 488, 5093 [NASA ADS] [CrossRef] [Google Scholar]
- Alsing, J., Wandelt, B., & Feeney, S. 2018, MNRAS, 477, 2874 [NASA ADS] [CrossRef] [Google Scholar]
- Amon, A., Gruen, D., Troxel, M. A., et al. 2022, Phys. Rev. D, 105, 023514 [NASA ADS] [CrossRef] [Google Scholar]
- Anderson, T. 2003, An Introduction To Multivariate Statistical Analysis (Wiley) [Google Scholar]
- Bradbury, J., Frostig, R., Hawkins, P., et al. 2018, JAX: composable transformations of Python+NumPy programs, http://github.com/google/jax [Google Scholar]
- Charnock, T., Lavaux, G., & Wandelt, B. D. 2018, Phys. Rev. D, 97, 083004 [NASA ADS] [CrossRef] [Google Scholar]
- Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. 2019, arXiv e-prints [arXiv:1806.07366] [Google Scholar]
- Cole, A., Miller, B. K., Witte, S. J., et al. 2022, JCAP, 2022, 004 [CrossRef] [Google Scholar]
- Contarini, S., Pisani, A., Hamaus, N., et al. 2023, ApJ, 953, 46 [NASA ADS] [CrossRef] [Google Scholar]
- Cranmer, K., Pavez, J., & Louppe, G. 2016, arXiv e-prints [arXiv:1506.02169] [Google Scholar]
- Cranmer, K., Brehmer, J., & Louppe, G. 2020, PNAS, 117, 30055 [NASA ADS] [CrossRef] [Google Scholar]
- Dark Energy Survey Collaboration (Abbott, T., et al.) 2016, MNRAS, 460, 1270 [Google Scholar]
- Davies, C. T., Cautun, M., Giblin, B., et al. 2022, MNRAS, 513, 4729 [Google Scholar]
- de Jong, R. S., Agertz, O., Berbel, A. A., et al. 2019, The Messenger, 175, 3 [NASA ADS] [Google Scholar]
- Dinh, L., Sohl-Dickstein, J., & Bengio, S. 2017, arXiv e-prints [arXiv:1605.08803] [Google Scholar]
- Dodelson, S., & Schneider, M. D. 2013, Phys. Rev. D, 88, 063537 [Google Scholar]
- Durkan, C., Bekasov, A., Murray, I., et al. 2019, arXiv e-prints [arXiv:1906.04032] [Google Scholar]
- Eifler, T., Miyatake, H., Krause, E., et al. 2021, MNRAS, 507, 1746 [NASA ADS] [CrossRef] [Google Scholar]
- Fang, X., Krause, E., Eifler, T., et al. 2023, MNRAS, 527, 9581–9593 [NASA ADS] [CrossRef] [Google Scholar]
- Fluri, J., Kacprzak, T., Refregier, A., Lucchi, A., & Hofmann, T. 2021, Phys. Rev. D, 104, 123526 [NASA ADS] [CrossRef] [Google Scholar]
- Friedrich, O., & Eifler, T. 2017, MNRAS, 473, 4150 [Google Scholar]
- Friedrich, O., Seitz, S., Eifler, T. F., & Gruen, D. 2015, MNRAS, 456, 2662 [Google Scholar]
- Friedrich, O., Gruen, D., DeRose, J., et al. 2018, Phys. Rev. D, 98, 023508 [Google Scholar]
- Friedrich, O., Andrade-Oliveira, F., Camacho, H., et al. 2021, MNRAS, 508, 3125 [NASA ADS] [CrossRef] [Google Scholar]
- Gatti, M., Jeffrey, N., Whiteway, L., et al. 2024, Phys. Rev. D, 109, 063534 [NASA ADS] [CrossRef] [Google Scholar]
- Gerardi, F., Feeney, S. M., & Alsing, J. 2021, Phys. Rev. D, 104, 083531 [NASA ADS] [CrossRef] [Google Scholar]
- Germain, M., Gregor, K., Murray, I., & Larochelle, H. 2015, arXiv e-prints [arXiv:1502.03509] [Google Scholar]
- Glöckler, M., Deistler, M., & Macke, J. H. 2022, arXiv e-prints [arXiv:2203.04176] [Google Scholar]
- Grathwohl, W., Chen, R. T. Q., Bettencourt, J., Sutskever, I., & Duvenaud, D. 2018, arXiv e-prints [arXiv:1810.01367] [Google Scholar]
- Greenberg, D. S., Nonnenmacher, M., & Macke, J. H. 2019, arXiv e-prints [arXiv:1905.07488] [Google Scholar]
- Gruen, D., Friedrich, O., Krause, E., et al. 2018, Phys. Rev. D, 98, 023507 [Google Scholar]
- Hahn, C., Eickenberg, M., Ho, S., et al. 2023a, PNAS, 120, e2218810120 [NASA ADS] [CrossRef] [Google Scholar]
- Hahn, C., Eickenberg, M., Ho, S., et al. 2023b, JCAP, 2023, 010 [CrossRef] [Google Scholar]
- Halder, A., Friedrich, O., Seitz, S., et al. 2021, MNRAS, 506, 2780 [CrossRef] [Google Scholar]
- Hall, A., & Taylor, A. 2018, MNRAS, 483, 189 [Google Scholar]
- Hamaus, N., Pisani, A., Choi, J.-A., Lavaux, G., et al. 2020, JCAP, 2020, 023 [Google Scholar]
- Hartlap, J., Simon, P., & Schneider, P. 2006, A&A, 464, 399 [Google Scholar]
- Heavens, A. F., Jimenez, R., & Lahav, O. 2000, MNRAS, 317, 965 [NASA ADS] [CrossRef] [Google Scholar]
- Heavens, A. F., Sellentin, E., & Jaffe, A. H. 2020, MNRAS, 498, 3440 [Google Scholar]
- Ho, J., Jain, A., & Abbeel, P. 2020, arXiv e-prints [arXiv:2006.11239] [Google Scholar]
- Hou, J., Dizgah, A. M., Hahn, C., et al. 2024, Phys. Rev. D, 109, 103528 [NASA ADS] [CrossRef] [Google Scholar]
- Ivezić, Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111 [Google Scholar]
- Jeffrey, N., Alsing, J., & Lanusse, F. 2020, MNRAS, 501, 954 [NASA ADS] [CrossRef] [Google Scholar]
- Jeffrey, N., Whiteway, L., Gatti, M., et al. 2025, MNRAS, 536, 1303 [Google Scholar]
- Jimenez Rezende, D., & Mohamed, S. 2016, arXiv e-prints [arXiv:1505.05770] [Google Scholar]
- Joachimi, B., Lin, C.-A., Asgari, M., et al. 2021, A&A, 646, A129 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Kacprzak, T., & Fluri, J. 2022, Phys. Rev. X, 12, 031029 [NASA ADS] [Google Scholar]
- Kacprzak, T., Kirk, D., Friedrich, O., et al. 2016, MNRAS, 463, 3653 [Google Scholar]
- Kacprzak, T., Fluri, J., Schneider, A., et al. 2022, arXiv e-prints, [arXiv:2209.04662] [Google Scholar]
- Kaufman, G. M. 1967, Report No. 6710, Center for Operations Research and Econometrics, Catholic University of Louvain, Heverlee, Belgium [Google Scholar]
- Kidger, P. 2021, PhD thesis, University of Oxford, UK [Google Scholar]
- Kilbinger, M. 2015, Rep. Progr. Phys., 78, 086901 [Google Scholar]
- Kingma, D. P., & Ba, J. 2017, arXiv e-prints [arXiv:1412.6980] [Google Scholar]
- Kingma, D. P., & Dhariwal, P. 2018, arXiv e-prints [arXiv:1807.03039] [Google Scholar]
- Kingma, D. P., Salimans, T., Poole, B., & Ho, J. 2023, arXiv e-prints [arXiv:2107.00630] [Google Scholar]
- Krause, E., & Eifler, T. 2017, MNRAS, 470, 2100 [NASA ADS] [CrossRef] [Google Scholar]
- Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, arXiv e-prints [arXiv:1110.3193] [Google Scholar]
- Leclercq, F. 2018, Phys. Rev. D, 98, 063511 [NASA ADS] [CrossRef] [Google Scholar]
- Leclercq, F., & Heavens, A. 2021, MNRAS, 506, L85 [Google Scholar]
- Lemos, P., Parker, L., Hahn, C., et al. 2024, Phys. Rev. D, 109, 083536 [Google Scholar]
- Levi, M. E., Allen, L. E., Raichoor, A., et al. 2019, arXiv e-prints [arXiv:1907.10688] [Google Scholar]
- Lin, C.-A., & Kilbinger, M. 2015, A&A, 583, A70 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Lin, K., von Wietersheim-Kramsta, M., Joachimi, B., et al. 2023, MNRAS, 524, 6167 [NASA ADS] [CrossRef] [Google Scholar]
- Linke, L., Heydenreich, S., Burger, P. A., & Schneider, P. 2023, A&A, 672, A185 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., et al. 2023, arXiv e-prints [arXiv:2210.02747] [Google Scholar]
- Lueckmann, J.-M., Goncalves, P. J., Bassetto, G., et al. 2017, arXiv e-prints [arXiv:1711.01861] [Google Scholar]
- Makinen, T. L., Charnock, T., Alsing, J., et al. 2021, JCAP, 2021, 049 [CrossRef] [Google Scholar]
- Modi, C., Pandey, S., Ho, M., et al. 2025, MNRAS, 536, 254 [Google Scholar]
- Mohammad, F. G., & Percival, W. J. 2022, MNRAS, 514, 1289 [NASA ADS] [CrossRef] [Google Scholar]
- Murphy, K. P. 2022, Probabilistic Machine Learning: An Introduction (MIT Press) [Google Scholar]
- Norberg, P., Baugh, C. M., Gaztañaga, E., & Croton, D. J. 2009, MNRAS, 396, 19 [Google Scholar]
- Papamakarios, G. 2019, arXiv e-prints [arXiv:1910.13233] [Google Scholar]
- Papamakarios, G., Pavlakou, T., & Murray, I. 2018, arXiv e-prints [arXiv:1705.07057] [Google Scholar]
- Papamakarios, G., Sterratt, D. C., & Murray, I. 2019, arXiv e-prints [arXiv:1805.07226] [Google Scholar]
- Papamakarios, G., Nalisnick, E., Jimenez Rezende, D., Mohamed, S., & Lakshminarayanan, B. 2021, arXiv e-prints [arXiv:1912.02762] [Google Scholar]
- Percival, W. J., Friedrich, O., Sellentin, E., & Heavens, A. 2021, MNRAS, 510, 3207 [Google Scholar]
- Percival, W. J., Ross, A. J., Sánchez, A. G., et al. 2014, MNRAS, 439, 2531 [NASA ADS] [CrossRef] [Google Scholar]
- Prelogović, D., & Mesinger, A. 2023, MNRAS, 524, 4239 [CrossRef] [Google Scholar]
- Prelogović, D., & Mesinger, A. 2024, A&A, 688, A199 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Reeves, A., Nicola, A., Refregier, A., et al. 2024, JCAP, 2024, 042 [CrossRef] [Google Scholar]
- Schneider, P. 2006, Weak Gravitational Lensing (Springer Berlin Heidelberg), 269 [Google Scholar]
- Schneider, P., van Waerbeke, L., Kilbinger, M., & Mellier, Y. 2002, A&A, 396, 1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Sellentin, E., & Heavens, A. F. 2015, MNRAS, 456, L132 [Google Scholar]
- Song, Y., Sohl-Dickstein, J., Kingma, D. P., et al. 2021, arXiv e-prints [arXiv:2011.13456] [Google Scholar]
- Tabak, E. G., & Turner, C. V. 2013, Commun. Pure Appl. Math., 66, 145 [CrossRef] [Google Scholar]
- Tabak, E. G., & Vanden-Eijnden, E. 2010, Commun. Math. Sci., 8, 217 [CrossRef] [Google Scholar]
- Tam, S.-I., Umetsu, K., & Amara, A. 2022, ApJ, 925, 145 [Google Scholar]
- Taylor, A., & Joachimi, B. 2014, MNRAS, 442, 2728 [Google Scholar]
- Taylor, A., Joachimi, B., & Kitching, T. 2013, MNRAS, 432, 1928 [Google Scholar]
- Tegmark, M., Taylor, A. N., & Heavens, A. F. 1997, ApJ, 480, 22 [NASA ADS] [CrossRef] [Google Scholar]
- Tucci, B., & Schmidt, F. 2024, JCAP, 2024, 063 [Google Scholar]
- Uhlemann, C., Friedrich, O., Villaescusa-Navarro, F., et al. 2020, MNRAS, 495, 4006 [NASA ADS] [CrossRef] [Google Scholar]
- Valogiannis, G., & Dvorkin, C. 2022, Phys. Rev. D, 105, 103534 [Google Scholar]
- Villaescusa-Navarro, F., Hahn, C., Massara, E., et al. 2020, ApJS, 250, 2 [CrossRef] [Google Scholar]
- Villaescusa-Navarro, F., Anglés-Alcázar, D., Genel, S., et al. 2021, ApJ, 915, 71 [NASA ADS] [CrossRef] [Google Scholar]
- Villanueva-Domingo, P., & Villaescusa-Navarro, F. 2022, ApJ, 937, 115 [NASA ADS] [CrossRef] [Google Scholar]
- Weyant, A., Schafer, C., & Wood-Vasey, W. M. 2013, ApJ, 764, 116 [Google Scholar]
- White, M., & Padmanabhan, N. 2015, JCAP, 2015, 058 [Google Scholar]
Our code is a available at github.com/homerjed/sbiax
Note that diffusion (Ho et al. 2020; Song et al. 2021) and flow-matching models (Lipman et al. 2023) calculate log-likelihoods by approximating continuous flows.
All Tables
All Figures
![]() |
Fig. 1 Effects of estimated data error distribution on posterior contours and their locations. We show estimates |
In the text |
![]() |
Fig. 2 Average model parameter posterior variance reported by methods compared in this work, conditioned on noisy data vectors estimated with neural-likelihood estimation using a masked autoregressive flow. The colour-coded dashed lines show the (Fisher) variances that would have been measured if the exact data covariance had been known and used in a Gaussian likelihood ansatz with a flat prior. Cross points label posterior variances from SBI analyses where the exact data covariance was known for linear compression (Eq. (12)). Dotted lines show the expected variances of the maximum a posteriori that would have been measured when using a data covariance estimated from a set of ns simulations. Note that these lines multiply the Fisher variance with the factor (Eq. (7)) Dodelson & Schneider (2013) calculated using 2 × ns simulations. Circle points label posterior variances from SBI analyses where the data covariance was estimated from ns simulations and used in a linear compression. Diamond points label posterior variances obtained when using a compression with the diagonal elements of the estimated covariance matrix. Square points show variances obtained using a neural network for the compression, trained on ns simulations. All points include a 1-σ error bar on the variances estimated over all of the experiments. The additional simulations, not labelled on the x-axis but required for the separate compressions (where the true covariance is unknown), are noted for each method. When the true data covariance is not known, necessitating the use of double the number of simulations, the reconstructed posterior errors from SBI are significantly higher than the Dodelson & Schneider (2013) corrected errors when that correction is substantial. |
In the text |
![]() |
Fig. 3 Coverage Fω; i.e. how often the true cosmology in the experiment is found inside the 68 (1 − σ) and 95 (2 − σ) credible regions of the estimated posterior (see Eq. (23)) against the number of simulations ns used for the training set. Shown is the result for a neural-likelihood estimation (with a MAF model) for independently sampled data vectors and data covariance matrices in a series of repeated experiments with the number of simulations for each experiment on the horizontal axis. The expected coverage of a Gaussian posterior with a debiased estimate of the precision matrix (using the Hartlap correction; Eq. (3)) and posterior covariance corrected with the Dodelson & Schneider (2013) factor is plotted for both coverage intervals with dashed lines. The grey lines show the expected coverage of the common approach using a Gaussian posterior with a precision matrix corrected applying the Hartlap factor (i.e. using the IJ prior). The additional simulations, not labelled on the x-axis but required for the separate compressions (where the true covariance is unknown), are noted for each method. The SBI posteriors obtain the correct coverage to within errors for all numbers of simulations ns and each compression method. |
In the text |
![]() |
Fig. C.1 Similar to Fig. 3, showing model parameter posterior variance, conditioned on noisy datavectors, estimated with the SBI (the method and normalizing flow model used in each experiment is listed in each panel). The variances between NLE and NPE for each compression are similar except for the small underestimation of the variance in the NLE experiments, due to the likelihood being additionally informed by the prior. The additional simulations, not labeled on the x-axis, but required for the separate compressions (where the true covariance is unknown) are noted for each method. When the true data covariance is not known, requiring the use of double the number of simulations, the reconstructed posterior errors from SBI are significantly higher than the Dodelson & Schneider (2013) corrected errors for less than 4 × 103 simulations. |
In the text |
![]() |
Fig. C.2 Coverage Fω, i.e. how often the true cosmology in the experiment is found inside the 68 (1 − σ) and 95 (2 − σ)) credible regions of the estimated posterior (see Eq. 23) against the number of simulations ns used for the training set. Shown here are the results for Neural Posterior Estimation (separately with a MAF and CNF model) and Neural Likelihood Estimation (with a CNF model) for independently sampled data vectors and data covariance matrices in a series of repeated experiments with the number of simulations for each experiment on the horizontal axis. The expected coverage of a Gaussian posterior with a debiased estimate of the precision matrix (using the Hartlap correction, Eq. 3) and posterior covariance corrected with the Dodelson & Schneider (2013) factor is plotted for both coverage intervals with dashed lines. The grey-toned lines show the expected coverage of the common approach using a Gaussian posterior with a precision matrix corrected by applying the Hartlap factor. The additional simulations, not labelled on the x-axis, but required for the separate compressions (where the true covariance is unknown) are noted for each method. The SBI posteriors obtain the correct coverage to within errors for all numbers of simulations ns and each compression method. |
In the text |
![]() |
Fig. C.3 SBI posteriors (red for true covariance |
In the text |
![]() |
Fig. C.4 SBI posteriors (red for |
In the text |
![]() |
Fig. D.1 Posteriors derived from traditional likelihood analyses using a Gaussian likelihood with the true data covariance matrix, where the expectation ξ[π] is estimated from ns simulations. The posterior samples are obtained from MCMC sampling the analytic posterior. The same prior used in the experiments for this work where the model for the expectation is fit to data alongside the parameters and the data covariance is known. This shows that the error contribution to the posterior from an unknown model is much less than that due to the data covariance being estimated from the same number of simulations. |
In the text |
![]() |
Fig. E.1 Scatter in the posterior mean of SBI with respect to the true parameters. The true parameters are fixed to the fiducial parameters π0 at approximately the centre of our prior to reduce truncation of the posterior at the prior boundaries which is more significant at lower ns (see Eq. 7). The bias in the Sdiag. estimated covariance is due to ignoring the cross correlations in the datavector - this applies similarly to the neural network (labelled ‘NN’) variances. The similarity of the points in these plots to the plots of the marginal variances of the SBI posteriors against the number of simulations shows that the posterior mean does not scatter significantly more than the MAP. |
In the text |
![]() |
Fig. F.1 Average model parameter posterior variance conditioned on noisy datavectors estimated with Neural Posterior Estimation using a masked autoregressive flow. The same marginal Fisher variances, DS13 factors and compression methods are used for this plot as in Fig. 2. These results depend on using Σr as the true covariance (see Sect. 5) for the data generating process. When the true data covariance is not known and has significant non-diagonal elements (r = 0.2; see Appendix F), the compression using either a neural network fψ or an estimate of the diagonal elements Sdiag. of the covariance in a linear compression fails catastrophically. |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.