Diagnosing systematic effects using the inferred initial power spectrum

Tristan Hoellinger; Florent Leclercq

doi:10.1051/0004-6361/202453416

Home

All issues

Volume 699 (July 2025)

A&A, 699 (2025) A224

Full HTML

Open Access

Issue		A&A Volume 699, July 2025


Article Number		A224
Number of page(s)		23
Section		Cosmology (including clusters of galaxies)
DOI		https://doi.org/10.1051/0004-6361/202453416
Published online		09 July 2025

A&A 699, A224 (2025)

Diagnosing systematic effects using the inferred initial power spectrum

Tristan Hoellinger^⋆ and Florent Leclercq^⋆

Sorbonne Université, CNRS, UMR 7095, Institut d’Astrophysique de Paris, 98 bis bd Arago, 75014 Paris, France

^⋆ Corresponding authors: tristan.hoellinger@iap.fr, florent.leclercq@iap.fr

Received: 12 December 2024
Accepted: 26 April 2025

Abstract

Context. The next generation of galaxy surveys has the potential to substantially deepen our understanding of the Universe. This potential hinges on our ability to rigorously address systematic uncertainties. Until now, diagnosing systematic effects prior to inferring cosmological parameters has been out of reach in field-based implicit likelihood cosmological inference frameworks.

Aims. As a solution, we aim to diagnose a variety of systematic effects in galaxy surveys prior to inferring cosmological parameters, using the inferred initial matter power spectrum.

Methods. Our approach is built upon a two-step framework. First, we employed the simulator expansion for likelihood-free inference (SELFI) algorithm to infer the initial matter power spectrum, which we utilised to thoroughly investigate the impact of systematic effects. This investigation relies on a single set of N-body simulations. Second, we obtained a posterior on cosmological parameters via implicit likelihood inference, recycling the simulations from the first step for data compression. As a demonstration, we relied on a model of large-scale spectroscopic galaxy surveys that incorporates fully non-linear gravitational evolution with COmoving Lagrangian Acceleration (COLA) and simulates multiple systematic effects encountered in real surveys.

Results. We provide a practical guide on how the SELFI posterior can be used to assess the impact of misspecified galaxy bias parameters, selection functions, survey masks, inaccurate redshifts, and approximate gravity models on the inferred initial matter power spectrum. We show that a subtly misspecified model can lead to a bias exceeding 2σ in the (Ω_m, σ₈) plane, which we are able to detect and avoid prior to inferring cosmological parameters.

Conclusions. This framework has the potential to significantly enhance the robustness of physical information extraction from full forward models of large-scale galaxy surveys such as DESI, Euclid, and LSST.

Key words: methods: statistical / cosmological parameters / large-scale structure of Universe

© The Authors 2025

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

Current and forthcoming large-scale galaxy surveys have the statistical capability to unveil the nature of dark energy, shed light on the mechanisms driving cosmic inflation, and constrain neutrino masses with unprecedented precision (Mishra-Sharma et al. 2018; Euclid Collaboration: Blanchard et al. 2020; Goldstein et al. 2023). The vast volumes of data collected by Stage-IV surveys (Albrecht et al. 2006) such as DESI (DESI Collaboration 2016), Euclid (Euclid Collaboration: Mellier et al. 2025), and LSST (LSST Dark Energy Science Collaboration 2012; Ivezić et al. 2019) will be dominated by systematic rather than statistical uncertainties (Salvati et al. 2020). Extracting the subtle signals relating to the cosmological structure and content of our Universe therefore requires careful treatment of astrophysical and observational nuisances, along with their associated systematic effects (e.g. Meiksin et al. 1999; Desjacques et al. 2018). Such effects are pervasive and constitute one of the foremost challenges in modern cosmology (e.g. Kim et al. 2004; Kitching et al. 2008; Davis et al. 2011; Kitching et al. 2016; Glanville et al. 2021; Ayçoberry et al. 2023). For instance, substantial effort is being directed towards accurately measuring the growth rate of structure from peculiar velocities and galaxy surveys, whilst properly accounting for biases and selection effects (e.g. Said et al. 2020; Carreres et al. 2023). Crucially, the risk of mistaking systematic biases for new physics must be rigorously addressed.

To match the ever-growing precision of observations from stage-IV astrophysical surveys, Bayesian cosmological inference pipelines rely on increasingly complex probabilistic simulators. In particular, field-based forward models of galaxy surveys have become a cornerstone of modern precision cosmology (e.g. Jasche & Wandelt 2013; Leclercq et al. 2015; Ramanah et al. 2019; Porqueres et al. 2022; Andrews et al. 2023; Kostić et al. 2023; Nguyen et al. 2024; Zeghal et al. 2025; Zhou et al. 2024). These models simulate the evolution of the entire density field from its initial conditions in the early Universe to the present epoch. Whilst these simulators incorporate a refined treatment of non-linear gravitational evolution, astrophysical nuisances, and instrumental responses, the resulting pipelines become increasingly dependent on the accuracy of the underlying models. Consequently, they exhibit little to no robustness to model misspecification: when models deviate from the actual data-generating process, posteriors tend to become biased or overly concentrated (Frazier et al. 2017). For instance, using explicit likelihood inference within an effective field theory model of large-scale structure, Nguyen et al. (2021) showed that misspecified models can significantly bias the inferred transfer function.

Current approaches to cosmological data analysis using field-based models can be classified into two broad categories, depending on whether the likelihood is explicitly or implicitly defined. Explicit likelihood methods (e.g. Wandelt et al. 2004; Jasche & Wandelt 2013; Wang et al. 2014, 2016; Alsing et al. 2016; Jasche & Lavaux 2019; Loureiro et al. 2023; Sellentin et al. 2023; Zhou et al. 2024) employ Markov chain Monte Carlo (MCMC, Metropolis et al. 1953) algorithms such as Hamiltonian Monte Carlo (HMC, Duane et al. 1987) to sample from the target distribution. To make the likelihood tractable, they rely on approximations of the astronomical observables’ models. In contrast, implicit likelihood approaches (see Weyant et al. 2013; Ishida et al. 2015; Lintusaari et al. 2017; Alsing et al. 2018; Leclercq 2018; Leclercq et al. 2019; Taylor et al. 2019; Alsing et al. 2019; Brehmer et al. 2018; Cranmer et al. 2020; Makinen et al. 2021; Nguyen et al. 2024; Tucci & Schmidt 2024; Ho et al. 2024, for reviews and applications to cosmology) can accommodate arbitrarily complex simulator-based models, in which the internal mechanisms are unknown or difficult to incorporate into likelihood-based analyses. We refer to these simulators as hidden-box models. They may include a physical treatment of structure formation through N-body simulations as well as the intricacies of observational processes within the survey.

Both explicit and implicit approaches are susceptible to model misspecification. To address this issue within explicit likelihood cosmological inference frameworks, several solutions have been proposed: developing flexible data models (Jasche & Lavaux 2017, 2019; Nguyen et al. 2021; Porqueres et al. 2022), marginalising over unknown systematic effects (Porqueres et al. 2019), heating up the likelihood (Jasche & Lavaux 2019), or informing the likelihood with explicit knowledge of where the model is likely to underperform (Doeser et al. 2024). Some statistical literature has recently been focused on addressing the issue of model misspecification in implicit likelihood inference (ILI, e.g. Thomas et al. 2020). Yet, until now, no method had been designed to systematically address model misspecification in field-based implicit likelihood cosmological inferences.

Among the ILI approaches to cosmological data analysis, Leclercq et al. (2019) introduced the simulator expansion for likelihood-free inference (SELFI) algorithm, which makes it possible to infer the initial matter power spectrum using arbitrarily complex hidden-box models of galaxy surveys. The initial matter power spectrum (after recombination) is a theoretically well-understood object, and we expect its inference to be sensitive to most of the systematic effects encountered in the survey. Therefore, in this paper, we propose to use the initial matter power spectrum to diagnose systematic effects in implicit likelihood cosmological inferences from hidden-box models of galaxysurveys.

The full pipeline that we construct is built upon an application to cosmological data analysis of the two-step procedure introduced by Leclercq (2022) for ILI of Bayesianhierarchical model (BHMs) where a latent function carries relevant information about the target parameters to be inferred. The latent function is further required to be confined within a narrow region of its parameter space. Cosmological inferences naturally fall within this class of BHMs: the initial matter power spectrum can be treated as a latent function within the physical model mapping the cosmological parameters to the observed galaxy count fields; and its shape is already strongly constrained by Cosmic Microwave Background experiments (Planck Collaboration VI 2020; Balkenhol et al. 2023). For the first step of the procedure, we utilise the (SELFI) algorithm to infer the initial matter power spectrum. The inference requires the mean data model at an expansion point along with its gradient, which can be estimated by finite differences as in Leclercq et al. (2019). Alternatively, manual or automatic differentiation can be used: see Hahn et al. (2024) for an automatically differentiable Boltzmann solver; Wang et al. (2014), Jasche & Lavaux (2019), Modi et al. (2021) for differentiable N-body simulators; PINETREE (Ding et al. 2024) for a differentiable halo model. The second step of the procedure addresses the primary objective of inferring the cosmological parameters given the observations. Any ILI technique such as approximate Bayesian computation (ABC) or more sophisticated techniques can be used¹. ILI is known to be arduous when the dimensionality of the data space is high (e.g. Cranmer et al. 2020), and therefore requires data compression. A benefit of the statistical framework introduced by Leclercq (2022) is that the simulations generated in the first step naturally provide a score compressor (Heavens et al. 2000; Alsing & Wandelt 2018) for the second step at no additional cost.

In this article, we illustrate the method with a forward model of spectroscopic galaxy surveys that includes fully non-linear gravitational evolution and simulates multiple systematic effects. We use a prior embedding substantial information about the initial matter power spectrum, building upon previous experiments to effectively reduce the dimensionality of the parameter space, and infer the initial matter power spectrum from synthetic observations using SELFI. To make things simple whilst keeping an ILI framework, we rely on an ABC procedure using a population Monte Carlo (ABC-PMC) sampler (Beaumont et al. 2009) to infer the posterior on cosmological parameters. Using the ABC posterior, we show that an inconspicuously misspecified model can lead to a bias greater than 2σ in the (Ω_m, σ₈) plane. This bias can be unambiguously detected and avoided before performing the ILI of cosmological parameters. We provide a practical guide to diagnose systematic effects in implicit likelihood cosmological inferences, demonstrating how the SELFI posterior enables a comprehensive investigation of misspecified linear galaxy bias parameters, selection functions, survey masks, and inaccurate redshifts. Importantly, this process relies on a single, tractable set of N-body simulations. Additionally, at the cost of using distinct sets of N-body simulations, we are able to investigate the effect of misspecified gravity models, which is not directly apparent from the error on the galaxy count fields.

This article is structured as follows. In Section 2, we introduce the BHM used in this work and describe the two-step framework in detail, with a novel treatment of cosmological parameters within the hidden-box model compared to the original version of the SELFI algorithm. In Section 3, we present the spectroscopic galaxy survey model used in this study, which includes a non-linear physical model of structure formation and accounts for multiple observational systematic effects. In Section 4, we address the issue of model misspecification by investigating the impact of systematic effects on the inferred initial matter power spectrum, and we discuss their impact on the posterior on cosmological parameters. Finally, we discuss our results and conclude in Section 5.

2. Method

2.1. Bayesian hierarchical models of galaxy surveys with the initial matter power spectrum as a latent vector

We consider a BHM comprising the following variables: the target parameters ω ∈ ℝ^N, limited to cosmological parameters in this paper, a latent vector θ ∈ ℝ^S, corresponding to the normalised values of the initial matter power spectrum at S wave numbers, the summary statistics Φ ∈ ℝ^P of the mock galaxy populations considered in this study, and the compressed data vector $\tilde{ω} \in R^{N}$ $\widetilde{\boldsymbol{\omega}} \in \mathbb{R}^N$ composed of N numbers–one per target parameter. The deterministic compression step $\tilde{C}$ $\widetilde{\mathcal{C}}$ linking Φ to $\tilde{ω}$ $\boldsymbol{\widetilde{\omega}}$ is discussed in Section 2.3.1. A graphical representation of the generic BHM is presented in Figure 1; additionally, Figure 2 provides a detailed representation of the BHM in the context of cosmological inference from probing the large-scale structure of the Universe. In applications aimed at jointly constraining cosmological and astrophysical nuisance parameters from upcoming galaxy surveys, we expect N ∼ 𝒪(5 − 20) target parameters, S ∼ 𝒪(10² − 10³); P can be as large as 𝒪(10⁴) for complex data models exploiting information beyond 2-point statistics, and corresponds to a first layer of compression from the D-dimensional full-field data, which may be as large as D ∼ 𝒪(10¹¹). Table 1 provides an overview of the different variables appearing in this section and their respective physical interpretation within the spectroscopic galaxy survey model introduced for this study.

Fig. 1.

Hierarchical representation of the Bayesian framework employed for diagnosing model misspecification and inferring the target parameters ω. The rounded green boxes denote probability distributions, whilst the purple squares represent deterministic functions. 𝒫(ω) is the prior on ω. 𝒯 is a deterministic function linking ω to θ. 𝒫(Φ|θ) denotes the probabilistic data model that maps the space of latent vectors θ to the survey space. The final layer, $\tilde{C}$ $\widetilde{\mathcal{C}}$ , is a deterministic compression step required for the ILI of the target parameters ω. In this study, ω is the vector of cosmological parameters, 𝒯 is derived from the Boltzmann solver CLASS, θ is the initial matter power spectrum (after recombination), and Φ corresponds to the power spectra of multiple galaxy number count fields.

Fig. 2.

Detailed representation of the BHM used in this study for the inference of cosmological parameters ω from galaxy surveys. The hidden-box model corresponds to the second dashed orange rectangle and defines the true (unknown) likelihood L(θ) when Φ = Φ_O. The variables are ω (the target cosmological parameters), θ (the initial matter power spectrum), ψ (the nuisance parameters), o (the cosmological parameters, formally treated as nuisance parameters within the hidden-box), d (the full data vector containing the full galaxy count fields), Φ (the summary statistics), and $\tilde{ω}$ $\widetilde{\boldsymbol{\omega}}$ (the compressed summaries). We provide further details about the roles and physical meanings of these variables in Table 1.

Table 1.

Main variables, their generic role in the statistical framework, and their physical interpretation in this study.

In this work, the variables ω and θ are linked by a deterministic Boltzmann solver, the cosmic linear anisotropy solving system (CLASS, Blas et al. 2011), which associated with the deterministic function 𝒯 in the BHM. Alternative choices include the Code for Anisotropies in the Microwave Background (CAMB, Lewis & Challinor 2011), DIfferentiable Simulations for COsmology – Done with Jax (DISCO-DJ, Hahn et al. 2024), an emulator of cosmological power spectra (Spurio Mancini et al. 2022; Mootoovaloo et al. 2022) or a fitting function such as the Eisenstein-Hu fitting function (Eisenstein & Hu 1998; Campagne et al. 2023). In either case, 𝒯 is theoretically well-understood and numerically inexpensive compared to the potentially misspecified part of the BHM, which is the probabilistic simulator 𝒫(Φ|θ) linking the initial matter power spectrum θ to the simulated summary statistic Φ.

2.1.1. The top-level Bayesian problem

From a broad perspective, the cosmological inference problem considered here consists in updating prior knowledge 𝒫(ω) of the cosmological parameters ω through the Bayes rule

$\begin{matrix} P (ω | Φ_{O}) = L_{ω} (ω) \frac{P (ω)}{P (Φ_{O})}, \end{matrix}$ $\begin{aligned} \mathcal{P} (\boldsymbol{\omega }|\boldsymbol{\Phi }_{\rm O}) = L_{\omega }(\boldsymbol{\omega }) \frac{\mathcal{P} (\boldsymbol{\omega })}{\mathcal{P} (\boldsymbol{\Phi }_{\rm O})}, \end{aligned}$ (1)

based on observations Φ_O from one or multiple cosmological probes, where the true, unknown likelihood with respect to the cosmological parameters is given by

$\begin{matrix} L_{ω} (ω) \equiv P (Φ_{O} | ω), \end{matrix}$ $\begin{aligned} L_{\omega }(\boldsymbol{\omega }) \equiv \mathcal{P} (\boldsymbol{\Phi }_{\rm O}|\boldsymbol{\omega }), \end{aligned}$ (2)

and the prior 𝒫(ω) stems from previous experiments and/or encodes physical constraints derived from theoretical considerations and heuristic rules. From this point onwards, to avoid assuming a parametric form for the likelihood L_ω(ω), we employ ILI techniques based on a probabilistic forward model of the observable Φ. These techniques rely on comparing the simulated data, Φ, with the observations Φ_O, which is difficult when the dimension P of the summarised data space is high. Consequently, the observed and simulated data must undergo an additional compression step, denoted by $\tilde{C}$ $\widetilde{\mathcal{C}}$ in Figure 1.

2.1.2. The latent vector inference

To obtain a posterior for the initial matter power spectrum θ conditional on the observations Φ_O, we consider the alternative inference problem defined by

$\begin{matrix} P (θ | Φ_{O}) \propto L (θ) P (θ), \end{matrix}$ $\begin{aligned} \mathcal{P} (\boldsymbol{\theta }|\boldsymbol{\Phi }_{\rm O}) \propto L(\boldsymbol{\theta }) \mathcal{P} (\boldsymbol{\theta }), \end{aligned}$ (3)

where

$\begin{matrix} L (θ) \equiv P (Φ_{O} | θ) \end{matrix}$ $\begin{aligned} L(\boldsymbol{\theta }) \equiv \mathcal{P} (\boldsymbol{\Phi }_{\rm O}|\boldsymbol{\theta }) \end{aligned}$ (4)

is the true intermediate likelihood. We expect the inference of θ to be sensitive to most of the systematic effects of interest, as the initial matter power spectrum contains a wealth of information about the cosmological parameters ω (Peebles 1980). Harnessing our theoretical understanding of θ, we use the posterior 𝒫(θ|Φ_O) as a diagnostic tool to examine how simulation- and observation-related systematic effects may affect the top-level Bayesian inference problem defined by Equation (2).

2.2. First step: Initial matter power spectrum inference

2.2.1. The power spectrum prior distribution

At wave number k, we define the latent vector as

$\begin{matrix} θ_{k} \equiv \frac{P (k)}{P_{0} (k)}, \end{matrix}$ $\begin{aligned} \theta _k \equiv \frac{P(k)}{P_0(k)}, \end{aligned}$ (5)

where P(k) is the initial matter power spectrum, and the normalisation function P₀(k) is the Bardeen-Bond-Kaiser-Szalay (BBKS, Bardeen et al. 1986) power spectrum for some fiducial cosmological parameters ω₀. The normalisation P₀ plays no role in the statistical framework presented here and is introduced solely for numerical stability, as initial power spectra span several orders of magnitude across the range of scales considered.

We assume that, given a prior 𝒫(ω), a tight, physically-motivated prior on the initial matter power spectrum θ can be obtained by sampling from 𝒫(ω) and propagating the samples through the Boltzmann solver associated with the deterministic function 𝒯. That is, $P (θ | ω) = δ_{θ}^{D} (T (ω))$ $\mathcal{P}(\boldsymbol{\theta}|\boldsymbol{\omega})=\delta^{\mathrm{D}}_{\boldsymbol{\theta}}(\mathcal{T}(\boldsymbol{\omega}))$ , where $δ_{θ}^{D}$ $\delta^{\mathrm{D}}_{\boldsymbol{\theta}}$ is the Dirac measure centred on θ. Marginalising over ω yields:

$\begin{matrix} P (θ) = \int δ_{θ}^{D} (T (ω)) P (ω) d ω . \end{matrix}$ $\begin{aligned} \mathcal{P} (\boldsymbol{\theta }) = \int \delta ^\mathrm{D}_{\boldsymbol{\theta }}(\mathcal{T} (\boldsymbol{\omega })) \mathcal{P} (\boldsymbol{\omega }) \, \mathrm{d}\boldsymbol{\omega }. \end{aligned}$ (6)

To obtain an explicit form for 𝒫(θ), we assume a Gaussian-distributed prior on the initial matter power spectrum, whose mean and covariance matrix can be estimated by drawing from Equation (6). We generate an ensemble of m simulated power spectra $θ_{ω}^{(i)} = T (ω^{(i)})$ $\boldsymbol{\theta}_{\boldsymbol{\omega}}^{(i)} = \mathcal{T}(\boldsymbol{\omega}^{(i)})$ , i ∈ ⟦1, m⟧, where ω⁽ⁱ⁾ ∼ 𝒫(ω). The resulting prior reads

$\begin{matrix} - 2 log P (θ) \equiv log | 2 π S | + | | θ - {\hat{θ}}_{ω} | |_{S^{- 1}}^{2}, \end{matrix}$ $\begin{aligned} -2\log \mathcal{P} (\boldsymbol{\theta }) \equiv \log \left| 2\pi \mathbf S \right| + \left||\boldsymbol{\theta }-\boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }}\right||^2_\mathbf{S ^{-1}}, \end{aligned}$ (7)

where we introduced the notation ∥a∥_B² ≡ a^⊺Ba, with

$\begin{matrix} {\hat{θ}}_{ω} \equiv E^{m} [θ_{ω}] = \frac{1}{m} \sum_{i = 1}^{m} θ_{ω}^{(i)} \end{matrix}$ $\begin{aligned} \boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }} \equiv \mathrm{E}^m \left[ \boldsymbol{\theta }_{\boldsymbol{\omega }} \right] = \frac{1}{m}\sum _{i = 1}^{m} \boldsymbol{\theta }_{\boldsymbol{\omega }}^{(i)} \end{aligned}$ (8)

the mean of our prior, and where

$\begin{matrix} S & \equiv \frac{m}{m - 1} E^{m} [(θ_{ω} - {\hat{θ}}_{ω}) {(θ_{ω} - {\hat{θ}}_{ω})}^{⊺}] \\ = \frac{1}{m - 1} \sum_{i = 1}^{m} (θ_{ω}^{(i)} - {\hat{θ}}_{ω}) {(θ_{ω}^{(i)} - {\hat{θ}}_{ω})}^{⊺} \end{matrix}$ $\begin{aligned} \mathbf S&\equiv \frac{m}{m-1} \mathrm{E}^m \left[ (\boldsymbol{\theta }_{\boldsymbol{\omega }} - \boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }})(\boldsymbol{\theta }_{\boldsymbol{\omega }} - \boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }})^\intercal \right] \nonumber \\&= \frac{1}{m-1} \sum _{i = 1}^m (\boldsymbol{\theta }_{\boldsymbol{\omega }}^{(i)} - \boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }}) (\boldsymbol{\theta }_{\boldsymbol{\omega }}^{(i)} - \boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }})^\intercal \end{aligned}$ (9)

is the empirical prior covariance matrix. The symbol E^m denotes the empirical mean computed from an ensemble of m samples.

Since evaluating the deterministic function 𝒯 is computationally inexpensive compared to the probabilistic simulator 𝒫(Φ|θ), m can be taken as large as necessary to ensure that the prior distribution 𝒫(θ) is well-sampled. In this work, we use m = 10⁴.

2.2.2. The power spectrum posterior distribution

In the following, we adopt the approach introduced by Leclercq et al. (2019) to obtain a posterior on θ, with the key distinction that we formally treat the cosmological parameters o as additional nuisance parameters within the hidden-box model, over which we marginalise. In doing so, we rigorously account for the dependence of the mapping from initial matter power spectra to evolved dark matter density fields on the cosmological parameters.

Once realisations of θ, o, and ψ are specified, the output d ∈ ℝ^D of the simulation is a deterministic function 𝒮; that is, 𝒫(d|θ, o, ψ) = δ_d^D(𝒮(θ, o, ψ)), where, again, δ_d^D denotes the Dirac measure centred on d. Including the compression 𝒞 of the full data d to a summary statistic Φ, the hidden-box model can therefore be expressed as ℬ ≡ 𝒞° 𝒮, as illustrated in Figure 2. Marginalising over o and ψ, the exact likelihood given by Equation (4) reads

$\begin{matrix} L (θ) & = \int P (Φ_{O} | θ, o, ψ) P (o) P (ψ) d o d ψ \\ = \int δ_{Φ_{O}}^{D} (B (θ, o, ψ)) P (o) P (ψ) d o d ψ, \end{matrix}$ $\begin{aligned} L(\boldsymbol{\theta })&= \int \mathcal{P} (\boldsymbol{\Phi }_{\rm O}|\boldsymbol{\theta }, \mathbf o , \boldsymbol{\psi }) \mathcal{P} (\mathbf o ) \mathcal{P} (\boldsymbol{\psi }) \, \mathrm{d}\mathbf o \,\mathrm{d}\boldsymbol{\psi } \nonumber \\&= \int \delta ^{\mathrm{D}}_{\boldsymbol{\Phi }_{\rm O}}(\mathcal{B} (\boldsymbol{\theta }, \mathbf o , \boldsymbol{\psi })) \mathcal{P} (\mathbf o ) \mathcal{P} (\boldsymbol{\psi }) \, \mathrm{d}\mathbf o \,\mathrm{d}\boldsymbol{\psi }, \end{aligned}$ (10)

which involves an intractable integral.

To overcome this difficulty, an inefficient yet insightful approach is to condition the probabilities on an ensemble of θ-dependent data realisations. For a given θ, consider an ensemble of K simulated summaries $Φ_{θ}^{(i)} = B (θ, o^{(i)}, ψ^{(i)})$ $\boldsymbol{\Phi}_{\boldsymbol{\theta}}^{(i)} = \mathcal{B}(\boldsymbol{\theta}, \mathbf{o}^{(i)}, \boldsymbol{\psi}^{(i)})$ with i ∈ ⟦1, K⟧, where K is arbitrary for now. Postulating an effective Gaussian likelihood yields ${\tilde{L}}^{K} (θ) = exp [{\tilde{ℓ}}^{K} (θ)]$ $\widetilde{L}^K(\boldsymbol{\theta}) = \exp\left[ \tilde{\ell}^K(\boldsymbol{\theta}) \right]$ with the effective log-likelihood ${\tilde{ℓ}}^{K} (θ)$ $\tilde{\ell}^K(\boldsymbol{\theta})$ defined as

$\begin{matrix} - 2 ℓ^{K} (θ) = log | 2 π {\hat{Σ}}_{θ}^{'} | + | | Φ_{O} - {\hat{Φ}}_{θ} | |_{{\hat{Σ}}_{θ}^{' - 1}}^{2}, \end{matrix}$ $\begin{aligned} -2 {\ell }^K(\boldsymbol{\theta }) = \log \left| 2\pi \boldsymbol{\hat{\Sigma }}_{\boldsymbol{\theta }}^{\prime } \right| + \left||\boldsymbol{\Phi }_{\rm O} - \boldsymbol{\hat{\Phi }}_{\boldsymbol{\theta }}\right||^2_{\boldsymbol{\hat{\Sigma }}_{\boldsymbol{\theta }}^{\prime -1}}, \end{aligned}$ (11)

where ${\hat{Φ}}_{θ}$ $\boldsymbol{\hat{\Phi}}_{\boldsymbol{\theta}}$ is the empirical mean of the simulated summaries, defined by

$\begin{matrix} {\hat{Φ}}_{θ} \equiv E^{K} [Φ_{θ}] = \frac{1}{K} \sum_{i = 1}^{K} Φ_{θ}^{(i)}, \end{matrix}$ $\begin{aligned} \boldsymbol{\hat{\Phi }}_{\boldsymbol{\theta }} \equiv \mathrm{E}^K \left[ \boldsymbol{\Phi }_{\boldsymbol{\theta }} \right] = \frac{1}{K}\sum _{i = 1}^{K} \boldsymbol{\Phi }_{\boldsymbol{\theta }}^{(i)}, \end{aligned}$ (12)

and the estimators of the covariance matrix of ${\tilde{L}}^{K} (θ)$ $\widetilde{L}^K(\boldsymbol{\theta})$ and its inverse are given by:

$\begin{matrix} \frac{K - 1}{K + 1} {\hat{Σ}}_{θ}^{'} & \equiv E^{K} [(Φ_{θ} - {\hat{Φ}}_{θ}) {(Φ_{θ} - {\hat{Φ}}_{θ})}^{⊺}] \\ = \frac{1}{K} \sum_{i = 1}^{K} (Φ_{θ}^{(i)} - {\hat{Φ}}_{θ}) {(Φ_{θ}^{(i)} - {\hat{Φ}}_{θ})}^{⊺} and \\ \frac{K + 1}{K} {\hat{Σ}}_{θ}^{' - 1} & \equiv α {\hat{Σ}}_{θ}^{- 1}, \end{matrix}$ $\begin{aligned} \frac{K-1}{K+1}\boldsymbol{\hat{\Sigma }}_{\boldsymbol{\theta }}^\prime&\equiv \mathrm{E}^K \left[ (\boldsymbol{\Phi }_{\boldsymbol{\theta }} - \boldsymbol{\hat{\Phi }}_{\boldsymbol{\theta }})(\boldsymbol{\Phi }_{\boldsymbol{\theta }} - \boldsymbol{\hat{\Phi }}_{\boldsymbol{\theta }})^\intercal \right] \nonumber \\&= \frac{1}{K} \sum _{i = 1}^K (\boldsymbol{\Phi }_{\boldsymbol{\theta }}^{(i)} - \boldsymbol{\hat{\Phi }}_{\boldsymbol{\theta }}) (\boldsymbol{\Phi }_{\boldsymbol{\theta }}^{(i)} - \boldsymbol{\hat{\Phi }}_{\boldsymbol{\theta }})^\intercal \,\mathrm{and} \nonumber \\ \frac{K+1}{K}\boldsymbol{\hat{\Sigma }}_{\boldsymbol{\theta }}^{\prime -1}&\equiv \alpha \,\boldsymbol{\hat{\Sigma }}_{\boldsymbol{\theta }}^{-1}, \end{aligned}$ (13)

respectively. The factor $α \equiv \frac{K - P - 2}{K - 1}$ $\alpha \equiv \dfrac{K-P-2}{K-1}$ arises from the assumption that $Σ_{θ}^{- 1}$ $\boldsymbol{\Sigma}_{\boldsymbol{\theta}}^{-1}$ follows an inverse-Wishart distribution (Hartlap et al. 2007); the $\frac{K + 1}{K}$ $\dfrac{K+1}{K}$ factor originates from a derivation of ${\tilde{L}}^{K} (θ)$ $\widetilde{L}^K(\boldsymbol{\theta})$ based on a surrogate signal proposed in Leclercq et al. (2019), which motivates the Gaussian parametric form.

At this point, the numerical cost of computing ${\tilde{L}}^{K} (θ)$ $\widetilde{L}^K(\boldsymbol{\theta})$ as given by Equation (11) is still prohibitively large when the full parameter space has to be explored: it requires K simulations per θ where an evaluation of the approximate likelihood is sought. The SELFI algorithm solves this issue by Taylor-expanding the full forward model at first order around an expansion point θ₀, leveraging prior knowledge of the initial matter power spectrum so that the linearised model remains asymptotically exact around θ₀. We define the expansion point as

$\begin{matrix} θ_{0} \equiv T (ω_{0}), \end{matrix}$ $\begin{aligned} \boldsymbol{\theta }_0 \equiv \mathcal{T} (\boldsymbol{\omega }_0), \end{aligned}$ (14)

where ω₀ is a vector of fiducial cosmological parameters.

The aforementioned approach translates into a linearisation of the mean data model around the initial matter power spectrum θ₀. Namely, let o and ψ denote cosmological and nuisance parameters, δθ be a small vector in the parameter space; let X₀ = (θ₀, o, ψ), H = (δθ,0, 0), X = X₀ + H. At first order in its first variable, ℬ(X) ≡ 𝒞° 𝒮(X) can be approximated by

$\begin{matrix} B_{j} (X) & ≃ B_{j} (X_{0}) + \nabla B_{j} (X_{0}) \cdot H \\ ≃ B_{j} (X_{0}) + \partial_{θ} B_{j} (X_{0}) \cdot δ θ, \end{matrix}$ $\begin{aligned} \mathcal{B} _j\left(\boldsymbol{\mathrm{X}}\right)&\simeq \mathcal{B} _j\left(\boldsymbol{\mathrm{X}}_0\right) + \boldsymbol{\nabla } \mathcal{B} _j\left(\boldsymbol{\mathrm{X}}_0\right) \cdot \boldsymbol{\mathrm{H}} \nonumber \\&\simeq \mathcal{B} _j\left(\boldsymbol{\mathrm{X}}_0\right) + \partial _\theta \mathcal{B} _j\left(\boldsymbol{\mathrm{X}}_0\right) \cdot \delta \boldsymbol{\theta }, \end{aligned}$ (15)

where ℬ_j(X) is the j-th component of ℬ(X) and ∂_θ denotes the partial derivative with respect to the initial matter powerspectrum. Formally treating the cosmological parameters o intervening in the second layer of the BHM as a nuisance and marginalising over all nuisances yields

$\begin{matrix} E [Φ_{θ}] & \equiv \int B (X) P (o) P (ψ) d o d ψ \\ ≃ \int [B (X_{0}) + \partial_{θ} B (X_{0}) δ θ] P (o) P (ψ) d o d ψ \\ ≃ E [B (X_{0})] + \partial_{θ} E [B (X_{0})] \cdot δ θ, \end{matrix}$ $\begin{aligned} \mathbb{E} \left[ \boldsymbol{\Phi }_{\boldsymbol{\theta }} \right]&\equiv \int \mathcal{B} \left(\boldsymbol{\mathrm{X}}\right) \mathcal{P} (\mathbf o ) \mathcal{P} (\boldsymbol{\psi }) \, \mathrm{d}\mathbf o \,\mathrm{d}\boldsymbol{\psi } \nonumber \\&\simeq \int \left[\mathcal{B} \left(\boldsymbol{\mathrm{X}}_0\right) + \partial _\theta \mathcal{B} \left(\boldsymbol{\mathrm{X}}_0\right)\delta \boldsymbol{\theta }\right] \mathcal{P} (\mathbf o ) \mathcal{P} (\boldsymbol{\psi }) \, \mathrm{d}\mathbf o \,\mathrm{d}\boldsymbol{\psi } \nonumber \\&\simeq \mathbb{E} \left[ \mathcal{B} \left(\boldsymbol{\mathrm{X}}_0\right) \right] + \partial _\theta \mathbb{E} \left[ \mathcal{B} \left(\boldsymbol{\mathrm{X}}_0\right) \right] \cdot \delta \boldsymbol{\theta }, \end{aligned}$ (16)

under the assumption that the 𝔼 and ∂_θ operators can be permuted in Equation (16), and where the expectation is taken with respect to the joint distribution of o and ψ. For any θ, the ensemble mean 𝔼[Φ_θ] can therefore be approximated by

$\begin{matrix} f (θ) \equiv f_{0} + \nabla_{θ} f_{0} \cdot δ θ, \end{matrix}$ $\begin{aligned} \mathbf f (\boldsymbol{\theta }) \equiv \mathbf f _0 + \boldsymbol{\nabla }_{\theta } \mathbf f _0 \cdot \delta \boldsymbol{\theta }, \end{aligned}$ (17)

where δθ = θ − θ₀, f₀ ≡ E^N₀[Φ_θ₀] is the empirical mean of the data model at the expansion point θ₀ using N₀ simulations for different phase and other nuisance realisations, and ∇_θf₀ is the empirical gradient of ℬ.

Further assuming the covariance matrix to be independent of θ close to the expansion point, that is ${\hat{Σ}}_{θ}' ≃ {\hat{Σ}}_{θ_{0}}' \equiv C_{0}$ $\boldsymbol{\hat{\Sigma}}_{\boldsymbol{\theta}}{\prime}\simeq \boldsymbol{\hat{\Sigma}}_{\boldsymbol{\theta}_0}{\prime}\equiv \mathbf{C}_0$ , and replacing the exact data model in Equation (11) with the linearised data model f, the SELFI effective likelihood is given by ${\hat{L}}^{N_{0}} (θ) = exp [{\hat{ℓ}}_{θ}^{N_{0}} (θ)]$ $\widehat{L}^{N_0}(\boldsymbol{\theta}) = \exp\left[ \hat{\ell}^{N_0}_{\theta}(\boldsymbol{\theta}) \right]$ with

$\begin{matrix} - 2 {\hat{ℓ}}_{θ}^{N_{0}} (θ) \equiv log | 2 π C_{0} | + | | Φ_{O} - f (θ) | |_{C_{0}^{- 1}}^{2} . \end{matrix}$ $\begin{aligned} -2 \hat{\ell }^{N_0}_{\theta }(\boldsymbol{\theta }) \equiv \log \left| 2\pi \mathbf C _0 \right| + \left||\boldsymbol{\Phi }_{\rm O} - \mathbf f (\boldsymbol{\theta })\right||^2_\mathbf{C _0^{-1}}. \end{aligned}$ (18)

At the end of the day, SELFI approximates the average hidden-box model with its linearisation f around θ₀ marginalised over the nuisance parameters. These assumptions, alongside the Gaussian effective likelihood assumption, solely impact the inference of the initial matter power spectrum θ, and do not affect the inference of the target cosmological parameters ω discussed later, except through data compression.

The SELFI likelihood defined by Equation (18) is fully characterised by f₀, C₀, and ∇_θf₀, which can all be evaluated through forward simulations. The numerical computation requires N₀ simulations at the expansion point to evaluate f₀ and C₀. The gradient ∇_θf₀ can be evaluated using an analytical formula or auto-differentiation when available. In this work, we perform N_ssimulations in each direction of the parameter space to empirically estimate ∇_θf₀ via first-order forward finite differences. The total number of simulations required is therefore N_tot = N₀ + N_s × S simulations. N₀ and N_s should be of the order of the dimensionality of the data space P, giving a total cost of 𝒪(≳P(S + 1)) model evaluations. If not chosen in advance, a precise value for N₀ and N_s can be obtained by ensuring sufficient convergence of the covariance matrix and the gradients (see Appendix A) or by monitoring the convergence of the SELFI posterior.

To fully characterise the Bayesian problem, any prior 𝒫(θ) can be used, such as the prior naturally given by Equation (6). Any numerical technique such as MCMC can then be employed to explore the posterior using Equation (18), which does not require any additional evaluation of the hidden-box model. Remarkably, if the prior is Gaussian, then the SELFI effective posterior for the initial matter power spectrum is Gaussian and reads:

$\begin{matrix} - 2 log P (θ | Φ_{O}) ≃ log | 2 π Γ | + | | θ - γ | |_{Γ^{- 1}}, \end{matrix}$ $\begin{aligned} -2\log \mathcal{P} (\boldsymbol{\theta }|\boldsymbol{\Phi }_{\rm O}) \simeq \log \left| 2\pi \boldsymbol{\Gamma } \right| + \left||\boldsymbol{\theta }-\boldsymbol{\gamma }\right||_{\boldsymbol{\Gamma }^{-1}}, \end{aligned}$ (19)

with mean and covariance matrix given by

$\begin{matrix} γ & \equiv θ_{0} + Γ {(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} (Φ_{O} - f_{0}) + Γ S^{- 1} Δ, \end{matrix}$ $\begin{aligned} \boldsymbol{\gamma }&\equiv \boldsymbol{\theta }_0 + \boldsymbol{\Gamma } \, (\boldsymbol{\nabla }_{\theta } \mathbf f _0)^\intercal \, \mathbf C _0^{-1} (\boldsymbol{\Phi }_{\rm O} - \mathbf f _0) + \boldsymbol{\Gamma }\mathbf S ^{-1}\boldsymbol{\Delta }, \end{aligned}$ (20)

$\begin{matrix} Γ & \equiv {[{(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} \nabla_{θ} f_{0} + S^{- 1}]}^{- 1} . \end{matrix}$ $\begin{aligned} \boldsymbol{\Gamma }&\equiv \left[ (\boldsymbol{\nabla }_{\theta } \mathbf f _0)^\intercal \, \mathbf C _0^{-1} \boldsymbol{\nabla }_{\theta } \mathbf f _0 + \mathbf S ^{-1} \right]^{-1}. \end{aligned}$ (21)

$Δ = {\hat{θ}}_{ω} - θ_{0}$ $\boldsymbol{\Delta}=\boldsymbol{\hat{\theta}}_{\boldsymbol{\omega}} - \boldsymbol{\theta}_0$ is the difference between the prior mean and the expansion point. This result was derived by Leclercq et al. (2019) for the special case where the mean of the prior is equal to the expansion point; that is, Δ = 0. We provide the general derivation in Appendix B. To avoid inverting S in case it is ill-conditioned, one may equivalently opt for the alternative form:

$\begin{matrix} Γ S^{- 1} Δ & = {[S {(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} \nabla_{θ} f_{0} + I]}^{- 1} Δ, \end{matrix}$ $\begin{aligned} \boldsymbol{\Gamma }\mathbf S ^{-1}\boldsymbol{\Delta }&= \left[ \mathbf S (\boldsymbol{\nabla }_{\theta } \mathbf f _0)^\intercal \, \mathbf C _0^{-1} \boldsymbol{\nabla }_{\theta } \mathbf f _0 + \mathbf I \right]^{-1}\boldsymbol{\Delta }, \end{aligned}$ (22)

$\begin{matrix} Γ & = {[S {(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} \nabla_{θ} f_{0} + I]}^{- 1} S . \end{matrix}$ $\begin{aligned} \boldsymbol{\Gamma }&= \left[ \mathbf S (\boldsymbol{\nabla }_{\theta } \mathbf f _0)^\intercal \, \mathbf C _0^{-1} \boldsymbol{\nabla }_{\theta } \mathbf f _0 + \mathbf I \right]^{-1}\mathbf S . \end{aligned}$ (23)

2.2.3. Check for model misspecification

A sensitivity analysis of how systematic effects in the data model influence the posterior θ can be performed by recalculating the SELFI posterior mean γ using Equation (20) whilst varying the model of the systematic effect under investigation. This approach enables qualitative assessments of the impact of model misspecification on the posterior initial matter power spectrum, which can be interpreted in light of theoretical considerations. For a given posterior, as a simple quantitative check for model misspecification, we compute the Mahalanobis distance between the reconstruction γ and the prior distribution 𝒫(θ):

$\begin{matrix} d_{M} (γ, θ_{0} | S) \equiv | | γ - θ_{0} | |_{S^{- 1}} . \end{matrix}$ $\begin{aligned} d_{\rm M}(\boldsymbol{\gamma }, \boldsymbol{\theta }_0 | \mathbf S ) \equiv \left||\boldsymbol{\gamma }-\boldsymbol{\theta }_0\right||_\mathbf{S ^{-1}}. \end{aligned}$ (24)

We compare it to an ensemble of values of d_M(θ, θ₀|S) for simulations θ = 𝒯(ω), where ω is drawn from 𝒫(ω).

2.3. Second step: Implicit likelihood inference of the cosmological parameters

2.3.1. Score compression

The second step of the framework focuses on inferring the cosmological parameters ω based on the observations Φ_O. As usual in ILI, the simulated and observed summaries must undergo an additional compression step to reduce their dimensionality. In essence, the compressed summaries $\tilde{C} (Φ)$ $\widetilde{\mathcal{C}}(\boldsymbol{\Phi})$ should closely approximate sufficient statistics of Φ, such that $P (ω ∣ | \tilde{C} (Φ)) ≃ P (ω | Φ)$ $\mathcal{P}\left(\boldsymbol{\omega}\mid|\widetilde{\mathcal{C}}(\boldsymbol{\Phi})\right) \simeq \mathcal{P}(\boldsymbol{\omega}|\boldsymbol{\Phi})$ . Following the procedure described by Leclercq (2022), we assume, for compression only, that 𝒫(Φ|ω) is Gaussian-distributed: $P (Φ_{O} | ω) \equiv exp [{\hat{ℓ}}_{ω} (ω)]$ $\mathcal{P}(\boldsymbol{\Phi}_{\mathrm{O}}|\boldsymbol{\omega}) \equiv \exp \left[ \hat{\ell}_\omega(\boldsymbol{\omega}) \right]$ with ${\hat{ℓ}}_{ω} (ω) \equiv \hat{ℓ} (T (ω))$ $\hat{\ell}_\omega(\boldsymbol{\omega}) \equiv \hat{\ell}(\mathcal{T}(\boldsymbol{\omega}))$ , where

$\begin{matrix} - 2 {\hat{ℓ}}_{ω} (ω) \equiv log | 2 π C_{0} | + | | Φ_{O} - f [T (ω)] | |_{C_{0}^{- 1}}^{2} \end{matrix}$ $\begin{aligned} -2 \hat{\ell }_\omega (\boldsymbol{\omega }) \equiv \log \left| 2\pi \mathbf C _0 \right| + \left||\boldsymbol{\Phi }_{\rm O} - \mathbf f \left[\mathcal{T} (\boldsymbol{\omega })\right]\right||^2_\mathbf{C _0^{-1}} \end{aligned}$ (25)

is the effective log-likelihood defined in the first part of the framework by Equation (18), which corresponds to the exact log-likelihood under the Gaussian assumption.

We consider the score function $\nabla_{ω} {\hat{ℓ}}_{ω}$ $\boldsymbol{\nabla}_{\omega} \hat{\ell}_\omega$ , which is the gradient of the log-likelihood with respect to the parameters ω at a fiducial point in parameter space. This function is a sufficient statistic for the log-likelihood given by Equation (25) to linear order (Alsing & Wandelt 2018), making it a natural choice for data compression. Using ω₀ as the fiducial point, a quasi maximum-likelihood estimator for the parameters is given by ${\tilde{ω}}_{O} = ω_{0} + F_{0}^{- 1} \nabla_{ω} {\hat{ℓ}}_{ω} (ω_{0})$ $\widetilde{\boldsymbol{\omega}}_{\mathrm{O}} = \boldsymbol{\omega}_0 + \mathbf{F}^{-1}_0 \boldsymbol{\nabla}_{\boldsymbol{\omega}} \hat{\ell}_\omega({\boldsymbol{\omega}_0})$ where the Fisher matrix $F_{0} = E [\nabla {\hat{ℓ}}_{ω} (ω_{0}) \nabla^{⊺} {\hat{ℓ}}_{ω} (ω_{0})]$ $\mathbf{F}_0=\mathbb{E}\left[\boldsymbol{\nabla} \hat{\ell}_\omega({\boldsymbol{\omega}_0}) \boldsymbol{\nabla}^\intercal \hat{\ell}_\omega({\boldsymbol{\omega}_0})\right]$ represents the expected observed information, and the gradient of the log-likelihood is evaluated at ω₀. Compression of Φ_O to ${\tilde{ω}}_{O}$ $\widetilde{\boldsymbol{\omega}}_{\mathrm{O}}$ yields N compressed statistics, which are optimal in the sense that they saturate the Fisher information content of the data at the expansion point under lenient assumptions provided by Alsing & Wandelt (2018).

Further assuming that the covariance C₀ does not depend on the parameters (∇_ωC₀ = 0) around the expansion point, and using Equation (14), the compressor can be approximated by

$\begin{matrix} \tilde{C} (Φ) \equiv \tilde{ω} \equiv ω_{0} + F_{0}^{- 1} [{(\nabla_{ω} f_{0})}^{⊺} C_{0}^{- 1} (Φ - f_{0})], \end{matrix}$ $\begin{aligned} \widetilde{\mathcal{C} }(\boldsymbol{\Phi }) \equiv \boldsymbol{\widetilde{\omega }} \equiv \boldsymbol{\omega }_0 + \mathbf F ^{-1}_0 \left[ (\boldsymbol{\nabla }_{\omega } \mathbf f _0)^\intercal \mathbf C _0^{-1} (\boldsymbol{\Phi } - \mathbf f _0) \right], \end{aligned}$ (26)

where the Fisher matrix is given by

$\begin{matrix} F_{0} \equiv - E [\nabla_{ω} \nabla_{ω} {\hat{ℓ}}_{ω} (ω_{0})] = {(\nabla_{ω} f_{0})}^{⊺} C_{0}^{- 1} \nabla_{ω} f_{0}, \end{matrix}$ $\begin{aligned} \mathbf F _0 \equiv -\mathbb{E} \left[ \boldsymbol{\nabla }_{\omega } \boldsymbol{\nabla }_{\omega }\hat{\ell }_\omega ({\boldsymbol{\omega }_0}) \right] = (\boldsymbol{\nabla }_{\omega } \mathbf f _0)^\intercal \mathbf C _0^{-1} \boldsymbol{\nabla }_{\omega } \mathbf f _0, \end{aligned}$ (27)

with ∇_ωf₀ = ∇f₀ ⋅ ∇_ω𝒯(ω₀) (Heavens et al. 2000; Alsing & Wandelt 2018). Alsing & Wandelt (2018) provide a more general data compression scheme which includes the case when the dependence of C₀ on the parameter cannot be neglected.

At this point, C₀ and ∇f₀ have already been computed for the initial matter power spectrum inference with SELFI; and the gradient ∇_ω𝒯(ω₀) derived from the Boltzmann solver can easily be computed for instance by finite difference or using auto-differentiation (Hahn et al. 2024). We now have a BHM that maps the N target parameters, ω, to the same number N of compressed summaries, $\tilde{ω}$ $\widetilde{\boldsymbol{\omega}}$ , as summarised in Figure 1, which can be used to infer the target cosmological parameters using standard ILI techniques.

2.3.2. Implicit likelihood inference with a population Monte Carlo sampler

Every ILI method, such as ABC, requires a discrepancy measure to compare the compressed simulated summaries $\tilde{ω}$ $\widetilde{\boldsymbol{\omega}}$ with the observed summaries ${\tilde{ω}}_{O}$ $\widetilde{\boldsymbol{\omega}}_{\mathrm{O}}$ (Beaumont et al. 2002; Sunnåker et al. 2013). For this purpose, we re-used the Fisher matrix from Equation (27) to compute the Fisher-Rao distance between the compressed observed and simulated data, defined by:

$\begin{matrix} d_{FR} (\tilde{ω}, {\tilde{ω}}_{O}) \equiv | | \tilde{ω} - {\tilde{ω}}_{O} | |_{F_{0}} . \end{matrix}$ $\begin{aligned} d_{\rm FR}(\widetilde{\boldsymbol{\omega }}, \widetilde{\boldsymbol{\omega }}_{\rm O}) \equiv \left||\widetilde{\boldsymbol{\omega }}-\widetilde{\boldsymbol{\omega }}_{\rm O}\right||_\mathbf{F _0}. \end{aligned}$ (28)

To infer the target parameters $\tilde{ω}$ $\widetilde{\boldsymbol{\omega}}$ given the observed summaries, ${\tilde{ω}}_{O}$ $\widetilde{\boldsymbol{\omega}}_{\mathrm{O}}$ , we relied on a simple yet theoretically well-motivated particle filter method, which is a variant introduced by Simola et al. (2021) of ABC-PMC. Unlike the basic ABC sampling strategy, known as implicit likelihood rejection sampling (Neumann 1951; Beaumont et al. 2002), and which does not take advantage of the information acquired during the sampling process, particle filters iteratively refine pools of candidates–referred to as particles, bringing them closer and closer to the true posterior distribution. Specifically, ABC-PMC methods gradually enhance the quality of the samples by updating the proposal distribution.

The ABC-PMC algorithm variant introduced by Simola et al. (2021) and employed here proceeds as follows. First, we set an initial acceptance threshold ϵ₀, and generated an initial pool of J particles $Ξ^{(0)} \equiv {ω_{i}^{(0)}}_{i \in 〚 1, J 〛}$ $\Xi^{(0)} \equiv \left\{\boldsymbol{\omega}_i^{(0)}\right\}_{i\in [ [ 1,\,J ]] }$ by drawing cosmological parameters from the proposal distribution π⁽⁰⁾(ω)≡𝒫(ω) until they satisfied $d_{FR} (\tilde{C} ° B (ω_{i}^{(0)}), {\tilde{ω}}_{O}) < ϵ_{0}$ $d_{\mathrm{FR}}\left(\widetilde{\mathcal{C}} \circ \mathcal{B}\left(\boldsymbol{\omega}_i^{(0)}\right), \widetilde{\boldsymbol{\omega}}_{\mathrm{O}}\right) < \epsilon_0$ . This initial step is equivalent to an implicit likelihood rejection sampling stage with J accepted samples and a tolerance ϵ₀.

To construct an intermediate proposal distribution π⁽¹⁾(ω) and the corresponding approximate posterior, the method uses a Gaussian mixture distribution. We set initial weights w_i⁽⁰⁾ ≡ J⁻¹. The initial multivariate Gaussian kernel functions are centred on ω_i⁽⁰⁾, with a common covariance matrix defined as twice the weighted sample covariance of the particles Ξ⁽⁰⁾; that is,

$\begin{matrix} Σ^{(0)} = 2 \sum_{i = 1}^{J} w_{i}^{(0)} (ω_{i}^{(0)} - {\hat{ω}}^{(0)}) {(ω_{i}^{(0)} - {\hat{ω}}^{(0)})}^{⊺}, \end{matrix}$ $\begin{aligned} \boldsymbol{\Sigma }^{(0)} = 2\,\sum _{i = 1}^{J} {w}_i^{(0)} \left(\boldsymbol{\omega }_i^{(0)} - \boldsymbol{\widehat{\boldsymbol{\omega }}}^{(0)}\right)\left(\boldsymbol{\omega }_i^{(0)} - \boldsymbol{\widehat{\boldsymbol{\omega }}}^{(0)}\right)^\intercal , \end{aligned}$ (29)

where ${\hat{ω}}^{(0)}$ $\boldsymbol{\widehat{\boldsymbol{\omega}}}^{(0)}$ is the sample mean of the particles Ξ⁽⁰⁾. The factor 2 arises from minimising the Kullback–Leibler divergence (Kullback & Leibler 1951) between the proposal and the target distribution (Beaumont et al. 2009). This leads to the updated proposal distribution

$\begin{matrix} π^{(1)} (ω) = \sum_{i = 1}^{J} w_{i}^{(0)} G (ω | ω_{i}^{(0)}, Σ^{(0)}), \end{matrix}$ $\begin{aligned} \pi ^{(1)}(\boldsymbol{\omega })=\sum _{i = 1}^{J} w_i^{(0)}\mathcal{G} \left(\boldsymbol{\omega }\left|\boldsymbol{\omega }_i^{(0)},\boldsymbol{\Sigma }^{(0)}\right.\right), \end{aligned}$ (30)

where 𝒢(ω|μ,Σ) denotes the probability distribution function (PDF) of a Gaussian centred on μ with covariance matrix Σ, evaluated at ω.

In subsequent iterations t, the algorithm proceeds by drawing particles from the approximate posterior proposal distribution π^(t)(ω) until they satisfy $d_{FR} (\tilde{C} ° B (ω_{i}^{(t)}), {\tilde{ω}}_{O}) < ϵ_{t}$ $d_{\mathrm{FR}}\left(\widetilde{\mathcal{C}} \circ \mathcal{B}\left(\boldsymbol{\omega}_i^{(t)}\right), \widetilde{\boldsymbol{\omega}}_{\mathrm{O}}\right) < \epsilon_t$ for some ϵ_t < ϵ_t − 1. This results in the updated pool of particles $Ξ^{(t)} = {ω_{i}^{(t)}}_{i \in 〚 1, J 〛}$ $\Xi^{(t)}=\left\{\boldsymbol{\omega}_i^{(t)}\right\}_{i\in [ [ 1,\,J ] ] }$ , along with an updated proposal π^(t + 1)(ω) given by the Gaussian mixture distribution

$\begin{matrix} π^{(t + 1)} (ω) = \sum_{i = 1}^{J} w_{i}^{(t)} G (ω | ω_{i}^{(t)}, Σ^{(t)}), \end{matrix}$ $\begin{aligned} \pi ^{(t+1)}(\boldsymbol{\omega })=\sum _{i = 1}^{J} {w}_i^{(t)} \mathcal{G} \left(\boldsymbol{\omega }\left|\boldsymbol{\omega }_i^{(t)},\boldsymbol{\Sigma }^{(t)}\right.\right), \end{aligned}$ (31)

where the unnormalised importance weights are

$\begin{matrix} w_{i}^{(t)} \equiv \frac{π^{(t)} (ω_{i}^{(t)})}{\sum_{j = 1}^{J} w_{j}^{(t - 1)} G (ω_{i}^{(t)} | ω_{j}^{(t - 1)}, Σ^{(t - 1)})}, \end{matrix}$ $\begin{aligned} {w}_i^{(t)}\equiv \dfrac{\pi ^{(t)}(\boldsymbol{\omega }_i^{(t)})}{\displaystyle \sum _{j = 1}^J{w}_j^{(t-1)}\mathcal{G} \left(\boldsymbol{\omega }_i^{(t)}\left|\boldsymbol{\omega }_j^{(t-1)},\boldsymbol{\Sigma }^{(t-1)}\right.\right)}, \end{aligned}$ (32)

so that $w_{i}^{(t)} = {\tilde{w}}_{i}^{(t)} / \sum_{j} {\tilde{w}}_{j}^{(t)}$ ${w}_i^{(t)}=\tilde{w}_i^{(t)}/\sum_j{\tilde{w}_j^{(t)}}$ . The covariance matrix Σ^(t) is, again, defined as twice the weighted sample covariance of the particles Ξ^(t). The resulting approximate posterior can thus be expressed as a kernel density estimator:

$\begin{matrix} {\hat{π}}_{ϵ_{t}, h} (ω | {\tilde{ω}}_{O}) = \sum_{i = 1}^{J} w_{i}^{(t)} k_{h} (ω - ω_{i}^{(t)}), \end{matrix}$ $\begin{aligned} \hat{\pi }_{\epsilon _t,h}\left(\boldsymbol{\omega }|\widetilde{\boldsymbol{\omega }}_{\rm O}\right)=\displaystyle \sum _{i = 1}^J {w}_i^{(t)} k_h\left(\boldsymbol{\omega }-\boldsymbol{\omega }_i^{(t)}\right), \end{aligned}$ (33)

where k_h is the kernel (e.g. Rosenblatt 1956; Parzen 1962), and where we replaced the exponent (t) with ϵ_t to emphasise the dependence on the acceptance threshold. In this study, we used an isotropic Gaussian kernel with a smoothing parameter h. The particles were drawn from sequentially improving proposal distributions, with decreasing tolerances ϵ₀ > ϵ₁ > … > ϵ_T. The algorithm starts with an initial acceptance quantile q₀, which is fixed by the initial tolerance ϵ₀, and terminates when q_T falls below a predetermined target quantile q. The target q sets the maximum allowable pointwise difference between posteriors at successive stages for the algorithm to be deemed converged. The rule for determining T and the choice of the decreasing sequence of tolerances are discussed in detail in Simola et al. (2021). The sequence is designed to ensure efficient sampling and to provide good exploration of the relevant regions of the parameter space. Starting from the initial acceptance quantile q₀, subsequent tolerances were determined as follows:

$\begin{matrix} ϵ_{t} & = min_{i \in [[1, J]]} {d_{FR} (\tilde{C} ° B (ω_{i}^{(t - 1)}), {\tilde{ω}}_{O})}, \end{matrix}$ $\begin{aligned} \epsilon _t&= \displaystyle \min _{i \in [\![1, J ]\!]} \left\{ d_{\rm FR}\left(\widetilde{\mathcal{C} } \circ \mathcal{B} \left(\boldsymbol{\omega }_i^{(t-1)}\right), \widetilde{\boldsymbol{\omega }}_{\rm O}\right) \right\} , \end{aligned}$ (34)

$\begin{matrix} q_{t} & = {(sup_{ω} \frac{{\tilde{π}}_{ϵ_{t}} (ω | {\tilde{ω}}_{O})}{{\tilde{π}}_{ϵ_{t - 1}} (ω | {\tilde{ω}}_{O})})}^{- 1}, \end{matrix}$ $\begin{aligned} q_t&= \left( \sup _{\boldsymbol{\omega }} \dfrac{\widetilde{\pi }_{\epsilon _t}\left(\boldsymbol{\omega }|\widetilde{\boldsymbol{\omega }}_{\rm O}\right)}{\widetilde{\pi }_{\epsilon _{t-1}}\left(\boldsymbol{\omega }|\widetilde{\boldsymbol{\omega }}_{\rm O}\right)} \right)^{-1}, \end{aligned}$ (35)

where ${\tilde{π}}_{ϵ_{t}} / {\tilde{π}}_{ϵ_{t - 1}}$ $\widetilde{\pi}_{\epsilon_t}/\widetilde{\pi}_{\epsilon_{t-1}}$ is a density ratio estimate of the posterior change between steps t − 1 and t. Notably, substantial differences between the approximate posteriors at stages t − 1 and t lead to a large shrinkage of the proposed tolerance ϵ_t + 1. The shrinkage in tolerance gradually decreases as the ABC posterior converges.

3. Data model of galaxy surveys

In the following, Model A refers to the correct, well-specified model used to generate the synthetic observations, whilst Model B refers to a misspecified model.

3.1. Initial power spectrum parametrisation

Throughout this work, we defined the initial density fluctuations and galaxy overdensity fields on a cubic equidistant grid with a comoving side length of 3.6 Gpc/h and 512³ voxels, spanning scales between k_s, min = 1.75 × 10⁻³h/Mpc and k_s, max = 7.74 × 10⁻¹ h/Mpc. We discretised the parameter space by evaluating the continuous normalised power spectra at S = 64 support wave numbers k_s, yielding the vector θ defined by Equation (5). Between two consecutive support wave numbers, we interpolated power spectra P(k) using quintic splines. The first eight support wave numbers match the largest modes in the Fourier grid of our set-up. We manually distributed the remaining support wave numbers up to k_s, max to ensure a small maximum relative error in the representation of initial matter power spectra across all wave numbers of the Fourier grid. We verified that, for all k below the grid’s Nyquist frequency, this set-up yielded a relative interpolation error below 0.14% for the fiducial power spectrum.

3.2. Gravitational evolution

The data model is a non-linear process meant to approximate the large variety of physical and observational phenomena at play in actual galaxy surveys. To generate dark matter overdensity fields for a given set of cosmological parameters, we essentially followed the procedure described by Leclercq et al. (2019). We assumed a flat ΛCDM cosmology (Blumenthal et al. 1984) and used the best fit of the Planck 2018 data, given in Table 2, as the fiducial cosmological parameters ω₀.

Table 2.

Fiducial cosmological parameters ω₀.

Given an initial matter power spectrum P(k), we generated a corresponding realisation of the initial matter density contrast field δⁱ. For the gravitational evolution, we used theSIMBELMYNë cosmological solver (Leclercq et al. 2015). We populated the initial grid of 512³ voxels with 1024³ dark matter particles arranged on a regular lattice. These particles were evolved via second order Lagrangian perturbation theory (2LPT) (Moutarde et al. 1991; Bouchet et al. 1995) up to redshift z_LPT = 19, followed by N-body evolution with COLA (COmoving Lagrangian Acceleration, Tassev et al. 2013) from z_LPT to z₁ = 0.1500, both implemented within SIMBELMYNë. We performed the COLA evolution on a particle-mesh grid of 1024³ voxels. We used 20 non-uniform time steps, which we distributed to ensure approximately linear scaling in the scale factor a while meeting the requirements imposed by our choice of mock galaxy populations. Specifically, the first 14 time steps were linearly spaced in a up to redshift z₃ = 0.8182, followed by four time steps linearly spaced in a up to redshift z₂ = 0.4925, and finally two additional time steps to reach z₁ = 0.1500, corresponding to a scale factor of a = 0.8696.

The observer is positioned at one corner of the box, whereby they see one octant of the sky. Using the radial component v_r of their final peculiar velocities with respect to the observer, the dark matter particles are placed in redshift space according to the non-linear mapping defined for each particle by

$\begin{matrix} 1 + z_{obs} = (1 + z_{cosmo}) (1 + z_{pec}), z_{pec} \equiv - \frac{v_{r}}{c}, \end{matrix}$ $\begin{aligned} 1+z_{\rm obs} = (1+z_{\rm cosmo})(1+z_{\rm pec}), z_{\rm pec} \equiv -\frac{v_r}{\mathrm{c}}, \end{aligned}$ (36)

where z_cosmo is the true cosmological redshift, z_pec is the redshift due to the peculiar velocity, z_obs is the “observed” redshift of the particle and c is the speed of light. The particles are then assigned to a 512³-voxels grid using the cloud-in-cell scheme (Hockney & Eastwood 1988) to give the density contrast fields at redshift z_i, denoted δ^z_i, illustrated in the upper and middle rows of Figure 3.

Fig. 3.

Illustration of the galaxy survey data model used in this study. (Upper and middle rows) Slices through a realisation of the dark matter overdensity fields in real space (top) and redshift space (middle). The fields are evolved in time using N-body simulations, starting from a grid of Gaussian initial dark matter density fluctuations with power spectrum θ_gt, up to the redshift indicated above the corresponding panels. (Lower row) Observed galaxy number count fields for the three galaxy populations, each defined on a grid of 512³ cells. Light-cone effects are accounted for by using one effective redshift per galaxy population. Each slice passes through the observer located at the (0, 0, 0) corner of the box.

3.3. Galaxy biasing

Galaxies are biased tracers of the underlying dark matter distribution (Kaiser 1984; Bardeen et al. 1986). The mapping from local dark matter overdensities to the galaxy density field is a crucial element in cosmological inferences, involving numerous physical processes. Multiple questions, such as halo formation and mergers, gas cooling, and stellar feedback (Desjacques et al. 2018), have to be addressed. Despite its inherent complexity, the highly non-linear process of galaxy formation can be accurately described by a small number of bias parameters on linear and quasi-linear scales, through a bias expansion in the matter density contrast and tidal fields. The parameters of this expansion often exhibit degeneracies with cosmological parameters, making the bias model critical in cosmological inferences, with the potential to either significantly enhance or severely bias the constraints on cosmological parameters (Barreira 2020; Barreira et al. 2020). Consequently, exploiting galaxy survey data beyond baryon acoustic oscillations (BAOs) requires careful modelling of galaxy biases (Barreira et al. 2021; Beyond-2pt Collaboration 2024), and there remains considerable scope for improving the bias models employed in cosmological analyses (Bartlett et al. 2024).

In this study, we focus on linear galaxy bias parameters, usually denoted b₁ in the literature, and use our framework to assess the impact of these bias parameters on the SELFI posterior on the initial matter power spectrum. The same statistical framework can be applied to study the impact of higher-order local-in-matter density (LIMD) bias parameters or tidal biasparameters on the initial matter power spectrum reconstruction. We leave this exploration for future research. Hereafter, the first-order LIMD bias parameter for the i-th galaxy population g_i is denoted b_{g_i}, to avoid confusion of b_g₂ and b_g₃ with higher order bias parameters.

We simulated three mock galaxy populations: one population of nearby bright galaxies and two populations of luminous red galaxies (LRGs). LRGs are slowly evolving galaxies that have been extensively studied; due to their high spatial densities and intrinsic brightness, they are among the most valuable tracers of the dark matter distribution (Eisenstein et al. 2001) in the Universe. Notably, subsamples of the 1.5 million spectra from the Baryon Oscillation Spectroscopic Survey (Eisenstein 2015) have been prominently used to study the large-scale structure of the Universe, proving able to measure the BAOs in their large-scale distribution (Eisenstein et al. 2005; Anderson et al. 2014). LRGs are notably the primary target of DESI within the redshift range 0.4 < z < 1.0 (Zhou et al. 2023).

We approximated light cone effects by using snapshots from the dark matter overdensity field at three distinct redshifts z₁, z₂, and z₃, as input to the linear bias model to define the three populations of galaxies, where z_i represents the effective redshift of the i-th galaxy population g_i. According to the linear bias model, the normalised galaxy density ρ^g_i for the galaxy population g_i is therefore given in any cell x by:

$\begin{matrix} ρ_{x}^{g_{i}} = \bar{N} (1 + b_{g_{i}} δ_{x}^{z_{i}}) . \end{matrix}$ $\begin{aligned} \rho _x^{{g}_i} = \bar{N}\,(1+ b_{{g}_i} \, \delta _x^{z_i}). \end{aligned}$ (37)

For simplicity, we used fixed values for the b_{g_i} parameters, given in Table 3. We emphasise that, given any prior distribution for these parameters, they could be treated as additional nuisances alongside ψ and marginalised over, without affecting the statistical framework presented in this article. To showcase the efficiency of our method for diagnosing systematic effects with high signal-to-noise ratio data, we made the simplifying assumption that $\bar{N} = 1$ $\bar{N} = 1$ , effectively adopting a unit system where the mean number of galaxies per cell equals unity, which in turn affects the stochastic properties of the galaxy count fields, as detailed in Section 3.4. We further assumed that the contrast,

$\begin{matrix} δ_{x}^{g_{i}} = b_{g_{i}} δ_{x}^{z_{i}}, \end{matrix}$ $\begin{aligned} \delta _x^{{g}_i}=b_{{g}_i} \, \delta _x^{z_i}, \end{aligned}$ (38)

Table 3.

Linear galaxy bias parameters used in this study.

could be observed directly, implying in particular that the mean number of galaxies per cell is perfectly known.

Table 4.

Values of the selection function parameters.

To simulate LRG populations, we based our approach on the measurements reported by Gil-Marıin et al. (2015), where the authors found b₁^1.40(z_eff)σ₈(z_eff) = 1.672 ± 0.060 and fσ₈(z_eff) = 0.597 for z_eff = 0.57 (best fit of their full sample), yielding b₁^1.40(z_eff) = 2.80 ± 0.10. Motivated by the (b₁, D) degeneracy between the linear galaxy bias and the growth factor at large scales, we assumed constant b₁(z)D(z) within the relevant redshift range, yielding:

$\begin{matrix} b_{1} (z) ≃ \frac{b_{1} (z_{eff}) \times D (z_{eff})}{D (z)} ≃ \frac{2.086 \times D (z_{eff})}{D (z)} . \end{matrix}$ $\begin{aligned} b_1(z) \simeq \frac{b_1(z_{\rm eff})\times D(z_{\mathrm{eff}})}{D(z)} \simeq \frac{2.086\times D(z_{\mathrm{eff}})}{D(z)}. \end{aligned}$ (39)

For the two LRGs populations, g₂ and g₃, we applied Equation (39) to the mean redshift of each bin, used as the effective redshift to estimate an approximate ground truth linear bias (see Figure 4 and Table 3). For nearby galaxies, we used the measurements $f σ_{8} (z_{1} = 0.15) = 0 . 43_{- 0.14}^{+ 0.15}$ $f\sigma_8(z_1 = 0.15) = 0.43^{+0.15}_{-0.14}$ and b₁σ₈(z₁ = 0.15) = 1.20 ± 0.15 provided for a sample of high-bias galaxies at z < 0.2 by Howlett et al. (2015). Assuming $f = Ω_{m}^{0.55}$ $f=\Omega_{\mathrm{m}}^{0.55}$ (Bouchet et al. 1995), we obtained b₁(z₁ = 0.15)≃1.45. For Model A, we randomly selected the ground truth values in a ±0.02 interval around those given by the above equations. For Model B, the misspecified values were chosen manually. They are reported in Table 3.

Fig. 4.

Radial selection functions modelled as log-normal distributions in redshift (upper x axis) for the three mock galaxy populations. In the figure, the distributions are represented with respect to comoving distances (lower x axis) using the redshift-distance relation. The continuous lines represent the selection functions for the well-specified Model A, while the dotted lines represent the misspecified Model B, both defined by the values provided in Table 4. The small, percent-order difference between Models A and B renders the dotted curves nearly indistinguishable from the solid ones. The vertical dashed lines indicate the means of the log-normal distributions.

3.4. Observational processes

The last step involves a virtual observation of the galaxy fields, accounting for observational effects expected in actual surveys. One such effect is dust extinction. Dust particles in the Milky Way and the intergalactic medium, presumably ejected by stars, absorb a significant portion of UV-to-near-infrared light, contaminating observations, causing reddening that must be corrected for (Galametz et al. 2017), and affecting the noise properties of the observed galaxies (Ho et al. 2015). The presence of gas and plasma in the intra- and extra-galactic medium introduces another known source of attenuation, as these media absorb, scatter, and re-emit a portion of incident radiation at longer wavelengths (Ménard et al. 2008; More et al. 2009).

These attenuation phenomena, along with contamination from bright stars, constitute complex and often correlated systematic effects (Boulanger et al. 1996) which can significantly impact nearly all cosmological measurements, from supernova distance estimates (Brout & Riess 2023) to photometric and spectroscopic galaxy surveys (Corasaniti 2006; Leistedt & Peiris 2014; Ho et al. 2015; Bovy et al. 2016). Consequently, considerable effort is devoted to accurately modelling and accounting for dust extinction in cosmological inference pipelines (e.g. Huterer et al. 2013; Jasche & Lavaux 2017; Karchev et al. 2024).

To obtain the simulated data from the galaxy contrast fields, we computed the three-dimensional survey response operators W_i. For each galaxy population g_i, the corresponding operator is defined as the product of the radial selection function, R_i(r), and the angular survey mask and completeness function $C (\hat{n})$ $C(\hat{\mathbf{n}})$ for any line of sight $\hat{n}$ $\hat{\mathbf{n}}$ ; that is,

$\begin{matrix} W_{i} (\hat{n}, r) \equiv R_{i} (r) C (\hat{n}) . \end{matrix}$ $\begin{aligned} \mathbf W _i(\hat{\mathbf{n }},r) \equiv R_i(r) \, C(\hat{\mathbf{n }}). \end{aligned}$ (40)

For the angular completeness $C (\hat{n})$ $C(\hat{\mathbf{n}})$ , we used a mask mimicking the wide sky coverage of stage-IV experiments, shown in Figure 5. We introduced additional linear extinction near the galactic plane up to 60° galactic latitude as a simple model of dust extinction. We also introduced 256 randomly positioned holes of angular diameter ∼0.1°, representing the masking of bright stars or other point sources contamination expected in galaxy surveys. Then, we projected the octant delineated by the light orchid triangle in Figure 5 onto the observed 512³ grid to define the survey mask. We modelled the radial selection functions as log-normal distributions in redshift, which approximately captures the expected behaviour over a range of redshifts (as in e.g. Gavazzi & Jaffe 1986). They are defined by

Fig. 5.

Survey mask for the well-specified Model A. We added extinction linearly decreasing from 100% at the galactic plane to 0% at 60° galactic latitude (dash-dotted orchid line), along with 256 holes of approximately 0.1-degree radius (white crosses). The observed octant is delineated by the light orchid triangle. The colour scale represents the completeness function $C (\hat{n})$ $C(\hat{\mathbf{n}})$ , with 0 indicating masked regions and 1 indicating unmasked regions, and any value in between indicating partial completeness. In the misspecified Model B, the same mask is used, except for the linear extinction which extends from 0 to 59° galactic latitude.

$\begin{matrix} R_{i} (z) = \frac{c_{i}}{z σ_{i} \sqrt{2 π}} [exp (- \frac{{(ln z - μ_{i})}^{2}}{2 σ_{i}^{2}})], \end{matrix}$ $\begin{aligned} R_i(z)=\dfrac{c_i}{z\sigma _i\sqrt{2\,\pi }}\left[\exp \left({-\dfrac{(\ln {z}-\mu _i)^2}{2\,\sigma _i^2}}\right)\right], \end{aligned}$ (41)

where the constants σ_i and μ_i take different values for the three galaxy populations, and c_i scales the amplitude of the signal. By analogy with the scaling of redshift uncertainties in typical photometric survey models (e.g. Laureijs et al. 2011), we defined the variance s_i of the log-normal selection functions in our data model as

$\begin{matrix} s_{i} = s (1 + z_{i}), \end{matrix}$ $\begin{aligned} s_{i}=s\,(1+z_i), \end{aligned}$ (42)

with an overall factor s = 0.1, where z_i is the effective redshift of the i-th galaxy population. The R_i are defined in redshift using $μ_{i} \equiv ln (z_{i}^{2} / \sqrt{z_{i}^{2} + s_{i}^{2}})$ $\mu_i\equiv\ln\left(z_i^2/\sqrt{z_i^2+s_{i}^2}\right)$ and σ_i² = ln(1+s_i²/z_i²), and then mapped to distance space using tabulated values of the distance-redshift relation. The values of z_i, s_i, and c_i used for Models A and B are provided in Table 4; the corresponding profiles of the selection functions are illustrated in Figure 4.

We emulated the survey by generating galaxy excess number counts N_x^g_i for each cell x of the box across the three galaxy populations. To account for instrumental noise, the N_x^g_i were drawn from Gaussian distributions with means μ_x^g_i and standard deviations ς_i, x,

$\begin{matrix} N_{x}^{g_{i}} G (μ_{x}^{g_{i}}, ς_{i, x}) . \end{matrix}$ $\begin{aligned} N_x^{\mathrm{g}_i} \mathcal{G} (\mu ^{\mathrm{g}_i}_x,\varsigma _{i,x}). \end{aligned}$ (43)

The mean μ_x^g_i ≡ W_i, xb_{g_i} δ_x^z_i characterises the expected number of galaxies based on Equation (38), where W_i, x is the value in the cell x of the three-dimensional response operator defined by Equation (40). The standard deviation is defined as $ς_{x} \equiv ς \sqrt{W_{i, x}}$ $\varsigma_x \equiv \varsigma \sqrt{W_{i,x}}$ , where ς = 1 is the overall shot noise level. Setting $\bar{N} = 1$ $\bar{N} = 1$ in Equation (37) yields a mean galaxy number density of approximately 6 × 10⁻³ h³ Mpc⁻³ for the mock populations of LRGs, which is about an order of magnitude higher than the expected galaxy number density in actual surveys, for instance 5 × 10⁻⁴ h³ Mpc⁻³ in Berti et al. (2023). A slice through one realisation of the galaxy counts fields N^g_i for each population g_i is shown in Figure 3.

3.5. Summary statistics

The full data vectors d, consisting of the three 512³-dimensional galaxy count fields, are compressed into a summary statistic believed to contain relevant information about the cosmological parameters. In this study, we use the estimated power spectra of the observed and simulated fields, which is a standard choice in cosmological data analysis that has proven to be a powerful tool for constraining cosmological parameters (e.g. Tegmark et al. 2004; Ross et al. 2013). The transformation from the full data space to the space of observed summaries corresponds to the data compressor 𝒞 in the BHM of Figure 2.

For each galaxy population g_i, following Leclercq et al. (2019), we define the binned data power spectrum as

$\begin{matrix} P^{g_{i}} (k_{r}) = \sum_{| k | \in [k_{r} \pm δ r]} \frac{| {\hat{N}}_{k}^{g_{i}} |^{2}}{N_{k_{r}} - 2}, \end{matrix}$ $\begin{aligned} P^{\mathrm{g}_i}(k_r) = \sum _{|\mathbf k | \in \left[k_r\pm \delta r\right]} \frac{\vert \widehat{N}_k^{\mathrm{g}_i}\vert ^2}{N_{k_r}-2}, \end{aligned}$ (44)

for k_r − δr > 0, where ${\hat{N}}^{g_{i}}$ $\widehat{\mathbf{N}}^{\mathrm{g}_i}$ denotes the discrete Fourier transform of N^g_i. N_{k_r} represents the total number of Fourier modes within the wave number shell around k_r. The −2 term arises from the assumption that the data power spectrum follows an inverse-Γ distribution with shape parameter N_{k_r}/2 and scale parameter ${| {\hat{N}}_{k}^{g_{i}} |}^{2} / 2$ $\left| \widehat{N}^{\mathrm{g}_i}_k \right|^2/2$ , corresponding to the Jeffreys prior for power spectra (Jasche et al. 2010). The summaries Φ are then defined as the concatenation of the three vectors P^g_i, i = 1, 2, 3, corresponding to the three galaxy populations.

3.6. Synthetic observations

The ground truth cosmological parameters ω_gt are drawn from the Planck 2018 prior (Planck Collaboration VI 2020). They are given up to the fourth decimal place in Table 5. We generate synthetic observations Φ_O with the well-specified Model A, using

$\begin{matrix} θ_{gt} = T (ω_{gt}), \end{matrix}$ $\begin{aligned} \boldsymbol{\theta }_{\rm gt}=\mathcal{T} (\boldsymbol{\omega }_{\rm gt}), \end{aligned}$ (45)

Table 5.

Ground truth cosmological parameters.

as the input initial matter power spectrum, with ω_gt given in Table 5, whilst fixing o = ω_gt in the simulator. The synthetic observations are therefore obtained as

$\begin{matrix} Φ_{O} | ω_{gt} \int δ^{D} (B (θ_{gt}, ω_{gt}, ψ)) P (ψ) d ψ . \end{matrix}$ $\begin{aligned} \boldsymbol{\Phi }_{\rm O}\vert \boldsymbol{\omega }_{\rm gt} \int \delta ^{\mathrm{D}}(\mathcal{B} (\boldsymbol{\theta }_{\rm gt}, \boldsymbol{\omega }_{\rm gt}, \boldsymbol{\psi }))\mathcal{P} (\boldsymbol{\psi }) \, \,\mathrm{d}\boldsymbol{\psi }. \end{aligned}$ (46)

4. Results

4.1. First step: Check for model misspecification

4.1.1. SELFI posteriors with the well- and misspecified models

We ran N₀ = 500 and N_s × S = 640 simulations at and around the expansion point θ₀ defined by Equation (14) and Table 2 to estimate f₀ and ∇_θf₀, respectively. The simulated summaries used for estimating f₀, along with their full covariance matrix, are displayed in Figure 6. Using these estimates, we computed the SELFI effective posterior 𝒫(θ|Φ_O) based on Equations (20) and (21). From the simulated summaries and their covariance matrices alone, it is difficult to distinguish the well- and misspecified models, let alone identify the source of misspecification in Model B. More insight can be gained by examining the SELFI posteriors.

Fig. 6.

Data intervening in the computation of the SELFI posterior. (Left panels) Observed and simulated summary statistics for Model A (well-specified; upper panel) and B (misspecified; lower panel). For each population, the simulated summaries are shown in grey and their means are represented as dotted coloured lines. The shaded areas correspond to ±2σ around their mean. The solid black line corresponds to the observations Φ_O. The binning is indicated by the vertical dashed lines. Right panels: covariance matrices for Models A and B. For each (k, k′) entry, the colour scale represents the covariance between the k-th and k′-th modes. The diagonal blocs of the full covariance matrix correspond to the intra-population covariance; the extra-diagonal blocs correspond to the inter-populations covariances.

The posterior initial matter power spectrum inferred with the well-specified Model A is represented in green in Figure 7. It is compatible with the truth across all scales, demonstrating the self-consistency of the SELFI assumptions. We further verify that this holds true for a wide range of observed data vectors derived from different sets of cosmological parameters, confirming the unbiasedness of the method (Appendix C.1). The posterior obtained with the misspecified Model B, in red in Figure 7, is implausible: it exhibits an excess of power of about ∼2σ with respect to the prior mean at the largest scales and a lack of power of similar amplitude at the smallest scales, where non-linear physics dominate.

Fig. 7.

Prior and SELFI posteriors for the initial matter power spectrum θ given the observations Φ_O. The ground truth θ_gt is indicated by the dashed blue line. The prior and posterior means, θ₀, γ_A and γ_B, are represented respectively by the yellow, green and red lines, along with their 2σ credible regions (shaded yellow, green, and red areas). The vertical dashed lines indicate the support wave numbers for the initial matter power spectrum representation. The posterior obtained with the well-specified Model A is unbiased over all scales, whilst the posterior obtained with the misspecified Model B exhibits an excess and a lack of power at the largest and smallest scales, respectively.

As a quantitative check, Figure 8 depicts the Mahalanobis distances between 𝒫(θ) and the reconstructed mean power spectra γ for each model, computed using Equation (24), in comparison to those between 𝒫(θ) and 5000 random samples θ_n = 𝒯(ω_n). Each ω_n is drawn from the prior 𝒫(ω) on the cosmological parameters. The Mahalanobis distances aremultivariate analogues of the standard scores, accounting for correlations between the components, and measure the deviation of the reconstructions from the prior distribution. Large deviations suggest significant disagreement, prompting further investigation. We found that the distance for Model A, d_M(γ_A, θ₀|S) = 1.816, is much smaller than the distance for Model B, d_M(γ_B, θ₀|S) = 2.827. A larger distance does not necessarily imply a worse model, but hints at the presence of systematic effects. For comparison, the mean Mahalanobis distance from 𝒫(θ) to the 5000 θ_n samples is ⟨d_M(θ_n,θ₀|S)⟩_n = 2.431. The disagreement between d_M(γ_B, θ₀|S) and the value typically observed from random samples, ⟨d_M(θ_n,θ₀|S)⟩_n, highlights how percent-order variations in the modelling assumptions strongly influence the reconstruction of the initial matter power spectrum θ, pulling it away from the prior. This motivates a careful investigation of systematic effects, which we undertake in the following sections.

Fig. 8.

Histogram of the Mahalanobis distances d_M(θ_n, θ₀|S) between the prior and 5000 power spectra θ_n ≡ 𝒯(ω_n) sampled from Equation (6). The mean value is indicated by the vertical solid black line. The green and red lines indicate the distances from γ_A and γ_B to the prior, respectively.

Additionally, since the prior given by Equation (9) embeds substantial information about the functional shape of the initial matter power spectrum, one might question how the SELFI algorithm performs with a less informative prior. Leclercq et al. (2019) demonstrated that a wiggle-less prior centred on θ₀ ≡ 1_ℝ^S enables precise recovery of initial features such as the BAOs up to mildly non-linear scales. Given the enhanced complexity of the data model employed here–incorporating extinction, point-source contamination, and radial selection effects, whilst using a lower mass-resolution–we successfully verified that the BAOs are still retrievable in this set-up using a prior agnostic to the BAOs, as is shown in Appendix C.2.

4.1.2. Impact of galaxy biases on the initial power spectrum reconstruction

Due to the non-linearity of the data model, the impact of the bias model on the posterior is in general out of reach of theoretical considerations. Linear galaxy biases, however, are expected to have a mostly scale-independent effect on the SELFI reconstruction γ of the initial matter power spectrum, as demonstrated in Appendix D based on qualitative considerations. This effect is observed on Figure 9a, which illustrates the impact of the linear galaxy bias parameters on the SELFI posterior distribution for the initial matter power spectrum. We considered a constant relative error in the ±2% range for the linear galaxy bias parameters of the three populations, as indicated by the colour scale. All scales experience a similar shift in standard deviations, which is consistent with the analytical expectations.

Fig. 9.

Ensemble of SELFI posteriors for the initial matter power spectrum θ conditional on the observed data Φ_O for different observational models, used to diagnose systematic effects. All the posteriors are derived from a single, common set of N-body simulations. They make it possible to disentangle individual systematic contributions by comparing their differential impact across the range of wave numbers spanned by the power spectra. The colour scales indicate the relative error associated with the systematic effects considered in each sub-figure. The solid yellow line denotes the prior mean θ₀ and the ground truth θ_gt is indicated by the dashed blue line. The vertical dashed lines indicate the support wave numbers for the initial matter power spectrum representation. The 2σ credible regions for the prior and the posterior with the well-specified Model A correspond to the shaded yellow and green areas, respectively. The posterior means γ are represented by the coloured continuous lines for varying degrees of misspecification. For clarity, their corresponding credible regions have been omitted. (a) Impact of misspecified linear galaxy biases. (b) Impact of misspecified linear extinction. (c) Impact of misspecified redshifts. (d) Impact of misspecified selection function variances. (e) Impact of misspecified number of holes. (f) Impact of misspecified selection function variances (varying) under misspecified linear extinction (fixed).

This demonstrates the ability of our framework to accurately diagnose the impact of systematic effects using the reconstructed initial matter power spectrum. We retrieve that the effect of misspecified linear galaxy bias parameters closely resembles a simple rescaling of the initial matter power spectrum by a constant factor, highlighting the well-known degeneracy between the linear galaxy bias parameters and the value of σ₈ (Beutler et al. 2012; Arnalte-Mur et al. 2016; Desjacques et al. 2018; Repp & Szapudi 2020).

4.1.3. Impact of linear extinction, selection functions, punctual contaminations, and inaccurate redshifts

Even when the internal mechanisms of the forward model are known and well understood, assessing how a particular systematic effect influences the reconstruction solely through analytical considerations is often impractical. If an analytical expression can be derived, it may not be accurate due to correlations with other systematic effects, and its validity is likely to depend on the specificities of the hidden-box forward model. In such cases, we show that our framework makes it possible to conduct a thorough numerical investigation by reconstructing γ with Equation (20), whilst varying the values of the parameters involved in the model of the systematic effect. As a demonstration, we examine the impact of misspecified extinction near the galactic plane, misspecified selection functions, inaccurate redshifts, and punctual contaminations. All these numerical investigations rely on the same, common set of N-body simulations previously computed to obtain the posterior with Model A and already used to evaluate the effect of linear galaxy bias parameters in Section 4.1.2.

Figure 9b illustrates the effect of over- or underestimating the amount of extinction near the galactic plane. The colour scale represents the relative mean visibility between the misspecified and well-specified models, averaged over the three-dimensional survey window function. The extinction reduces the amplitude of small scales correlations; thus, overestimating extinction in the misspecified mask leads to a power deficit at small scales in the simulated summaries, shifting the reconstructed spectrum upwards. Conversely, underestimating extinction shifts the reconstructed spectrum downwards at small scales. At larger scales, excessive extinction in the misspecified mask introduces spurious fluctuations which are not retrieved in the observations, adding power to the largest k-modes and pushing the reconstruction downwards. As is shown in Figure 9b, the transition between these two regimes occurs at a constant scale, regardless of the degree of error in the extinction. This characteristic scale provides a means to distinguish the effects of extinction from other sources of systematic errors.

We emulated a systematic effect analogous to inaccurate redshifts due to line interlopers (e.g. Pullen et al. 2016; Addison et al. 2019; Massara et al. 2021) by increasing (or decreasing) the observed density of the central spectroscopic galaxy bin whilst correspondingly decreasing (or increasing) the densities of the two other populations, ensuring that the total density remained unchanged. An example of the resulting misspecified radial selection functions is shown in Figure 10. The resulting reconstructions, displayed in Figure 9c, demonstrate that even small, percent-level variations significantly affect the posterior on the initial matter power spectrum.

Fig. 10.

Profiles of the well- and misspecified selection functions R_i for the three mock galaxy populations. The continuous lines represent the selection functions used in Model A. The dashed lines are derived by underestimating n_1, 2(z) by 10% whilst maintaining n_tot(z) constant.

Figure 9d shows the impact of another source of systematic error associated with selection functions, specifically the impact of misspecified variances in the radial log-normal distributions. The result is comparable to that of misspecified galaxy bias parameters, though it is slightly more pronounced at smaller scales.

In complex data models displaying multiple putative sources of systematic effects alongside potentially many corresponding model parameters, it is crucial to know which effects are most likely to have a significant impact on the initial matter power spectrum, and should be prioritised for further investigation. Notably, some systematic effects may have a negligible impact on the initial matter power spectrum. Figure 9e illustrates the effect of misspecifying the number of holes in the survey mask in our set-up, using 0, 256, or all 512 holes defined at the beginning of the study, which represent the masking of bright stars or other point sources of contamination expected in cosmological surveys. The reconstruction appears largely unaffected by the number of holes, as their small angular size causes them to be overwhelmed by small-scale noise.

4.1.4. Joint impact of multiple systematic effects

Naturally, one is not restricted to studying individual systematic effects in isolation, and the SELFI posterior can be employed to explore how jointly misspecified observational effects influence the reconstructed initial matter power spectrum. This approach would be particularly beneficial for designing the data model of a real survey, enabling the assessment of the joint impact of different systematic effects on the cosmological inference whilst accounting for potential correlations between them. For example, Figure 9f shows the combined impact on the initial matter power spectrum of misspecified variances in the log-normal selection functions alongside incorrectly specified linearextinction.

Conversely, if a comprehensive study of multiple individual systematic effects has already been conducted, one can use it to disentangle their contributions when faced with a surprising posterior that warrants further investigation. For instance, if we obtain one of the reconstructions shown in Figure 9f, assuming unknown additive contributions of the five effects described in Section 4.1.3, we can easily disentangle individual contributions as follows. Shifting the reconstruction until it aligns with one of the characteristic scales associated with misspecified extinction or redshifts, we deduce that the systematic error most consistent with the observed reconstruction is a −10% error in the extinction, along with the corresponding error in the variance of the selection functions. As the model complexity increases, greater caution is required when interpreting the results, since multiple systematic effects can have degenerate impacts on the reconstructed power spectra. This underscores the importance of thoroughly investigating the influence of each individual systematic effect beforehand.

If known and unknown systematic effects are entangled, it remains possible to first identify the former using the characterisation presented in Figure 9. Running the framework while accounting for all known sources of systematic effects ensures that only unknown systematic effects and/or new physics remain to be characterised in the inferred initial matter power spectrum. Consequently, our framework has significant potential for detecting previously unknown systematic effects. This approach is conceptually distinct from that introduced by Porqueres et al. (2019), who identify and marginalise over unknown systematic effects within explicit likelihood inference.

4.1.5. Impact of the gravity solver

The gravity solver is a critical component of the data model. There is some flexibility in selecting the total number of time steps n_steps, dark matter particles N_p, and the resolution of the particle-mesh grid used to compute the gravitational forces. Often, a trade-off must be made between the numerical cost of the resolution and the precision required by the observations and scientific objectives. The standard approach is to specify a target precision for the first or second-order statistics of the final dark matter overdensity fields and adjust the gravity solver parameters accordingly, based on convergence tests or comparisons with higher resolution simulations.

Unlike the systematic effects discussed in Sections 4.1.2, 4.1.3, and 4.1.4, obtaining multiple posteriors on the initial matter power spectrum for various configurations of the gravity solver implemented in the forward model requires re-running N-body simulations. However, the numerical cost of obtaining a single SELFI posterior for the initial matter power spectrum–𝒪(10³) N-body simulations in this study–remains orders of magnitude lower than for the cosmological parameters–𝒪(10⁵) N-body simulations in this study. Consequently, our framework presents further value for investigating the error caused by misspecified gravity models on the posterior initial matter power spectrum. Similarly, our framework could be employed to diagnose the effect of baryons in the hidden-box model based on the inferred initial matter power spectrum, provided one is willing to incur the numerical cost of incorporating baryonic physics into the simulator.

We refer to the error on the reconstructed initial matter power spectrum as the inverse error, as opposed to the direct error on the measured galaxy power spectra. We show that measuring this inverse error is particularly relevant for large-scale galaxy surveys: it highlights the sensitivity of the inference to the gravity solver parameters, which is not apparent from the direct error on the galaxy count fields.

Figures 11a and b illustrate the effect of using 10 instead of 20 time steps for the gravitational evolution with COLA. Whilst the direct error in the measured galaxy power spectra compared to Model A remains below 1% across all scales, the inverse error in the SELFI posterior exceeds 2% for k ≤ 4 ⋅ 10⁻³. This discrepancy arises due to the non-linearity of the inversion: small differences in the predicted galaxy power spectra between Models A and B can propagate into substantial biases in the posteriors after solving the inverse problem. As is shown in Figure 11b, the posterior rejects the ground truth by more than 2σ across most scales. The SELFI posterior is therefore highly sensitive to the time-stepping in the gravity solver. To achieve, for example, a 1% precision on the posterior, it is necessary to ensure an even higher precision in the galaxy count powerspectra.

Fig. 11.

Impact of the gravity solver parameters on the SELFI posterior. Panels (a, c, e) display the summary statistics Φ for the three galaxy count fields, comparing the well-specified Model A (dashed lines) with the misspecified model (solid lines). The lower parts of these panels show the relative error between the two models, with shaded regions corresponding to a 1% error margin. Panels (b, d, f) show the corresponding SELFI posteriors for the initial matter power spectrum θ, with the well-specified and misspecified models represented by green and indigo lines, respectively. (a) Example of summary statistics for a single realisation with n_steps = 10 (solid lines) as opposed to n_steps = 20 time steps for Model A (dashed lines) in the gravitational evolution with COLA. Although the direct error remains well below 1% across all scales, the inverse error on the SELFI posterior (b) reaches 2% at the largest scales, corresponding to a deviation greater than 1σ from the well-specified Model A. (c) Summary statistics using only the final snapshot at redshift z₁ = 0.15 for the three mock galaxy populations, thereby neglecting light cone effects (solid lines). The forward error reaches 74% at the largest scales for the most distant galaxy population, and the corresponding SELFI posterior (d) significantly rejects the ground truth across all scales. (e) Summary statistics using a pure 2LPT gravity model instead of N-body simulations to define the galaxy populations (solid lines). The forward error reaches ∼7% at the smallest scale, and the ground truth is significantly rejected by the SELFI posterior (f), which deviates by more than 2σ compared to Model A at the largest scales.

Similarly, Figures 11c and 11d highlight the importance of accounting for light cone effects in the forward model: the SELFI posterior consistently rejects the ground truth when all three galaxy populations are defined from the snapshot at redshift z₁ = 0.15, compared to using z₁, z₂, and z₃ in Model A. Since the cosmological structure is less evolved at higher redshifts, this results in overestimating the measured power spectrum amplitude for the two most distant galaxy populations, which pushes the SELFI posterior downwards at all scales. We note that this effect is highly degenerate with that of misspecified galaxy biases. Both risk biasing the value of σ₈ inferred from the data if not properly accounted for.

Strikingly, Figure 11e and 11f show that using 2LPT instead of COLA in the forward model significantly biases the posterior on the initial matter power spectrum. This arises because the range of scales considered in this study extends well beyond k_lin ≃ 0.15 h/Mpc, commonly considered as the smallest scale of the linear regime (e.g. Colless et al. 2001), up to k_max ≃ 0.4 h/Mpc. Within the SELFI framework, all scales are coupled by the non-linearity. Over-prediction of small-scale power by 2LPT (due to the mass resolution) results in under-prediction of large scale power to fit the data. Furthermore, since the number of small-scale modes vastly exceeds that of large-scale modes, the posterior computation sacrifices large scale accuracy to fit small scales. The forward error on the galaxy count power spectra reaches ∼7% at the smallest scales, and the SELFI posterior rejects the ground truth by almost 2σ across most scales. This underscores the necessity of employing fully non-linear gravity solvers to accurately extract small-scale information from large-scale galaxy surveys.

4.2. Second step: Cosmological parameters

After checking for model misspecification, the second step of the framework consists of inferring the target cosmological parameters, ω, given the compressed summaries ${\tilde{ω}}_{O}$ $\boldsymbol{\widetilde{\omega}}_{\mathrm{O}}$ . For score compression, we re-used the estimates f₀ and ∇_θf₀ computed in the first step, and estimate the gradient ∇𝒯 of CLASS at the fiducial cosmological parameters ${\tilde{ω}}_{0}$ $\boldsymbol{\widetilde{\omega}}_0$ using fifth order finite differences. To perform the inference, we use the ELFIpackage (Lintusaari et al. 2018), which implements the variant of ABC-PMC proposed by Simola et al. (2021) and discussed in Section 2.3.2.

To reduce computational costs, in this study, we inferred only Ω_m, Ω_b, n_S, and σ₈ whilst fixing h to the ground truth given in Table 5. This choice was further motivated by the fact that our data model carries no additional information beyond the prior concerning the value of h. For practical purposes, we used nine distinct ABC-PMC populations of 64 particles each, aggregating the J = 576 final samples and corresponding weights to obtain a posterior distribution of (Ω_m, Ω_b, σ₈, n_S). Figure 12 shows the posteriors obtained using a kernel density estimate with the J weighted accepted samples. The shaded yellow area corresponds to 2σ credible contours on the prior. The posterior with the well-specified Model A is shown in green; it is unbiased for all cosmological parameters.

Fig. 12.

Prior 𝒫(ω) (yellow) and posterior $P (ω | {\tilde{ω}}_{O})$ $\mathcal{P}(\boldsymbol{\omega}|\boldsymbol{\widetilde{\omega}}_{\mathrm{O}})$ distributions of (Ω_m, Ω_b, σ₈, n_S), estimated using Population Monte Carlo based on the Fisher-Rao distance $d_{FR} (\tilde{ω}, {\tilde{ω}}_{O})$ $d_{\mathrm{FR}}(\boldsymbol{\widetilde{\omega}}, \boldsymbol{\widetilde{\omega}}_{\mathrm{O}})$ between simulated and observed summaries. The posterior with the well-specified Model A is represented by the green contours, whilst the red contours show the results with the misspecified Model B, both depicting the 1σ and 2σ credible regions. The percent-level misspecification in Model B introduces a bias exceeding 2σ in the (Ω_m, σ₈) plane. The one-dimensional marginal posterior distributions of the parameters are displayed on the diagonal, following the same colour scheme. The dashed line indicates the ground truth parameters ω_gt.

Replacing the accurate data Model A with the misspecified Model B for ABC-PMC yields the posterior in red. This substitution results in over-concentrating the posterior distributions of σ₈, Ω_m, and n_S, and introduces a bias exceeding 2σ in the (σ₈, Ω_m) plane. The bias in the marginal posterior of σ₈ is to be expected in light of the well-known degeneracy between σ₈ and galaxy bias parameters (Beutler et al. 2012; Arnalte-Mur et al. 2016; Desjacques et al. 2018; Repp & Szapudi 2020).

5. Discussion and conclusion

We have presented a method for thoroughly assessing individual or combined systematic effects in galaxy surveys using the initial matter power spectrum. We provide a practical guide for evaluating the impact of various systematic effects, demonstrated using a spectroscopic galaxy survey model through a quantitative assessment of the effect of misspecified linear galaxy biases, extinction near the galactic plane, selection functions, punctual contaminations, inaccurate redshifts, and imprecise gravity solver.

The method introduced in this study offers a Bayesian framework for addressing model misspecification in field-based implicit-likelihood cosmological inference. Notably, we have shown that even small, percent-level errors can significantly skew cosmological inferences derived from galaxy clustering probes. This was illustrated by a bias exceeding 2σ in the (σ₈, Ω_m) plane (Figure 12), which we were able to detect and correct (Figures 7 and 9) prior to inferring the cosmological parameters. Correct identification of systematic effects is crucial for accurately measuring primordial features of the Universe, such as non-Gaussianities–a major scientific goal of next-generation galaxy surveys (Andrews et al. 2023). Our method could serve as a critical tool for field-based ILI of these primordial features.

Our approach primarily centres on systematic effects that influence the initial matter power spectrum reconstruction. While we have assumed that the inferred spectrum serves as a reliable proxy for cosmological parameters, there may be systematic effects that do not affect the initial matter power spectrum yet still introduce biases in cosmological inferences. Such effects fall outside the current scope of this framework. In this paper, we assumed fixed deterministic linear galaxy bias parameters, but scale-dependent or non-linear biasmodels could be incorporated within the same framework. Addressing model misspecification for more sophisticated bias models would be particularly relevant for the inference of primordial non-Gaussianities. If a prior distribution of the bias values is available, they can be treated as additional nuisance parameters alongside ψ and marginalised over without altering the statistical framework presented here. Alternatively, they could be treated as additional input parameters alongside θ and inferred jointly with the initial matter power spectrum. These avenues remain for future investigation.

Potential extensions of the method include incorporating higher-order summary statistics from the full galaxy count fields into the inference (Gil-Marıin et al. 2015; Philcox & Ivanov 2022), which provide valuable information, for instance for constraining dark energy (Fazolo et al. 2022). Other options include using wavelet scattering transform coefficients (Régaldo-Saint Blancard et al. 2024), and additional cosmological probes such as peculiar velocity tracers (Prideaux-Ghee et al. 2023). The compression step could be improved by using information-maximising neural networks (Charnock et al. 2018; Makinen et al. 2024; Lanzieri et al. 2024), which complement physically motivated summary statistics. These beyond 2-pt approaches are particularly relevant for extracting the maximum amount of information from upcoming galaxy surveys and could reveal additional sources of systematic effects which should be accounted for in cosmological inferences based on these summaries (Beyond-2pt Collaboration 2024).

In conclusion, we have proposed a method to address the issue of model misspecification in implicit likelihood, field-based cosmological inferences. It enables a comprehensive investigation of both simulation- and observation-related systematic effects in galaxy surveys, leveraging both theoretical insights and prior knowledge of the initial matter power spectrum. We demonstrated the effectiveness of this method by quantifying the impact of various systematic effects on the inferred initial matter power spectrum. Furthermore, we emphasised the importance of using an accurate model for gravitational evolution within the hidden-box model: our results indicate that pure 2LPT is insufficient for the range of scales considered in this study, which is k ∈ [1.75×10⁻³, 4.47×10⁻¹] h/Mpc. Our framework can be used with any hidden-box model of galaxy surveys developed for cosmological inference pipelines, making it a versatile tool for addressing model misspecification in cosmological data analysis. It has the potential to significantly enhance the robustness of cosmological inferences from upcoming galaxy surveys such as DESI, Euclid, and LSST, for which the complexity of data models and the number of systematic effects will be substantial.

Data availability

This study relies on the pySELFI implementation of the SELFI (Leclercq et al. 2019) algorithm, which is publicly available at pyselfi.florent-leclercq.eu.. The gravitational evolution is performed using a modified version of the publicly available SIMBELMYNë cosmological solver (https://bitbucket.org/florent-leclercq/simbelmyne, Leclercq et al. 2015). The code and data underlying this paper, including the hidden-box model of galaxy surveys described in the article and routines to perform the inference using our framework in a cluster environment, are publicly available at github.com/hoellin/selfisys_public.

¹

If the likelihood of the model has an explicit form, explicit likelihood techniques may also be used for the second step.

Acknowledgments

We thank Chloé Barjou-Delayre, Guilhem Lavaux, Gary Mamon, Nhat-Minh Nguyen and Fabian Schmidt for useful discussions and feedback to enhance this manuscript. The work has made use of the Infinity Cluster hosted by the Institut d’Astrophysique de Paris. We appreciate the efforts of Stéphane Rouberol for running the cluster smoothly. FL and TH acknowledge financial support from the Agence Nationale de la Recherche (ANR) through the grant INFOCW, under reference ANR-23-CE46-0006-01. FL and TH acknowledge financial support from the Simons Collaboration on “Learning the Universe”. This work was done within the Aquila Consortium (https://www.aquila-consortium.org/).

References

Addison, G. E., Bennett, C. L., Jeong, D., Komatsu, E., & Weiland, J. L. 2019, ApJ, 879, 15 [NASA ADS] [CrossRef] [Google Scholar]
Albrecht, A., Bernstein, G., Cahn, R., et al. 2006, ArXiv e-prints [arXiv:astro-ph/0609591] [Google Scholar]
Alsing, J., & Wandelt, B. 2018, MNRAS, 476, L60 [NASA ADS] [CrossRef] [Google Scholar]
Alsing, J., Heavens, A., Jaffe, A. H., et al. 2016, MNRAS, 455, 4452 [Google Scholar]
Alsing, J., Wandelt, B., & Feeney, S. 2018, MNRAS, 477, 2874 [NASA ADS] [CrossRef] [Google Scholar]
Alsing, J., Charnock, T., Feeney, S., & Wandelt, B. 2019, MNRAS, 488, 4440 [Google Scholar]
Anderson, L., Aubourg, É., Bailey, S., et al. 2014, MNRAS, 441, 24 [Google Scholar]
Andrews, A., Jasche, J., Lavaux, G., & Schmidt, F. 2023, MNRAS, 520, 5746 [Google Scholar]
Arnalte-Mur, P., Vielva, P., Martínez, V. J., et al. 2016, JCAP, 2016, 005 [Google Scholar]
Ayçoberry, E., Ajani, V., Guinot, A., et al. 2023, A&A, 671, A17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Balkenhol, L., Dutcher, D., Spurio Mancini, A., et al. 2023, Phys. Rev. D, 108, 023510 [Google Scholar]
Bardeen, J. M., Bond, J. R., Kaiser, N., & Szalay, A. S. 1986, ApJ, 304, 15 [Google Scholar]
Barreira, A. 2020, JCAP, 2020, 031 [CrossRef] [Google Scholar]
Barreira, A., Cabass, G., Lozanov, K. D., & Schmidt, F. 2020, JCAP, 2020, 049 [Google Scholar]
Barreira, A., Lazeyras, T., & Schmidt, F. 2021, JCAP, 2021, 029 [CrossRef] [Google Scholar]
Bartlett, D. J., Ho, M., & Wandelt, B. D. 2024, ApJ, 977, L44 [Google Scholar]
Beaumont, M. A., Zhang, W., & Balding, D. J. 2002, Genetics, 162, 2025 [Google Scholar]
Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., & Robert, C. P. 2009, Biometrika, 96, 983 [CrossRef] [Google Scholar]
Berti, A. M., Dawson, K. S., & Dominguez, W. 2023, ApJ, 954, 131 [Google Scholar]
Beutler, F., Blake, C., Colless, M., et al. 2012, MNRAS, 423, 3430 [NASA ADS] [CrossRef] [Google Scholar]
Beyond-2pt Collaboration 2024, ArXiv e-prints [arXiv:2405.02252] [Google Scholar]
Blas, D., Lesgourgues, J., & Tram, T. 2011, JCAP, 2011, 034 [Google Scholar]
Blumenthal, G. R., Faber, S. M., Primack, J. R., & Rees, M. J. 1984, Nature, 311, 517 [Google Scholar]
Bouchet, F. R., Colombi, S., Hivon, E., & Juszkiewicz, R. 1995, A&A, 296, 575 [NASA ADS] [Google Scholar]
Boulanger, F., Abergel, A., Bernard, J. P., et al. 1996, A&A, 312, 256 [Google Scholar]
Bovy, J., Rix, H.-W., Green, G. M., Schlafly, E. F., & Finkbeiner, D. P. 2016, ApJ, 818, 130 [Google Scholar]
Brehmer, J., Louppe, G., Pavez, J., & Cranmer, K. 2018, ArXiv e-prints [arXiv:1805.12244] [Google Scholar]
Brout, D., & Riess, A. 2023, ArXiv e-prints [arXiv:2311.08253] [Google Scholar]
Campagne, J.-E., Lanusse, F., Zuntz, J., et al. 2023, Open J. Astrophys., 6, 15 [Google Scholar]
Carreres, B., Bautista, J. E., Feinstein, F., et al. 2023, A&A, 674, A197 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Charnock, T., Lavaux, G., & Wandelt, B. D. 2018, Phys. Rev. D, 97, 083004 [NASA ADS] [CrossRef] [Google Scholar]
Colless, M., Dalton, G., Maddox, S., et al. 2001, MNRAS, 328, 1039 [Google Scholar]
Corasaniti, P. S. 2006, MNRAS, 372, 191 [Google Scholar]
Cranmer, K., Brehmer, J., & Louppe, G. 2020, Proc. Natl. Acad. Sci., 117, 30055 [Google Scholar]
Davis, T. M., Hui, L., Frieman, J. A., et al. 2011, ApJ, 741, 67 [NASA ADS] [CrossRef] [Google Scholar]
DESI Collaboration 2016, ArXiv e-prints [arXiv:1611.00036] [Google Scholar]
Desjacques, V., Jeong, D., & Schmidt, F. 2018, Phys. Rep., 733, 1 [Google Scholar]
Ding, S., Lavaux, G., & Jasche, J. 2024, A&A, 690, A236 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Doeser, L., Jamieson, D., Stopyra, S., et al. 2024, MNRAS, 535, 1258 [Google Scholar]
Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. 1987, Phys. Lett. B, 195, 216 [CrossRef] [Google Scholar]
Eisenstein, D. 2015, APS Meeting Abstracts, 2015, Z2.001 [Google Scholar]
Eisenstein, D. J., & Hu, W. 1998, ApJ, 496, 605 [Google Scholar]
Eisenstein, D. J., Annis, J., Gunn, J. E., et al. 2001, AJ, 122, 2267 [Google Scholar]
Eisenstein, D. J., Zehavi, I., Hogg, D. W., et al. 2005, ApJ, 633, 560 [Google Scholar]
Euclid Collaboration (Blanchard, A., et al.) 2020, A&A, 642, A191 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Euclid Collaboration (Mellier, Y., et al.) 2025, A&A, 697, A1 [Google Scholar]
Fazolo, R. E., Amendola, L., & Velten, H. 2022, Phys. Rev. D, 105, 103521P [Google Scholar]
Frazier, D. T., Robert, C. P., & Rousseau, J. 2017, ArXiv e-prints [arXiv:1708.01974] [Google Scholar]
Galametz, A., Saglia, R., Paltani, S., Apostolakos, N., & Dubath, P. 2017, A&A, 598, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gavazzi, G., & Jaffe, W. 1986, ApJ, 310, 53 [Google Scholar]
Gil-Marıin, H., Noreña, J., Verde, L., et al. 2015, MNRAS, 451, 539 [Google Scholar]
Glanville, A., Howlett, C., & Davis, T. M. 2021, MNRAS, 503, 3510 [NASA ADS] [CrossRef] [Google Scholar]
Goldstein, S., Park, M., Raveri, M., Jain, B., & Samushia, L. 2023, Phys. Rev. D, 107, 063530 [Google Scholar]
Hahn, O., List, F., & Porqueres, N. 2024, JCAP, 2024, 063 [Google Scholar]
Hartlap, J., Simon, P., & Schneider, P. 2007, A&A, 464, 399 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Heavens, A. F., Jimenez, R., & Lahav, O. 2000, MNRAS, 317, 965 [NASA ADS] [CrossRef] [Google Scholar]
Ho, S., Agarwal, N., Myers, A. D., et al. 2015, JCAP, 2015, 040 [Google Scholar]
Ho, M., Bartlett, D. J., Chartier, N., et al. 2024, Open J. Astrophys., 7, 54 [NASA ADS] [CrossRef] [Google Scholar]
Hockney, R. W., & Eastwood, J. W. 1988, Computer Simulation Using Particles (Taylor& Francis Group) [Google Scholar]
Howlett, C., Ross, A. J., Samushia, L., Percival, W. J., & Manera, M. 2015, MNRAS, 449, 848 [NASA ADS] [CrossRef] [Google Scholar]
Huterer, D., Cunha, C. E., & Fang, W. 2013, MNRAS, 432, 2945 [Google Scholar]
Ishida, E. E., Vitenti, S. D., Penna-Lima, M., et al. 2015, Astron. Comput., 13, 1 [Google Scholar]
Ivezić, Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111 [Google Scholar]
Jasche, J., & Lavaux, G. 2017, A&A, 606, A37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Jasche, J., & Lavaux, G. 2019, A&A, 625, A64 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Jasche, J., Kitaura, F. S., Wandelt, B. D., & Enßlin, T. A. 2010, MNRAS, 406, 60 [NASA ADS] [CrossRef] [Google Scholar]
Jasche, J., & Wandelt, B. D. 2013, MNRAS, 432, 894 [Google Scholar]
Kaiser, N. 1984, ApJ, 284, L9 [NASA ADS] [CrossRef] [Google Scholar]
Karchev, K., Grayling, M., Boyd, B. M., et al. 2024, MNRAS, 530, 3881 [Google Scholar]
Kim, A. G., Linder, E. V., Miquel, R., & Mostek, N. 2004, MNRAS, 347, 909 [Google Scholar]
Kitching, T. D., Taylor, A. N., & Heavens, A. F. 2008, MNRAS, 389, 173 [Google Scholar]
Kitching, T. D., Verde, L., Heavens, A. F., & Jimenez, R. 2016, MNRAS, 459, 971 [Google Scholar]
Kostić, A., Nguyen, N.-M., Schmidt, F., & Reinecke, M. 2023, JCAP, 2023, 063 [Google Scholar]
Kullback, S., & Leibler, R. A. 1951, Ann. Math. Stat., 22, 79 [CrossRef] [Google Scholar]
Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]
Lanzieri, D., Zeghal, J., Makinen, T. L., et al. 2024, ArXiv e-prints [arXiv:2407.10877] [Google Scholar]
Leclercq, F. 2018, Phys. Rev. D, 98, 063511 [NASA ADS] [CrossRef] [Google Scholar]
Leclercq, F. 2022, Phys. Sci. Forum, 5, 4 [Google Scholar]
Leclercq, F., Enzi, W., Jasche, J., & Heavens, A. 2019, MNRAS, 490, 4237 [NASA ADS] [CrossRef] [Google Scholar]
Leclercq, F., Jasche, J., & Wandelt, B. 2015, JCAP, 2015, 015 [CrossRef] [Google Scholar]
Leistedt, B., & Peiris, H. V. 2014, MNRAS, 444, 2 [Google Scholar]
Lewis, A., & Challinor, A. 2011, Astrophysics Source Code Library [record ascl:1102.026] [Google Scholar]
Lintusaari, J., Gutmann, M. U., Dutta, R., Kaski, S., & Corander, J. 2017, Syst. Biol., 66, e66 [Google Scholar]
Lintusaari, J., Vuollekoski, H., Kangasrääsiö, A., et al. 2018, J. Mach. Learn. Res., 19, 1 [Google Scholar]
Liu, D. C., & Nocedal, J. 1989, Math. Progr., 45, 503 [CrossRef] [Google Scholar]
Loureiro, A., Whiteway, L., Sellentin, E., et al. 2023, Open J. Astrophys., 6, 6 [CrossRef] [Google Scholar]
LSST Dark Energy Science Collaboration 2012, ArXiv e-prints [arXiv:1211.0310] [Google Scholar]
Makinen, T. L., Charnock, T., Alsing, J., & Wandelt, B. D. 2021, JCAP, 2021, 049 [CrossRef] [Google Scholar]
Makinen, T. L., Heavens, A., Porqueres, N., et al. 2024, ArXiv e-prints [arXiv:2407.18909] [Google Scholar]
Massara, E., Ho, S., Hirata, C. M., et al. 2021, MNRAS, 508, 4193 [Google Scholar]
Meiksin, A., White, M., & Peacock, J. A. 1999, MNRAS, 304, 851 [NASA ADS] [CrossRef] [Google Scholar]
Ménard, B., Nestor, D., Turnshek, D., et al. 2008, MNRAS, 385, 1053 [Google Scholar]
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. 1953, J. Chem. Phys., 21, 1087 [Google Scholar]
Mishra-Sharma, S., Alonso, D., & Dunkley, J. 2018, Phys. Rev. D, 97, 123544 [Google Scholar]
Modi, C., Lanusse, F., & Seljak, U. 2021, Astron. Comput., 37, 100505 [NASA ADS] [CrossRef] [Google Scholar]
Mootoovaloo, A., Jaffe, A. H., Heavens, A. F., & Leclercq, F. 2022, Astron. Comput., 38, 100508 [NASA ADS] [CrossRef] [Google Scholar]
More, S., Bovy, J., & Hogg, D. W. 2009, ApJ, 696, 1727 [Google Scholar]
Moutarde, F., Alimi, J. M., Bouchet, F. R., Pellat, R., & Ramani, A. 1991, ApJ, 382, 377 [Google Scholar]
Neumann, V. 1951, Notes by GE Forsythe, 36 [Google Scholar]
Nguyen, N.-M., Schmidt, F., Lavaux, G., & Jasche, J. 2021, JCAP, 2021, 058 [Google Scholar]
Nguyen, N.-M., Schmidt, F., Tucci, B., Reinecke, M., & Kostić, A. 2024, Phys. Rev. Lett., 133, 221006 [Google Scholar]
Parzen, E. 1962, Ann. Math. Stat., 33, 1065 [Google Scholar]
Peebles, P. J. E. 1980, The Large-scale Structure of the Universe (Princeton: Princeton University Press) [Google Scholar]
Philcox, O. H. E., & Ivanov, M. M. 2022, Phys. Rev. D, 105, 043517 [CrossRef] [Google Scholar]
Planck Collaboration VI. 2020, A&A, 641, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Porqueres, N., Kodi Ramanah, D., Jasche, J., & Lavaux, G. 2019, A&A, 624, A115 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Porqueres, N., Heavens, A., Mortlock, D., & Lavaux, G. 2022, MNRAS, 509, 3194 [Google Scholar]
Prideaux-Ghee, J., Leclercq, F., Lavaux, G., Heavens, A., & Jasche, J. 2023, MNRAS, 518, 4191 [Google Scholar]
Pullen, A. R., Hirata, C. M., Doré, O., & Raccanelli, A. 2016, PASJ, 68, 12 [Google Scholar]
Ramanah, D. K., Lavaux, G., Jasche, J., & Wandelt, B. D. 2019, A&A, 621, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Régaldo-Saint Blancard, B., Hahn, C., Ho, S., et al. 2024, Phys. Rev. D, 109, 083535 [CrossRef] [Google Scholar]
Repp, A., & Szapudi, I. 2020, MNRAS, 498, L125 [NASA ADS] [CrossRef] [Google Scholar]
Rosenblatt, M. 1956, Ann. Math. Stat., 27, 832 [Google Scholar]
Ross, A. J., Percival, W. J., Carnero, A., et al. 2013, MNRAS, 428, 1116 [Google Scholar]
Said, K., Colless, M., Magoulas, C., Lucey, J. R., & Hudson, M. J. 2020, MNRAS, 497, 1275 [NASA ADS] [CrossRef] [Google Scholar]
Salvati, L., Douspis, M., & Aghanim, N. 2020, A&A, 643, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Sellentin, E., Loureiro, A., Whiteway, L., et al. 2023, Open J. Astrophys., 6, 31 [Google Scholar]
Simola, U., Cisewski-Kehe, J., Gutmann, M. U., & Corander, J. 2021, Bayesian Anal., 16, 397 [Google Scholar]
Spurio Mancini, A., Piras, D., Alsing, J., Joachimi, B., & Hobson, M. P. 2022, MNRAS, 511, 1771 [NASA ADS] [CrossRef] [Google Scholar]
Sunnåker, M., Busetto, A. G., Numminen, E., et al. 2013, PLoS Computational Biology, 9, e1002803 [Google Scholar]
Tassev, S., Zaldarriaga, M., & Eisenstein, D. J. 2013, JCAP, 2013, 036 [Google Scholar]
Taylor, P. L., Kitching, T. D., Alsing, J., et al. 2019, Phys. Rev. D, 100, 023519 [NASA ADS] [CrossRef] [Google Scholar]
Tegmark, M., Strauss, M. A., Blanton, M. R., et al. 2004, Phys. Rev. D, 69, 103501 [CrossRef] [Google Scholar]
Thomas, O., Sá-Leão, R., de Lencastre, H., et al. 2020, ArXiv e-prints [arXiv:2002.09377] [Google Scholar]
Tucci, B., & Schmidt, F. 2024, JCAP, 2024, 063 [Google Scholar]
Wandelt, B. D., Larson, D. L., & Lakshminarayanan, A. 2004, Phys. Rev. D, 70, 083511 [Google Scholar]
Wang, H., Mo, H. J., Yang, X., Jing, Y. P., & Lin, W. P. 2014, ApJ, 794, 94 [NASA ADS] [CrossRef] [Google Scholar]
Wang, H., Mo, H. J., Yang, X., et al. 2016, ApJ, 831, 164 [Google Scholar]
Weyant, A., Schafer, C., & Wood-Vasey, W. M. 2013, ApJ, 764, 116 [Google Scholar]
Zeghal, J., Lanzieri, D., Lanusse, F., et al. 2025, A&A, in press, https://doi.org/10.1051/0004-6361/202452410 [Google Scholar]
Zhou, A. J., Li, X., Dodelson, S., & Mandelbaum, R. 2024, Phys. Rev. D, 110, 023539 [Google Scholar]
Zhou, R., Dey, B., Newman, J. A., et al. 2023, AJ, 165, 58 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Convergence of the posterior

Considerations about the degrees of freedom in the estimated covariance matrix given by Equation (13) impose the required number of simulations at the expansion point to be at least N₀ ≥ P + 3 = 114. However, the convergence may require more simulations depending for instance on the observational noise level. We checked based on the evolution of the spectral and Frobenius norms of C₀ that sufficient convergence was reached after ≃450 simulations (see Figure A.1), and we fixed the number of simulations at the expansion point to linearise the hidden-box to N₀ = 500. We also verified that using in between ≃350 and ≃500 simulations yielded vanishing differences in the corresponding initial matter power spectrum posteriors.

Similarly, we ensured sufficient convergence for the gradients by looking at the evolution of (∇f₀)^⊺C₀⁻¹∇f₀ where C₀ was estimated using all the N₀ = 500 simulations available at the expansion point. We fixed N_S = 10 to obtain the SELFI posteriors presented in this article, and checked using Model A that restricting to any number of simulations between 2 and 30 per directional derivative yielded negligible difference in the posterior.

Fig. A.1.

Convergence history of the covariance. We used the correlation matrix rather than C₀ to compute the norms.

Appendix B: Effective posterior for the initial matter power spectrum when the prior mean differs from the expansion point

We generalise the derivation of the SELFI equations from Leclercq et al. (2019), to the case where the prior mean ${\hat{θ}}_{ω}$ $\boldsymbol{\hat{\theta}}_{\boldsymbol{\omega}}$ differs from the expansion point θ₀. The SELFI effective log-likelihood given by Equation (18) can be rewritten in canonical form as

$\begin{matrix} - 2 {\hat{ℓ}}_{θ}^{N_{0}} (θ) = log | 2 π C_{0} | + μ_{0}^{⊺} N_{0} μ_{0} - 2 μ_{0} θ + θ^{⊺} N_{0}^{- 1} θ, \end{matrix}$ $\begin{aligned} -2 \hat{\ell }^{N_0}_{\theta }(\boldsymbol{\theta }) = \log \left| 2\pi \mathbf C _0 \right| + \boldsymbol{\mu }_0^\intercal \mathbf N _0 \boldsymbol{\mu }_0 - 2\boldsymbol{\mu }_0 \boldsymbol{\theta } + \boldsymbol{\theta }^\intercal \mathbf N _0^{-1} \boldsymbol{\theta }, \end{aligned}$ (B.1)

with $μ_{0} \equiv N_{0}^{- 1} y_{0} = N_{0}^{- 1} θ_{0} + {(\nabla f_{0})}^{⊺} C_{0}^{- 1} (Φ_{O} - f_{0})$ $\boldsymbol{\mu}_0 \equiv \mathbf{N}_0^{-1} \mathbf{y}_0 = \mathbf{N}_0^{-1}\boldsymbol{\theta}_0 + (\nabla \mathbf{f}_0)^\intercal \mathbf{C}_0^{-1} (\boldsymbol{\Phi}_{\mathrm{O}}-\mathbf{f}_0)$ , where we have defined

$\begin{matrix} N_{0} & \equiv & {[{(\nabla f_{0})}^{⊺} C_{0}^{- 1} \nabla f_{0}]}^{- 1}, \\ y_{0} & \equiv & θ_{0} + {(\nabla f_{0})}^{- 1} \cdot (Φ_{O} - f_{0}) . \end{matrix}$ $\begin{aligned} \mathbf N _0&\equiv&\left[ (\nabla \mathbf f _0)^\intercal \mathbf C _0^{-1} \nabla \mathbf f _0 \right]^{-1},\nonumber \\ \mathbf y _0&\equiv&\boldsymbol{\theta }_0 + (\nabla \mathbf f _0)^{-1} \cdot (\boldsymbol{\Phi }_{\rm O}-\mathbf f _0). \end{aligned}$ (B.2)

Similarly, the prior given by Equation (7) can be written

$\begin{matrix} - 2 log P (θ) = log | 2 π S | + η^{⊺} S η - 2 η^{⊺} θ + θ^{⊺} S^{- 1} θ \end{matrix}$ $\begin{aligned} -2\log \mathcal{P} (\boldsymbol{\theta }) = \log \left| 2\pi \mathbf S \right| + \boldsymbol{\eta }^\intercal \mathbf S \boldsymbol{\eta } -2 \boldsymbol{\eta }^\intercal \boldsymbol{\theta } + \boldsymbol{\theta }^\intercal \mathbf S ^{-1} \boldsymbol{\theta } \end{aligned}$ (B.3)

in canonical form, where $η \equiv S^{- 1} {\hat{θ}}_{ω}$ $\boldsymbol{\eta} \equiv \mathbf{S}^{-1} \boldsymbol{\hat{\theta}}_{\boldsymbol{\omega}}$ . The effective posterior therefore verifies

$\begin{matrix} - 2 log P (θ | Φ_{O}) & = - 2 {\hat{ℓ}}_{θ}^{N_{0}} (θ) - 2 log P (θ) \\ = - 2 {(μ_{0} + η)}^{⊺} θ + θ^{⊺} (N_{0}^{- 1} + S^{- 1}) θ \\ + constant terms, \end{matrix}$ $\begin{aligned} -2\log \mathcal{P} (\boldsymbol{\theta }|\boldsymbol{\Phi }_{\rm O})&= -2 \hat{\ell }^{N_0}_{\theta }(\boldsymbol{\theta }) -2\log \mathcal{P} (\boldsymbol{\theta }) \nonumber \\&= -2(\boldsymbol{\mu }_0+\boldsymbol{\eta })^\intercal \boldsymbol{\theta } + \boldsymbol{\theta }^\intercal (\mathbf N _0^{-1} + \mathbf S ^{-1}) \boldsymbol{\theta }\nonumber \\&+ \mathrm{constant~terms}, \end{aligned}$ (B.4)

where we recognise the canonical form of a Gaussian distribution with covariance matrix

$\begin{matrix} Γ \equiv {(N_{0}^{- 1} + S^{- 1})}^{- 1} = {[{(\nabla f_{0})}^{⊺} C_{0}^{- 1} \nabla f_{0} + S^{- 1}]}^{- 1}, \end{matrix}$ $\begin{aligned} \boldsymbol{\Gamma } \equiv \left(\mathbf N _0^{-1} + \mathbf S ^{-1}\right)^{-1}= \left[ (\nabla \mathbf f _0)^\intercal \mathbf C _0^{-1} \nabla \mathbf f _0 + \mathbf S ^{-1} \right]^{-1}, \end{aligned}$ (B.5)

and mean

$\begin{matrix} γ & \equiv & Γ (μ_{0} + η) = Γ [{(\nabla f_{0})}^{⊺} C_{0}^{- 1} \nabla f_{0} θ_{0} + {(\nabla f_{0})}^{⊺} C_{0}^{- 1} \nabla f_{0} {(\nabla f_{0})}^{- 1} \\ \cdot (Φ_{O} - f_{0}) + S^{- 1} {\hat{θ}}_{ω}] \\ = Γ {[{(\nabla f_{0})}^{⊺} C_{0}^{- 1} \nabla f_{0} + S^{- 1}] θ_{0} - S^{- 1} θ_{0} \\ + {(\nabla f_{0})}^{⊺} C_{0}^{- 1} \cdot (Φ_{O} - f_{0}) + S^{- 1} {\hat{θ}}_{ω}} \\ = θ_{0} + Γ S^{- 1} Δ + Γ {(\nabla f_{0})}^{⊺} C_{0}^{- 1} \cdot (Φ_{O} - f_{0}), \end{matrix}$ $\begin{aligned} \boldsymbol{\gamma }&\equiv&\boldsymbol{\Gamma } (\boldsymbol{\mu }_0+\boldsymbol{\eta }) = \boldsymbol{\Gamma } \left[ (\nabla \mathbf f _0)^\intercal \mathbf C _0^{-1} \nabla \mathbf f _0\boldsymbol{\theta }_0 + (\nabla \mathbf f _0)^\intercal \mathbf C _0^{-1} \nabla \mathbf f _0 (\nabla \mathbf f _0)^{-1}\right. \nonumber \\&\left. \cdot (\boldsymbol{\Phi }_{\rm O}-\mathbf f _0) + \mathbf S ^{-1} \boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }} \right]\nonumber \\&= \boldsymbol{\Gamma } \left\{ \left[(\nabla \mathbf f _0)^\intercal \mathbf C _0^{-1} \nabla \mathbf f _0 + \mathbf S ^{-1}\right] \boldsymbol{\theta }_0 - \mathbf S ^{-1} \boldsymbol{\theta }_0 \right. \nonumber \\&\left. + (\nabla \mathbf f _0)^\intercal \mathbf C _0^{-1} \cdot (\boldsymbol{\Phi }_{\rm O}-\mathbf f _0) + \mathbf S ^{-1} \boldsymbol{\hat{\theta }}_{\boldsymbol{\omega }} \right\} \nonumber \\&= \boldsymbol{\theta }_0 + \boldsymbol{\Gamma } \mathbf S ^{-1} \boldsymbol{\Delta } + \boldsymbol{\Gamma } (\nabla \mathbf f _0)^\intercal \mathbf C _0^{-1} \cdot (\boldsymbol{\Phi }_{\rm O}-\mathbf f _0), \end{aligned}$ (B.6)

where $Δ \equiv (θ_{0} - {\hat{θ}}_{ω})$ $\boldsymbol{\Delta}\equiv \left(\boldsymbol{\theta}_0 - \boldsymbol{\hat{\theta}}_{\boldsymbol{\omega}}\right)$ , giving Equation (21).

Appendix C: Consistency checks

C.1. Unbiasedness of the posterior initial power spectrum

To consistently assess the unbiasedness of the method, we sampled ten different ground truth set of cosmological parameters from the prior, computed the corresponding observed vector $Φ_{O}$ $\boldsymbol\Phi_{\mathrm{O}}$ for each case, and confirmed both visually and quantitatively that the resulting SELFI posterior remained consistently unbiased. This demonstrates the robustness of the method with respect to the value of the ground truth cosmological parameters.

C.2. Inference using a wiggle-less prior

As a consistency check, we verified that SELFI successfully retrieves the BAOs within the set-up described in this article, even when the prior does not provide any information about the oscillations. We postulate a Gaussian prior given by

$\begin{matrix} - 2 log P (θ) \equiv log | 2 π S | + | | θ - θ_{0} | |_{S^{- 1}}^{2}, \end{matrix}$ $\begin{aligned} -2\log \mathcal{P} (\boldsymbol{\theta }) \equiv \log \left| 2\pi \mathbf S \right| + \left||\boldsymbol{\theta }-\boldsymbol{\theta }_0\right||^2_\mathbf{S ^{-1}}, \end{aligned}$ (C.1)

where the expansion point θ₀ and the covariance matrix S differ from those of Equation (7). Namely, we set the mean to θ₀ ≡ 1_ℝ^S, corresponding to the BBKS power spectrum for the fiducial cosmological parameters, and, following the procedure described by Leclercq et al. (2019), we define the prior covariance matrix as

$\begin{matrix} S \equiv θ_{norm}^{2} u u^{⊺} ° K, \end{matrix}$ $\begin{aligned} \mathbf S \equiv \theta _{\rm norm}^2 \, \mathbf u \mathbf u ^\intercal \circ \mathbf K , \end{aligned}$ (C.2)

where ° denotes the Hadamard product and K is a radial basis function defined by

$\begin{matrix} {(K)}_{s s^{'}} \equiv \exp [- \frac{1}{2} {(\frac{k_{s} - k_{s^{'}}}{k_{corr}})}^{2}], \end{matrix}$ $\begin{aligned} \left(\mathbf K \right)_{ss^{\prime }} \equiv \mathrm{exp} \left[ -\frac{1}{2} \left(\frac{k_s - k_{s^{\prime }}}{k_{\rm corr}}\right)^2 \right], \end{aligned}$ (C.3)

and u is a correction accounting for cosmic variance,

$\begin{matrix} {(u)}_{s} \equiv 1 + σ_{s} = 1 + \frac{α_{cv}}{k_{s}^{3 / 2}} . \end{matrix}$ $\begin{aligned} (\mathbf u )_s \equiv 1 + \sigma _s = 1+\frac{\alpha _{\rm cv}}{k_s^{3/2}}. \end{aligned}$ (C.4)

k_corr determines the length scale on which power spectrum amplitudes of different wave number correlate with each other. α_cv characterises the “strength” of cosmic variance given the effective volume considered and θ_norm² captures the overall amplitude of the covariance matrix S. The strength of cosmic variance within our simulation volume shall satisfy $α_{cv} = \sqrt{k^{3} / \tilde{N_{k}}}$ $\alpha_{\mathrm{cv}} = \sqrt{k^3/\tilde{N_k}}$ at all scales k, where $\tilde{N_{k}}$ $\tilde{N_k}$ is the effective number of modes at scale k, accounting for the window function. As in Leclercq et al. (2019), the hyperparameters k_corr and θ_norm are optimised to reproduce the shape of a fiducial wiggle vector θ_fid, using the effective posterior distribution obtained in Section 2.2.2 as the likelihood; that is,

$\begin{matrix} - 2 log P (k_{corr}, θ_{norm} | θ) \equiv log | 2 π Γ | + | | θ_{fid} - γ | |_{Γ^{- 1}}, \end{matrix}$ $\begin{aligned} -2\log \mathcal{P} (k_{\rm corr},\theta _{\rm norm}\vert \boldsymbol{\theta }) \equiv \log \left| 2\pi \boldsymbol{\Gamma } \right| + \left||\boldsymbol{\theta }_{\rm fid}-\boldsymbol{\gamma }\right||_{\boldsymbol{\Gamma }^{-1}}, \end{aligned}$ (C.5)

where γ and Γ inherit their dependence on k_corr and θ_norm through S. We use Gaussian hyperpriors on (k_corr, θ_norm), namely k_corr ∼ 𝒢(0.020, 0.015²) [h Mpc⁻¹] and θ_norm ∼ 𝒢(0.2, 0.3²). We estimated the maximum a posteriori using L-BFGS (Liu & Nocedal 1989) and found $k_{corr} = 0.012458 h {Mpc}^{- 1}$ $k_{\mathrm{corr}} = 0.012458\,h\mathrm{Mpc}^{-1}$ and θ_norm = 0.034743, values that were used to obtain the prior on θ corresponding to the shaded yellow area in Figure C.1.

Fig. C.1.

Wiggle-less prior and corresponding SELFI posterior on the initial matter power spectrum θ. The prior mean θ₀, also used as expansion point, is indicated by the yellow line; its 2σ credible region corresponds to the yellow-shaded area. The posterior mean γ and 2σ credible region correspond to the green line and shaded area. The ground truth θ_gt is indicated by the solid blue line. The dashed red line denotes the Nyquist frequency. Even though the prior contains no information about the BAOs, the SELFI posterior is consistently unbiased across all scales and accurately retrieves the oscillations up to k ≃ 0.2 h/Mpc.

The posterior obtained with the wiggle-less prior is in green in Figure C.1. It is consistent with the ground truth. Notably, it accurately retrieves the oscillations up to k ≃ 0.2 h/Mpc, even though the prior is completely agnostic to the BAOs, except through its hyperparameter optimisation.

Appendix D: Impact of linear galaxy bias parameters

The linear galaxy bias parameters have a quadratic scale-independent impact on the galaxy overdensity power spectrum. We show qualitatively that this is expected to hold true for the SELFI posterior, to some extent. Indeed, neglecting the term ΓS⁻¹Δ, which we measured to be smaller than 10⁻³ at all scales in our set-up, the SELFI filter equation (20) for the mean of the posterior on the matter power spectrum can be rewritten:

$\begin{matrix} γ = θ_{0} + Γ {(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} Φ_{O} - Γ {(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} f_{0} . \end{matrix}$ $\begin{aligned} \boldsymbol{\gamma } = \boldsymbol{\theta }_0 + \boldsymbol{\Gamma } \, (\boldsymbol{\nabla }_{\theta } \mathbf f _0)^\intercal \, \mathbf C _0^{-1}\boldsymbol{\Phi }_{\rm O}- \boldsymbol{\Gamma } \, (\boldsymbol{\nabla }_{\theta } \mathbf f _0)^\intercal \, \mathbf C _0^{-1}\mathbf f _0. \end{aligned}$ (D.1)

If we consider for simplicity the case where the overall noise level is σ = 0, for a fixed phase, P^g_i(k_r) becomes a deterministic function of b_{g_i} given by

$\begin{matrix} P^{g_{i}} (k_{r}) & \propto b_{g_{i}}^{2} \sum_{| k | = k_{r}} {| \hat{W_{i} δ^{z_{i}}} (k) |}^{2}, \end{matrix}$ $\begin{aligned} P^{\mathrm{g}_i}(k_r)&\propto b_{\mathrm{g}_i}^2\sum _{|\mathbf k |=k_r} \left| \widehat\mathbf{W _i\boldsymbol{\delta }^{z_i}}(k) \right|^2, \end{aligned}$

where $\hat{h}$ $\widehat{\mathbf{h}}$ denotes the discrete Fourier transform of h. Hence, in the absence of noise, a constant relative error on all the three squared linear galaxy biases has a scale-independent impact on any simulated summary Φ obtained with the hidden-box simulator. By linearity of the 𝔼 and ∂_θ operators, the relative impacts on f₀ and ∇_θf₀ are scale-independent as well and quadratic in b_{g_i}. Similarly, from Equation (13), C₀⁻¹ scales as b_{g_i}⁻⁴, so that, from Equation (21), Γ and therefore Γ (∇_θf₀)^⊺ C₀⁻¹f₀ are left unaffected and only the $Γ {(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} Φ_{O}$ $\boldsymbol{\Gamma} \, (\boldsymbol{\nabla}_{\theta} \mathbf{f}_0)^\intercal \, \mathbf{C}_0^{-1}\boldsymbol{\Phi}_{\mathrm{O}}$ term in Equation (D.1) is impacted by misspecified galaxy biases. If the misspecified biases are such that the relative error (b_i^mis)²/b_{g_i}² = ϵ is independent of the population g_i, then every component of the vector ${(\nabla_{θ} f_{0})}^{⊺} C_{0}^{- 1} Φ_{O}$ $(\boldsymbol{\nabla}_{\theta} \mathbf{f}_0)^\intercal \, \mathbf{C}_0^{-1}\boldsymbol{\Phi}_{\mathrm{O}}$ is affected by a factor ϵ, and the overall effect inherits its scale-dependence solely through the covariance matrix Γ. This is the behaviour observed on Figure 9a, where, for a misspecified bias, all the scales undergo about the same shift in number of standard deviations.

Appendix E: Implicit likelihood inference of cosmological parameters

E.1. Numerical set-up

For the ILI of cosmological parameters, we used the variant of ABC-PMC from Simola et al. (2021) implemented in the ELFI package (https://github.com/elfi-dev/elfi, Lintusaari et al. 2018), with minor technical modifications which do not affect the statistical method.

E.2. Population Monte Carlo hyperparameters

The Population Monte Carlo algorithm automatically selects a tolerance sequence based on the procedure described by Simola et al. (2021). It is not completely ϵ-free as one must choose a stopping rule and an initial acceptance threshold for the rejection sampler, the choice of the latter being crucial to prevent the algorithm from getting stuck in local overdense regions of the parameter space. In this study, we set the initial quantile used to determine the corresponding acceptance threshold for the rejection sampling stage to q₀ = 0.2.

All Tables

Table 1.

Main variables, their generic role in the statistical framework, and their physical interpretation in this study.

In the text

Table 2.

Fiducial cosmological parameters ω₀.

In the text

Table 3.

Linear galaxy bias parameters used in this study.

In the text

Table 4.

Values of the selection function parameters.

In the text

Table 5.

Ground truth cosmological parameters.

In the text

All Figures

Fig. 1.

Hierarchical representation of the Bayesian framework employed for diagnosing model misspecification and inferring the target parameters ω. The rounded green boxes denote probability distributions, whilst the purple squares represent deterministic functions. 𝒫(ω) is the prior on ω. 𝒯 is a deterministic function linking ω to θ. 𝒫(Φ|θ) denotes the probabilistic data model that maps the space of latent vectors θ to the survey space. The final layer, $\tilde{C}$ $\widetilde{\mathcal{C}}$ , is a deterministic compression step required for the ILI of the target parameters ω. In this study, ω is the vector of cosmological parameters, 𝒯 is derived from the Boltzmann solver CLASS, θ is the initial matter power spectrum (after recombination), and Φ corresponds to the power spectra of multiple galaxy number count fields.

In the text

Fig. 2.

Detailed representation of the BHM used in this study for the inference of cosmological parameters ω from galaxy surveys. The hidden-box model corresponds to the second dashed orange rectangle and defines the true (unknown) likelihood L(θ) when Φ = Φ_O. The variables are ω (the target cosmological parameters), θ (the initial matter power spectrum), ψ (the nuisance parameters), o (the cosmological parameters, formally treated as nuisance parameters within the hidden-box), d (the full data vector containing the full galaxy count fields), Φ (the summary statistics), and $\tilde{ω}$ $\widetilde{\boldsymbol{\omega}}$ (the compressed summaries). We provide further details about the roles and physical meanings of these variables in Table 1.

In the text

Fig. 3.

Illustration of the galaxy survey data model used in this study. (Upper and middle rows) Slices through a realisation of the dark matter overdensity fields in real space (top) and redshift space (middle). The fields are evolved in time using N-body simulations, starting from a grid of Gaussian initial dark matter density fluctuations with power spectrum θ_gt, up to the redshift indicated above the corresponding panels. (Lower row) Observed galaxy number count fields for the three galaxy populations, each defined on a grid of 512³ cells. Light-cone effects are accounted for by using one effective redshift per galaxy population. Each slice passes through the observer located at the (0, 0, 0) corner of the box.

In the text

Fig. 4.

Radial selection functions modelled as log-normal distributions in redshift (upper x axis) for the three mock galaxy populations. In the figure, the distributions are represented with respect to comoving distances (lower x axis) using the redshift-distance relation. The continuous lines represent the selection functions for the well-specified Model A, while the dotted lines represent the misspecified Model B, both defined by the values provided in Table 4. The small, percent-order difference between Models A and B renders the dotted curves nearly indistinguishable from the solid ones. The vertical dashed lines indicate the means of the log-normal distributions.

In the text

Fig. 5.

Survey mask for the well-specified Model A. We added extinction linearly decreasing from 100% at the galactic plane to 0% at 60° galactic latitude (dash-dotted orchid line), along with 256 holes of approximately 0.1-degree radius (white crosses). The observed octant is delineated by the light orchid triangle. The colour scale represents the completeness function $C (\hat{n})$ $C(\hat{\mathbf{n}})$ , with 0 indicating masked regions and 1 indicating unmasked regions, and any value in between indicating partial completeness. In the misspecified Model B, the same mask is used, except for the linear extinction which extends from 0 to 59° galactic latitude.

In the text

Fig. 6.

Data intervening in the computation of the SELFI posterior. (Left panels) Observed and simulated summary statistics for Model A (well-specified; upper panel) and B (misspecified; lower panel). For each population, the simulated summaries are shown in grey and their means are represented as dotted coloured lines. The shaded areas correspond to ±2σ around their mean. The solid black line corresponds to the observations Φ_O. The binning is indicated by the vertical dashed lines. Right panels: covariance matrices for Models A and B. For each (k, k′) entry, the colour scale represents the covariance between the k-th and k′-th modes. The diagonal blocs of the full covariance matrix correspond to the intra-population covariance; the extra-diagonal blocs correspond to the inter-populations covariances.

In the text

Fig. 7.

Prior and SELFI posteriors for the initial matter power spectrum θ given the observations Φ_O. The ground truth θ_gt is indicated by the dashed blue line. The prior and posterior means, θ₀, γ_A and γ_B, are represented respectively by the yellow, green and red lines, along with their 2σ credible regions (shaded yellow, green, and red areas). The vertical dashed lines indicate the support wave numbers for the initial matter power spectrum representation. The posterior obtained with the well-specified Model A is unbiased over all scales, whilst the posterior obtained with the misspecified Model B exhibits an excess and a lack of power at the largest and smallest scales, respectively.

In the text

	Fig. 8. Histogram of the Mahalanobis distances d_M(θ_n, θ₀\|S) between the prior and 5000 power spectra θ_n ≡ 𝒯(ω_n) sampled from Equation (6). The mean value is indicated by the vertical solid black line. The green and red lines indicate the distances from γ_A and γ_B to the prior, respectively.
In the text

Fig. 9.

Ensemble of SELFI posteriors for the initial matter power spectrum θ conditional on the observed data Φ_O for different observational models, used to diagnose systematic effects. All the posteriors are derived from a single, common set of N-body simulations. They make it possible to disentangle individual systematic contributions by comparing their differential impact across the range of wave numbers spanned by the power spectra. The colour scales indicate the relative error associated with the systematic effects considered in each sub-figure. The solid yellow line denotes the prior mean θ₀ and the ground truth θ_gt is indicated by the dashed blue line. The vertical dashed lines indicate the support wave numbers for the initial matter power spectrum representation. The 2σ credible regions for the prior and the posterior with the well-specified Model A correspond to the shaded yellow and green areas, respectively. The posterior means γ are represented by the coloured continuous lines for varying degrees of misspecification. For clarity, their corresponding credible regions have been omitted. (a) Impact of misspecified linear galaxy biases. (b) Impact of misspecified linear extinction. (c) Impact of misspecified redshifts. (d) Impact of misspecified selection function variances. (e) Impact of misspecified number of holes. (f) Impact of misspecified selection function variances (varying) under misspecified linear extinction (fixed).

In the text

	Fig. 10. Profiles of the well- and misspecified selection functions R_i for the three mock galaxy populations. The continuous lines represent the selection functions used in Model A. The dashed lines are derived by underestimating n_1, 2(z) by 10% whilst maintaining n_tot(z) constant.
In the text

Fig. 11.

Impact of the gravity solver parameters on the SELFI posterior. Panels (a, c, e) display the summary statistics Φ for the three galaxy count fields, comparing the well-specified Model A (dashed lines) with the misspecified model (solid lines). The lower parts of these panels show the relative error between the two models, with shaded regions corresponding to a 1% error margin. Panels (b, d, f) show the corresponding SELFI posteriors for the initial matter power spectrum θ, with the well-specified and misspecified models represented by green and indigo lines, respectively. (a) Example of summary statistics for a single realisation with n_steps = 10 (solid lines) as opposed to n_steps = 20 time steps for Model A (dashed lines) in the gravitational evolution with COLA. Although the direct error remains well below 1% across all scales, the inverse error on the SELFI posterior (b) reaches 2% at the largest scales, corresponding to a deviation greater than 1σ from the well-specified Model A. (c) Summary statistics using only the final snapshot at redshift z₁ = 0.15 for the three mock galaxy populations, thereby neglecting light cone effects (solid lines). The forward error reaches 74% at the largest scales for the most distant galaxy population, and the corresponding SELFI posterior (d) significantly rejects the ground truth across all scales. (e) Summary statistics using a pure 2LPT gravity model instead of N-body simulations to define the galaxy populations (solid lines). The forward error reaches ∼7% at the smallest scale, and the ground truth is significantly rejected by the SELFI posterior (f), which deviates by more than 2σ compared to Model A at the largest scales.

In the text

Fig. 12.

Prior 𝒫(ω) (yellow) and posterior $P (ω | {\tilde{ω}}_{O})$ $\mathcal{P}(\boldsymbol{\omega}|\boldsymbol{\widetilde{\omega}}_{\mathrm{O}})$ distributions of (Ω_m, Ω_b, σ₈, n_S), estimated using Population Monte Carlo based on the Fisher-Rao distance $d_{FR} (\tilde{ω}, {\tilde{ω}}_{O})$ $d_{\mathrm{FR}}(\boldsymbol{\widetilde{\omega}}, \boldsymbol{\widetilde{\omega}}_{\mathrm{O}})$ between simulated and observed summaries. The posterior with the well-specified Model A is represented by the green contours, whilst the red contours show the results with the misspecified Model B, both depicting the 1σ and 2σ credible regions. The percent-level misspecification in Model B introduces a bias exceeding 2σ in the (Ω_m, σ₈) plane. The one-dimensional marginal posterior distributions of the parameters are displayed on the diagonal, following the same colour scheme. The dashed line indicates the ground truth parameters ω_gt.

In the text

	Fig. A.1. Convergence history of the covariance. We used the correlation matrix rather than C₀ to compute the norms.
In the text

Fig. C.1.

Wiggle-less prior and corresponding SELFI posterior on the initial matter power spectrum θ. The prior mean θ₀, also used as expansion point, is indicated by the yellow line; its 2σ credible region corresponds to the yellow-shaded area. The posterior mean γ and 2σ credible region correspond to the green line and shaded area. The ground truth θ_gt is indicated by the solid blue line. The dashed red line denotes the Nyquist frequency. Even though the prior contains no information about the BAOs, the SELFI posterior is consistently unbiased across all scales and accurately retrieves the oscillations up to k ≃ 0.2 h/Mpc.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Addison, G. E., Bennett, C. L., Jeong, D., Komatsu, E., & Weiland, J. L. 2019, ApJ, 879, 15 [NASA ADS] [CrossRef] [Google Scholar]

[2] Albrecht, A., Bernstein, G., Cahn, R., et al. 2006, ArXiv e-prints [arXiv:astro-ph/0609591] [Google Scholar]

[3] Alsing, J., & Wandelt, B. 2018, MNRAS, 476, L60 [NASA ADS] [CrossRef] [Google Scholar]

[4] Alsing, J., Heavens, A., Jaffe, A. H., et al. 2016, MNRAS, 455, 4452 [Google Scholar]

[5] Alsing, J., Wandelt, B., & Feeney, S. 2018, MNRAS, 477, 2874 [NASA ADS] [CrossRef] [Google Scholar]

[6] Alsing, J., Charnock, T., Feeney, S., & Wandelt, B. 2019, MNRAS, 488, 4440 [Google Scholar]

[7] Anderson, L., Aubourg, É., Bailey, S., et al. 2014, MNRAS, 441, 24 [Google Scholar]

[8] Andrews, A., Jasche, J., Lavaux, G., & Schmidt, F. 2023, MNRAS, 520, 5746 [Google Scholar]

[9] Arnalte-Mur, P., Vielva, P., Martínez, V. J., et al. 2016, JCAP, 2016, 005 [Google Scholar]

[10] Ayçoberry, E., Ajani, V., Guinot, A., et al. 2023, A&A, 671, A17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[11] Balkenhol, L., Dutcher, D., Spurio Mancini, A., et al. 2023, Phys. Rev. D, 108, 023510 [Google Scholar]

[12] Bardeen, J. M., Bond, J. R., Kaiser, N., & Szalay, A. S. 1986, ApJ, 304, 15 [Google Scholar]

[13] Barreira, A. 2020, JCAP, 2020, 031 [CrossRef] [Google Scholar]

[14] Barreira, A., Cabass, G., Lozanov, K. D., & Schmidt, F. 2020, JCAP, 2020, 049 [Google Scholar]

[15] Barreira, A., Lazeyras, T., & Schmidt, F. 2021, JCAP, 2021, 029 [CrossRef] [Google Scholar]

[16] Bartlett, D. J., Ho, M., & Wandelt, B. D. 2024, ApJ, 977, L44 [Google Scholar]

[17] Beaumont, M. A., Zhang, W., & Balding, D. J. 2002, Genetics, 162, 2025 [Google Scholar]

[18] Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., & Robert, C. P. 2009, Biometrika, 96, 983 [CrossRef] [Google Scholar]

[19] Berti, A. M., Dawson, K. S., & Dominguez, W. 2023, ApJ, 954, 131 [Google Scholar]

[20] Beutler, F., Blake, C., Colless, M., et al. 2012, MNRAS, 423, 3430 [NASA ADS] [CrossRef] [Google Scholar]

[21] Beyond-2pt Collaboration 2024, ArXiv e-prints [arXiv:2405.02252] [Google Scholar]

[22] Blas, D., Lesgourgues, J., & Tram, T. 2011, JCAP, 2011, 034 [Google Scholar]

[23] Blumenthal, G. R., Faber, S. M., Primack, J. R., & Rees, M. J. 1984, Nature, 311, 517 [Google Scholar]

[24] Bouchet, F. R., Colombi, S., Hivon, E., & Juszkiewicz, R. 1995, A&A, 296, 575 [NASA ADS] [Google Scholar]

[25] Boulanger, F., Abergel, A., Bernard, J. P., et al. 1996, A&A, 312, 256 [Google Scholar]

[26] Bovy, J., Rix, H.-W., Green, G. M., Schlafly, E. F., & Finkbeiner, D. P. 2016, ApJ, 818, 130 [Google Scholar]

[27] Brehmer, J., Louppe, G., Pavez, J., & Cranmer, K. 2018, ArXiv e-prints [arXiv:1805.12244] [Google Scholar]

[28] Brout, D., & Riess, A. 2023, ArXiv e-prints [arXiv:2311.08253] [Google Scholar]

[29] Campagne, J.-E., Lanusse, F., Zuntz, J., et al. 2023, Open J. Astrophys., 6, 15 [Google Scholar]

[30] Carreres, B., Bautista, J. E., Feinstein, F., et al. 2023, A&A, 674, A197 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[31] Charnock, T., Lavaux, G., & Wandelt, B. D. 2018, Phys. Rev. D, 97, 083004 [NASA ADS] [CrossRef] [Google Scholar]

[32] Colless, M., Dalton, G., Maddox, S., et al. 2001, MNRAS, 328, 1039 [Google Scholar]

[33] Corasaniti, P. S. 2006, MNRAS, 372, 191 [Google Scholar]

[34] Cranmer, K., Brehmer, J., & Louppe, G. 2020, Proc. Natl. Acad. Sci., 117, 30055 [Google Scholar]

[35] Davis, T. M., Hui, L., Frieman, J. A., et al. 2011, ApJ, 741, 67 [NASA ADS] [CrossRef] [Google Scholar]

[36] DESI Collaboration 2016, ArXiv e-prints [arXiv:1611.00036] [Google Scholar]

[37] Desjacques, V., Jeong, D., & Schmidt, F. 2018, Phys. Rep., 733, 1 [Google Scholar]

[38] Ding, S., Lavaux, G., & Jasche, J. 2024, A&A, 690, A236 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[39] Doeser, L., Jamieson, D., Stopyra, S., et al. 2024, MNRAS, 535, 1258 [Google Scholar]

[40] Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. 1987, Phys. Lett. B, 195, 216 [CrossRef] [Google Scholar]

[41] Eisenstein, D. 2015, APS Meeting Abstracts, 2015, Z2.001 [Google Scholar]

[42] Eisenstein, D. J., & Hu, W. 1998, ApJ, 496, 605 [Google Scholar]

[43] Eisenstein, D. J., Annis, J., Gunn, J. E., et al. 2001, AJ, 122, 2267 [Google Scholar]

[44] Eisenstein, D. J., Zehavi, I., Hogg, D. W., et al. 2005, ApJ, 633, 560 [Google Scholar]

[45] Euclid Collaboration (Blanchard, A., et al.) 2020, A&A, 642, A191 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[46] Euclid Collaboration (Mellier, Y., et al.) 2025, A&A, 697, A1 [Google Scholar]

[47] Fazolo, R. E., Amendola, L., & Velten, H. 2022, Phys. Rev. D, 105, 103521P [Google Scholar]

[48] Frazier, D. T., Robert, C. P., & Rousseau, J. 2017, ArXiv e-prints [arXiv:1708.01974] [Google Scholar]

[49] Galametz, A., Saglia, R., Paltani, S., Apostolakos, N., & Dubath, P. 2017, A&A, 598, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[50] Gavazzi, G., & Jaffe, W. 1986, ApJ, 310, 53 [Google Scholar]

[51] Gil-Marıin, H., Noreña, J., Verde, L., et al. 2015, MNRAS, 451, 539 [Google Scholar]

[52] Glanville, A., Howlett, C., & Davis, T. M. 2021, MNRAS, 503, 3510 [NASA ADS] [CrossRef] [Google Scholar]

[53] Goldstein, S., Park, M., Raveri, M., Jain, B., & Samushia, L. 2023, Phys. Rev. D, 107, 063530 [Google Scholar]

[54] Hahn, O., List, F., & Porqueres, N. 2024, JCAP, 2024, 063 [Google Scholar]

[55] Hartlap, J., Simon, P., & Schneider, P. 2007, A&A, 464, 399 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[56] Heavens, A. F., Jimenez, R., & Lahav, O. 2000, MNRAS, 317, 965 [NASA ADS] [CrossRef] [Google Scholar]

[57] Ho, S., Agarwal, N., Myers, A. D., et al. 2015, JCAP, 2015, 040 [Google Scholar]

[58] Ho, M., Bartlett, D. J., Chartier, N., et al. 2024, Open J. Astrophys., 7, 54 [NASA ADS] [CrossRef] [Google Scholar]

[59] Hockney, R. W., & Eastwood, J. W. 1988, Computer Simulation Using Particles (Taylor& Francis Group) [Google Scholar]

[60] Howlett, C., Ross, A. J., Samushia, L., Percival, W. J., & Manera, M. 2015, MNRAS, 449, 848 [NASA ADS] [CrossRef] [Google Scholar]

[61] Huterer, D., Cunha, C. E., & Fang, W. 2013, MNRAS, 432, 2945 [Google Scholar]

[62] Ishida, E. E., Vitenti, S. D., Penna-Lima, M., et al. 2015, Astron. Comput., 13, 1 [Google Scholar]

[63] Ivezić, Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111 [Google Scholar]

[64] Jasche, J., & Lavaux, G. 2017, A&A, 606, A37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[65] Jasche, J., & Lavaux, G. 2019, A&A, 625, A64 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[66] Jasche, J., Kitaura, F. S., Wandelt, B. D., & Enßlin, T. A. 2010, MNRAS, 406, 60 [NASA ADS] [CrossRef] [Google Scholar]

[67] Jasche, J., & Wandelt, B. D. 2013, MNRAS, 432, 894 [Google Scholar]

[68] Kaiser, N. 1984, ApJ, 284, L9 [NASA ADS] [CrossRef] [Google Scholar]

[69] Karchev, K., Grayling, M., Boyd, B. M., et al. 2024, MNRAS, 530, 3881 [Google Scholar]

[70] Kim, A. G., Linder, E. V., Miquel, R., & Mostek, N. 2004, MNRAS, 347, 909 [Google Scholar]

[71] Kitching, T. D., Taylor, A. N., & Heavens, A. F. 2008, MNRAS, 389, 173 [Google Scholar]

[72] Kitching, T. D., Verde, L., Heavens, A. F., & Jimenez, R. 2016, MNRAS, 459, 971 [Google Scholar]

[73] Kostić, A., Nguyen, N.-M., Schmidt, F., & Reinecke, M. 2023, JCAP, 2023, 063 [Google Scholar]

[74] Kullback, S., & Leibler, R. A. 1951, Ann. Math. Stat., 22, 79 [CrossRef] [Google Scholar]

[75] Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]

[76] Lanzieri, D., Zeghal, J., Makinen, T. L., et al. 2024, ArXiv e-prints [arXiv:2407.10877] [Google Scholar]

[77] Leclercq, F. 2018, Phys. Rev. D, 98, 063511 [NASA ADS] [CrossRef] [Google Scholar]

[78] Leclercq, F. 2022, Phys. Sci. Forum, 5, 4 [Google Scholar]

[79] Leclercq, F., Enzi, W., Jasche, J., & Heavens, A. 2019, MNRAS, 490, 4237 [NASA ADS] [CrossRef] [Google Scholar]

[80] Leclercq, F., Jasche, J., & Wandelt, B. 2015, JCAP, 2015, 015 [CrossRef] [Google Scholar]

[81] Leistedt, B., & Peiris, H. V. 2014, MNRAS, 444, 2 [Google Scholar]

[82] Lewis, A., & Challinor, A. 2011, Astrophysics Source Code Library [record ascl:1102.026] [Google Scholar]

[83] Lintusaari, J., Gutmann, M. U., Dutta, R., Kaski, S., & Corander, J. 2017, Syst. Biol., 66, e66 [Google Scholar]

[84] Lintusaari, J., Vuollekoski, H., Kangasrääsiö, A., et al. 2018, J. Mach. Learn. Res., 19, 1 [Google Scholar]

[85] Liu, D. C., & Nocedal, J. 1989, Math. Progr., 45, 503 [CrossRef] [Google Scholar]

[86] Loureiro, A., Whiteway, L., Sellentin, E., et al. 2023, Open J. Astrophys., 6, 6 [CrossRef] [Google Scholar]

[87] LSST Dark Energy Science Collaboration 2012, ArXiv e-prints [arXiv:1211.0310] [Google Scholar]

[88] Makinen, T. L., Charnock, T., Alsing, J., & Wandelt, B. D. 2021, JCAP, 2021, 049 [CrossRef] [Google Scholar]

[89] Makinen, T. L., Heavens, A., Porqueres, N., et al. 2024, ArXiv e-prints [arXiv:2407.18909] [Google Scholar]

[90] Massara, E., Ho, S., Hirata, C. M., et al. 2021, MNRAS, 508, 4193 [Google Scholar]

[91] Meiksin, A., White, M., & Peacock, J. A. 1999, MNRAS, 304, 851 [NASA ADS] [CrossRef] [Google Scholar]

[92] Ménard, B., Nestor, D., Turnshek, D., et al. 2008, MNRAS, 385, 1053 [Google Scholar]

[93] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. 1953, J. Chem. Phys., 21, 1087 [Google Scholar]

[94] Mishra-Sharma, S., Alonso, D., & Dunkley, J. 2018, Phys. Rev. D, 97, 123544 [Google Scholar]

[95] Modi, C., Lanusse, F., & Seljak, U. 2021, Astron. Comput., 37, 100505 [NASA ADS] [CrossRef] [Google Scholar]

[96] Mootoovaloo, A., Jaffe, A. H., Heavens, A. F., & Leclercq, F. 2022, Astron. Comput., 38, 100508 [NASA ADS] [CrossRef] [Google Scholar]

[97] More, S., Bovy, J., & Hogg, D. W. 2009, ApJ, 696, 1727 [Google Scholar]

[98] Moutarde, F., Alimi, J. M., Bouchet, F. R., Pellat, R., & Ramani, A. 1991, ApJ, 382, 377 [Google Scholar]

[99] Neumann, V. 1951, Notes by GE Forsythe, 36 [Google Scholar]

[100] Nguyen, N.-M., Schmidt, F., Lavaux, G., & Jasche, J. 2021, JCAP, 2021, 058 [Google Scholar]

[101] Nguyen, N.-M., Schmidt, F., Tucci, B., Reinecke, M., & Kostić, A. 2024, Phys. Rev. Lett., 133, 221006 [Google Scholar]

[102] Parzen, E. 1962, Ann. Math. Stat., 33, 1065 [Google Scholar]

[103] Peebles, P. J. E. 1980, The Large-scale Structure of the Universe (Princeton: Princeton University Press) [Google Scholar]

[104] Philcox, O. H. E., & Ivanov, M. M. 2022, Phys. Rev. D, 105, 043517 [CrossRef] [Google Scholar]

[105] Planck Collaboration VI. 2020, A&A, 641, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[106] Porqueres, N., Kodi Ramanah, D., Jasche, J., & Lavaux, G. 2019, A&A, 624, A115 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[107] Porqueres, N., Heavens, A., Mortlock, D., & Lavaux, G. 2022, MNRAS, 509, 3194 [Google Scholar]

[108] Prideaux-Ghee, J., Leclercq, F., Lavaux, G., Heavens, A., & Jasche, J. 2023, MNRAS, 518, 4191 [Google Scholar]

[109] Pullen, A. R., Hirata, C. M., Doré, O., & Raccanelli, A. 2016, PASJ, 68, 12 [Google Scholar]

[110] Ramanah, D. K., Lavaux, G., Jasche, J., & Wandelt, B. D. 2019, A&A, 621, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[111] Régaldo-Saint Blancard, B., Hahn, C., Ho, S., et al. 2024, Phys. Rev. D, 109, 083535 [CrossRef] [Google Scholar]

[112] Repp, A., & Szapudi, I. 2020, MNRAS, 498, L125 [NASA ADS] [CrossRef] [Google Scholar]

[113] Rosenblatt, M. 1956, Ann. Math. Stat., 27, 832 [Google Scholar]

[114] Ross, A. J., Percival, W. J., Carnero, A., et al. 2013, MNRAS, 428, 1116 [Google Scholar]

[115] Said, K., Colless, M., Magoulas, C., Lucey, J. R., & Hudson, M. J. 2020, MNRAS, 497, 1275 [NASA ADS] [CrossRef] [Google Scholar]

[116] Salvati, L., Douspis, M., & Aghanim, N. 2020, A&A, 643, A20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[117] Sellentin, E., Loureiro, A., Whiteway, L., et al. 2023, Open J. Astrophys., 6, 31 [Google Scholar]

[118] Simola, U., Cisewski-Kehe, J., Gutmann, M. U., & Corander, J. 2021, Bayesian Anal., 16, 397 [Google Scholar]

[119] Spurio Mancini, A., Piras, D., Alsing, J., Joachimi, B., & Hobson, M. P. 2022, MNRAS, 511, 1771 [NASA ADS] [CrossRef] [Google Scholar]

[120] Sunnåker, M., Busetto, A. G., Numminen, E., et al. 2013, PLoS Computational Biology, 9, e1002803 [Google Scholar]

[121] Tassev, S., Zaldarriaga, M., & Eisenstein, D. J. 2013, JCAP, 2013, 036 [Google Scholar]

[122] Taylor, P. L., Kitching, T. D., Alsing, J., et al. 2019, Phys. Rev. D, 100, 023519 [NASA ADS] [CrossRef] [Google Scholar]

[123] Tegmark, M., Strauss, M. A., Blanton, M. R., et al. 2004, Phys. Rev. D, 69, 103501 [CrossRef] [Google Scholar]

[124] Thomas, O., Sá-Leão, R., de Lencastre, H., et al. 2020, ArXiv e-prints [arXiv:2002.09377] [Google Scholar]

[125] Tucci, B., & Schmidt, F. 2024, JCAP, 2024, 063 [Google Scholar]

[126] Wandelt, B. D., Larson, D. L., & Lakshminarayanan, A. 2004, Phys. Rev. D, 70, 083511 [Google Scholar]

[127] Wang, H., Mo, H. J., Yang, X., Jing, Y. P., & Lin, W. P. 2014, ApJ, 794, 94 [NASA ADS] [CrossRef] [Google Scholar]

[128] Wang, H., Mo, H. J., Yang, X., et al. 2016, ApJ, 831, 164 [Google Scholar]

[129] Weyant, A., Schafer, C., & Wood-Vasey, W. M. 2013, ApJ, 764, 116 [Google Scholar]

[130] Zeghal, J., Lanzieri, D., Lanusse, F., et al. 2025, A&A, in press, https://doi.org/10.1051/0004-6361/202452410 [Google Scholar]

[131] Zhou, A. J., Li, X., Dodelson, S., & Mandelbaum, R. 2024, Phys. Rev. D, 110, 023539 [Google Scholar]

[132] Zhou, R., Dey, B., Newman, J. A., et al. 2023, AJ, 165, 58 [NASA ADS] [CrossRef] [Google Scholar]