Issue 
A&A
Volume 657, January 2022



Article Number  L17  
Number of page(s)  9  
Section  Letters to the Editor  
DOI  https://doi.org/10.1051/00046361/202141706  
Published online  24 January 2022 
Letter to the Editor
Optimal machinedriven acquisition of future cosmological data
^{1}
MaxPlanckInstitut für Astrophysik, KarlSchwarzschildStr. 1, 85748 Garching, Germany
email: akostic@mpagarching.mpg.de
^{2}
Ludwig Maximilians University, GeschwisterSchollPlatz 1, 80539 München, Germany
^{3}
The Oskar Klein Centre for Cosmoparticle Physics, Department of Physics, Stockholm University, AlbaNova, Stockholm 106 91, Sweden
^{4}
DARK, Niels Bohr Institute, University of Copenhagen, Jagtvej 128, 2200 Copenhagen, Denmark
^{5}
Sorbonne Université, CNRS, UMR 7095, Institut d’Astrophysique de Paris, 98 bis bd Arago, 75014 Paris, France
Received:
2
July
2021
Accepted:
10
December
2021
We present a set of maps classifying regions of the sky according to their information gain potential as quantified by Fisher information. These maps can guide the optimal retrieval of relevant physical information with targeted cosmological searches. Specifically, we calculated the response of observed cosmic structures to perturbative changes in the cosmological model and we charted their respective contributions to Fisher information. Our physical forwardmodeling machinery transcends the limitations of contemporary analyses based on statistical summaries to yield detailed characterizations of individual 3D structures. We demonstrate this advantage using galaxy counts data and we showcase the potential of our approach by studying the information gain of the Coma cluster. We find that regions in the vicinity of the filaments and cluster core, where mass accretion ensues from gravitational infall, are the most informative with regard to our physical model of structure formation in the Universe. Hence, collecting data in those regions would be most optimal for testing our model predictions. The results presented in this work are the first of their kind to elucidate the inhomogeneous distribution of cosmological information in the Universe. This study paves a new way forward for the performance of efficient targeted searches for the fundamental physics of the Universe, where search strategies are progressively refined with new cosmological data sets within an active learning framework.
Key words: galaxies: statistics / cosmology: observations / methods: data analysis / methods: statistical / largescale structure of Universe
© A. Kostić et al. 2022
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Open Access funding provided by Max Planck Society.
1. Introduction
One of the most outstanding questions of astrophysical research asks where we might look next to find something new in the universe. This reflects the overall wish to confirm our current physical understanding of the world and find new evidence that could shift prevailing scientific paradigms. The latest advances in cosmological observations and data analysis now unlock the opportunity to perform machineaided targeted searches for cosmological physics across the Universe. This has become feasible due to the availability of largescale inferences that are informed by physics and causality of the cosmic largescale structures and their evolution with time.
Stateoftheart cosmological surveys gather scientific information from tracers of the cosmic largescale structures, such as galaxies, via largearea homogeneous scans of the sky. The historical origin of such search strategies is inherently related to the theoretical formulation of cosmology. Without any detailed knowledge about the actual spatial distribution of cosmic matter, theorists have resorted to predicting the mean value of statistical summaries, constituting ensemble averages over different realizations of the unknown cosmic matter distribution. Until today, most cosmological analyses have been performed on the basis of twopoint statistics (Tegmark 2004; Percival et al. 2007; Porredon et al. 2021), although there are ongoing efforts to go beyond two or threepoint analyses. The initial need to use statistical summaries was twofold. First, the calculation of detailed matter field realizations by evaluating gravitational structure growth via simulations was costly and complex. Therefore, cosmological perturbation theory provided an efficient alternative to make predictions on the average statistical properties of the cosmic structures (Buchert et al. 1994; Bouchet et al. 1995). Secondly, little was known about the particular realization of the cosmic matter distribution of our Universe. The best way forward was to test statements that would be true, on average, using the quantitative diagnostics provided by statistical summaries.
The above situation has witnessed a dramatic improvement in the era of modern cosmology. The abundance of cosmological observations and the availability of computational resources have now contributed significantly to constructing a much more refined picture of the actual spatial configuration of the cosmic structures beyond characterizations made via statistical summaries. In particular, we have proposed the Bayesian physical forwardmodeling framework (Jasche & Wandelt 2013; Lavaux & Jasche 2016; Jasche & Lavaux 2019) as a novel method enabling us to reconstruct the full 3D density field underlying observed galaxies in surveys with high fidelity. This machinery performs a causal inference that is informed by physics of the cosmic largescale structures, their initial conditions and dynamical evolution. By characterizing the full 3D density field, our inference framework exploits the information on the phase distribution and higherorder statistics in the data, which is inaccessible when carrying out traditional approaches. While analyses relying on the use of the information content contained in the phase distribution of the largescale structures have been proposed for a long time (see, for e.g., Chiang & Coles 2000; Byun et al. 2020), it was not propagated in the same manner as in the aforementioned physical forwardmodeling machinery. Therefore, the inference carried out in previous studies by Jasche & Wandelt (2013), Lavaux & Jasche (2016), Jasche & Lavaux (2019) transcends the limitations of contemporary analyses based on statistical summaries in order to yield detailed characterizations of individual 3D structures. Presently, apart from our framework, several research groups are developing the technology to perform full 3D characterizations of the cosmic structures probed by galaxy surveys (Wang et al. 2014; Modi et al. 2018; Kitaura et al. 2021; Porqueres et al. 2020, 2021).
So far, most of the work in the literature is focused on constraining cosmological physics from existing data sets, such as observations of the cosmic microwave background (CMB) (see, for e.g., Planck Collaboration VI 2020, and references therein) and largescale structure (e.g., Einasto et al. 2011; Byun et al. 2020; Porredon et al. 2021). Here, we want to go beyond the standard task of constraining models to answering the question of how to optimally acquire future data that will be most informative to update our cosmological knowledge and to study physics. In doing so, we use existing information on cosmic structures to identify regions in the sky that promise the highest discovery potential as quantified by the Fisher information. Given the advent of new data analysis technologies, we think that now it is especially timely to start a debate on whether the standard cosmological search strategy should be revised.
We note that some past studies aimed at understanding if certain regions of the sky are particularly informative with regard to cosmology (e.g., Mukherjee & Wandelt 2018) or if one could use such information to design better survey geometries in the future (Bassett 2005). The method proposed here goes beyond these studies by using the full forward model of the 3D matter field for quantifying the information content, weighted by the posterior of plausible realizations of the largescale structures in the Universe.
In summary, we would like to address the question of whether it is true that a homogeneous scan of the sky is the optimal search strategy for testing fundamental physics with cosmological surveys. We also consider whether there are some particular regions in the sky that are more informative than others with respect to specific research questions and whether it is possible to answer certain research questions faster with small, cheap, and targeted searches. These questions are relevant since, in turn, they raise questions on the optimal use of scientific resources. For example, two upcoming largescale cosmological surveys, namely, the Vera Rubin Observatory (Ivezić et al. 2019) and Euclid (Racca et al. 2016), constitute a total financial investment of over a billion dollars. As such, to optimize the scientific returns of these missions, we must ensure that the limited survey resources are adequately managed and optimally utilized.
With the ideas outlined in this work, we wish to trigger a discussion in the cosmological community on whether scientific progress in terms of information gain on fundamental physics can be sped up by using more refined cosmological searches, akin to active learning strategies. We also hope to initiate new technological developments for targeted searches of cosmological physics. To make advances in this direction, we address the question of what parts of the Universe ought to be observed to optimally retrieve information about an area of interest for a particular research focus. Specifically, in this work, we are interested in identifying the regions in the Universe that are most relevant to obtain new information about the cosmological parameters underlying our standard model of cosmology. Nonetheless, our approach is equally applicable to other research questions of interest.
We present, for the first time, maps detailing the expected information gain provided by cosmic structures of the nearby Universe on cosmological parameters, based on reconstructions of the 2M++ galaxy catalog (Lavaux & Hudson 2011). Our proposed methodology is not limited to the nearby Universe, however, and is indeed applicable to any existing cosmological data sets.
2. Results
In our quest for mapping out the cosmologically sensitive largescale structures in the sky, we adopted the Fisher information (Fisher 1925) methodology to provide a quantitative measure of the information encoded by the cosmological parameters on the observed largescale structures. Although computing the Fisher information in this context is a highly nontrivial task, this ambitious undertaking is rendered feasible by employing a physical forwardmodeling machinery, such as our BORG algorithm (Jasche & Wandelt 2013; Jasche & Lavaux 2019), which constitutes a causal model of structure formation with a fully nonlinear treatment of the dark matter clustering. It allows for making a connection between the cosmic initial conditions and the observed galaxy distribution. A schematic view of the distinct components of the BORG forward model is illustrated in Fig. 1. In our study, we marginalize over the ensemble of plausible realizations of the 3D primordial matter fluctuations of the very early Universe, as inferred by BORG via a hierarchical Bayesian statistical inference framework, from the 2M++ galaxy catalog (Lavaux & Hudson 2011) that traces the matter distribution of the nearby Universe. We refer the reader to the appendices for more detailed information pertaining to the BORG algorithm, 2M++ galaxy catalog, mathematical formalism underlying the Fisher information map, and numerical implementation thereof. We stress that our approach for deriving the Fisher information map, based on the constrained realizations of cosmic structures conditional on galaxy observations, bears a stark contrast to the standard Fisher analyses for obtaining forecasts on cosmological constraints from forthcoming galaxy surveys.
Fig. 1. Schematic of the Bayesian physical forwardmodeling framework of BORG, which solves a largescale Bayesian inverse problem by fitting a dynamical structure formation model to galaxy observations and subsequently inferring the primordial initial conditions (ICs) that lead to the formation of the presently observed cosmic structures via gravitational evolution. The BORG forwardmodeling approach naturally marginalizes over unknown galaxy bias and accounts for all relevant physical effects, such as redshift space distortions (RSDs) resulting from the peculiar velocities of galaxies, as well as instrumental selection effects. 
The availability of such inferred primordial matter fluctuations enables us to test the causal sensitivity of cosmic structures when forwardmodeled with perturbed cosmological parameters. To illustrate this possibility, in Fig. 2, we illustrate the response of cosmic structures in the Universe to perturbative changes in six parameters of the socalled concordance ΛCDM cosmological model, namely, the matter density, Ω_{m}, baryon density, Ω_{b}, cosmic curvature, Ω_{k}, Hubble constant, h, amplitude of matter fluctuations, σ_{8}, and scalar spectral index, n_{s}, of the primordial power spectrum. This entails computing the gradient of the 3D galaxy field with respect to the cosmological parameters. This gradient quantifies the sensitivity of the galaxy distribution to changes in the cosmological parameters. As such, the gradient is described by a 6D vector for each volume element in the 3D grid. We computed the gradient using finite differencing by executing BORG forward model evaluations on the ensemble of data constrained realizations of the cosmic initial conditions from the BORG 2M++ analysis (Jasche & Lavaux 2019), while varying the cosmological parameters about their corresponding fiducial values, as given by the latest bestfit values from the Planck Collaboration (Planck Collaboration VI 2020).
Fig. 2. Components of the gradient of the matter density field with respect to the cosmological parameters for a spherical slice of thickness ∼2.65 h^{−1} Mpc at a comoving distance of 100 h^{−1} Mpc from the observer, with the corresponding galaxies from the 2M++ catalog denoted via black dots. Visually, we find that the baryon density Ω_{b} has the largest influence on the spatial distribution of the cosmic structures, with the regions surrounding the filamentary galaxy distribution being particularly sensitive to changes in the baryon density. Conversely, the least significant response emanates from the cosmic curvature Ω_{k}, with only the vicinity of the dense galaxy clusters reacting to changes in the geometry of the Universe. 
In Fig. 2, we visualize the individual gradient components of the six cosmological parameters by representing them as “cosmological sensitivity maps”. To this end, we consider a spherical slice of thickness ∼2.65 h^{−1} Mpc at a comoving distance of 100 h^{−1} Mpc from the observer that is projected onto a HEALPIX map (Górski et al. 2005). The observed galaxies from the 2M++ catalog lying in the projected spherical slice are also indicated. From a visual comparison, we find that the density of baryonic matter, as characterized by the Ω_{b} parameter, induces the most significant response in the observed distribution of cosmic structures, with the Ω_{m} and σ_{8} parameters also having a notable influence. The astrophysical regions surrounding the filamentary patterns traced by the galaxies respond effectively to changes in the latter cosmological parameters. In contrast, the cosmic matter distribution displays a relatively minimal sensitivity to the Ω_{k} parameter, with the most noticeable effect of the cosmic curvature on the largescale structures being on the densest regions, such as the galaxy clusters. The underdense regions of the matter distribution, the socalled cosmic voids, are most impacted by changes in the Ω_{b} and σ_{8} parameters. We clarify that our structure formation model does not account for baryonic physics, so that Ω_{b} only modifies the amplitude of growth of matter perturbations and the shape of the linear power spectrum. Nevertheless, it is straightforward to repeat our analysis with hydrodynamical simulations to properly account for the presence of baryons. Given the significantly higher computational cost of such simulations, one possible option would be to use physics emulators, which have recently been developed on a large scale (VillaescusaNavarro et al. 2021).
In addition to displaying the relative strength of the causal response of the cosmic structures to changes in our underlying cosmological model, the array of sensitivity maps in Fig. 2 reveals some interesting features. The σ_{8} map, for example, illustrates how the clustering of matter occurs at the expense of the neighbouring regions, while the h map shows that a modified Hubble flow yields different clustering features. Moreover, the Ω_{m} and Ω_{b} maps display distinct anti correlated signatures. This may be attributed to the fact that the sound horizon distance scale due to baryon acoustic oscillations (BAOs) increases with both Ω_{m} and Ω_{b}, with the effect induced by Ω_{b} being much stronger, such that a fixed BAO scale encoded in the data yields this anti correlation between these two maps. Similarly, the Ω_{m} and σ_{8} maps depict a striking anti correlation. This can be naturally explained on the basis that the primary constraining power emanates from the combination of the growth rate, f, of cosmological perturbations and σ_{8}, where for the ΛCDM model.
A crucial ingredient in the Fisher information formalism derived in our study, as described by Eq. (C.12), lies in computing the above gradient. The desired 3D Fisher information field, obtained using Eq. (C.12), can be represented as a “Fisher information map” in the same way as the cosmological sensitivity maps. This Fisher information map represents the combined information gain on all six cosmological parameters considered in this study. The Fisher information map for a spherical slice of thickness ∼2.65 h^{−1} Mpc at a distance of 100 h^{−1} Mpc from the observer is displayed in Fig. 3, along with the observed galaxy distribution in this particular slice. The Fisher information map indicates that the regions with the highest information gain are those in the vicinity of the filamentary distribution traced by galaxies. One plausible interpretation is that the regions of gravitational infall, where there is an accumulation of matter, are the most informative with regard to our cosmological model. This is particularly interesting as these infalling regions are still relatively obscure and must be properly understood to make further progress.
Fig. 3. Fisher information map for the same spherical slice as in Fig. 2. The observed galaxy distribution from the 2M++ catalog, lying in the corresponding spherical shell centered around the observer, is represented by the red dots. We find that the regions in the vicinity of the massive cosmic structures, as traced by the galaxy distribution, are the most informative according to the Fisher information map. These regions correspond to the regime of gravitational infall of the galaxy clusters. 
The BORG analysis of the 2M++ galaxy catalog showcased the capacity of the physical forwardmodeling machinery to resolve the key features of prominent cosmic structures in the present Universe (Jasche & Lavaux 2019). The inferred mass profile of the Coma cluster, in particular, was found to be in remarkable agreement with stateoftheart weak lensing measurements. Therefore, it is also interesting to study the characteristic features underlying the source of cosmological information for this wellknown cluster. Figure 4 displays the Fisher information map of the Coma cluster and its corresponding mass density. We note that the figure depicts the central slice of thickness ∼5 h^{−1} Mpc through a 3D patch extending over 40 h^{−1} Mpc. The features present in the Fisher map of the Coma cluster are in accordance with those from the sky projected Fisher map from Fig. 3, thereby supporting the interpretation that the information that can be gleaned from the regions surrounding the filaments, namely, the gravitationally infalling regions of the cluster, provides the most significant information gain. Recent studies relying on conceptually distinct methodologies have also indicated that accretion filaments and the surroundings of voids are highly sensitive to predictions of dark energy (Leclercq et al. 2016) and gravity (Lam et al. 2012) models. In contrast, the central core of the cluster provides a relatively lower information gain as quantified by Fisher information, with the regions devoid of matter displaying insubstantial information. A key point worth stressing here is that our relatively simple physics informed algorithm was capable of pinning down potential regions of cosmological interest in the sky.
Fig. 4. Fisher information map of the Coma cluster, with its corresponding mass density overlaid as a contour, for the central slice of thickness ∼5 h^{−1} Mpc through a 3D region centered on the cluster that extends over 40 h^{−1} Mpc. According to the Fisher information map, the central region containing the core of the cluster encodes fairly limited information gain, whilst the peripheral regions of the filaments and cluster core, where mass accretion occurs via gravitational infall, constitute the greatest proportion of information gain. 
Now, it is possible to use the maps displayed in Figs. 3 and 4 as a guide for where to look in order to optimally collect data for testing our physical model of structure formation. For the particular case demonstrated here, the data consists of galaxy counts and the corresponding information gain predictions are geared at the galaxy clustering data. The idea behind the targeted search approach is to recursively search for galaxies in the high information gain regions, as quantified by the Fisher map, and then to use this newly acquired data to constrain the model parameters and repeat the procedure. This is schematically depicted in Fig. 5. The justification pertaining to why this search strategy is optimal is further elaborated in Appendix D. It should be understood that this approach can be similarly extended to other observables for constructing the corresponding information gain maps.
Fig. 5. Flowchart of the targeted search approach. The core idea is to use the existing knowledge as quantified by the posterior p(θ, ϕd) in order to calculate the Fisher map marginalized over this posterior, ⟨ℐ(θd_{0})⟩_{(ϕθ, d0)}, while keeping the observables of interest. This allows this Fisher map to provide us with regions of the sky with highest information gain potential for acquiring new data that are optimal for testing our model predictions, ⟨ℐ(θd_{0})⟩_{(ϕθ, d0)}: d_{0} → d. This data will in turn be used to update our knowledge about the model through updating the posterior and the procedure would repeat until the information content is fully depleted. 
3. Discussion
The results presented in this work demonstrate the feasibility of machineaided targeted searches for cosmological physics signals. These have become feasible through recent developments of physicsinformed causal inference frameworks to study the 3D cosmic largescale structures, their origin, and evolution over time. We used 3D initial conditions, inferred with the BORG algorithm, and a physics simulator of cosmological structure formation to chart the response of observed structures in the Universe and their corresponding information gain with respect to cosmological parameters. Our results are the first of their kind and elucidate the inhomogeneous distribution of cosmological information in the Universe. This study paves a new way forward to perform efficient targeted searches for the fundamental physics of the Universe, where search strategies are progressively refined with new cosmological data sets within an active learning framework.
We have further illuminated the response of the cosmic largescale structures with respect to individual cosmological parameters. These results suggest that different features of the cosmic structures respond differently to perturbations in the physics determined by the cosmological parameters. For instance, we find that the vicinity of the filamentary cosmic structures, corresponding to gravitationally infalling regions, are highly sensitive to changes in the baryon density, with the cosmic curvature impacting only the surroundings of the dense galaxy clusters. Cosmic voids are primarily affected by the amplitude of matter fluctuations and the baryon density. Our results demonstrate the value of going beyond stateoftheart analyses of the cosmic largescale structures that are limited to summary statistics and ignore this richness of the 3D cosmic structures. Even though we consider one particular observable, namely galaxy counts, in this study, our proposed framework can be seamlessly applied to other tracers, such as Lymanα forest (Porqueres et al. 2019a, 2020), which would bear complementary information.
Our findings further suggest that optimal targeted searches for research questions have become feasible. Given a specific research question, our approach enables us to propose targets for optimal information retrieval. This raises the question if traditional survey strategies should be revised and if significant scientific progress could also be driven by targeted searches with smaller, cheaper, and faster instrumentation. We hope that our contribution will trigger a discussion and new technological advances in the field.
While we studied the influence of cosmological parameters on the Universe in this work, our proof of concept reveals the potentially farreaching and groundbreaking implications when considering the physical effects induced by modified gravity (Koyama 2016), dynamical dark energy (Zhao et al. 2017), and exotic dark matter models, such as selfinteracting (Carlson et al. 1992) and fuzzy dark matter (Hu et al. 2000) or massive neutrinos (VillaescusaNavarro et al. 2014), on the cosmic largescale structures. Once we identify the astrophysical region(s) of interest, based on the Fisher information map, for a particular model, we may subsequently proceed by computing accurate theoretical predictions for the spatial distribution of matter and luminous tracers for the given model via cosmological Nbody or hydrodynamical simulations with extremely high resolution, thereby providing highly detailed physical features. The final step would then entail observing the relevant region(s) and comparing the theoretical predictions with the galaxy observations in a likelihood or posterior analysis. In essence, we are proposing a novel way of doing science that optimizes scarce resources for maximal scientific returns via an efficient observational strategy.
Acknowledgments
We would like to express our appreciation to Fabian Schmidt, Benjamin Wandelt, Eleni Tsaprazi, Natalia Porqueres, Minh Nguyen, Radek Wojtak, Florent Leclercq and Harry Desmond for their remarks on our manuscript. A.K. acknowledges support from the Starting Grant (ERC2015STG 678652) “GrInflaGal” of the European Research Council at MPA. J.J. acknowledges support by the Swedish Research Council (VR) under the project 202005143 – “Deciphering the Dynamics of Cosmic Structure”. DKR is a DARK fellow supported by a Semper Ardens grant from the Carlsberg Foundation (reference CF150384). This work was supported by the ANR BIG4 project, grant ANR16CE230002 of the French Agence Nationale de la Recherche. This work has made use of the Horizon Cluster hosted by Institut d’Astrophysique de Paris. This work has been done within the activities of the Domaine d’Intérêt Majeur (DIM) Astrophysique et Conditions d’Apparition de la Vie (ACAV), and received financial support from Région IledeFrance. We thank Cepheid Studio (https://www.cepheidstudio.com) for providing us with the telescope illustration used in Fig. 1. This work is done within the Aquila Consortium (https://aquilaconsortium.org). Author contributions. A.K.: led the project; methodology; software; obtained, validated, and interpreted results; writing – editing. J.J.: project conceptualization; methodology; validation and interpretation; supervision; writing  editing. D.K.R.: methodology; visualization; validation and interpretation, writing – editing. G.L.: methodology; validation and interpretation; supervision; provided resources; funding acquisition, editing.
References
 Abazajian, K. N., AdelmanMcCarthy, J. K., Agüeros, M. A., et al. 2009, ApJS, 182, 543 [Google Scholar]
 Bartlett, D. J., Desmond, H., & Ferreira, P. G. 2021, Phys. Rev. D, 103, 023523 [NASA ADS] [CrossRef] [Google Scholar]
 Bassett, B. A. 2005, Phys. Rev. D, 71, 083517 [NASA ADS] [CrossRef] [Google Scholar]
 Bouchet, F. R., Colombi, S., Hivon, E., & Juszkiewicz, R. 1995, A&A, 296, 575 [NASA ADS] [Google Scholar]
 Buchert, T., Melott, A. L., & Weiss, A. G. 1994, A&A, 288, 349 [NASA ADS] [Google Scholar]
 Byun, J., Franco, F. O., Howlett, C., Bonvin, C., & Obreschkow, D. 2020, MNRAS, 497, 1765 [NASA ADS] [CrossRef] [Google Scholar]
 Carlson, E. D., Machacek, M. E., & Hall, L. J. 1992, ApJ, 398, 43 [CrossRef] [Google Scholar]
 Charnock, T., Lavaux, G., Wandelt, B. D., et al. 2020, MNRAS, 494, 50 [NASA ADS] [CrossRef] [Google Scholar]
 Chiang, L.Y., & Coles, P. 2000, MNRAS, 311, 809 [NASA ADS] [CrossRef] [Google Scholar]
 Desmond, H., & Ferreira, P. G. 2020, Phys. Rev. D, 102, 104060 [NASA ADS] [CrossRef] [Google Scholar]
 Desmond, H., Ferreira, P. G., Lavaux, G., & Jasche, J. 2018, Phys. Rev. D, 98, 064015 [NASA ADS] [CrossRef] [Google Scholar]
 Desmond, H., Ferreira, P. G., Lavaux, G., & Jasche, J. 2019, MNRAS, 483, L64 [NASA ADS] [CrossRef] [Google Scholar]
 Einasto, J., Hütsi, G., Saar, E., et al. 2011, A&A, 531, A75 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Elsner, F., Schmidt, F., Jasche, J., Lavaux, G., & Nguyen, N.M. 2020, JCAP, 2020, 029 [Google Scholar]
 Fisher, R. A. 1925, Theory of Statistical Estimation (Cambridge University Press) [Google Scholar]
 Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
 Hu, W., Barkana, R., & Gruzinov, A. 2000, Phys. Rev. Lett., 85, 1158 [NASA ADS] [CrossRef] [Google Scholar]
 Huchra, J. P., Macri, L. M., Masters, K. L., et al. 2012, ApJS, 199, 26 [Google Scholar]
 Ivezić, Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111 [Google Scholar]
 Jasche, J., & Kitaura, F. S. 2010, MNRAS, 407, 29 [NASA ADS] [CrossRef] [Google Scholar]
 Jasche, J., & Lavaux, G. 2019, A&A, 625, A64 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Jasche, J., & Wandelt, B. D. 2013, MNRAS, 432, 894 [Google Scholar]
 Jones, D. H., Read, M. A., Saunders, W., et al. 2009, MNRAS, 399, 683 [Google Scholar]
 Kitaura, F.S., Ata, M., RodríguezTorres, S. A., et al. 2021, MNRAS, 502, 3456 [NASA ADS] [CrossRef] [Google Scholar]
 Kodi Ramanah, D., Lavaux, G., Jasche, J., & Wandelt, B. D. 2019, A&A, 621, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Koyama, K. 2016, Rep. Prog. Phys., 79, 046902 [NASA ADS] [CrossRef] [Google Scholar]
 Lam, T. Y., Nishimichi, T., Schmidt, F., & Takada, M. 2012, Phys. Rev. Lett., 109, 051301 [NASA ADS] [CrossRef] [Google Scholar]
 Lavaux, G., & Hudson, M. J. 2011, MNRAS, 416, 2840 [Google Scholar]
 Lavaux, G., & Jasche, J. 2016, MNRAS, 455, 3169 [Google Scholar]
 Leclercq, F., Lavaux, G., Jasche, J., & Wandelt, B. 2016, JCAP, 2016, 027 [CrossRef] [Google Scholar]
 Modi, C., Feng, Y., & Seljak, U. 2018, JCAP, 2018, 028 [CrossRef] [Google Scholar]
 Mukherjee, S., & Wandelt, B. D. 2018, JCAP, 2018, 042 [CrossRef] [Google Scholar]
 Mukherjee, S., Lavaux, G., Bouchet, F. R., et al. 2021, A&A, 646, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Neyrinck, M. C., AragónCalvo, M. A., Jeong, D., & Wang, X. 2014, MNRAS, 441, 646 [Google Scholar]
 Nguyen, N.M., Jasche, J., Lavaux, G., & Schmidt, F. 2020, JCAP, 2020, 011 [Google Scholar]
 Nguyen, N.M., Schmidt, F., Lavaux, G., & Jasche, J. 2021, JCAP, 2021, 058 [Google Scholar]
 Pardo, K., Desmond, H., & Ferreira, P. G. 2019, Phys. Rev. D, 100, 123006 [NASA ADS] [CrossRef] [Google Scholar]
 Percival, W. J., Nichol, R. C., Eisenstein, D. J., et al. 2007, ApJ, 657, 645 [NASA ADS] [CrossRef] [Google Scholar]
 Planck Collaboration VI. 2020, A&A, 641, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Porqueres, N., Jasche, J., Lavaux, G., & Enßlin, T. 2019a, A&A, 630, A151 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Porqueres, N., Kodi Ramanah, D., Jasche, J., & Lavaux, G. 2019b, A&A, 624, A115 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Porqueres, N., Hahn, O., Jasche, J., & Lavaux, G. 2020, A&A, 642, A139 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Porqueres, N., Heavens, A., Mortlock, D., & Lavaux, G. 2021, MNRAS, 502, 3035 [NASA ADS] [CrossRef] [Google Scholar]
 Porredon, A., Crocce, M., ElvinPoole, J., et al. 2021, ArXiv eprints [arXiv:2105.13546] [Google Scholar]
 Racca, G. D., Laureijs, R., Stagnaro, L., et al. 2016, in Proc. SPIE, SPIE Conf. Ser., 9904, 99040O [Google Scholar]
 Saunders, W., Sutherland, W. J., Maddox, S. J., et al. 2000, MNRAS, 317, 55 [Google Scholar]
 Schmidt, F., Cabass, G., Jasche, J., & Lavaux, G. 2020, JCAP, 2020, 008 [Google Scholar]
 Skrutskie, M. F., Cutri, R. M., Stiening, R., et al. 2006, AJ, 131, 1163 [Google Scholar]
 Tegmark, M., Blanton, M. R., Strauss, M. A., et al. 2004, ApJ, 606, 702 [Google Scholar]
 VillaescusaNavarro, F., Marulli, F., Viel, M., et al. 2014, JCAP, 2014, 011 [CrossRef] [Google Scholar]
 VillaescusaNavarro, F., AnglésAlcázar, D., Genel, S., et al. 2021, ApJ, 915, 71 [NASA ADS] [CrossRef] [Google Scholar]
 Wang, H., Mo, H. J., Yang, X., Jing, Y. P., & Lin, W. P. 2014, ApJ, 794, 94 [NASA ADS] [CrossRef] [Google Scholar]
 Zhao, G.B., Raveri, M., Pogosian, L., et al. 2017, Nat. Astron., 1, 627 [NASA ADS] [CrossRef] [Google Scholar]
Appendix A: 2M++ galaxy catalog
The 2M++ catalog (Lavaux & Hudson 2011) is a compilation galaxy redshifts derived from the TwoMicronAllSkySurvey (2MASS) Redshift Survey (2MRS) (Huchra et al. 2012), the SixDegreeField Galaxy Redshift Survey Data Release 3 (6dFGRSDR3) (Jones et al. 2009), and the Sloan Digital Sky Survey Data Release 7 (SDSSDR7) (Abazajian et al. 2009). The resulting catalog has a greater depth and a higher sampling rate relative to the Infrared Astronomical Satellite (IRAS) Point Source Catalog Redshift Survey (PSCZ) (Saunders et al. 2000). The photometry is based on the 2MASS Extended Source Catalog (2MASSXSC) (Skrutskie et al. 2006), an allsky survey in the J, H and K_{S} bands, with redshifts in the K_{S} band of the 2MRS complemented by those from the SDSSDR7 and 6dFGRSDR3.
Since the 2M++ catalog is a combination of several surveys, the galaxy magnitudes from all sources were first recomputed by measuring the apparent magnitude in the K_{S} band within a circular isophote at 20 mags arcsec^{−2}. The apparent K_{S} band magnitudes were subsequently corrected by taking into account Galactic extinction, cosmological surface brightness dimming, stellar evolution and kcorrections, while masking the Galactic Plane. To account for the incompleteness due to fibre collisions in 6dFGRS and SDSS, the redshifts of nearby galaxies within each survey region are cloned. The final 2M++ catalog contains 69190 galaxies in total, and is fairly sampling the galaxies of the cosmic volume up to a distance of 200 h^{−1} Mpc for the area covered by the 6dFGRS and SDSS, and up to 125 h^{−1} Mpc for the region mapped by 2MRS. For a more indepth description of the construction of the 2M++ catalog, we refer the interested reader to the original compilation (Lavaux & Hudson 2011), with the computation of radial selection functions and target selection completeness as required for the BORG 2M++ analysis (Jasche & Lavaux 2019) detailed in Sect. 4.1 thereof.
Appendix B: BORG
The Bayesian Origin Reconstruction from Galaxies (BORG) algorithm (Jasche & Wandelt 2013; Jasche & Lavaux 2019) constitutes a hierarchical Bayesian statistical inference framework to infer the initial conditions of the Universe from the observed galaxy positions in the sky as provided by galaxy redshift surveys. By encoding a physical forward model at its core, as illustrated in Fig. 1, BORG allows for a detailed 3D reconstruction and characterization of observed cosmic largescale structures.
The forward model entails the sequential application of several components that causally relate the cosmic initial conditions to the observed galaxy distribution, starting with a physical description of the nonlinear dynamics involved in gravitational structure formation. While the BORG machinery incorporates several models of structure growth, such as the approximate first and secondorder Lagrangian perturbation theory and fully nonlinear particle mesh models, we employ the latter option to evolve the primordial density fluctuations to their corresponding 3D dark matter distribution via cosmological Nbody simulations. The predicted dark matter fields must subsequently be connected to a galaxy population via an adequate galaxy bias model. Although the galaxy biasing effect is a presently challenging and as yet unresolved issue in cosmology, the 2M++ analysis made use of a local but nonlinear truncated power law bias model (Neyrinck et al. 2014) as motivated by numerical simulations. Specifically, BORG inherently accounts for the physical effects such as the geometric distortion due to the cosmic expansion and the redshift space distortions due to the peculiar velocities of galaxies. The final component in the forward model deals with all the relevant observational and instrumental effects, such as survey geometry, selection effects, and foreground contamination, to yield the desired galaxy distribution that can be compared to observations via a suitable likelihood, with a Poissonian distribution adopted in the 2M++ analysis (Jasche & Lavaux 2019).
The above physical forward model results in a highly nontrivial Bayesian inverse problem. To efficiently sample the highdimensional and nonlinear parameter space of plausible initial conditions at an earlier epoch, with typically 𝒪(10^{7}) free parameters corresponding to the discretized volume elements of the observed domain, BORG relies on a Hamiltonian Monte Carlo (HMC) technique (Jasche & Kitaura 2010; Jasche & Wandelt 2013; Jasche & Lavaux 2019). In essence, BORG performs the joint inference of the respective posterior distributions of the initial conditions and the bias parameters given the observed galaxy distribution via a modular Markov Chain Monte Carlo (MCMC) sampling framework. Feeding the MCMC samples of the initial conditions to the forward model therefore yields physically plausible realizations of the 3D cosmic structures, namely, nonlinearly evolved density fields and associated velocity fields underlying the spatial distribution of observed galaxies.
Recent extensions to the BORG framework have led to substantial improvements in cosmological parameter inference (Kodi Ramanah et al. 2019; Elsner et al. 2020; Schmidt et al. 2020), Lymanα (Porqueres et al. 2019a, 2020) and cosmic shear (Porqueres et al. 2021) reconstructions, with the fieldlevel treatment transcending the capabilities of conventional cosmological analyses. Novel sophisticated additions to the forward model include a robust likelihood to account for unknown foreground contamination and systematics (Porqueres et al. 2019b) and machine learningbased galaxy bias models (Charnock et al. 2020). The reconstructed density and velocity fields from BORG analyses have been employed in investigating extensions of the standard model of particle physics and cosmology (Desmond et al. 2018, 2019; Pardo et al. 2019; Desmond & Ferreira 2020; Bartlett et al. 2021) and for unbiased and accurate measurements of the Hubble constant from gravitational wave events (Mukherjee et al. 2021), as well as those of the kinematic SunyaevZel’dovich effect from CMB observations (Nguyen et al. 2020).
Appendix C: Fisher information maps
The proposed targeted search approach requires identifying regions of the observational domain whose observations can thus provide optimal information gain to update existing knowledge. To quantify this information gain, we chose to employ the Fisher information (Fisher 1925) approach. This approach measures the amount of information that unseen observable data can carry about uncertain model parameters. More explicitly, Fisher information estimates the squared norm of the likelihood score for given model parameters averaged over all possible future data realizations permitted by the likelihood. It therefore measures the expected strength with which the likelihood will respond to changes in the model parameters once new data becomes available. For the specific case considered in this work, we assume that prior knowledge on the spatial cosmic matter configuration is available from previous observations. More specifically, we assume that the white noise realizations (the phases) of the initial conditions ϕ are provided by the BORG reconstruction of the 2M++ survey (Jasche & Lavaux 2019). Assuming this realization of initial conditions we can express the conditional Fisher information as:
where θ corresponds to the set of cosmological parameters parametrizing our forward model, with the galaxy observations within the voxels denoted by d = [N_{i}]_{i = 1, …Nbox} and N_{box} corresponds to the number of the voxels inside the 3D volume considered in the 2M++ BORG reconstruction (see Jasche & Lavaux 2019, for further details). We note that we express the vector nature of these random variables using boldface symbols. For the sake of this work, we assume a Poisson likelihood to describe the galaxy clustering data:
where λ_{i}(θ; ϕ) denotes the rate of the specific realization of the Poisson process as dictated by the initial phases ϕ and cosmological parameters θ, while N_{i} represents the galaxy counts within the ith cell of the 3D volume. Now, taking the derivative with respect to the cosmological parameters θ gives:
The square of the above derivative is given by:
where we split the sum in diagonal and offdiagonal terms. Again, the indices i and j denote the pixels in the HEALPIX projection. Since given a density field, the individual Poisson realizations are independent, the offdiagonal terms will vanish when computing the Fisher information as given by Eq. (C.1). In order to show this, we consider the following data average of Eq. (C.4):
where we used the following identities:
The first equality follows from the conditional independence of the Poisson realizations within each grid cell, which always holds once the underlying density field λ(θ, ϕ) is given; the second results from the expression of variance of the Poisson distribution. Therefore, the diagonal terms are:
From the above equation, we see that the contribution from a particular volume element to the Fisher information is given by:
This allows us to obtain a 3D Fisher map. Now, we also have to specify the Poisson intensity in terms of the output of our physics simulator. This relation can be written as:
where B(x) is an arbitrary nonnegative bias function and G(θ, ϕ) is a physics simulator of the cosmic largescale structures with output δ corresponding to the 3D matter density contrast amplitudes. For illustrative purposes, we assume B(x) = 1 + x. We note that any bias model monotonic in the density will change the quantitative results – but not the qualitative results, as can also be seen from Eq. (C.8). We then obtain:
which leads to the following expression for the derivative with respect to cosmological parameters θ:
We can therefore approximate the Fisher information elements through finite differencing as:
We note, however, that the above calculation is done for a fixed realization of the phases ϕ. Since we do not know what particular realization is compatible with our Universe, we need to marginalize over them:
This task is made possible by the BORG algorithm, as described in Appendix B. In this way, we are able to use the information content of the previously obtained constraints and update the Fisher information. Projecting this 3D Fisher map onto a HEALPIX grid and visualizing a particular spherical slice through it, we obtain an allsky map which we refer to in our study as the "Fisher information map." A specific example is provided in Fig. 3.
Now, in order to calculate the G_{i}(θ_{0} + Δθ; ϕ) term above, we must first perform a forward model evaluation by setting the cosmological parameters to their corresponding fiducial values: θ_{0} ≡ {Ω_{m} = 0.3111, Ω_{b} = 0.049, Ω_{k} = 7 × 10^{−4}, h = 0.6766, σ_{8} = 0.8102, n_{s} = 0.9665}, according to the latest Planck bestfit ΛCDM cosmology (Planck Collaboration VI 2020), for all the MCMC realizations of the initial conditions from the BORG 2M++ analysis. We then compute the mean galaxy field marginalized over the forward model output realizations. We subsequently repeat this procedure for a new set of perturbed cosmological parameters, θ′≡θ_{0} + Δθ, ensuring Δθ does not exceed values outside 1sigma width of a Gaussian centered at the fiducial cosmology θ_{0} as characterized by Planck bestfit values. Using the respective means of the fiducial and perturbed galaxy fields, we compute the gradient of the forward model output with respect to the cosmological parameters, Δ_{G}/Δ_{θ}, using a finite differencing scheme, with:
and Δθ = θ′−θ_{0}. The distinct gradient components correspond to the cosmological sensitivity maps displayed in Fig. 2. The gradient is then squared and divided by the mean fiducial galaxy field and marginalized over phase realizations, as required by Eq. (C.12), to finally yield the desired Fisher information elements. The resulting Fisher information map for a particular spherical slice through the 3D field is illustrated in Fig. 3.
In this approach, we do not explicitly enforce that ∑_{i}Ω_{i} = 1 for θ′, since we are interested in infinitesimal and independent variations of the cosmological parameters as required by the Fisher information. Therefore, although this is marginally inconsistent from the cosmological perspective, it is perfectly consistent from the information theory perspective. Furthermore, enforcing the ∑_{i}Ω_{i} = 1 in our forward model introduces dependencies of the variations required for the evaluation of the Fisher information, which renders the interpretation more convoluted. Nonetheless, we also tested the impact of enforcing ∑_{i}Ω_{i} = 1 in our forward model and found that this has a negligible effect on the derived Fisher information map. In addition, we verified the robustness of our results, as expected, to the choice of fiducial cosmology.
Appendix D: Marginalizing phase realizations and targeted searches
The derivation presented in the previous section took care of explicitly keeping, where needed, the conditional dependence on the specific realization of the phases of the initial density field, also referred to as the initial white noise field. The reason being that in practice, the exact phase information of the initial density field is not available:
where 𝒫(ϕ) is a Gaussian prior with zero mean and unit variance for the initial white noise field. We effectively use a Markov approximation of the integral by drawing random realizations from the white noise prior and evaluating the corresponding averaged Fisher information as:
where the ϕ_{i} are independent white noise realizations. Similarly, we can just decompose the averaged Fisher information into the individual components per volume element:
It is clear that since we marginalize over all possible white noise realizations, there cannot be any variation in the h_{i}(θ) since every configuration is equally likely. Thus, without any further prior information on the phases, every point in the Universe, on average, is expected to contribute exactly the same amount of information.
This is one of the logical reasons behind performing homogeneous cosmological surveys. However, we are no longer in a regime of cosmology where complete ignorance about the Universe prevails. We have access to data, which can inform us about the specific realization of our Universe. If we account for this fact, we can condition the average Fisher information on the existing data:
Now, 𝒫(ϕd) is the data constrained posterior of the initial phase distribution. Obtaining this posterior distribution is a nontrivial task, but one that is solved by our BORG algorithm. In particular, BORG provides a Markov approximation to the highdimensional posterior distribution 𝒫(ϕd), such that we can approximate the integral as:
with this inferred posterior of initial phase distributions being fairly robust to the details of the physical model adopted (Nguyen et al. 2021). Similarly, we can use the results of BORG to estimate the Fisher information per volume element as:
The key point to realize here is that the result does not depend on the white noise phases ϕ as they have been marginalized out. More precisely, this has been achieved by evaluating Eq. (D.6) with samples characterizing the posterior 𝒫(ϕd).
All Figures
Fig. 1. Schematic of the Bayesian physical forwardmodeling framework of BORG, which solves a largescale Bayesian inverse problem by fitting a dynamical structure formation model to galaxy observations and subsequently inferring the primordial initial conditions (ICs) that lead to the formation of the presently observed cosmic structures via gravitational evolution. The BORG forwardmodeling approach naturally marginalizes over unknown galaxy bias and accounts for all relevant physical effects, such as redshift space distortions (RSDs) resulting from the peculiar velocities of galaxies, as well as instrumental selection effects. 

In the text 
Fig. 2. Components of the gradient of the matter density field with respect to the cosmological parameters for a spherical slice of thickness ∼2.65 h^{−1} Mpc at a comoving distance of 100 h^{−1} Mpc from the observer, with the corresponding galaxies from the 2M++ catalog denoted via black dots. Visually, we find that the baryon density Ω_{b} has the largest influence on the spatial distribution of the cosmic structures, with the regions surrounding the filamentary galaxy distribution being particularly sensitive to changes in the baryon density. Conversely, the least significant response emanates from the cosmic curvature Ω_{k}, with only the vicinity of the dense galaxy clusters reacting to changes in the geometry of the Universe. 

In the text 
Fig. 3. Fisher information map for the same spherical slice as in Fig. 2. The observed galaxy distribution from the 2M++ catalog, lying in the corresponding spherical shell centered around the observer, is represented by the red dots. We find that the regions in the vicinity of the massive cosmic structures, as traced by the galaxy distribution, are the most informative according to the Fisher information map. These regions correspond to the regime of gravitational infall of the galaxy clusters. 

In the text 
Fig. 4. Fisher information map of the Coma cluster, with its corresponding mass density overlaid as a contour, for the central slice of thickness ∼5 h^{−1} Mpc through a 3D region centered on the cluster that extends over 40 h^{−1} Mpc. According to the Fisher information map, the central region containing the core of the cluster encodes fairly limited information gain, whilst the peripheral regions of the filaments and cluster core, where mass accretion occurs via gravitational infall, constitute the greatest proportion of information gain. 

In the text 
Fig. 5. Flowchart of the targeted search approach. The core idea is to use the existing knowledge as quantified by the posterior p(θ, ϕd) in order to calculate the Fisher map marginalized over this posterior, ⟨ℐ(θd_{0})⟩_{(ϕθ, d0)}, while keeping the observables of interest. This allows this Fisher map to provide us with regions of the sky with highest information gain potential for acquiring new data that are optimal for testing our model predictions, ⟨ℐ(θd_{0})⟩_{(ϕθ, d0)}: d_{0} → d. This data will in turn be used to update our knowledge about the model through updating the posterior and the procedure would repeat until the information content is fully depleted. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.