Issue 
A&A
Volume 663, July 2022



Article Number  A124  
Number of page(s)  22  
Section  Cosmology (including clusters of galaxies)  
DOI  https://doi.org/10.1051/00046361/202142526  
Published online  19 July 2022 
Covariances of density probability distribution functions. Lessons from hierarchical models
^{1}
Université ParisSaclay, CNRS, CEA, Institut de physique théorique, 91191 GifsurYvette, France
email: francis.bernardeau@cea.fr
^{2}
Institut d’Astrophysique de Paris, CNRS and Sorbonne Université, UMR 7095, 98 bis bd Arago, 75014 Paris, France
Received:
26
October
2021
Accepted:
11
March
2022
Context. Statistical properties of the cosmic density fields are to a large extent encoded in the shape of the onepoint density probability distribution functions (PDF) as measured in surveys. In order to successfully exploit such observables, a detailed functional form of the covariance matrix of the onepoint PDF is needed.
Aims. The objectives are to model the properties of this covariance for general stochastic density fields and for stochastic fields that reproduce the properties expected in cosmology. The accuracy of the proposed forms is evaluated in specific cases.
Methods. The study was conducted in a cosmological context and determined whether the density is defined absolutely or relatively to the sample mean density. Leading and subleading contributions were identified within a large class of models, the socalled hierarchical models. They come from either large or short separation contributions. The validity of the proposed forms for the covariance matrix was assessed with the help of a toy model, the minimum tree model, for which a corpus of exact results could be obtained (forms of the one and twopoint PDF, largescale densitybias functions, and full covariance matrix of the onepoint PDF).
Results. It is first shown that the covariance matrix elements are directly related to the spatial average of the twopoint density PDF within the sample. The dominant contribution to this average is explicitly given for hierarchical models (coming from large scale contribution), which leads to the construction of specific densitybias functions. However, this contribution alone cannot be used to construct an operational likelihood function. Subdominant largescale effects are found to provide corrective terms, but also a priori lead to limited information on the covariance matrix. Short distance effects are found to be more important but more difficult to derive as they depend more on the details of the model. However, a simple and generic form of these contributions is proposed. Detailed comparisons in the context of the RayleighLevy flight model show that the largescale effects capture the bulk of the supersample effects and that, by adding the shortdistance contributions, a qualitatively correct model of the likelihood function can be obtained.
Key words: largescale structure of Universe / cosmology: theory / methods: statistical
© F. Bernardeau 2022
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
In the context of cosmological studies, the concept of countsincells statistics has been put forward for a long time as a unique way to quantify the statistical properties of the cosmological fields (White 1979; Colombi et al. 1995; Balian & Schaeffer 1989; Bernardeau & Schaeffer 1999). It was then shown in particular that countsincells statistics, which represents a discrete representation of the local density probability distribution function (PDF), could be directly related to the correlation hierarchy of the density field.
Interest in these types of observables was recently renewed for several reasons. The size of the surveys makes accurately measuring these quantities more realistic. This is already the case for surveys such as the Dark Energy Survey (DES collaboration; Abbott et al. 2018), the KiloDegree Survey (KIDS; Heymans et al. 2021), and the Hyper Suprime Cam (HSC; Hikage et al. 2019). The future promises even larger and more powerful surveys such as Euclid (Laureijs et al. 2011; Amendola et al. 2018) and the Rubin Observatory (Ivezić et al. 2019). Moreover, the theoretical foundations for these constructions (at least in the cosmological context) has been considerably strengthen with the realization that the largedeviation theory (LDT; for a general review, see Touchette 2011) could successfully be invoked, as shown in Bernardeau & Reimberg (2016). It clarifies the applicability of the theory to the cosmological density field and places previous works on a much more solid foundation (Valageas 2002; Bernardeau et al. 2014). The ability of density PDF to constrain cosmology was emphasized in Codis et al. (2016b) and completed in Friedrich et al. (2020) and in Uhlemann et al. (2020), who showed that these observable could efficiently constrain the neutrino mass or primordial nonGaussianities. Finally, although the matter PDF is not a direct observable, as is matter density, it can be closely approached with the help of luminous tracer statistics (Repp & Szapudi 2020), more convincingly in weaklensing fields, as advocated in numerous recent papers (Barthelemy et al. 2021; Bernardeau & Valageas 2000), or with combined approaches such as densitysplit statistics (Gruen et al. 2018; Friedrich et al. 2018; Brouwer et al. 2018), which proved to be particularly promising.
The construction of a full theory of these observable requires a detailed analysis of its global error budget, however, due to finitesize surveys, imperfect tracers, and so on. Some of these aspects have been explored in early studies such as Szapudi & Colombi (1996) and Szapudi et al. (1999), but a full theory is still lacking. The developments presented in this paper are made in this context. More precisely, the purpose of this study is to explore what determines the expression of the covariance of data vectors whose elements are local quantities, such as the density contrast and density profiles, in cosmological contexts, that is, in classical random fields with longrange correlations. Derivations were made furthermore assuming statistical homogeneity and isotropy. The domain of application encompasses both countsincells statistics, basically 2D or 3D counts of density tracers, or proxies to projected densities such as mass maps for weaklensing tomographic observations.
In order to gain insights into the different contributions and the effects that might contribute to the covariance, we rely on the use of the hierarchical models to derive results we think rather general. The immense advantage of using such models is that they naturally incorporate many of the features expected in density cosmological fields (e.g., the magnitude of the highorder correlation functions), and there are also models for which many exact results can be obtained in particular for countsincells statistics. The goal of these constructions is to eventually infer precisely what the performance of PDF measurements would be on the determination of cosmological parameters, taking advantage of results such as those found in Boyle et al. (2021), which give the response function of these observable to various cosmological parameters
Section 2 is devoted to the presentation of the general framework. The subsequent section explores different contributions, from largescale effects with the derivation of several bias functions to shortdistance contributions. Results are derived in a framework as general as possible, including discrete noise associated with the use of a finite number of tracers. Section 4 presents the general hierarchical models, and more specifically, the RayleighLevy flight model that we use as a toy model to evaluate the performances of approximate schemes. In Sect. 5, simplified models for the covariance matrix are presented and evaluated with the help of a set of numerical experiments. Section 6 summarizes the results that have be found and specifies their expected range of application.
The text is complemented by appendices that contain a large amount of material. They present the hierarchical models, their mathematical description, and the meanfield approximation that is used throughout for explicit computations. Appendix C is more specifically devoted to the minimal tree model and the construction of the exact meanfield covariance matrix.
2. General framework. Construction of covariance matrices
The purpose of this section is to show how the elements of the covariance matrix are related to the joint density PDFs within a given survey. We first formalize this relation in a general framework before we explore its consequence in the context we are interested in. We assume we are interested in the PDF of some local quantity, μ, that can be evaluated within a survey, thus defining a field μ(x) throughout the survey. The a priori typical example of this quantity is the density (see below for a more precise illustration of what this quantity could be). The value of μ is assumed to lie in some ensemble ℳ (that can be simply the real numbers), and the data vector we are interested in consists of the probabilities p_{i} that μ lie within the subsets ℳ_{i} (which are a priori nonzero within ℳ). The onepoint PDF of μ is then given by
if ℘(μ,x) dμ is the PDF of μ at location x. p_{i}(x) is then assumed to be independent of x in the context we are interested in, for which statistical homogeneity is assumed. More formally, we can define the characteristic function χ_{i}(x), which takes the value 1 where μ(x) ∈ ℳ_{i} and 0 otherwise.
An estimation of p_{i} would then be given by the volume fraction of the survey where μ(x) ∈ ℳ_{i}. We note this estimate as P_{i} ^{1},
which is then itself a random variable, the properties of which we are interested in. More precisely, we would like to derive an operational form for the likelihood function of a set of P_{i} variable. We limit our investigation here to the construction of the likelihood from the covariance matrix, assuming that the likelihood of P_{i} is close enough to a Gaussian distribution^{2}.
The ensemble average of P_{i} is
We can further define a joint PDF of the same field, ℘(μ,x;μ′,x′), which is the joint PDF of μ and μ′ in locations x and x′. Defining p_{ij}(x, x′) as the joint ensemble average of ℘(μ,x;μ′,x′), we have
The elements of the covariance matrix of P_{i} are then formally
This gives the relation between the covariances and joint PDF. If p_{ij}(x, x′) depends only on the relative distance between x and x′, this expression can be recast in terms of the distribution of such distances, P_{s}(r_{d}), in the form
The precise form of P_{s}(r_{d}) depends on the detail of the survey. Explicit forms can be given in case of simple regular surveys such as square surveys^{3}. In the context of statistically homogeneous and isotropic random fields, this latter expression is used. In particular, we wish to determine the configurations that contribute most to p̄_{ij}. They obviously depend on both the random processes we consider and on the definition of ℳ_{i} and ℳ_{j}. In order to be more specific, we assume in the following that μ is a local density assigned to be in bins (i) centered on ρ_{i} and with width dρ_{i} (assumed a priori to be arbitrarily small), so that
If necessary, local densities could be obtained after the field μ(x) has been convolved with a window function W_{R}(x), associated with a scale R that is
It is then assumed R is small compared to the sample size in order to identify what the leading contributions to the joint PDFs might be. In practice, W_{R} might also be a simple tophat window function, but this is not necessarily the case. It could be more elaborated filters, such as compensated filters (of zero average), such as those introduced for cosmic shear analysis (Schneider 1996; Kaiser 1998; Bernardeau & Valageas 2000).
We furthermore allow the estimated densities ρ_{i} to be defined with respect the overall density of the sample ρ_{s},
For instance, we could be interested in or which are frequently encountered situations in praxis. Then ρ_{s} is itself a random variable whose correlation with ρ(x) ought to be taken into account. We then need to explore the properties of either P(ρ_{i}, ρ_{j}; x, x′) or P(ρ_{s}, ρ_{i}, ρ_{j}; x, x′) from which functions of interest can be built, that is,
from which the covariance elements such as
can be derived and whose properties we wish to explore. We wish in particular to build a model of the likelihood function from such a covariance, requiring full knowledge of its eigenvalues and eigendirections. In this respect, it is implicit that the number of bins (i) to be used is finite. We nonetheless present at least in this first section the results in the continuous limit for ρ_{i}. It is finally to be noted that as stated before, we restrict our analysis to covariance matrices, but higherorder correllators might also be considered by generalizing the relation (5) to a higher number of variables.
3. PDF covariances in the context of cosmological models
3.1. Modeling the joint PDF
To make progress, we need to make further assumptions about the mathematical structure of the joint PDF. In the following, we assume in particular that joint PDFs can be obtained from their cumulant generating functions (CGF)^{4}, φ(λ_{i}, λ_{j}; r_{d}). The latter is defined as
and it is assumed that this relation can be inverted to give the joint PDF from Laplace inverse transformations,
where the integrations are made a priori along the imaginary axis. The CGFs themselves are closely related to the averaged correlation functions of the underlying field. In the following, we develop models for which these correlation functions can be computed precisely.
3.2. Largedistance contributions to the joint density PDF
We start by assuming that covariances are dominated by longrange correlation and not by proximity effects (e.g., densities taken in nearby regions). Whether this assumption is correct obviously depends on the particular model and setting we consider, as we detail below. There are large sets of models for which general expressions can be given in this regime. They are the socalled hierarchical models, originally introduced in Peebles (1980), discussed in more detail in Fry (1984a,b), Balian & Schaeffer (1989), Bernardeau & Schaeffer (1992), and further formalized in Bernardeau & Schaeffer (1999) as described below; it is also true in the quasilinear regime as originally pointed out in Bernardeau (1996) and derived in more detail in Codis et al. (2016a). In these regimes, we obtain the following functional form (see previous references and the detailed derivation in Appendix B):
where ξ(x, x′) is the twopoint correlation function of the density field at positions x and x′, and φ_{0}(λ) and φ_{1}(λ) are specific functions of λ that depend on the details of the model.
Then, setting λ_{s} to zero, we can easily obtain the expression of the joint PDF at leading order in ξ(r_{d}),
Here P(ρ_{i}) is the onepoint density PDF (i.e., implicitly at scale R), and b(ρ_{i}) is the densitybias function (to be distinguished from the standard halobias function). It also depends on ρ_{i} (and on the scale R) so that in the previous expression, the dependence on ρ_{i}, ρ_{j}, and r_{d} can be factorized out.
To be more precise, P(ρ_{i}) is given by the inverse Laplace transform of the CGF (see, e.g., Balian & Schaeffer 1989 and Bernardeau 2013 for a detailed derivation of this inversion),
where φ_{0}(λ) is the CGF of the density taken at scale R (i.e., for the filter W_{R}). The function b(ρ_{i}) is defined through a similar relation,
The function φ_{1}(λ) can be explicitly computed in the context of perturbation theory calculations (Codis et al. 2016a). This is the case also for models in the socalled hierarchical models (see appendices). In the latter case, these calculations can be extended to higher order, as we describe below, providing ways to better assess the domain of validity of this expansion.
According to the previous relation, this implies that this form translates into the expression of the covariance coefficients for the density PDF. More precisely, we expect
where is the average value of the twopoint correlation function ξ(r_{d}) within the sample.
It is to be noted, however, that this is true if

The term in is indeed the leading contribution of the expansion (15). This is obviously not the case for samples with periodic boundary conditions, for which vanishes by construction;

The density is defined regardless of the density of the sample. Its expectation value therefore does not coincide with ρ_{s} for a given sample.
It can also be noted that in the Gaussian limit, we have b(ρ_{i})=δ_{i}/ξ. Applying the relation (16) to the density within one cell and to the sample density ρ_{s} = 1 + δ_{s} leads then to the following expression for the conditional expression of density PDF,
This leads to the interpretation of the function b(ρ_{i}) as the response function of the density PDF to the sample density. This means that although the densitybias function cannot be derived from the density PDF alone, we should be able to derive it if we are in possession of an operational method to compute the density PDF for arbitrary cosmological parameters (in a way similar to the derivation of halobias function as pioneered in Mo & White 1996). Undoubtedly, this result is closely related to the socalled supersample effects (as described for the power spectra covariance in Takada & Hu 2013), that is, the impact of modes of scale comparable to or larger than the sample size. This is not necessarily their only contribution (because subdominant largescale contributions can also contribute to the covariance), however, but likely to be the most important contribution, as described below.
The densitybias function obeys the following consistency relations:
as initially pointed out in Bernardeau & Schaeffer (1992).
3.3. Case of relative densities
The previous formula applies to the local densities, evaluated regardless of the sample density. It does not apply in particular to standard settings (e.g., densities measured out of galaxy counts) where the density is defined with respect to the mean density of the sample. To address this case in particular, we should consider
as the observable for which the covariance is to be computed. In this case, the formal derivation of the PDFs is presented in the appendix, and it leads to
We then use relation (15) to compute the form of this function. At this stage, it is to be noted that the expressions , ∫dx_{0} ξ(x_{0}, x_{1}) and ξ_{12} all take the same averaged value when integrated over the sample. We note here this common value as . Inserting the resulting expressions of the CGF in both the expressions of and and expanding all terms at linear oder in , we obtain
This leads to the definition of the first samplebias function,
which can be reexpressed in terms of the densitybias function defined in Eq. (18) and the derivative of with respect to
In this case, the covariance matrix elements are then expected to be given by
Remarkably, b_{s1}(ρ) can entirely be expressed in terms of b(ρ). For the sake of completeness, we also consider the case of . In this case, it is easy to show that
Following the same approach as for the previous case, the leadingorder expression in of the connected joint PDF is
It leads to the definition of the second samplebias function,
so that
The three bias functions are therefore closely related. Although the densitybias function b(ρ) cannot be derived from the shape of P(ρ) alone, as mentioned before, the relations between b(ρ) and either b_{s1}(ρ) and b_{s2}(ρ) depend on the PDF alone. Furthermore, the two relative density bias functions obey the following consistency relations:
The second relation is at variance with the corresponding relation (22) for the densitybias function. It indicates that for typical values of ρP(ρ), the sample bias functions, b_{s#}(ρ), are likely to be smaller than the densitybias function b(ρ).
3.4. Structure of the covariance matrix
The consequences of these formulae on the structure of the covariance matrix are illustrated below with the help of the RayleighLevy flight model. Figure 3 compares the results from exact derivations of the covariance matrix with these prescriptions. The diagonal parts of the covariance matrices are well accounted for by these formulae. The root mean square of the measured local density PDF in particular exhibits the expected density dependence, at least for mild values of the density.
In all the formulae (19), (29), and (34), the expression of the covariance exhibits a simple structure, as it is factorizable in the two densities. This implies, for instance, that the reduced covariance matrix
has an extremely simple structure: it is given by the sign of the product of the bias functions (i.e., sign[b(ρ_{i})b(ρ_{j})], sign[b_{s1}(ρ_{i})b_{s1}(ρ_{j})], and sign[b_{s2}(ρ_{i})b_{s2}(ρ_{j})] for the three different measurement strategies). This leads to the butterflylike structure in the plotted matrices, as illustrated in Fig. 4. This simple form betrays the fact that the density covariance is only poorly known. To be more specific, formulae (19), (29), and (34) give only a single eigendirection of the covariance matrix (namely b(ρ_{i})P(ρ_{i})) and the amplitude of a single eigenvalue associated with it. The numerical calculations suggest that it is the leading one when does not vanish, as illustrated on Fig. 7. These formulae do not offer any indication of the amplitude of the covariance in orthogonal directions, however. Taken at face value, they imply that the other eigenvalues all vanish, preventing the covariance matrix from being invertible. These formulae therefore cannot be used alone to model the covariance for practical purposes, and complementary contributions have to be derived from other (and a priori subdominant) effects.
3.5. Beyond leadingorder effects
In the previous subsection, we identified the longdistance leading contributions. As mentioned before, this leads to only limited information of the covariance structure. This difficulty is even more acute for covariances evaluated in numerical experiments consisting of a collection of independent samples, each of them with periodic boundary conditions (this does not have to be so, but it is often the case in practice). By construction, the mean correlation function within the sample then vanishes, making the term we have computed identically zero. All these considerations indicate that further contributions need to be identified. The identification of the nexttoleading order effects in Eq. (16) is difficult to do a priori, however:

One natural nexttoleading contribution is obtained by taking into account secondorder terms in ξ(d) in Eq. (16), that is, by considering doubles lines between cells in a diagrammatic representation. This would induce a term of about ξ(d)^{2}, whose average never vanishes^{5}. As shown in the appendix, these contributions can be formally derived in the context of the hierarchical models. This leads to correction terms that can be organized in a sum of factorized terms. Therefore, although it can indeed provide corrective terms to the covariance matrix, only a limited number of eigendirections can be generated.

Other contributions naturally come from proximity effects due to the fact that cells are finite, and could even overlap, which makes the expansion in ξ(d) ineffective. In a diagrammatic point of view, they are due to the fact that many more diagrams contributed when cells are too close. This has dramatic effects for overlapping cells. For hierarchical models, an approximate form can be used to help model these effects, which we use below.

Finally, effects due to the fact that discrete tracers are used in countincells statistics might also play a role at short distances. They are also tentatively modeled below.
In the following, we propose some modeling of these effects and explore how they depend on the properties of the survey.
3.5.1. Joint PDF at short distances
There are no general forms for the joint PDF at close distance. The hierarchical models suggest the following form (derived from the saddle point approximation, which is valid for moderate values of and of the density contrast), however:
where ρ_{m} = (ρ_{i} + ρ_{j})/2 and δ_{ρ} = (ρ_{i} − ρ_{j})/2, and where α is modeldependent parameter. In other words, the PDF of the difference between ρ_{i} and ρ_{j} can be described by a simple Gaussian with a known width driven by the expression of provided it is small compared to . We note that Δ_{ξ}(d) obviously vanishes at d = 0, it then leads to a Kronecker δ function at zero separation as expected, and generically scales like d^{2} at short distances^{6}. Interestingly, for the minimal tree model the form (38) is exact for α = 1 (see appendix). In general, this is also the expected form based on the saddle point approximation (valid when is small) for generic hierarchical tree models. The value of α can be identified from smallorder cumulants,
where S_{3} is the reduced thirdorder cumulant,
This form is probably not very accurate in general. It can be used to model the impact of close distances to the covariance matrix, however, as shown below.
3.5.2. Poisson noise and minimum separation
A further contribution to this joint PDF can come from discrete effects that arise because the density is evaluated from the counting of discrete tracers (as explored in Szapudi & Colombi 1996 or more recently in Repp & Szapudi 2021). In this subsection, we assume that the density corresponds to the density obtained after application of a tophat filter and that tracers are Poisson realizations of continuous fields (although it is possible to encounter sub or overPoissonnian noises; Friedrich et al. 2018). The use of other filters can be explored but would require specific developments that we do not pursue here. Within such hypotheses then, the joint distribution of countsincells N_{i} is given by the convolution of the joint density PDF, ℘({ρ_{i}}), in the continuous limit convolved by Poisson countsincells as
where P_{Poisson}(N;N̄) is more precisely the probability of having N tracers in a cell where the mean density of tracers is N̄. In practice, N̄_{i} is given by nV_{i}, where n is the number density of tracers and V_{i} is the volume of the cell V_{i}.
Discrete effects would then induce further scatter between the estimated values of ρ_{i} and ρ_{j}. The latter are given by Poisson noise induced by the nonoverlapping parts of the cells, as shown in Szapudi & Colombi (1996), further contributing to the scatter. The scatter in the difference in the number of points is
It can be incorporated as a contribution to the variance of the PDF of δ_{ρ} of the form
where f_{e}(d) is the fraction of the volume of each cell that does not overlap with the other as a function of the cell distance. For short distances (i.e., for about d ≲ R), it is in the 2D case given by
The expression (43) is then a priori to be added to the variance term that appears in Eq. (38) so that the total variance for the density difference reads
Equation (38) then fully encodes the fact that nearby cells are likely to have similar densities. This encodes, for instance, that nearby cells are within the same haloes. This contribution is expected to enhance the covariance terms. It shows that the amount of information is limited at small scales: there is therefore a minimum separation between cells smaller than which no gain in precision is expected of PDF measurements. The minimum distance depends on the bin size: d_{min} is the distance such that the densities in two cells separated by less than d_{min} are almost certainly in the sale density bin. d_{min} therefore depends on the bin width Δ_{i}. From Eq. (45), it is possible to infer this value. We desire
This suggests that a minimum distance between cells can be adopted, given by
The other upper limit comes on d from the expression of Δ_{ξ} as a function of d. The latter depends on both the shape of the power spectrum and on the filter that is used. In general (e.g., for a Gaussian filter), Δ_{ξ}(d) scales like d^{2}/R^{2}, where R is the filtering radius, with a coefficient c_{ns} that depends on the power spectrum index n_{s} and is proportional to its amplitude. Tophat filters have different analytical properties. We give here the formal expression of Δ_{ξ}(d) at 2D for a powerlaw spectrum of index n_{s},
This is the situation we encounter below in the numerical experiments we perform. This leads to the following form:
It is to be noted that it can be in practice a rather short distance, shorter than the filtering scale R. For instance, for a bin width of 1/4, a variance of about unity, d_{minhalo} is about R/5.
Equation (38), together with the expressions of the bias functions described above, is the main results of this paper. We illustrate below how they can be used to compute the covariance matrices.
4. Hierarchical models
In order to illustrate the previous findings, we make use of toy models for which explicit computations can be made.
4.1. General formalism
Hierarchical models are a class of nonGaussian fields whose correlation functions follow the socalled hierarchical ansatz,
where the sum is made over all possible trees that join the p points (diagram without loops), and the tree value is obtained by the product of a fixed weight (that depends only on the tree topology) and the product of the twopoint correlation functions for all pairs that are connected together in the given tree. This construction ensures that the average ppoint function, , scales like the , where is the averaged twopoint function. More precisely, there are S_{p} parameters such that
The precise value of the S_{p} parameters depend on the Q_{p} parameters and on the averages of the product of ξ(r_{ij}) functions. A very good approximation is to assume that the average of the products of this function is given by the product of these averages. Then the S_{p} coefficients depend solely on Q_{p},
4.2. The (minimal) tree model
The tree models are based on a further assumption on the Q_{p} parameters. It is basically assumed that tree expressions can be computed locally^{7}, that is,
where ν_{p} is a weight attributed to all vertices with p incoming lines (ν_{0} = ν_{1} = 1 for completion). In this formalism, the vertex generating function is generally introduced,
The minimal tree model is a model in which ν_{2} alone does not vanish. In the minimal model^{8}, its value is fixed and is given by ν_{2} = 1/2, so that
Together with the Gaussian case (which corresponds to ζ(τ)=1 + τ), this is the only case for which we are sure that it can be effectively built (in the sense that other models may be unphysical).
In this model, it is possible to build the cumulant generating function of the local density. For the onepoint case, assuming the meanfield approximation, the CGF is given by
with
This is not the result of large deviation principle calculations, but of mere combinatorics, although it leads to the same formal transformation between the CGF and the vertexgenerating function. In case of the minimal model, Eq. (57) takes a simple form that can be easily solved. We finally have
The onepoint PDF of the density can then be computed explicitly (see appendix),
as can the densitybias function,
where is the averaged twopoint correlation function within the cell.
4.3. RayleighLevy flight model
The minimal tree model can be implemented with RayleighLevy random walks (or rather RayleighLevy flights, as described in Peebles 1980). This is a Markov random walk where the PDF of the step length ℓ follows a power law,
with a regularizing cutoff at small separation, and where α satisfies
The sample points are then all the step points reached by the walker.
More precisely, the cumulative distribution function of step of length ℓ is
where ℓ_{0} is a smallscale parameter. The two and higherorder correlation functions can then be explicitly computed. Starting with a first point at position r_{0}, the density of the subsequent point (first descendant) at position r is given by
In the following, the dimension of space is denoted D. The density of the descendants, assuming there are an infinity of them, of a point at position r_{0} is then given by a series of convolutions,
with subsequent convolutions and where the integral is done in the whole space. Defining the Fourier transform of f_{1}(r) as ψ(k),
which is then a function of k only, it is easy to see that
where we take advantage of the expression of convolutions in Fourier space and their resummations. The twopoint correlation function is then given by two possible configurations: a neighbor can either be an ascendant or a descendant, so that the twopoint correlation functions between positions r_{1} and r_{2} are given by
where n is the number density of points in the sample that can be associated with a typical length ℓ_{n,}
At large scale, this expression causes the power spectra to be power laws. They scale like k^{−α}, and the resulting twopoint correlation function then takes the form in the large separation limit,
It is to be noted, however, that this expression does not take into account the boundary conditions, in particular if they are assumed to be periodic. This case is examined in some detail in the next paragraph. It is to be noted, however, that in this case, the function ξ(r) has a more complex form. It is in particular no more isotropic.
Higherorder correlation functions can also be computed in this model: n points are correlated when they are embedded in a chronological sequence that can be run in one direction or the other. Thus the threepoint function is simply given by
with five other terms obtained by all combinations of the indices. Expressing the result in terms of the twopoint function, we have
corresponding to a tree structure with ν_{2} = 1/2.
Higherorder correlation functions can be computed similarly. They follow a tree structure in the sense above, with ν_{2} = 1/2 and ν_{p} = 0 for p ≥ 3.
4.4. Periodic boundary conditions
We briefly explore here the case of periodic boundary conditions. Then the multipoint density field g^{PBC}(r_{i}) for periodic boundary conditions can be expressed in terms of the former density field g(r_{i}) as sums of all copies of the sample, that is,
where n_{i} are vectors whose components are integers, and the sums run over all integer values for all i; L is the size of the sample (assumed to be the same in all directions).
When it is applied in this context, we can construct the npoint density function out of the density function f computed previously. Thus the twopoint density function is given by
where n_{12} = n_{2} − n_{1} and n^{PBC} is the resulting onepoint (and therefore homogeneous) density in the sample. This expression is therefore written in terms of the function
We can now compute its expression in terms of the power spectra, or more specifically, the function ψ(k) defined previously. We have
and the latter sum ensures that the contributing wave modes k are only those that are periodic in all three directions, that is, those whose components are multiples of 2π/L so that
with
and where the sum is all over possible integer triplets but n = (0, 0, 0). The twopoint correlation function is now given by
A similar result can be obtained for the threepoint correlation function with
As a consequence, the functional relation between the threepoint correlation function and the twopoint correlation function is left unchanged. This is also the case at all orders.
4.5. Covariance matrix of the minimal tree model
Remarkably, in case of the minimal tree model, the derivation of the CGF can also be made for multiple cells, and in particular, for two cells. Its expression is derived in the appendix. It takes the form
In this case, it is then possible to expand its expression in powers of ξ_{12} for distant cells or in powers of for close cells, and in both cases, closed forms can be obtained to any order. It leads to the possibility of computing the joint PDF for any configuration (see the appendix for details) and finally to evaluate the covariance matrix directly. This is even possible for any of the thre sets of variables we consider, {ρ_{i}}, , or .
We performed these computations for the minimal tree model with a powerlaw behavior ξ(r)∼r^{−1.5} (α = 0.5), a 2D survey with a size of 200^{2} pixels, and a tophat smoothing radius of 4.25 pixels. The amplitude of the correlation function was fixed to give at the smoothing scale. It precisely corresponds to the setting of the numerical simulations of RayleighLevy flights we also performed, as described in the next section. It allows us to compare the two approaches. These analytic results have two limitations: the results are based on the meanfield approximation for the computation of the twovariable GCF, and the covariance elements are computed ignoring the bin sizes (i.e., by evaluating the expression of the covariance for their central values). Although in most cases this should not be an issue, it still might have a nonnegligible impact when the PDF varies rapidly with the density.
5. Simplified models of the covariance matrix
The purpose of this section is then to propose two levels of modeling of the covariance matrix based on the previous results and to compare these propositions with results of either the full analytic results presented before or with the results of numerical experiments based on RayleighLevy flights.
5.1. Modeling the covariance matrix
More specifically, we considered two approximate forms for the full covariance. The first approximation is fully analytic. It makes use of the largescale contributions and those from the short distance expression (38). It reads as the sum of the two contributions
In this expression, the only free parameter is r_{max}. This is indeed a crucial parameter as it determines to a large extent the amplitude of the shortdistance effects. In the following, we take r_{max} = R, that is, the filtering scale. It is found to give a good result for the 2D case and for n_{s} = −1/2, but this choice is likely to depend on the shape of the power spectrum. In general, this formula is intended to give a good account of the general properties of the covariance matrix, it cannot provide reliable quantitative results a priori.
The other form we propose is intended to be much more precise quantitatively. Is is given by the following expression:
where Cov^{PBC}(ρ_{i}, ρ_{j}) is the expression of the covariant matrix for periodic boundary conditions. It is obtained here simply by replacing by before integrating over r_{d} so that the averaged joint correlations vanish identically. The rationale for this proposition is that Cov^{PBC}(ρ_{i}, ρ_{j}) could be more easily estimated from specific numerical experiments. In both cases, the shortdistance contributions are the same for the three types of observables ρ_{i}, , and . These forms are then compared to numerical results.
5.2. Numerical experiments with the RayleighLevy flight model
A series of experiments of 2D walks with a large number of samples were performed as described below. We restricted our analysis to α = 0.5 with l_{0} = 0.003 pixel size (the dependence on l_{0} was tested as illustrated on Fig. 2, where l_{0} = .006 was also used, but the analyses were made for a fixed value of l_{0}). Figure 1 illustrates how points are distributed in these samples. The point distribution does not show the filamentary structure of realistic cosmological simulations. It exhibits the presence of concentrated halos surrounded by empty regions, however, which are reminiscent of the structure of the largest matter concentrations of the cosmic web.
Fig. 1. Example of a realization of a RayleighLevy walk. Points mark the end point of each displacement. They are clearly correlated. 
Two different setting were employed to explore different aspects of the results that were found:

Set 𝒜: 1600 samples extracted from a single numerical realization (with periodic boundary conditions) with a size of 8000 × 8000 pixels^{2} containing 64 × 10^{6} points. Each sample then has 200 × 200 pixels^{2} containing an average of 200^{2} points each. For this set of samples, the average and covariance of the PDF were extracted following the three procedures mentioned before: either the density was taken with respect to the mean density of the realization, with respect to the density of each sample, or by subtracting the sample density. It therefore corresponds to an evaluation of the mean and covariance of the PDF of ρ_{i}, , and , respectively.

Set 𝓑: 1600 samples, each with periodic boundary conditions, with a size of 200 × 200 pixels^{2} containing 200^{2} points each. By construction, the average twopoint function in the sample, , vanishes in this case, and covariance is entirely due to proximity effects.
In each case, the local density was obtained after a filtering procedure. The point positions were first pixelized, that is, each point was attributed to a pixel so that the mean number of points per pixel was one. The field was then filtered by a (quasi) circular tophat functions. In practice, the number of pixels in the window function was 57. This makes the effective smoothing radius about 4.25 in pixel units. The resulting density was then measured at each pixel location. Their histograms were then computed after density binning. To avoid large undue discrete effects, the bin width was chosen to be a multiple of 1/57, and in order to ensure that the requirement (49) was met at the pixel distance, we chose a bin width of about 1/4, more precisely, of 14/57.
Figure 2 shows the resulting PDF as measured in the simulations and how it compares to the theoretical prediction, Eq. (59), for two different choices of l_{0}. The expected scaling for is recovered. The measured PDF also follows theoretical predictions for a wide range of probabilities remarkably well. It gives us confidence in the whole procedure and in the approaches used to compute PDFs in this model. The detailed comparisons were made for l_{0} = 0.003, leading to and a sample density variance in sets 𝒜 given by .
Fig. 2. Onepoint density PDF obtained with tophat filters compared with the theoretical predictions, Eq. (59). The values of are 0.8 and 1.09 for the blue and red curves, respectively, corresponding to two different values of l_{0}. The bottom panel shows the residuals. Departure from theory might be due to binning and/or to the finite number of samples. 
The measured variance of the density PDF is obtained from 1600 samples in each case. The resulting shapes are presented in Figs. 3–5 for the different cases, density in a supersample realization, and in samples with periodic boundary conditions. The results show the comparison between results obtained in the numerical simulations with yellow symbols, and results derived from the analytic prescriptions as blue dots, based on the meanfield approximation. The agreement between the two is very good. The overall shape of the variance and its dependence on the density is well reproduced. Discrepancies can be observed for densities above 4 or 5, however, where the theoretical predictions are seen to underestimate the results. The reasons for these discrepancies are not clear at this stage. A possible explanation might be the finite number of samples that is used to infer the variances^{9}. The variance of the density PDF is also compared with the largescale contributions (19), (29), and (34) for set 𝒜 depending on the cases (at this order, the covariance vanishes for set ). It shows that this formula captures some features of the variance (especially at low and moderate densities), but does not account for all. This is also illustrated in Fig. 4, which shows the reduced covariance. The fact that the covariance is determined to a large extent by its leading largescale contribution leads to values of the reduced covariance close to 1 or −1, leading to these butterfly patterns. Proximity effects, not captured in these forms, also contribute to the covariances at a significant level, however. This is already apparent in Fig. 3.
Fig. 3. Measured variance of the density PDF, i.e., diagonal elements of the covariance matrix, in sets 𝒜 for α = 0.5 and different prescription of the measured density. From left to right, raw density ρ_{i}, scaled density , and scaled density . The blue dots and solid lines are from the meanfield analytical expressions, and the large gold symbols are from the numerical simulations. The dashed black lines are what is expected from the largescale leading contribution. The variance at cell scale is about 1.09, and the variance at sample scale, , is about 0.09. 
Fig. 4. Resulting reduced covariance matrix for the three types of observables for set 𝒜. The covariance matrix is dominated by its leading eigenvalue and direction, leading to this typical butterfly shape of the reduced covariance matrix. 
5.3. Testing models of covariance matrices
Expressions (85) and (86) are precise propositions to show how the largescale contributions can be completed to account for the full form of the covariance. The comparisons between the predicted form and those obtained from the numerical experiments are explored in detail at different levels and using the following criteria:

Amplitude of the PDF variance,

Density dependence of the first eigenvalue of the covariance matrix,

Amplitude of the eigenvalues of the covariance matrix, and

Resulting χ^{2} distribution of a set of data vectors drawn from the original covariance.
These comparisons are shown in Figs. 6–8. For model (86), the term Cov^{PBC}(ρ_{i}, ρ_{j}) is taken from the measured covariance of set . Figures 6 and 7 show that these two prescriptions give a good account of the leading behavior of the covariance matrix. The conclusion is quite sensitive for the choice of r_{max} for prescription (85). On the other hand, there is no free parameter that can be adjusted for model (86). Interestingly, Fig. 7 shows that the PDF variance also departs significantly from the largescale term. The first eigenvector reproduces the functional form of the largescale densitybias functions very faithfully.
Fig. 6. Measured variance of the density PDF, i.e., diagonal elements of the covariance matrix, in sets 𝒜 and comparisons with proposed approximate forms. The yellow line and symbols are the results obtained in the numerical experiments. The dotdashed line is the prediction derived from relation (86), and the dashed gray line shows the prediction from Eq. (85). The dotdashed black lines correspond to the largescale contributions. 
Fig. 7. Behavior of the first eigenvector with the same colorcoding as in Fig. 6. The dashed black lines are the largescale prediction, b_{#}(ρ_{i})P(ρ_{i}) appropriately normalized. The size of the data vector is 30. 
Fig. 8. Performances of the approximate forms of the covariance matrix in terms of rigenvalues and χ^{2}distributions. Top panel: eigenvalues of the covariance matrices (rebinned into six bins) compared to what can be obtained from the proposed approximate forms; same colorcoding as for Fig. 6. The χ^{2} distributions are shown in the bottom panel. Model (86) reproduces the very same χ^{2} distributions. Model (85), in gray, is not as accurate and tends to slightly overestimate the χ^{2}. This latter behavior is amplified when a larger number of bins is used. 
The last two criteria are designed to verify that the reconstructed covariances also capture the subleading behavior of the matrix and can eventually be safely inverted and used as a model of likelihood. To avoid numerical uncertainties and make the comparison tractable, we chose to reduce the binning to six bins (through a rebinning of the histograms and densities ranging from 0.5 to 6.5). The resulting eigenvalues are shown in the top panel of Fig. 8. It shows that the eigenvalues decrease rapidly in amplitude, suggesting that the eigendirections are well sequenced and that the approximate form captures their values rather accurately. Form (86) in particular reproduces all six eigenvalues almost exactly.
Finally, χ^{2} distributions were computed from a set of random values drawn in each case from a Gaussian likelihood built from the measured covariance (with six bins). The values of were then computed for each data vector, and their histogram was computed from each of the proposed models (including the original model for reference),
where is the inverse of the covariance matrix, either computed from Eq. (85) or from Eq. (86). For the original model, the expected distribution of the χ^{2} values is then expected to be precisely that of a χ^{2} distribution with six degrees of freedom. This is indeed what is almost exactly obtained for model (86). Results obtained from prescription (85) are not quite as good. This is expected as the shortdistance effects are estimated rather crudely in Eq. (85). The performance of this prescription deteriorates when the dimension of the data vector (i.e., the number of bins) increases.
6. Conclusions and lessons
We presented key relations that give the largescale behavior of the joint PDF, and hence the leading behavior of the covariance matrix of the density PDF. These contributing terms do not give the expression of a covariance matrix that can be used to build a likelihood function, however, as it is not invertible. Further significant contributions are found to be due to small separation effects, and an approximate form is proposed in Eq. (38). The latter is found to encapsulate most of the proximity effects, that is, it informs on the fact that nearby regions are likely to be correlated. They also give an indication on the minimal grid size that can be used the maximum bin size that can be used without information loss for a given bin width.
We then used a toy model for which numerical experiments can easily be performed and for which the exact PDF and largescale covariance can be derived. It allows us to evaluate the efficiency of approximate schemes precisely. The conclusions of these comparisons are listed below.

The theoretical forms Eqs. (19), (29), and (34) give the leadingorder expression of the covariance elements when supersample effects are taken into account. It gives an accurate prediction of the leading eigenvalue and eigendirection of the covariance matrix.

Whether subdominant effects can be accounted for by subsequent terms depends on the behavior of the twopoint function: if the rms of the twopoint function is dominated by large separations, then nexttoleadingorder effect need to be taken into account; otherwise, shortdistance effects will be the dominant contributor.

In case shortdistance effects dominate, the covariance matrix can be accessed from small simulations provided the relevant dominant largescale contributions are added.

This suggests that in realistic situations, the supersample effects, that is, the effects due to modes whose wavelength is larger than the size of the survey, have limited impact on the structure of the covariance matrix and that they can be captured by the only leading largescale contribution. This is supported by a further analysis of the behavior of ξ(r_{d}) in realistic cosmological settings. For the standard model of cosmology (as derived from cosmic microwave background observations, Planck Collaboration VI 2020), the behavior of the matter correlation function can be derived. This is illustrated in Fig. 9, which illustrates the scales that are the main contributors to the first two moments of the twopoint correlation function. Whether in 2D or in 3D, the first moment is dominated by largescale contributions, whereas the second moment is dominated by smallscale contributions.
Fig. 9. Scale dependence of the matter correlation functions for a realistic cosmological model (cosmological parameters derived from Plank, Planck Collaboration VI 2020) for the 3D density and the projected density (for a uniformly sampled survey with a depth of about 800 h^{−1}Mpc between z = 0.75 and z = 1.25). The top panel shows (solid blue line) and (dashed red line) for the 3D density field, and the bottom panel shows and for the projected density. In both cases, the average value of the first moment of the twopoint correlation function is dominated by largedistance contributions, whereas shortdistance contributions dominate the second moment, assuming survey sizes of about 100 h^{−1} Mpc or above.

In the context of this study, we assumed that the measured P_{i} were Gaussian distributed. Although it is difficult to assess the accuracy of this hypothesis, the structure uncovered in Sect. 3 can be used to make such an attempt. In tree models, higherorder expressions of the joint density PDFs are expected to preserve the tree structure; see Bernardeau & Schaeffer (1999). The connected part of the threepoint joint density PDF is then expected to take the form
where b_{2}(ρ) is the twoline bias function of amplitude similar to b^{2}(ρ). This implies in particular that the thirdorder cumulant is about , much smaller than , making the distribution of the measured values of P(ρ) (quasi) Gaussian distributed. There might be some combination of ρ_{i} and values of , however, for which a higherorder term could play a role in the expression of the likelihood function.
For the application of these formulae in practical cases, some limitations have to be noted. We list them below.

In the proposed form, the fact that in practice, PDFs are generally measured on a grid, that is, on a finite set of locations, is not taken into account. For instance, the exclusion of nonoverlapping cells is not considered. this is expected to introduce additional noise in the PDF estimates. The covariance matrix for these constructions cannot then be derived from general formulae (6) even when the integral in r_{d} is restricted above a given threshold.

Relation (38) has been derived in a specific regime (using saddle point approximations) for tree hierarchical models. They are expected to capture the phenomenon at play for “typical” values of the densities, but they may not perform so well in the rare event tails (the exception being the minimum model, for which it is exact). Further checks of the validity of (38) should therefore certainly be done.

The general formulae (19), (29), and (34) are valid for any type of filtering schemes, even for a compensated filter. This is not the case for relation (38). The proximity effects for compensated filters ought to be considered specifically.

Prescription (86) is found to give a very precise account of the properties of the covariance matrix. It is based on the proposition that largescale (supersample) effects can be added separately from the proximity effects and that the latter can be evaluated with smallscale mocks in which supersample effects are absent (with periodic boundary conditions). This is not an exact result, however,. It relies in particular on the fact that the r.m.s. of the ξ_{s} is dominated by scales much smaller than the sample size.

Prescription (85) is less solid. It can be used for a quick assessment of the different contributing terms, or to build fully invertible covariance matrices, but it is unlikely to give reliable predictions at the χ^{2} level.
In all cases, prescriptions (85) and (86) can be the starting point of a more precise evaluation of the covariance from specific numerical experiments that can complement its evaluation following the approach presented in Friedrich & Eifler (2018), for instance. The authors also showed that some strategies could be adopted to limit the number of realizations required to reach a specific precision. This point is not discussed here.
This is an ideal estimate in the sense that μ is evaluated in an infinite number of locations. We therefore neglect here the impact of measuring μ on a finite number of locations when evaluating P_{i}. Regarding this aspect, a specific derivation that takes a finite number of measurements into account can be found in Codis et al. (2016a).
This is not necessarily so, as exemplified in Carron (2011), Carron & Neyrinck (2012).
it is minimal in the sense that it can be shown that ν_{2} cannot be smaller than 1/2 in the strongly nonlinear regime (Peebles 1980).
Acknowledgments
The author of this article is indebted to Cora Uhlemann, Alex Gough, Oliver Friedrich, Sandrine Codis, Aoife Boyle and Alexandre Barthelemy for many comments and careful examination of the preparatory notes of this manuscript.
References
 Abbott, T. M. C., Abdalla, F. B., Alarcon, A., et al. 2018, Phys. Rev. D, 98, 043526 [NASA ADS] [CrossRef] [Google Scholar]
 Amendola, L., Appleby, S., Avgoustidis, A., et al. 2018, Liv. Rev. Rel., 21, 2 [Google Scholar]
 Balian, R., & Schaeffer, R. 1989, A&A, 220, 1 [NASA ADS] [Google Scholar]
 Barthelemy, A., Codis, S., & Bernardeau, F. 2021, MNRAS, 503, 5204 [Google Scholar]
 Bernardeau, F. 1992, ApJ, 392, 1 [NASA ADS] [CrossRef] [Google Scholar]
 Bernardeau, F. 1996, A&A, 312, 11 [NASA ADS] [Google Scholar]
 Bernardeau, F. 2013, ArXiv eprints [arXiv:1311.2724] [Google Scholar]
 Bernardeau, F., & Reimberg, P. 2016, Phys. Rev. D, 94, 063520 [NASA ADS] [CrossRef] [Google Scholar]
 Bernardeau, F., & Schaeffer, R. 1992, A&A, 255, 1 [NASA ADS] [Google Scholar]
 Bernardeau, F., & Schaeffer, R. 1999, A&A, 349, 697 [NASA ADS] [Google Scholar]
 Bernardeau, F., & Valageas, P. 2000, A&A, 364, 1 [NASA ADS] [Google Scholar]
 Bernardeau, F., Pichon, C., & Codis, S. 2014, Phys. Rev. D, 90, 103519 [NASA ADS] [CrossRef] [Google Scholar]
 Boyle, A., Uhlemann, C., Friedrich, O., et al. 2021, MNRAS, 505, 2886 [NASA ADS] [CrossRef] [Google Scholar]
 Brouwer, M. M., Demchenko, V., HarnoisDéraps, J., et al. 2018, MNRAS, 481, 5189 [NASA ADS] [CrossRef] [Google Scholar]
 Carron, J. 2011, ApJ, 738, 86 [NASA ADS] [CrossRef] [Google Scholar]
 Carron, J., & Neyrinck, M. C. 2012, ApJ, 750, 28 [NASA ADS] [CrossRef] [Google Scholar]
 Codis, S., Pichon, C., Bernardeau, F., Uhlemann, C., & Prunet, S. 2016a, MNRAS, 460, 1549 [NASA ADS] [CrossRef] [Google Scholar]
 Codis, S., Bernardeau, F., & Pichon, C. 2016b, MNRAS, 460, 1598 [NASA ADS] [CrossRef] [Google Scholar]
 Colombi, S., Bouchet, F. R., & Schaeffer, R. 1995, ApJS, 96, 401 [NASA ADS] [CrossRef] [Google Scholar]
 Friedrich, O., & Eifler, T. 2018, MNRAS, 473, 4150 [NASA ADS] [CrossRef] [Google Scholar]
 Friedrich, O., Gruen, D., DeRose, J., et al. 2018, Phys. Rev. D, 98, 023508 [Google Scholar]
 Friedrich, O., Uhlemann, C., VillaescusaNavarro, F., et al. 2020, MNRAS, 498, 464 [NASA ADS] [CrossRef] [Google Scholar]
 Fry, J. N. 1984a, ApJ, 277, L5 [NASA ADS] [CrossRef] [Google Scholar]
 Fry, J. N. 1984b, ApJ, 279, 499 [NASA ADS] [CrossRef] [Google Scholar]
 Gruen, D., Friedrich, O., Krause, E., et al. 2018, Phys. Rev. D, 98 [Google Scholar]
 Heymans, C., Tröster, T., Asgari, M., et al. 2021, A&A, 646, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Hikage, C., Oguri, M., Hamana, T., et al. 2019, PASJ, 71, 43 [Google Scholar]
 Ivezić, Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111 [Google Scholar]
 Kaiser, N. 1998, ApJ, 498, 26 [NASA ADS] [CrossRef] [Google Scholar]
 Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv eprints [arXiv:1110.3193] [Google Scholar]
 Mo, H. J., & White, S. D. M. 1996, MNRAS, 282, 347 [Google Scholar]
 Peebles, P. J. E. 1980, The Largescale Structure of the Universe (Princeton University Press) [Google Scholar]
 Planck Collaboration VI. 2020, A&A, 641, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Repp, A., & Szapudi, I. 2020, MNRAS, 498, L125 [NASA ADS] [CrossRef] [Google Scholar]
 Repp, A., & Szapudi, I. 2021, MNRAS, 500, 3631 [Google Scholar]
 Schneider, P. 1996, MNRAS, 283, 837 [Google Scholar]
 Szapudi, I., & Colombi, S. 1996, ApJ, 470, 131 [NASA ADS] [CrossRef] [Google Scholar]
 Szapudi, I., Colombi, S., & Bernardeau, F. 1999, MNRAS, 310, 428 [NASA ADS] [CrossRef] [Google Scholar]
 Takada, M., & Hu, W. 2013, Phys. Rev. D, 87 [CrossRef] [Google Scholar]
 Touchette, H. 2011, ArXiv eprints [arXiv:1106.4146] [Google Scholar]
 Uhlemann, C., Friedrich, O., VillaescusaNavarro, F., Banerjee, A., & Codis, S. 2020, MNRAS, 495, 4006 [NASA ADS] [CrossRef] [Google Scholar]
 Valageas, P. 2002, A&A, 382, 412 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 White, S. D. M. 1979, MNRAS, 186, 145 [NASA ADS] [CrossRef] [Google Scholar]
Appendix A: Hierarchical tree models
In hierarchical tree models, the ppoint matter correlation functions are assumed to follow tree structures in the sense described in the main text. They are thus entirely defined by the twopoint functions ξ(r) and the vertexgenerating function ζ(τ). The exact generating function of multiple cell correlation functions can be built through simple transforms. We therefore consider a set of n cells V_{i}. These cells can overlap. The joint cumulants we consider are those of the average densities in cells V_{i} that can be expressed in terms of spatial averages^{10} of correlation functions,
We then wish to build the cumulantgenerating function,
This function represents the generating function of (averaged) tree diagrams where λ_{i} counts the number of points in each cells. As shown in Bernardeau & Schaeffer (1999), this is obtained with the help of the intermediate function τ(x) solution of the consistency equation^{11},
and then
This is an exact result based on pure combinatorics.
For cases of interest, it is possible to do a meanfield approximation that consists of assuming that τ(x) is constant within each cell. We then have the system of equations for τ_{i},
where
and
Bernardeau & Schaeffer (1999) found this to be very accurate, and we extensively use this approximation in the following, in particular for the minimal tree model.
Appendix B: Joint PDF, densitybias function in PT, and hierarchical tree models
Here we consider the joint distribution of densities in two cells whose centers are at distance d. The calculation is based on the inverse Laplace transform of the joint cumulantgenerating function φ(λ_{1}, λ_{2}),
where are the cumulants of the local density fields. They depend on the size and distance d between the cells. We assume in particular that the density correlation function between cells ξ(d) is small compared to unity and can serve as a small parameter.
B.1. Leading order in the meanfield approximation
Expanding with respect to ξ(d) then leads to the following form:
that is, a factorization of the linear term in ξ(d). This can explicitly be shown in case of tree models (as described in the main text). This is also the case in perturbation theory.
In case of the tree models, we have
where is the averaged correlation function within the cells. This is obtained assuming a meanfield approximation. We expect subleading corrections when d becomes comparable to the size of the cells.
B.2. Extending the previous case to the three variable case
In addition to the two variables ρ_{1} and ρ_{2}, we introduce here the variable ρ_{s}, which is the density within the sample. We assume that the correlation functions are built with the same model. Here the small parameter is the correlation between two cells V_{1} and V_{2} (at positions x_{1} and x_{2}) and the correlation function at sample size. It is natural in the context we are interested in to assume that these two quantities are on the same order.
We further assume we can use the meanfield approximation for the two cells V_{1} and V_{2}. This is not a priori the case for the density in the whole sample, however. We therefore derive the results without this approximation. The general expression is then
with the consistency relations
We therefore derive the expression of φ(λ_{s}, λ_{1}, λ_{2}) up to linear order jointly in , ξ(x_{s}, x_{1}), ξ(x_{s}, x_{1}) and ξ_{12} ≡ ξ(x_{1}, x_{2}). At zeroth order, τ(x_{s}) vanishes and
with a similar relation for . At linear order, we have
where
The resulting cumulantgenerating function reads
where the first three terms are at zeroth order and the last three are at linear order. Using the previous expression, we obtain
This relation is used to derive the expression of the sample bias functions in the next section.
B.3. Second order in the meanfield approximation
Results of Sect. B.1 can naturally be extended to any order in the crosscell correlation function in the context of the treehierarchical models (as illustrated on Fig B.1). Up to second order, it takes the form
Fig. B.1. Diagrammatic visualization of the bias functions. The function φ_{0}(λ) is the generating function of all trees within one cell, φ_{1}(λ) of all trees within one cell with one external lines, and φ_{2}(λ) with two external lines. The resulting connected diagrams up to second order in ξ_{12} are thus those presented here. Two φ_{2} generating functions cannot be conntected as that would induce a loop contribution. This reflects the underlying tree structure. 
where the function φ_{2}(λ) takes the form
This last expression can be directly obtained through a perturbative expansion as presented in the previous subsection^{12}.
In case of the minimal tree model, this perturbative expansion can be directly compared with exact meanfield results^{13}. This is shown in Fig. B.2. It shows that for a large regime in λ, the relation (B.15) provides a very accurate description of the joint cumulantgenerating function down to a distance corresponding to overlapping cells. For overlapping cells, relation (B.15) continues to be accurate except for high values of λ. In general,
Fig. B.2. Joint CGF as a function of d and for different values of λ_{1}, λ_{2}: from bottom to top, λ_{1} = λ_{2} = −0.1, λ_{1} = λ_{2} = 0.2, λ_{1} = λ_{2} = 0.5. The solid blue lines correspond to the twocell meanfield expression, (C.13). The predictions given in Eqs. (B.2) and (B.15) are shown as dashed red lines and dotted black lines. The shaded area is the region of overlapping cells. 
when d → 0 (more accurate results in case of the minimal tree model are given in Appendix D.).
Using (B.15), we then derive corrective perturbative terms to the joint density PDF. More specifically, we have
where
For a sample with periodic boundary conditions, the average of ξ_{12} vanishes, which a priori makes the other terms the leading contributors to the covariance elements. Equation (B.18) can be written as a sum of symmetric factorized terms,
showing the eigenstructure of the resulting matrix and showing that it defines three different eigendirections at most.
B.4. Relative density joint PDFs and bias functions
We wish to compute the joint PDF of the density when expressed in terms of the survey average density ρ_{s}. In order to do so, we consider the joint density P(ρ_{s}, ρ_{i}, ρ_{j}), where ρ_{s} is the density in the sample and ρ_{i} and ρ_{j} are the densities in two cells at distance d.
We wish to compute the joint probability distribution function of and , defined as
and the joint distribution functions of and , defined as
From these changes of variables, we have
and
Similarly, we also have
and
We continue the calculations by expressing the joint PDF with the help of inverse Laplace transforms,
As a result,
after integration over ρ_{s}. The latter expressions can be expressed as
In a similar manner, we can obtain the form of the joint PDF for ,
We then use the relation (B.14) to compute the form of these functions.
Noting that the expressions , ∫dx_{0} ξ(x_{0}, x_{1}) take all the same averaged value when integrated over the sample, which we note , then at linear order in ,
At the same order, we then have
Combining both the expressions of and and expanding all terms at linear order in , we obtain
This leads to the definition of the first sample bias function,
which can be reexpressed in terms of the densitybias function defined in Eq. (18) and the derivative of with respect to
The second samplebias function can be obtained in a similar manner. We indeed have
which eventually leads to
B.5. Response to a change in amplitude in ξ
A close notion related to the densitybias function is how the PDF is changed when the parameters of the simulations are changed. In particular for tree models, the statistical properties are entirely determined by the amplitude of the twopoint function, for instance, at cell size. This dependence can be made explicit by writing Eq. (17) as
after the change of variable and function,
where then the expression does not depend on (only on the functional form of ζ). It follows that
This expression can be used to defined the function b_{ξ}(ρ_{i}) as
It appears that b_{ξ}(ρ_{i}) is very similar to b_{s1}(ρ_{i}), but the two are not equal in general.
B.6. Close cell results
B.6.1. CGF for 2 close cells
Saddle point approximation and close cell results. In case of two cells, the general system in the meanfield approximation leads to
and
We are interested here in the behavior of φ(λ_{1}, λ_{2}) when the two cells are close, that is, when . When , τ_{1} and τ_{2} are also equal, making φ(λ_{1}, λ_{2}) a sole function of λ_{1} + λ_{2} and therefore forcing the joint PDF to be proportional to δ_{Dirac}(ρ_{i} − ρ_{j}). To be more precise, in this regime, , δ_{ρ} ≡ (ρ_{i} − ρ_{j})/2 is expected to be distributed with a width of about . This suggests that in this limit, the difference λ_{1} − λ_{2} should be treated as a large quantity of about . The limit behavior of the joint CGF can then be explicitly computed in terms of
In this limit, we obtain
leading to
and
The joint PDF of ρ_{m} = (ρ_{i} + ρ_{j})/2 and δ_{ρ} then reads
for which there is in general no closed form. We then need to rely on approximation schemes to complete the calculations.
B.6.2. Saddle point approximation
One of the approximations that can be used to evaluate Eq. (B.58) is to use the saddle point approximation. It has been used in the literature to compute the PDF (see Balian & Schaeffer 1989; Bernardeau 1992; Valageas 2002; Codis et al. 2016a). It is a priori valid when is small (and not for too high values of the density). In this approach, the expression under the exponential is approximated by a quadratic form at its minimum. In practice, the latter is obtained from the resolution of the system
which leads to the implicit or explicit values of λ, μ, and τ at the saddle point position (we hereafter denote this with the subscript s),
It is to be noted here that the value of τ_{s} is independent of δ_{ρ}. At the saddle point position, we then have
This then suggests that the joint PDF is given by the product of the onepoint PDF of ρ_{m} and a Gaussian distribution of δ_{ρ} of width . For usual models, as described above, ζ′(τ_{s})^{2} scales like a power of ρ_{m} so that one suggested form for the joint PDF is the following:
Interestingly, the value of α can be related to the reduced skewness of the density field from the computation of ⟨δ_{ρ}^{2}ρ_{m}⟩_{c}, and in the context of tree hierarchical models, it leads to
The validity of this form clearly ought to be checked. Its simplicity nonetheless offers a good grasp of the contribution of close cells to the covariance matrix.
Appendix C: Minimal tree model
In the previous section, general formulae were given. The aim of this section is to account for more precise results obtained in the case of a specific hierarchical model, namely the minimal tree model, as described below. It can then serve as a toy model for the construction of the approximate form for the covariance matrix. We first recall that this model describes the Rayleigh Levy flights model.
C.1. Onepoint results in the meanfield approximation
The RayleighLevy flight model makes it possible to build synthetic samples whose statistical properties follow the minimal model, that is, a hierarchical model with the following vertexgenerating function:
In the onecell meanfield approximation, we have the equation
which can be solved in
which leads to the following expression for the CGF:
The onepoint PDF of the density can then be computed explicitly. It takes the form in the continuous limit of
For this particular model, the void probability distribution (VPF) is nonzero even in the continuous limit. We recall here that the general expression of the VPF is given by exp(φ(−N̄)), which for the minimal model leads to
when N̄ → ∞.
The densitybias function can also be computed explicitly. For the minimal model, we have φ_{1}(λ)=φ(λ) so that
for ρ > 0. For this model, the fact that φ_{1}(λ)=φ(λ) implies that
This means that in the case of the minimal model, the densitybias function can be extracted from the functional form of the onepoint PDF as
This is a somewhat remarkable identity (which can be extended to higher orders, as shown below.)
In this case, the secondorder expansion leads to the form φ_{2}(λ) given by
and we note that φ_{p}(λ) all vanish for p ≥ 3.
C.2. Twocell results in the meanfield approximation
These meanfield calculations can be extended to the twocell case. In this case, we have the system
when the two cells are of the same size. This leads to the following expression for the joint CGF:
Remarkably, this expression can be written in terms of the onecell CGF as
This opens the possibility of computing the joint PDF to any order of ξ_{12}. The calculation of this expansion is made simple by the following observations: The corrective terms will make intervene functions of the forms
We further note that
with
We the define c_{n}(ρ) as
We then have on one side
and on the other side
which derives from the fact that
after integration by parts. As a result, the expression of the join PDF to any order can be written as polynomials making intervene P(ρ_{1}), b(ρ_{1}) P(ρ_{2}) and b(ρ_{1}) only.
C.3. Perturbative expansion for close cells
Another interesting result is when the cell centers are close (so that cells overlap), as described above. In this case, the limit behavior of the joint CGF is given by
with
It is then remarkable to see the result can be expressed with the sole onecell CGF,
In other words, the GFC of the variables ρ_{m} = (ρ_{1} + ρ_{2})/2 and δ_{ρ} = (ρ_{1} − ρ_{2})/2 is given by Eq. (C.26). It is possible to compute the joint PDF,
with the change of variable
The integral in leads to the onecell PDF of the density ρ_{m}, whereas the integral in μ can be done explicitly as it is a quadratic form in μ, leading to a Gaussian distribution in δ_{ρ}. The final PDF is given by
This shows that the joint PDF peaks for ρ_{1} ∼ ρ_{2} with a width that depends on the distance between the cells through the difference . Moreover, this form has no overlapping regime with the previous expansions of the joint PDF. It captures different aspects of the covariance calculations as listed below.

The previous expression says that close cells contribute more specifically to the covariance when ρ_{1} and ρ_{2} are close. This suggests that Eq. (C.29) contributes mostly to the near diagonal terms, whereas off diagonal terms could still be well described by perturbative expansions, as described before.

As noted before, perturbative expansions are closely related to supersample effects. They encode the way in which the local densities are jointly correlated with longwavelength modes. This is not the case in Eq. (C.29). It rather captures how a rare event, such as a peak, can contribute to the covariance elements: if there is a peak somewhere, nearby cells are likely to have a similar density up to distances for which remains small enough.
The above development can be pursued to any order in provided the following recipe is applied:
Then the joint density can be computed to any order in , making use of the very same expressions b_{n}(ρ).
The nexttoleading order in is thus given by
and the expansion can be extended in any (even) order in . Fig. C.1 illustrates the convergence properties of these expansions. Depending on , either the expansion in ξ_{12} or that in gives a very accurate estimate of the joint PDF. It opens the way to computing the covariance matrix starting in the twocell meanfield approximation (C.13).
Fig. C.1. Performances of the perturbative expansions of the joint PDF in the meanfield approximation either for the ξ_{12} expansion (open blue dots) or the expansion (red dots) up to 11th and 10th order, respectively. The comparisons are made for ρ_{1} = ρ_{2} = 2 and (top panel) and for ρ_{1} = 0.5, ρ_{2} = 3.5 and (bottom panel) and for ξ_{12} equalling 0.1, 0.3, 0.5, 0.7, and 0.9. 
C.4. Construction of the theoretical covariance matrix for the minimal tree model
The previous form can be used to compute the covariance matrix for the minimal tree model in simple implementations. It relies on analytic forms for both the twopoint cell correlation functions, which can formally be written as
for a given power spectrum. We also make use of the form P_{s}(r_{d}) given in footnote 2 to derive the PDF of cell distances. We then have all the required ingredients to compute the elements of the covariance matrix in the meanfield approximation,
In practice, is computed from the eighthorder expansion either in ξ_{12} when or in Δ_{ξ} when . This is used to explore the detailed properties of the covariance matrix and the validity of approximate schemes.
C.5. Joined PDF for relative densities
The minimal model allows us also to pursue the computation of the joint PDF for the variables or in all regimes. The first step is to extend Eq. (B.14) to a regime in which ξ_{12} is not assumed to be small. We find that
where φ_{c}(λ_{1}, λ_{2}) is given by
and we can note that
At leading order in ξ_{s}, that is, when we assume that the density fluctuations at sample size are much smaller than at smoothing scale, this expression then reduces to
We can then exploit this relation to compute the and from Eqs. (25) and (31), respectively. We then have at leading order in ξ_{s}
and
To complete the formal calculation of these expressions, we introduce the function
We can first note that Eq. (C.9) can be extended to
This comes from the observation that
can also be written
and that
The final expression of the PDF of the relative densities can then be obtained by noting that applying a multiplicative factor λ_{i} to the momentgenerating function is equivalent to the application of the operator ∂/∂ρ_{i} to the final expression this finally leads to the following forms:
for the and
for . These relations can then be applied to the expressions of the joint density such as found in the previous subsection.
All Figures
Fig. 1. Example of a realization of a RayleighLevy walk. Points mark the end point of each displacement. They are clearly correlated. 

In the text 
Fig. 2. Onepoint density PDF obtained with tophat filters compared with the theoretical predictions, Eq. (59). The values of are 0.8 and 1.09 for the blue and red curves, respectively, corresponding to two different values of l_{0}. The bottom panel shows the residuals. Departure from theory might be due to binning and/or to the finite number of samples. 

In the text 
Fig. 3. Measured variance of the density PDF, i.e., diagonal elements of the covariance matrix, in sets 𝒜 for α = 0.5 and different prescription of the measured density. From left to right, raw density ρ_{i}, scaled density , and scaled density . The blue dots and solid lines are from the meanfield analytical expressions, and the large gold symbols are from the numerical simulations. The dashed black lines are what is expected from the largescale leading contribution. The variance at cell scale is about 1.09, and the variance at sample scale, , is about 0.09. 

In the text 
Fig. 4. Resulting reduced covariance matrix for the three types of observables for set 𝒜. The covariance matrix is dominated by its leading eigenvalue and direction, leading to this typical butterfly shape of the reduced covariance matrix. 

In the text 
Fig. 5. Measured variance of the density PDF obtained for set ). Symbols are the same as in Fig. 3. 

In the text 
Fig. 6. Measured variance of the density PDF, i.e., diagonal elements of the covariance matrix, in sets 𝒜 and comparisons with proposed approximate forms. The yellow line and symbols are the results obtained in the numerical experiments. The dotdashed line is the prediction derived from relation (86), and the dashed gray line shows the prediction from Eq. (85). The dotdashed black lines correspond to the largescale contributions. 

In the text 
Fig. 7. Behavior of the first eigenvector with the same colorcoding as in Fig. 6. The dashed black lines are the largescale prediction, b_{#}(ρ_{i})P(ρ_{i}) appropriately normalized. The size of the data vector is 30. 

In the text 
Fig. 8. Performances of the approximate forms of the covariance matrix in terms of rigenvalues and χ^{2}distributions. Top panel: eigenvalues of the covariance matrices (rebinned into six bins) compared to what can be obtained from the proposed approximate forms; same colorcoding as for Fig. 6. The χ^{2} distributions are shown in the bottom panel. Model (86) reproduces the very same χ^{2} distributions. Model (85), in gray, is not as accurate and tends to slightly overestimate the χ^{2}. This latter behavior is amplified when a larger number of bins is used. 

In the text 
Fig. 9. Scale dependence of the matter correlation functions for a realistic cosmological model (cosmological parameters derived from Plank, Planck Collaboration VI 2020) for the 3D density and the projected density (for a uniformly sampled survey with a depth of about 800 h^{−1}Mpc between z = 0.75 and z = 1.25). The top panel shows (solid blue line) and (dashed red line) for the 3D density field, and the bottom panel shows and for the projected density. In both cases, the average value of the first moment of the twopoint correlation function is dominated by largedistance contributions, whereas shortdistance contributions dominate the second moment, assuming survey sizes of about 100 h^{−1} Mpc or above. 

In the text 
Fig. B.1. Diagrammatic visualization of the bias functions. The function φ_{0}(λ) is the generating function of all trees within one cell, φ_{1}(λ) of all trees within one cell with one external lines, and φ_{2}(λ) with two external lines. The resulting connected diagrams up to second order in ξ_{12} are thus those presented here. Two φ_{2} generating functions cannot be conntected as that would induce a loop contribution. This reflects the underlying tree structure. 

In the text 
Fig. B.2. Joint CGF as a function of d and for different values of λ_{1}, λ_{2}: from bottom to top, λ_{1} = λ_{2} = −0.1, λ_{1} = λ_{2} = 0.2, λ_{1} = λ_{2} = 0.5. The solid blue lines correspond to the twocell meanfield expression, (C.13). The predictions given in Eqs. (B.2) and (B.15) are shown as dashed red lines and dotted black lines. The shaded area is the region of overlapping cells. 

In the text 
Fig. C.1. Performances of the perturbative expansions of the joint PDF in the meanfield approximation either for the ξ_{12} expansion (open blue dots) or the expansion (red dots) up to 11th and 10th order, respectively. The comparisons are made for ρ_{1} = ρ_{2} = 2 and (top panel) and for ρ_{1} = 0.5, ρ_{2} = 3.5 and (bottom panel) and for ξ_{12} equalling 0.1, 0.3, 0.5, 0.7, and 0.9. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.