Open Access
Issue
A&A
Volume 666, October 2022
Article Number A129
Number of page(s) 17
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/202244065
Published online 14 October 2022

© E. Keihänen et al. 2022

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.

1. Introduction

The next generation of telescopes for cosmology surveys, such as Euclid (Laureijs et al. 2011), the Vera Rubin Observatory (Ivezić et al. 2019), DESI (DESI Collaboration 2016), or the Nancy Grace Roman Space Telescope (Akeson et al. 2019), will soon greatly improve the quality and quantity of data for galaxy clustering and lensing measurements. Their main aim is to illuminate the dark sector of cosmology, to test Einstein’s gravity on large scales, and to find signatures of the physics of inflation such as primordial non-Gaussianities.

Galaxy clustering is one of our most powerful cosmological probes (Cole et al. 2005; Eisenstein et al. 2005; Alam et al. 2005; Alam et al. 2021). However, galaxies are biased tracers of the nonlinear density field and their selection is subject to several different effects, such as fluctuations in exposure time, noise level, Milky Way extinction, photometry calibration error, sample contamination among others (e.g., Jasche & Lavaux 2017; Monaco et al. 2019; Kalus et al. 2019; Merz et al. 2021). Thus, their clustering will contain entangled information of matter clustering, galaxy bias, and observational systematics. The uncertainty will be represented by a covariance matrix. The inverse of the covariance matrix, the precision matrix, will be used in the likelihood analysis to infer cosmological parameters and their covariance. An accurate quantification of the clustering covariance under all the sources of uncertainty is therefore of paramount importance for the success of a survey.

It is customary to construct covariance matrices of galaxy clustering by using large samples of hundreds, if not thousands, of mock galaxy catalogs in the past light cone (e.g., Manera et al. 2013; Kitaura et al. 2016). Any known selection effects are applied to the mock catalogs, after which the clustering signal is measured with the same procedure as the one used for the actual data catalog. This sample of clustering measurements makes it possible to construct a brute-force numerical sample covariance. This approach has many advantages. It is conceptually simple, and the covariance built this way is positive-definite by construction. The estimation error of the covariance and its propagation to parameter estimation are well understood (Taylor et al. 2013; Taylor & Joachimi 2014; Hartlap et al. 2007; Dodelson & Schneider 2013; Percival et al. 2014, 2022; Sellentin & Heavens 2016). The sample covariance, however, is computationally expensive to construct; the variance of the covariance estimate decreases proportionally to 1/N, where N is the number of simulations, so getting the error down to a 10% level requires about 100 independent realizations, and ∼10 000 realizations for a 1% level.

This raises two related problems: on the one hand, the production of such mocks, which are typically addressed with approximate methods to bypass the high cost of N-body simulations (Monaco 2016); on the other hand, the measurement of galaxy clustering of thousands of mocks, which can be a bottleneck for a processing pipeline. Several strategies have been proposed to reduce the cost of covariance estimation. These include precision matrix expansion (Friedrich & Eifler 2018), tapering (Paz & Sánchez 2015), eigenvalue decomposition (Gaztañaga & Scoccimarro 2005), linear shrinkage (Pope & Szapudi 2008), sparse precision matrix estimation (Padmanabhan et al. 2016), and nonlinear shrinkage (Joachimi 2017).

In this work we focus on the estimation of the two-point correlation function (2PCF) and its covariance. In the special case of Gaussian fluctuations, the 2PCF contains all information on the statistical properties of the galaxy distribution. A concrete example would be the European Space Agency’s Euclid cosmology mission, and in particular its spectroscopic sample of Hα emitting galaxies (Euclid Collaboration 2022). This galaxy sample is expected to be as large as 20–30 million galaxies in the redshift range 0.9–1.8. The Euclid Consortium plans to represent the covariance matrix of the 2PCF with a few thousand mock catalogs. The time needed to measure the 2PCF of such a large number of mocks will be one major contributor to the whole pipeline from raw images to cosmological parameter inference.

The Landy–Szalay estimator (Landy & Szalay 1993) has become the standard estimator in galaxy clustering science. In addition to the actual galaxy catalog, the Landy–Szalay estimator requires a random catalog, which represents a uniform distribution of points within the survey volume considered, modulated with same weighting and selection as the data catalog. The 2PCF is then built as a combination of data-data (DD), data-random (DR), and random-random (RR) pair counts. The estimator is unbiased at the limit of an infinite random catalog, and, when the fluctuations are small, it yields the minimum-variance solution for the correlation function.

Since the random catalog is usually much larger (in number of objects) than the data catalog, the computational cost of the estimator is dominated by the cost of the RR counts. Glass-like random catalogs (Dávila-Kurbán et al. 2021) have been proposed as a way of reducing the required random catalog size. A straightforward way to reduce the cost is to coadd RR pair counts from a collection of small subcatalogs, rather than counting the pairs in one large catalog, thus omitting pairs between subcatalogs. This natural idea has been applied in many studies without explicit mention, or without assigning a name to it. We refer to this approach as the “split” method, thusly named because of the idea of “splitting” a large random catalog into several small ones. The term was coined in Keihänen et al. (2019), where the effect of the size of the random catalog on the estimator error is studied in a systematic way, and it is shown that the effect of the splitting on the estimation error is negligible. It is also shown that the optimal relation between the accuracy and computational cost is achieved when the subcatalogs have the same number of objects as the data catalog.

Even with a split random catalog, most of the computation time goes into the counting of the RR and DR pairs, while the estimation error is dominated by the scatter of the data points. The same applies to the sampling of the covariance matrix, the cost of which is N times that of a single 2PCF estimation. Using a single random catalog for all measurements can reduce the cost of the RR counts, but then counting the DR pairs becomes the next bottleneck.

In this paper we introduce a way of speeding up the covariance estimation, specific to the Landy–Szalay estimator. We aim to show that the covariance matrix for a 2PCF estimate can be constructed using a significantly smaller random catalog than what was used in the construction of the 2PCF itself.

The paper is organized as follows. In Sect. 2 we present the method and its theoretical background. In Sect. 3 we discuss the accuracy of the method, derive a covariance of covariance, and discuss implications for parameter estimation. In Sect. 4 we describe the simulations we used for the validation of the method. In Sect. 5 we present our results, comparing the accuracy and speed of the new method to those of the sample covariance. We give our conclusions in Sect. 6.

This work has made use of the 2PCF code developed by the Euclid Consortium.

2. Method

2.1. Landy–Szalay estimator

We denote the number of objects in the data catalog by Nd, and in the random catalog by Nr. We assume that the two-point correlation function is estimated with the Landy–Szalay estimator, with the additional twist of the “split” option, as follows. The random catalog is split into M subcatalogs, where M is called the split factor. RR pairs are counted within each subcatalog and coadded, but pairs of objects in two distinct subcatalogs are omitted. Each subcatalog will have to obey the same statistical properties and have the same sky coverage as the full catalog. In other words, each subcatalog must itself be a valid random catalog. Splitting the random catalog reduces the computational cost of 2PCF estimation significantly, for a negligible loss of accuracy (Keihänen et al. 2019). The optimal split factor has been shown to be M = Nr/Nd, that is to say the random catalog is split into subcatalogs of the same size as the data catalog. For fixed Nr, this minimizes the variance of the correlation function for a given computation time, or minimizes the computation time required to reach a given target variance. From here on we parametrize the size of the random catalog as Nr = MNd.

The random catalog is usually constructed to be significantly larger than the data catalog, in order that the estimation error is dominated by the scatter of the data points rather than that of the random points. In this work we use as baseline the value M = 50, the value adopted for the Euclid galaxy clustering study.

The Landy–Szalay estimator is

ξ ̂ ( r ) : = dd ( r ) 2 dr ( r ) rr ( r ) + 1 , $$ \begin{aligned} \hat{\xi }(\mathbf r ) := \frac{\mathrm{dd} (\mathbf r )-2\mathrm{dr} (\mathbf r )}{\mathrm{rr} (\mathbf r )}+1 , \end{aligned} $$(1)

where dd(r), dr(r), and rr(r) denote the normalized data-data, data-random, and random-random pair counts in a separation bin. Following the notation of Keihänen et al. (2019), we use the vector r to denote the generalized separation vector. Vector r may refer to a physical separation distance, or, more generally, to an arbitrary bin in 1D, 2D, or 3D space. The normalized data-data pair count is

dd ( r ) : = DD ( r ) N d ( N d 1 ) / 2 , $$ \begin{aligned} \mathrm{dd} (\mathbf r ) := \frac{\mathrm{DD} (\mathbf r )}{N_{\rm d}(N_{\rm d}-1)/2} , \end{aligned} $$(2)

where DD(r) is the unnormalized pair count. This is unaffected by the split. Similarly, the normalized data-random count is given by

dr ( r ) : = M 1 i = 1 M DR i ( r ) N d 2 = M 1 i = 1 M dr i ( r ) , $$ \begin{aligned} \mathrm{dr} (\mathbf r ) := {M^{-1}}\sum _{i=1}^M \frac{\mathrm{DR} _i(\mathbf r )}{N_{\rm d}^2} = {M^{-1}}\sum _{i=1}^M \mathrm{dr} _i(\mathbf r ) , \end{aligned} $$(3)

where DRi is the pair count between the data catalog and the ith random subcatalog. Since the dependence on the random catalog is linear, this too is unaffected by the split.

The normalized random-random count with split can be written as

rr ( r ) : = M 1 i = 1 M RR i ( r ) N d ( N d 1 ) / 2 = M 1 i = 1 M rr i ( r ) , $$ \begin{aligned} \mathrm{rr} (\mathbf r ) := {M^{-1}}\sum _{i=1}^M \frac{\mathrm{RR} _i(\mathbf r )}{N_{\rm d}(N_{\rm d}-1)/2} = {M^{-1}}\sum _{i=1}^M \mathrm{rr} _i(\mathbf r ) , \end{aligned} $$(4)

where RRi is the unnormalized pair count from the ith subcatalog. With this notation, the split Landy–Szalay estimator becomes

ξ ̂ ( r ) = dd ( r ) 2 M i dr i ( r ) M 1 i rr i ( r ) + 1 . $$ \begin{aligned} \hat{\xi }(\mathbf r ) = \frac{\mathrm{dd} (\mathbf r )-\frac{2}{M}\sum _i \mathrm{dr} _i(\mathbf r )}{{M^{-1}}\sum _i \mathrm{rr} _i(\mathbf r )}+1 . \end{aligned} $$(5)

We use the hat ( ξ ̂ $ {\hat\xi} $) to indicate that this is an estimate of the true correlation function ξ.

The computational cost of the estimator is roughly proportional to the total number of pairs counted. The cost of the split estimator is proportional to 1 2 N d 2 ( 1 + 3 M ) $ \tfrac12N_{\mathrm{_d}}^2(1+3M) $, while without split the cost grows proportional to 1 2 N d 2 ( 1 + 2 M + M 2 ) $ \tfrac12N_{\mathrm{_d}}^2(1+2M+M^2) $.

2.2. Covariance

We consider the estimated correlation function in two distance bins r1 and r2, which may or may not be the same. We want to estimate the covariance

cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ] : = [ ξ ̂ ( r 1 ) ξ ̂ ( r 1 ) ] [ ξ ̂ ( r 2 ) ξ ̂ ( r 2 ) ] . $$ \begin{aligned} \mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2)\right] := \left\langle \left[\hat{\xi }(\mathbf r _1) -\langle \hat{\xi }(\mathbf r _1)\rangle \right] \left[ \hat{\xi }(\mathbf r _2) -\langle \hat{\xi }(\mathbf r _2)\rangle \right] \right\rangle . \end{aligned} $$(6)

The brackets ⟨⟩ denote an average over an infinite ensemble of data realizations, for fixed cosmology and survey geometry. Since we consider the actual measured correlation function to represent one such realization, the covariance is a measure of the statistical uncertainty in the measured correlation function.

Assume we have N mock catalogs and corresponding random catalogs, with the same sky coverage, masking etc. as the actual survey catalog. Let ξ ̂ i ( r ) , i = 1 N $ {\hat\xi}_i({\textbf{r}}), i=1\ldots N $ denote the set of correlation functions estimated from these mocks. An unbiased estimate of the covariance is given by the sample covariance

C ̂ [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ] : = 1 N 1 i = 1 N [ ξ ̂ i ( r 1 ) ξ ¯ ( r 1 ) ] [ ξ ̂ i ( r 2 ) ξ ¯ ( r 2 ) ] , $$ \begin{aligned} \hat{{\mathcal{C} }}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2)\right] := \frac{1}{N\!-\!1} \sum _{i=1}^N \left[\hat{\xi }_i(\mathbf r _1)-\bar{\xi }(\mathbf r _1)\right] \left[\hat{\xi }_i(\mathbf r _2)-\bar{\xi }(\mathbf r _2)\right] , \end{aligned} $$(7)

where

ξ ¯ ( r ) : = 1 N i = 1 N ξ ̂ i ( r ) $$ \begin{aligned} \bar{\xi }(\mathbf r ) := \frac{1}{N} \sum _{i=1}^{N} \hat{\xi }_i(\mathbf r ) \end{aligned} $$(8)

is the estimated mean. The required number of mocks will depend on the accuracy requirement of the application in question. For few percent-level accuracy in parameter error bars, N ≳ 1000 is required.

Throughout this paper we use a convention where the symbol C ̂ $ {\hat{\text{C}}} $ with a hat denotes a numerical estimate of a covariance, and either cov or C denotes the true (ensemble average) covariance. Specifically, C ̂ $ {\hat{\mathcal{C}}} $ is reserved for the sample covariance estimate, constructed as in Eq. (7) (see Table 1).

Table 1.

Symbols used in this work.

The computational cost of constructing the sample covariance, obviously, is N times the cost of a single 2PCF estimate. In this paper we show that a given level of accuracy can be reached with a significantly lower CPU cost. For this goal, we now break the covariance of the Landy–Szalay estimator into pair count covariances.

Following the notation of the Landy–Szalay paper, we write

dd ( r ) = dd ( r ) [ 1 + α ( r ) ] , dr i ( r ) = dr ( r ) [ 1 + β i ( r ) ] , rr i ( r ) = rr ( r ) [ 1 + γ i ( r ) ] . $$ \begin{aligned}&\mathrm{dd} (\mathbf r ) = \langle \mathrm{dd} (\mathbf r )\rangle \left[1+\alpha (\mathbf r )\right] , \nonumber \\&\mathrm{dr} _i(\mathbf r ) = \langle \mathrm{dr} (\mathbf r )\rangle \left[1+\beta _i(\mathbf r )\right] , \\&\mathrm{rr} _i(\mathbf r ) = \langle \mathrm{rr} (\mathbf r )\rangle \left[1+\gamma _i(\mathbf r )\right] . \nonumber \end{aligned} $$(9)

The brackets ⟨⟩ indicate an ensemble average. Thus α, βi, γi capture the variation of the pair counts around their average. By definition, ⟨α⟩ = ⟨β⟩ = ⟨γ⟩ = 0. Inserting these into the Landy–Szalay estimator yields

ξ ̂ ( r ) = dd ( r ) [ 1 + α ( r ) ] rr ( r ) [ 1 + M 1 i γ i ( r ) ] 2 dr ( r ) [ 1 + M 1 i β i ( r ) ] rr ( r ) [ 1 + M 1 i γ i ( r ) ] + 1 . $$ \begin{aligned}&\hat{\xi }(\mathbf r ) = \frac{\langle \mathrm{dd} (\mathbf r ) \rangle \left[1+\alpha (\mathbf r )\right] }{\langle \mathrm{rr} (\mathbf r ) \rangle \left[1+{M^{-1}}\sum _i\gamma _i(\mathbf r )\right] } \nonumber \\&\quad \qquad -2\frac{\langle \mathrm{dr} (\mathbf r ) \rangle \left[1+{M^{-1}}\sum _i\beta _i(\mathbf r )\right] }{\langle \mathrm{rr} (\mathbf r ) \rangle \left[1+{M^{-1}}\sum _i\gamma _i(\mathbf r )\right] } +1 . \end{aligned} $$(10)

For the ensemble averages it holds that ⟨dr(r)⟩ = ⟨rr(r)⟩ and ⟨dd(r)⟩ = d(r)⟨rr(r)⟩, where we define

d ( r ) : = 1 + ξ ( r ) . $$ \begin{aligned} d(\mathbf r ) := 1+\xi (\mathbf r ) . \end{aligned} $$(11)

If the RR counts are large, as is usually the case, then γi(r)≪1, and [1 + M−1iγi(r)]−1 ≈ 1 − M−1iγi(r). Assuming α, β, γ ≪ 1, we can drop the quadratic terms, and the estimator becomes

ξ ̂ ( r ) d ( r ) [ 1 + α ( r ) M 1 i γ i ( r ) ] 2 [ 1 + M 1 i β i ( r ) M 1 i γ i ( r ) ] + 1 $$ \begin{aligned}&\hat{\xi }(\mathbf r ) \approx d(\mathbf r ) \left[1+\alpha (\mathbf r ) -{M^{-1}}{\textstyle {\sum }}_i\gamma _i(\mathbf r )\right] \nonumber \\&\qquad -2\left[1+{M^{-1}}{\textstyle {\sum }}_i\beta _i(\mathbf r ) -{M^{-1}}{\textstyle {\sum }}_i\gamma _i(\mathbf r )\right] +1 \end{aligned} $$(12)

and as ensemble average ξ ̂ ( r ) = d ( r ) 1 $ {\langle {\hat\xi}({\textbf{r}})\rangle}=d({\textbf{r}})-1 $. The deviation from the ensemble average is

ξ ̂ ( r ) ξ ( r ) d ( r ) [ α ( r ) M 1 i γ i ( r ) ] 2 M 1 [ i β i ( r ) i γ i ( r ) ] . $$ \begin{aligned}&\hat{\xi }(\mathbf r ) -\langle \xi (\mathbf r )\rangle \\&\quad \approx d(\mathbf r ) \left[\alpha (\mathbf r ) -{M^{-1}}{\textstyle {\sum }}_i\gamma _i(\mathbf r ) \right] -2{M^{-1}}\left[ {\textstyle {\sum }}_i\beta _i(\mathbf r ) -{\textstyle {\sum }}_i\gamma _i(\mathbf r ) \right] . \nonumber \end{aligned} $$(13)

Inserting Eq. (13) into Eq. (6) yields a combination of cross-correlation terms between α, β, γ. We now consider each of them in turn, and make use of our knowledge of their statistical properties to calculate the expectation values.

Let us begin with the term

M 2 ij γ i ( r 1 ) γ j ( r 2 ) . $$ \begin{aligned} {M^{-2}}\sum _{ij} \langle \gamma _i(\mathbf r _1)\gamma _j(\mathbf r _2)\rangle . \end{aligned} $$(14)

Indices i, j label independent random subcatalogs, all of which are statistically identical. We therefore have ⟨γi(r1)γj(r2)⟩ = 0 for i ≠ j, and ⟨γi(r1)γi(r2)⟩ = ⟨γ(r1)γ(r2)⟩, where we drop the subscript to indicate an ensemble average that is the same for all subcatalogs. The covariance element becomes

M 2 ij γ i ( r 1 ) γ j ( r 2 ) = M 1 γ ( r 1 ) γ ( r 2 ) . $$ \begin{aligned} {M^{-2}}\sum _{ij} \langle \gamma _i(\mathbf r _1)\gamma _j(\mathbf r _2)\rangle = {M^{-1}}\langle \gamma (\mathbf r _1)\gamma (\mathbf r _2)\rangle . \end{aligned} $$(15)

Based on similar arguments we can write

M 1 i α ( r 1 ) β i ( r 2 ) = α ( r 1 ) β ( r 2 ) $$ \begin{aligned} {M^{-1}}\sum _{i} \langle \alpha (\mathbf r _1)\beta _i(\mathbf r _2)\rangle = \langle \alpha (\mathbf r _1)\beta (\mathbf r _2)\rangle \end{aligned} $$(16)

and

M 1 i α ( r 1 ) γ i ( r 2 ) = α ( r 1 ) γ ( r 2 ) . $$ \begin{aligned} {M^{-1}}\sum _{i} \langle \alpha (\mathbf r _1)\gamma _i(\mathbf r _2)\rangle = \langle \alpha (\mathbf r _1)\gamma (\mathbf r _2)\rangle \, . \end{aligned} $$(17)

Assuming that the random catalog and the data catalog are independent would allow us to drop the ⟨α(r1)γi(r2)⟩ terms. This is however not necessarily always true. If the characteristics of the observed data catalog (mask, selection function) are used for the generation of the random catalog, a correlation may arise between the data catalog and the random catalog. Although such correlations are likely to be small, the assumption of independence is not relevant for the method we are developing, and we will thus not implement it.

When dealing with the terms involving β and γ, we split the sums into i = j and i ≠ j parts, to obtain

M 2 ij β i ( r 1 ) β j ( r 2 ) = M 1 β ( r 1 ) β ( r 2 ) + ( 1 M 1 ) β ( r 1 ) β ( r 2 ) cr . $$ \begin{aligned}&{M^{-2}}\sum _{ij} \langle \beta _i(\mathbf r _1)\beta _j(\mathbf r _2)\rangle \nonumber \\&\qquad ={M^{-1}}\langle \beta (\mathbf r _1)\beta (\mathbf r _2)\rangle +\left(1-{M^{-1}}\right) \langle \beta (\mathbf r _1)\beta (\mathbf r _2)\rangle _{\rm cr} . \end{aligned} $$(18)

Here the subscript cr (‘cross’) denotes that we are dealing with DR counts that involve two distinct random subcatalogs, however correlated through the shared data catalog.

Based on similar arguments, we obtain:

M 2 ij β i ( r 1 ) γ j ( r 2 ) = M 1 β ( r 1 ) γ ( r 2 ) + ( 1 M 1 ) β ( r 1 ) γ ( r 2 ) cr . $$ \begin{aligned}&{M^{-2}}\sum _{ij} \langle \beta _i(\mathbf r _1)\gamma _j(\mathbf r _2)\rangle \nonumber \\&\qquad ={M^{-1}}\langle \beta (\mathbf r _1)\gamma (\mathbf r _2)\rangle +\left(1-{M^{-1}}\right) \langle \beta (\mathbf r _1)\gamma (\mathbf r _2)\rangle _{\rm cr} . \end{aligned} $$(19)

As in the case of ⟨αγ⟩, assuming independence between the random catalog and the data catalog would allow us to drop the second term, but this assumption is not relevant for the method under discussion.

We introduce a more concise notation, where we drop the arguments r1 and r2, and each term is interpreted as a symmetrized version of itself. When d is involved, it is paired with the first element. For instance, dαβ⟩ is to be read as

d α β = 1 2 [ d ( r 1 ) α ( r 1 ) β ( r 2 ) + d ( r 2 ) α ( r 2 ) β ( r 1 ) ] , $$ \begin{aligned} d\langle \alpha \beta \rangle = \frac{1}{2} \left[ d(\mathbf r _1)\langle \alpha (\mathbf r _1)\beta (\mathbf r _2)\rangle + d(\mathbf r _2)\langle \alpha (\mathbf r _2)\beta (\mathbf r _1)\rangle \right] , \end{aligned} $$(20)

similarly for the other pairs. In this notation, the covariance takes the form

cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ; M ] = d 2 [ α α + M 1 γ γ 2 α γ ] 4 d [ α β α γ M 1 γ β ( 1 M 1 ) γ β cr + M 1 γ γ ] + 4 [ M 1 β β + ( 1 M 1 ) β β cr 2 M 1 β γ 2 ( 1 M 1 ) β γ cr + M 1 γ γ ] , $$ \begin{aligned}&\mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2);M\right] \nonumber \\&\quad = d^2\left[ \langle \alpha \alpha \rangle +{M^{-1}}\langle \gamma \gamma \rangle -2\langle \alpha \gamma \rangle \right] \nonumber \\&\qquad -4d\left[ \langle \alpha \beta \rangle -\langle \alpha \gamma \rangle -{M^{-1}}\langle \gamma \beta \rangle -(1-{M^{-1}})\langle \gamma \beta \rangle _{\rm cr} +{M^{-1}}\langle \gamma \gamma \rangle \right] \nonumber \\&\qquad +4 \bigg [ {M^{-1}}\langle \beta \beta \rangle +(1-{M^{-1}}) \langle \beta \beta \rangle _{\rm cr} \nonumber \\&\qquad -2{M^{-1}}\langle \beta \gamma \rangle -2(1\!-\!{M^{-1}})\langle \beta \gamma \rangle _{\rm cr} +{M^{-1}}\langle \gamma \gamma \rangle \bigg ] , \end{aligned} $$(21)

where the third argument (M) indicates the size of the random catalog.

We have expressed the Landy–Szalay covariance in terms of pair-count covariances. We are now arriving at an observation that is central for the method we are developing. Every term in Eq. (21) is either independent of M, or proportional to M−1. The covariance is thus of the form

cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ; M ] = A ( r 1 , r 2 ) + 1 M B ( r 1 , r 2 ) . $$ \begin{aligned} \mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2);M\right] = \mathrm{A}(\mathbf r _1,\mathbf r _2)+\frac{1}{M} \mathrm{B}(\mathbf r _1,\mathbf r _2) . \end{aligned} $$(22)

Suppose we know the covariance for two distinct random-catalog sizes M = Ma and M = Mb > Ma. We readily see that Eq. (22) holds when

A ( r 1 , r 2 ) = M b M b M a cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ; M b ] M a M b M a cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ; M a ] B ( r 1 , r 2 ) = M a M b M b M a { cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ; M a ] cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ; M b ] } . $$ \begin{aligned}&\mathrm{A}(\mathbf r _1,\mathbf r _2) = \frac{M_b}{M_b-M_a} \mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2);M_b\right] \nonumber \\&\qquad \qquad -\frac{M_a}{M_b-M_a}\mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2);M_a\right] \nonumber \\&\mathrm{B}(\mathbf r _1,\mathbf r _2) = \frac{M_aM_b}{M_b-M_a} \Big \{\mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2);M_a\right] \nonumber \\&\qquad \qquad \qquad \quad -\mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2);M_b\right] \Big \}. \end{aligned} $$(23)

To construct the covariance for an arbitrary value of M, it is sufficient to estimate the covariance for two smaller random-catalog sizes Ma and Mb. This is much cheaper than running the estimator with a large M. Equations (22) and (23) can then be used to construct the covariance for the actual random catalog size. This is the basic idea behind our proposed method.

2.3. Linear construction

We now consider how Ma and Mb should be chosen. The largest reduction in computational cost is obtained with Ma = 1 and Mb = 2. With Mb = M the method reduces to the conventional sample covariance.

We now argue in favor of selecting Mb = 2Ma. This allows us to use the random catalogs efficiently, as we now proceed to explain. For each mock data catalog we generate two independent random catalogs of same size Ma. We evaluate the M = Ma covariance with either of the two sets, and take the average. This is the Ma covariance to enter the formula (23). For the 2Ma covariance, we take the combined data of the two Ma random catalogs. This procedure reduces the scatter of the estimate, compared to generating a new 2Ma random catalog, since the correlated fluctuations cancel. We return to this point in Sect. 3 where we consider the accuracy of the estimated covariance quantitatively. To further save CPU time we save the DR and RR pair counts from the M = Ma simulations, and construct the M = 2Ma 2PCF from the saved pair counts, saving the CPU cost of another run with M = 2Ma. Throughout the rest of this paper we set Mb = 2Ma.

Our method is summarized formally as follows. We denote by C ̂ ij $ {\hat{\text{C}}}_{ij} $ an estimated covariance matrix between two elements ξ(ri),ξ(rj) of the correlation function. Indices i, j may refer to different distance bins, or to elements picked from different multipoles.

We denote the correlation function estimated from a data catalog D and a random catalog R as ξ ̂ ( D , R ) $ {\hat\xi}(\mathrm{D},\mathrm{R}) $, and the sample covariance over the set of mocks as C ̂ ij ( ξ ̂ ) $ {\hat{\mathcal{C}}}_{ij}({\hat\xi}) $. We now have one data catalog D and two random catalogs R1 and R2, both of size Ma. We construct estimates for the covariances with M = Ma and M = 2Ma (which we denote by C ̂ ij a $ {\hat{\text{C}}}^a_{ij} $ and C ̂ ij b $ {\hat{\text{C}}}^b_{ij} $, respectively) as

C ̂ ij a : = 1 2 C ̂ ij [ ξ ̂ ( D , R 1 ) ] + 1 2 C ̂ ij [ ξ ̂ ( D , R 2 ) ] C ̂ ij b : = C ̂ ij [ ξ ̂ ( D , R 1 R 2 ) ] . $$ \begin{aligned}&\hat{\mathrm{C}}^a_{ij} := {\textstyle \frac{1}{2}}\hat{{\mathcal{C} }}_{ij}\left[\hat{\xi }(\mathrm{D},\mathrm{R}_1)\right] +{\textstyle \frac{1}{2}}\hat{{\mathcal{C} }}_{ij}\left[\hat{\xi }(\mathrm{D},\mathrm{R}_2)\right] \nonumber \\&\hat{\mathrm{C}}^b_{ij} := \hat{{\mathcal{C} }}_{ij}\left[\hat{\xi }(\mathrm{D},\mathrm{R}_1\cup \mathrm{R}_2)\right] . \end{aligned} $$(24)

From these we construct two component matrices

A ̂ ij : = 2 C ̂ ij b C ̂ ij a B ̂ ij : = 2 M a [ C ̂ ij a C ̂ ij b ] $$ \begin{aligned}&\hat{\mathrm{A}}_{ij} := 2\hat{\mathrm{C}}^b_{ij} -\hat{\mathrm{C}}^a_{ij} \nonumber \\&\hat{\mathrm{B}}_{ij} := 2M_a\left[\hat{\mathrm{C}}^a_{ij} -\hat{\mathrm{C}}^b_{ij}\right] \end{aligned} $$(25)

and the final covariance as

C ̂ ij LC : = A ̂ ij + M 1 B ̂ ij . $$ \begin{aligned} \hat{\mathrm{C}}^\mathrm{LC}_{ij}:= \hat{\mathrm{A}}_{ij} +{M^{-1}}\hat{\mathrm{B}}_{ij} . \end{aligned} $$(26)

We refer to the covariance of Eq. (26) as the linear construction (LC) covariance.

The computational complexity of the Landy–Szalay estimator is roughly proportional to 1 2 N d 2 ( 1 + 3 M ) $ \frac12N_{\mathrm{d}}^2(1+3M) $, of which 1 2 N d 2 $ \frac12N_{\mathrm{d}}^2 $ goes to DD pairs, N d 2 M $ N_{\rm d}^2M $ to DR pairs, and 1 2 N d 2 M $ \frac12N_{\mathrm{d}}^2M $ to RR pairs. The cost of our proposed approach is 2 · 1 2 N d 2 ( 1 + 3 M a ) $ 2\cdot\frac12N_{\mathrm{d}}^2(1+3M_a) $ per realization. This is assuming the 2PCF estimation code is run twice, which involves counting the DD pairs twice. If that is avoided, the cost is further reduced to 1 2 N d 2 ( 1 + 6 M a ) $ {{\textstyle\frac12}}N_{\mathrm{d}}^2(1+6M_a) $. With M = 50 and Ma = 1, the gain with respect to sample covariance is a factor of 151/7 ≈ 21.6, and increases with increasing M. Moreover, our result readily yields an extrapolation to infinite M, that is, we know what would be the estimator variance if we could use an infinite number of random points, and how much the variance with a finite M differs from that.

It is important to note that the presented derivation is based on very general assumptions on how the Landy–Szalay estimator is build from pair counts, and on the definition of variance. We do not make assumptions on the survey geometry, or which physical processes cause the scatter in the random counts, nor do we assume Gaussianity. The proposed method is thus valid for a very wide class of galaxy distributions.

Another important aspect to note is that the same procedure can be applied to any linear function of the correlation function. The only requirement is that the decomposition of Eq. (22) remains valid. In particular, the method applies as such to the multipoles ξ(r) of the correlation function, and to the projected correlation function, since both are linear functions of the underlying 2-dimensional correlation function. It also applies to a rebinned correlation function.

3. Error analysis

We note that C ̂ LC $ {\hat{\text{C}}^{\text{LC}}} $ is a noisy estimate of the underlying true covariance C. Thus it is itself a random variable, and we can define a covariance for it. In the following we analyze the error of the covariance estimate, and derive a covariance of covariance for both the sample covariance and the LC covariance.

3.1. Gaussian distribution

We consider first the general case of four random variables x, y, w, z. For each of these we assume to have N independent realizations. An unbiased estimate of the covariance C(x, y) is obtained as

C ̂ ( x , y ) : = 1 N 1 i = 1 N ( x i x ¯ ) ( y i y ¯ ) , $$ \begin{aligned} \hat{{\mathcal{C} }}(x,{ y}) := \frac{1}{N-1} \sum _{i=1}^N (x_i-\bar{x})({ y}_i-\bar{ y}) , \end{aligned} $$(27)

where x ¯ = 1 N i = 1 N x i $ \bar x = \frac {1}{N} \sum\nolimits_{i=1}^{N} x_i $, and similarly for y ¯ $ \bar {y} $. It can be shown that

cov [ C ̂ ( x , y ) , C ̂ ( z , w ) ] = 1 N x y z w 1 N x y z w + 1 N ( N 1 ) ( x z y w + x w y z ) , $$ \begin{aligned}&\mathrm{cov}\left[\hat{{\mathcal{C} }}(x,{ y}),\hat{{\mathcal{C} }}(z,{ w})\right] = {\frac{1}{N}} \langle x^{\prime }{ y}^{\prime }z^{\prime }{ w}^{\prime }\rangle -{\frac{1}{N}}\langle x^{\prime }{ y}^{\prime }\rangle \langle z^{\prime }{ w}^{\prime }\rangle \nonumber \\&\qquad \quad +{\frac{1}{N(N-1)}} \left(\langle x^{\prime }z^{\prime }\rangle \langle { y}^{\prime }{ w}^{\prime }\rangle +\langle x^{\prime }{ w}^{\prime }\rangle \langle { y}^{\prime }z^{\prime }\rangle \right) , \end{aligned} $$(28)

where x′,y′,z′,w′ represent a deviation from the distribution mean, x′=x − ⟨x⟩. This is a general result that does not assume Gaussianity. For a Gaussian distribution

x y z w = x y z w + x z y w + x w y z . $$ \begin{aligned} \langle x^{\prime }{ y}^{\prime }z^{\prime }{ w}^{\prime }\rangle =\langle x^{\prime }{ y}^{\prime }\rangle \langle z^{\prime }{ w}^{\prime }\rangle +\langle x^{\prime }z^{\prime }\rangle \langle { y}^{\prime }{ w}^{\prime }\rangle +\langle x^{\prime }{ w}^{\prime }\rangle \langle { y}^{\prime }z^{\prime }\rangle . \end{aligned} $$(29)

If x, y, z, w are Gaussian distributed, Eq. (28) simplifies into

cov [ C ̂ ( x , y ) , C ̂ ( z , w ) ] = 1 N 1 [ C ( x , z ) C ( y , w ) + C ( x , w ) C ( y , z ) ] . $$ \begin{aligned}&\mathrm{cov}\left[\hat{{\mathcal{C} }}(x,{ y}),\hat{{\mathcal{C} }}(z,{ w})\right] \nonumber \\&\qquad = \frac{1}{N-1} \left[\mathrm{C}(x,z)\mathrm{C}({ y},{ w}) +\mathrm{C}(x,{ w})\mathrm{C}({ y},z) \right]. \end{aligned} $$(30)

3.2. Sample covariance

We can readily apply the results from above to the sample covariance. We take x, y, z, w to represent elements of the correlation function, as estimated through Landy–Szalay. We denote these elements by ξ ̂ i , ξ ̂ j , ξ ̂ k , ξ ̂ l $ {\hat\xi}_i,{\hat\xi}_j,{\hat\xi}_k,{\hat\xi}_l $. Different indices refer both to different distance bins, and to elements picked from different multipoles.

We denote the sample covariance for brevity by C ̂ Smp ij C ̂ ( ξ ̂ i , ξ ̂ j ) $ {\hat{\text{C}}^{\text{Smp}}}_{ij}\equiv{\hat{\mathcal{C}}}({\hat\xi}_i,{\hat\xi}_j) $. Assuming that the ξ ̂ $ {\hat\xi} $ estimates follow the Gaussian distribution, the covariance of the sample covariance is

cov ( C ̂ ij Smp , C ̂ kl Smp ) = 1 N 1 ( C ik C jl + C il C jk ) 1 N 1 ( C ̂ ik Smp C ̂ jl Smp + C ̂ il Smp C ̂ jk Smp ) . $$ \begin{aligned}&\mathrm{cov}\left(\hat{\mathrm{C}}^\mathrm{Smp}_{ij},\hat{\mathrm{C}}^\mathrm{Smp}_{kl}\right) = \frac{1}{N-1} (\mathrm{C}_{ik}\mathrm{C}_{jl} +\mathrm{C}_{il}\mathrm{C}_{jk} ) \nonumber \\&\qquad \qquad \approx \frac{1}{N-1} \left(\hat{\mathrm{C}}^\mathrm{Smp}_{ik}\hat{\mathrm{C}}^\mathrm{Smp}_{jl} +\hat{\mathrm{C}}^\mathrm{Smp}_{il}\hat{\mathrm{C}}^\mathrm{Smp}_{jk}\right) . \end{aligned} $$(31)

In particular for the diagonal elements

cov ( C ̂ ii Smp , C ̂ kk Smp ) = 2 N 1 [ C ( ξ i , ξ k ) ] 2 2 ( C ̂ ik Smp ) 2 N 1 . $$ \begin{aligned} \mathrm{cov}\left(\hat{\mathrm{C}}^\mathrm{Smp}_{ii},\hat{\mathrm{C}}^\mathrm{Smp}_{kk}\right) =\frac{2}{N-1} \left[\mathrm{C}(\xi _i,\xi _k)\right]^2 \approx \frac{2\left(\hat{\mathrm{C}}^\mathrm{Smp}_{ik}\right)^2}{N-1} . \end{aligned} $$(32)

The 1 σ uncertainty of a diagonal element of the sample covariance matrix is a fraction of 2 / ( N 1 ) $ \sqrt{2/(N-1)} $ of the diagonal element itself. For N = 5000 this gives a 2% error (1 σ). The off-diagonal part inherits the correlated structure of the covariance.

3.3. Linear construction

We can make use of Eq. (30) to derive covariance of covariance for the LC method as well. The basic data sets are now two sets of M = Ma estimates of the correlation function, which we denote by x and x′. Again we assume that x, x′ follow a Gaussian distribution. As described in Sect. 2.2, the M = Ma covariance is constructed as

C ̂ ij a = 1 2 [ C ̂ ( x i , x j ) + C ̂ ( x i , x j ) ] . $$ \begin{aligned} \hat{\mathrm{C}}^a_{ij} = {\textstyle \frac{1}{2}}\left[\hat{{\mathcal{C} }}(x_i,x_j)+\hat{{\mathcal{C} }}(x^{\prime }_i,x^{\prime }_j)\right] . \end{aligned} $$(33)

For M = 2Ma we construct the correlation function from the combined pair counts of the M = Ma case. For large pair counts (when α, β, γ ≪ 1) ξ ̂ i 1 2 ( x i + i ) $ {\hat\xi}_i\approx{{\textstyle\frac12}}(x_i+x^{\prime}_i) $, as we see from Eq. (12). The covariance is then

C ̂ ij b = C ̂ [ 1 2 ( x i + x i ) , 1 2 ( x j + x j ) ] = 1 4 [ C ̂ ( x i , x j ) + C ̂ ( x i , x j ) + C ̂ ( x i , x j ) + C ̂ ( x i , x j ) ] . $$ \begin{aligned}&\hat{\mathrm{C}}^b_{ij} = \hat{{\mathcal{C} }}\left[{\textstyle \frac{1}{2}}(x_i+x^{\prime }_i),{\textstyle \frac{1}{2}}(x_j+x^{\prime }_j)\right] \nonumber \\&\quad = \frac{1}{4} \left[\hat{{\mathcal{C} }}(x_i,x_j)+\hat{{\mathcal{C} }}(x^{\prime }_i,x^{\prime }_j)+\hat{{\mathcal{C} }}(x_i,x^{\prime }_j) +\hat{{\mathcal{C} }}(x^{\prime }_i,x_j)\right] . \end{aligned} $$(34)

The LC covariance for arbitrary M is constructed as

C ̂ ij LC = A ̂ ij + M 1 B ̂ ij , $$ \begin{aligned} \hat{\mathrm{C}}^\mathrm{LC}_{ij} = \hat{\mathrm{A}}_{ij} +{M^{-1}}\hat{\mathrm{B}}_{ij} , \end{aligned} $$(35)

where now

A ̂ ij = 2 C ̂ ij b C ̂ ij a = 1 2 C ̂ ( x i , x j ) + 1 2 C ̂ ( x i , x j ) B ̂ ij = 2 M a ( C ̂ ij a C ̂ ij b ) = 1 2 M a [ C ̂ ( x i , x j ) + C ̂ ( x i , x j ) C ̂ ( x i , x j ) C ̂ ( x i , x j ) ] . $$ \begin{aligned}&\hat{\mathrm{A}}_{ij} = 2\hat{\mathrm{C}}^b_{ij} -\hat{\mathrm{C}}^a_{ij} = {\textstyle \frac{1}{2}}{ \hat{{\mathcal{C} }}(x_i,x^{\prime }_j)+{\textstyle \frac{1}{2}}\hat{{\mathcal{C} }}(x^{\prime }_i,x_j)} \nonumber \\&\hat{\mathrm{B}}_{ij} = 2M_a(\hat{\mathrm{C}}^a_{ij}-\hat{\mathrm{C}}^b_{ij}) \nonumber \\&\quad \ = {\textstyle \frac{1}{2}}{M_a}\left[\hat{{\mathcal{C} }}(x_i,x_j)+\hat{{\mathcal{C} }}(x^{\prime }_i,x^{\prime }_j) -\hat{{\mathcal{C} }}(x_i,x^{\prime }_j) -\hat{{\mathcal{C} }}(x^{\prime }_i,x_j)\right] . \end{aligned} $$(36)

Here we see the importance of constructing C ̂ ij a $ {\hat{\text{C}}}^a_{ij} $ and C ̂ ij b $ {\hat{\text{C}}}^b_{ij} $ from the same pair counts: The auto-correlation terms in A ̂ ij $ {\hat{\text{A}}}_{ij} $ cancel out. In terms of x, x′ the LC covariance is then

C ̂ ij LC = 1 2 ( 1 M a M ) [ C ̂ ( x i , x j ) + C ̂ ( x i , x j ) ] + M a 2 M [ C ̂ ( x i , x j ) + C ̂ ( x i , x j ) ] . $$ \begin{aligned}&{\hat{\mathrm{C}}^\mathrm{LC}_{ij} = {\textstyle \frac{1}{2}}(1-\frac{M_a}{M}) \left[\hat{{\mathcal{C} }}(x_i,x^{\prime }_j)+\hat{{\mathcal{C} }}(x^{\prime }_i,x_j)\right]} \nonumber \\&\qquad \quad +\frac{M_a}{2M} \left[\hat{{\mathcal{C} }}(x_i,x_j)+\hat{{\mathcal{C} }}(x^{\prime }_i,x^{\prime }_j)\right] . \end{aligned} $$(37)

We are now ready to construct the covariance of the LC covariance. Correlating the four terms of Eq. (37) yields 16 terms, each of which, with the use of Eq. (30), splits further into two terms. Taking into account that x and x′ have identical statistical properties we finally arrive at

cov ( C ̂ ij LC , C ̂ kl LC ) = 1 N 1 [ D ik D jl + D il D jk + ( 1 2 M a 1 M ) 2 ( B ik B jl + B il B kl ) ] , $$ \begin{aligned}&{\mathrm{cov}\left(\hat{\mathrm{C}}^\mathrm{LC}_{ij} ,\hat{\mathrm{C}}^\mathrm{LC}_{kl}\right) } &\quad =\frac{1}{N\!-\!1} \left[ \mathrm{D}_{ik}\mathrm{D}_{jl} +\mathrm{D}_{il}\mathrm{D}_{jk} +\left(\frac{1}{2M_a}\!-\!\frac{1}{M}\right)^2 (\mathrm{B}_{ik}\mathrm{B}_{jl}+\mathrm{B}_{il}\mathrm{B}_{kl}) \right] ,\nonumber \end{aligned} $$(38)

where we have defined

D ik : = A ik + 1 2 M a B ik , $$ \begin{aligned} \mathrm{D}_{ik} := \mathrm{A} _{ik}+\frac{1}{2M_a} \mathrm{B} _{ik} , \end{aligned} $$(39)

and A, B are ensemble average versions of (36). Using Cij = Aij + M−1Bij the result can be worked into the alternative form

cov ( C ̂ ij LC , C ̂ kl LC ) = 1 N 1 [ C ik C jl + C il C jk + ( 1 2 M a 1 M ) ( D ik B jl + B ik D jl + D il B jk + B il D jk ) ] . $$ \begin{aligned}&{\mathrm{cov}\left(\hat{\mathrm{C}}^\mathrm{LC}_{ij} ,\hat{\mathrm{C}}^\mathrm{LC}_{kl}\right)} &\qquad = \frac{1}{N-1}\Big [ \mathrm{C}_{ik}\mathrm{C}_{jl} +\mathrm{C}_{il}\mathrm{C}_{jk} \nonumber \\&\qquad \quad +\left(\frac{1}{2M_a}-\frac{1}{M}\right) (\mathrm{D}_{ik}\mathrm{B}_{jl}+ \mathrm{B}_{ik}\mathrm{D}_{jl} +\mathrm{D}_{il}\mathrm{B}_{jk}+\mathrm{B}_{il}\mathrm{D}_{jk}) \Big ] .\nonumber \end{aligned} $$(40)

This allows a direct comparison with the sample covariance (Eq. (31)). The first line equals the covariance of the sample covariance. The second line represents additional error due to the reduced random catalog. When M = 2Ma, the LC covariance becomes equivalent to the sample covariance.

For all of the elements of Eqs. (40) or (38) we already have an estimate: C ik C ̂ LC ik $ {\text{C}}_{ik}\approx{\hat{\text{C}}^{\text{LC}}}_{ik} $, B ik B ̂ ik $ {\text{B}}_{ik}\approx{\hat{\text{B}}}_{ik} $, D ik A ̂ ik + B ̂ ik / ( 2 M a ) $ {\text{D}}_{ik}\approx{\hat{\text{A}}}_{ik}+{\hat{\text{B}}}_{ik}/({2M_a}) $. Thus we have a practical way of estimating the error of the LC covariance estimate.

3.4. Precision matrix and parameter estimation

The covariance matrix provides an account of the uncertainty in the 2PCF estimate. In many applications one is more interested in the inverse covariance, or the precision matrix

Ψ : = C 1 . $$ \begin{aligned} \Psi := C^{-1} . \end{aligned} $$(41)

The precision matrix enters a likelihood model, and is an ingredient in a maximum-likelihood parameter estimate. The properties of the precision matrix, when computed from the sample covariance, are relatively well understood. The inverse sample covariance is biased, but the bias can be corrected for with a multiplicative correction factor that only depends on the length of the data vector and on the available number of samples (see Anderson 2003; Hartlap et al. 2007, and references therein).

The effect of the accuracy of the precision matrix on parameter estimation has been studied by Taylor et al. (2013), Taylor & Joachimi (2014), Dodelson & Schneider (2013), Percival et al. (2014, 2022) and Sellentin & Heavens (2016). Taylor et al. (2013) present a remarkably simple result for the variance of the trace of the precision matrix. Dodelson & Schneider (2013) and Percival et al. (2014) compute the expected increase of estimated parameter errorbars due to the propagation of the sampling error of the covariance matrix. The increase is captured in a multiplicative factor that depends on the length of the data vector and on the number of independent parameters. Sellentin & Heavens (2016) use a fully Bayesian approach to incorporate the uncertainty of the estimated covariance into the likelihood function, for a more realistic likelihood which takes the form of a t-distribution. To have a clear interpretation of parameter posteriors in the case of a sample covariance matrix, Percival et al. (2022) propose a formulation of Bayesian priors that makes the parameter posteriors to match those in a frequentist approach of Dodelson & Schneider (2013) or Percival et al. (2014).

None of these results, unfortunately, generalize for the LC covariance, without further assumptions on the survey characteristics or on the parametric model. However, we do have the covariance of covariance, which can be used to assess the impact of covariance accuracy to a specific application, once the details are known. In the following we present some general observations.

One important aspect to note is that the LC covariance cannot be guaranteed to be positive-definite under all circumstances. This follows from the fact that the component matrix A ̂ $ {\hat{\text{A}}} $ is constructed as a difference between two numerical covariances. If the actual covariance matrix is close to singular, random fluctuations in the numerical estimate may bring the smallest eigenvalues on the negative side. We recommend that if the inverse covariance is needed, the eigenspectrum of the matrix is verified first.

The precision matrix can be expanded as Taylor series as

C ̂ 1 = ( C + Δ ) 1 C 1 C 1 Δ C 1 + C 1 Δ C 1 Δ C 1 , $$ \begin{aligned} \hat{\mathrm{C}}^{-1} = (\mathrm{C}+\Delta )^{-1} \approx \mathrm{C}^{-1} -\mathrm{C}^{-1}\Delta \,\mathrm{C}^{-1} +\mathrm{C}^{-1}\Delta \,\mathrm{C}^{-1}\Delta \,\mathrm{C}^{-1} , \end{aligned} $$(42)

where C is the true covariance and Δ the deviation of the estimate from it. The last term is the source of bias in the precision matrix, which exists even if the covariance estimate is unbiased (⟨Δ⟩ = 0). A bias in the precision matrix does not, however, translate into a bias in parameter estimation. The maximum-likelihood parameter estimate (without prior) is given by

p ̂ = ( β T C ̂ 1 β ) 1 β T C ̂ 1 y , $$ \begin{aligned} \hat{p} =(\beta ^T\hat{\mathrm{C}}^{-1}\beta )^{-1}\beta ^T\hat{\mathrm{C}}^{-1}y , \end{aligned} $$(43)

where p ̂ $ \hat p $ (length np) represents the vector or estimated parameters, y (length nb) is the data vector, C ̂ $ {\hat{\text{C}}} $ is the covariance estimate, and

β i α = y i p α $$ \begin{aligned} \beta _{i\alpha } = \frac{\partial { y}_i}{\partial p_\alpha } \end{aligned} $$(44)

is the linearized data model connecting the parameters to the data. One readily sees that the parameter estimate of Eq. (43) is unbiased regardless of C ̂ $ {\hat{\text{C}}} $, and, if the covariance is biased by a multiplicative factor, the estimate is actually unaffected. It is therefore more interesting to look at the parameter covariance than at the bias of the precision matrix alone.

Following the example of Hartlap et al. (2007) we now insert the expansion of Eq. (42) into the parameter estimate of Eq. (43). We obtain for the parameter covariance

δ p δ p T = F 1 + F 1 β T C 1 Δ C 1 Δ C 1 β F 1 F 1 β T C 1 Δ C 1 β F 1 β T C 1 Δ C 1 β F 1 , $$ \begin{aligned}&\langle \delta p\delta p^T\rangle = \mathsf{F }^{-1} + \mathsf{F }^{-1} \beta ^T\mathrm{C}^{-1} \langle \Delta \, \mathrm{C}^{-1} \Delta \rangle \mathrm{C}^{-1}\beta \mathsf{F }^{-1} \nonumber \\&\qquad \qquad - \mathsf{F }^{-1} \beta ^T\mathrm{C}^{-1}\langle \Delta \, \mathrm{C}^{-1}\beta \mathsf{F }^{-1} \beta ^T\mathrm{C}^{-1}\Delta \rangle \mathrm{C}^{-1} \beta F^{-1} , \end{aligned} $$(45)

where

F : = β T C 1 β . $$ \begin{aligned} \mathsf{F } := \beta ^T\mathrm{C}^{-1}\beta . \end{aligned} $$(46)

If the covariance of covariance is of the general form

Δ ij Δ kl = 1 N 1 ( U ik V jl + U il V jk ) $$ \begin{aligned} \langle \Delta _{ij}\Delta _{kl}\rangle = \frac{1}{N-1} (\mathrm{U} _{ik} \mathrm{V} _{jl} + \mathrm{U} _{il} \mathrm{V} _{jk}) \end{aligned} $$(47)

(as is the case for both sample covariance and LC) where U and V are arbitrary matrices, we find

Δ X Δ ij = 1 N 1 [ ( U X T V ) ij + U ij Tr ( X T V ) ] , $$ \begin{aligned} \langle \Delta \mathrm{X} \Delta \rangle _{ij} = \frac{1}{N-1}\big [ (\mathrm{U} \mathrm{X} ^T\mathrm{V} )_{ij} +\mathrm{U} _{ij} \mathrm{Tr} (\mathrm{X} ^T\mathrm{V} ) \big ] , \end{aligned} $$(48)

where again X is an arbitrary matrix. We use Eq. (40) in combination with Eqs. (45) and (48) to derive for the parameter covariance the result

δ p δ p T = F 1 ( 1 + n d n p N 1 ) + ( 1 2 M a M ) 1 N 1 { F 1 R F 1 + F 1 R T F 1 F 1 P F 1 Q F 1 F 1 Q F 1 P F 1 + F 1 P F 1 [ Tr ( C 1 D ) Tr ( F 1 Q ) ] + F 1 Q F 1 [ Tr ( C 1 B ) Tr ( F 1 P ) ] } , $$ \begin{aligned}&{\langle \delta p\delta p^T\rangle = \mathsf{F }^{-1} \left( 1+\frac{n_{\rm d}-n_{\rm p}}{N-1} \right)} \nonumber \\&\qquad \qquad + \left({\textstyle \frac{1}{2}}-\tfrac{M_a}{M}\right) {\frac{1}{N-1}} \Big \{ \mathsf{F }^{-1}\mathsf{R }\mathsf{F }^{-1} +\mathsf{F }^{-1}\mathsf{R }^T\mathsf{F }^{-1} \nonumber \\&\qquad \qquad -\mathsf{F }^{-1}\mathsf{P }\mathsf{F }^{-1}\mathsf{Q }\mathsf{F }^{-1} -\mathsf{F }^{-1}\mathsf{Q }\mathsf{F }^{-1}\mathsf{P }\mathsf{F }^{-1} \nonumber \\&\qquad \qquad +\mathsf{F }^{-1}\mathsf{P }\mathsf{F }^{-1} \left[\mathrm{Tr}(\mathrm{C}^{-1}\mathsf{D }) -\mathrm{Tr}(\mathsf{F }^{-1}\mathsf{Q })\right] \nonumber \\&\qquad \qquad +\mathsf{F }^{-1}\mathsf{Q }\mathsf{F }^{-1} \left[\mathrm{Tr}(\mathrm{C}^{-1}\mathsf{B }) -\mathrm{Tr}(\mathsf{F }^{-1}\mathsf{P })\right] \Big \} , \end{aligned} $$(49)

where

P : = β T C 1 B C 1 β , Q : = β T C 1 D C 1 β , R : = β T C 1 B C 1 D C 1 β . $$ \begin{aligned}&\mathsf{P } := \mathsf{\beta }^T\mathrm{C}^{-1}\mathrm{B}\mathrm{C}^{-1}\beta , \nonumber \\&\mathsf{Q } := \beta ^T\mathrm{C}^{-1}\mathrm{D}\mathrm{C}^{-1}\beta , &\mathsf{R } := \beta ^T\mathrm{C}^{-1}\mathrm{B}\mathrm{C}^{-1}\mathrm{D}\mathrm{C}^{-1}\beta . \nonumber \end{aligned} $$(50)

F−1 is the parameter covariance in the case where the data covariance C is known exactly. The first term represents the parameter covariance for sample covariance, a result in line with Dodelson & Schneider (2013). The rest is additional scatter specific for the LC method, and is dependent on the parametric model. Again we see that the additional terms vanish with M = 2Ma. Once the parametric model and β are fixed, and one has an estimate for C in the form of the LC covariance, Eq. (49) provides a practical recipe for estimating the parameter covariance.

4. Simulations

4.1. Cosmological mocks

To validate the LC method, we applied it to the computation of the 2PCF covariance matrix of simulated dark matter halo catalogs, and compared it to their sample covariance. We used mock catalogs produced with version 4.3 of PINOCCHIO1 (PINpointing Orbit Crossing Collapsed HIerarchical Objects) algorithm (Monaco et al. 2002; Munari et al. 2017). This code is based on Lagrangian perturbation theory, ellipsoidal collapse and excursion sets approach. It is able to generate catalogs of dark matter halos, both in periodic boxes and in the past light cone, that closely match the mass function and the clustering of simulated halos without running a full N-body simulation. The particular configuration we used (see Colavincenzo et al. 2019) was run with ΛCDM cosmology using parameter values presented in Table 2. The simulation box had sides of length L = 1500 h−1 Mpc sampled with 10003 particles of mass 2.67 × 1011h−1M. The smallest identified halos consisted of 30 particles, which translates to masses of 8.01 × 1012h−1M. The mock catalogs we used correspond to a snapshot of the simulation in a periodic box at redshift z = 1. The 10 000 realizations were run with the same configuration, but with different seeds for random numbers. As a consequence the number of halos in each PINOCCHIO realization is subject to sample variance. The mean number of halos in a box is 780 789 and varies from box to box by 0.4 + 0.3 % $ ^{+0.3}_{-0.4}\% $. This corresponds to a number density of 2.3 × 10−4 (h−1 Mpc)−3.

Table 2.

Parameter values for the PINOCCHIO simulation used in our analysis.

The PINOCCHIO mocks contain the halo positions in real space and their peculiar velocities, in a periodic box. To imitate a real survey more closely we mapped the halo positions into redshift space. We worked within the plane-parallel assumption; we constructed a periodic redshift-space box by shifting the halo positions along the x-axis according to the peculiar velocity component along the same axis. In order to compute the correlation function multipoles, we must define the location of the observer with respect to the simulation box. To preserve the plane-parallel assumption, we moved the observation point along the x-axis to a distance of 106h−1 Mpc from the box.

To mimic the geometry of a tomographic survey with a limited redshift coverage we selected a slab-like subset of the full simulation box. The thickness of the slab is L/5 = 300 h−1 Mpc. This geometry is shown in Fig. 1. The mean number of halos in the slab is one fifth of that of the full box (156 158 objects) and varies by ±3%. For the corresponding random catalogs we generated random coordinates homogeneously inside the slab using the method +random.rand+ of the numpy python library. The number of random points in each slab is M times the number of halos, so the size of each random catalog is also slightly different.

thumbnail Fig. 1.

Geometry of the mock catalogs used in our analysis. Blue points are the full simulation box and orange points are the slab we use for our analysis. Projected here is a slice of thickness of 100 h−1 Mpc.

The area of the simulation slab corresponds to a solid angle of 1400 square degrees at z = 1, which is 9.4% of the 15 000 deg2 sky coverage of the Euclid spectroscopic survey (Laureijs et al. 2011; Euclid Collaboration 2022). The thickness of the slab corresponds to a redshift bin of Δz ≈ 0.2. The mean number of objects in the slab corresponds to 5% of the survey (30 million objects). The small number of objects in the simulation made it possible to construct the sample covariance for a large number of realizations, and thus to compare the accuracy and efficiency of the LC method against that of the sample covariance.

We had 10 000 halo catalog realizations at our disposal. We divide them into two sets of 5000 realizations. We computed the sample covariance of one set, and used it as the reference covariance, against which we compared the other estimates. The reference covariance represents the best knowledge we have on the true covariance. We used the other set of 5000 realizations to compute both the LC covariance and the sample covariance, which we then compared with the reference covariance. This way we were able to estimate how much of the difference between the LC covariance and sample covariance was caused purely by the limited number of realizations.

We generated 10 000 random catalogs of size Nr = 50Nd, 5000 for the reference covariance, and 5000 for the sample covariance LC was compared with. In addition we generated 10 000 random catalogs of size Nr = 1Nd. We used a set of 5000 to serve as catalog R1 in Eq. (24), and another set of 5000 as catalog R2.

4.2. Random mocks

In the case of PINOCCHIO mocks we do not know the actual covariance exactly. We can only compare against the reference covariance, which itself is estimated from a finite data set. To have a test case where we know the true covariance, we ran another simulation using a purely random distribution of points as our data catalog. For this purpose we generated another set of 10 000 random mocks and used these in the place of the data catalog. Otherwise the setup was exactly the same as with PINOCCHIO mocks. We used the same slab geometry and point density, with the exception that each data catalog (and correspondingly each random catalog) realization has the same number of points: Nd = 2.3 × 10−4 (h−1 Mpc)−3 × 6.75 × 108 (h−1 Mpc)3 = 155 250. The correlation function for the random distribution is zero, and for the covariance, as well as for the covariance of the covariance, an analytic result can be derived. This allowed us to directly compare the estimated covariance against the expected result.

4.3. Constructing the covariance

To validate the LC method, we computed the correlation function of the mock catalogs and constructed the LC covariance. Since we were looking for maximal reduction in the computational cost, we set Ma = 1, i.e. we used random catalogs of same size as the data catalog.

We computed the correlation function of the simulated galaxy distribution using the 2PCF code developed for the Euclid mission. The code implements the Landy-Szalay estimator with split random catalog, and stores as a by-product the DD, DR, and RR pair counts, which we need for the construction of the LC covariance. We used the Euclid code to compute the 2-dimensional correlation function ξ ̂ ( r , μ ) $ {\hat\xi}(r,\mu) $, where r is the distance between a pair of galaxies, and μ is the cosine of the angle between the line-of-sight and the line segment connecting the galaxy pair. We used bin sizes Δr = 1 h−1 Mpc and Δμ = 0.01, and computed the correlation function for the distance range r ∈ [0, 200] h−1 Mpc. For some tests we needed also the 1-dimensional correlation function, which we obtained by coadding the pair counts in μ dimension. For each data catalog, we ran the code three times: once to construct the M = 50 correlation function, and twice with M = 1 random catalogs to produce the pair counts we needed for the construction of the LC covariance.

We constructed the LC and sample covariance estimates with an external code, which takes as input the precomputed pair counts. To ensure consistency, we recomputed the correlation functions from the pair counts. Having the precomputed pair counts on disk also left us the possibility of combining bins into wider ones. The run time of this external code is negligible, the CPU usage being fully dominated by the run-time of the Euclid code.

We computed the correlation function multipoles from the two-dimensional correlation function as

ξ ( r ) : = 2 + 1 2 1 1 ξ ( r , μ ) P ( μ ) d μ , $$ \begin{aligned} \xi _\ell (r) := \frac{2\ell +1}{2}\int _{-1}^1 \xi (r,\mu )\,P_\ell (\mu )\,\mathrm{d}\mu , \end{aligned} $$(51)

where P(μ) are Legendre polynomials ( = 0, 2, 4). The M = 50 correlation function multipoles, as estimated from the simulation slabs, are depicted in Fig. 2. We show the mean over 5000 realizations, and a single realization. For this small survey size, a single realization deviates strongly from the ensemble mean, and the errors are strongly correlated between distance bins.

thumbnail Fig. 2.

Correlation function multipoles. Mean over 5000 PINOCCHIO realizations and a single realization. The shaded area around the single realization curve is the 1σ error envelope, computed as the standard deviation of the available realizations.

The calculation of the covariance of covariance in Sect. 3 relies on the assumption that the elements of the correlation function follow a Gaussian distribution, at least approximatively. To verify the validity of this assumption, we plot the distributions of selected correlation function elements in Fig. 3. The assumption of approximative Gaussianity seems well justified. We note that Gaussianity is only required for the covariance of covariance to be valid. The LC covariance itself does not rely on any particular distribution.

thumbnail Fig. 3.

Histogram of correlation function multipole values at r = 25 h−1 Mpc and r = 125 h−1 Mpc. Along with the histograms we show the corresponding best-fit Gaussian distribution in orange.

5. Results

5.1. Random mocks

We begin by examining the one-dimensional 2PCF of the random mocks. As explained above, we ran tests using randomly distributed points in place of the data catalog. This has the benefit that we know exactly the expected correlation function (zero). We also have an accurate analytic estimate for the true covariance. From Keihänen et al. (2019) we have

cov [ ξ ̂ ( r 1 ) , ξ ̂ ( r 2 ) ] = δ 12 G p ( r 2 ) ( 2 N d ( N d 1 ) + 4 N d N r + 2 N r ( N d 1 ) ) δ 12 G p ( r 1 ) 2 N d ( N d 1 ) ( 1 + 3 M 1 ) , $$ \begin{aligned}&{\mathrm{cov}\left[\hat{\xi }(\mathbf r _1),\hat{\xi }(\mathbf r _2)\right]} \nonumber \\&\quad = \frac{\delta _{12}}{G_{\rm p}(\mathbf r _2)} \left( \frac{2}{N_{\rm d}(N_{\rm d}-1)} +\frac{4}{N_{\rm d}N_{\rm r}} +\frac{2}{N_{\rm r}(N_{\rm d}-1)}\right) \nonumber \\&\qquad \approx \frac{\delta _{12}}{G_{\rm p}(\mathbf r _1)} \frac{2}{N_{\rm d}(N_{\rm d}-1)} \left( 1+3{M^{-1}}\right), \end{aligned} $$(52)

where Nr is the number of random points and Nd the number of data points, and Gp(r) is the geometrical pair volume fraction defined as

G p ( r ) = [ V d 3 x 1 d 3 x 2 ] 1 V W ( x 1 x 2 r ) d 3 x 1 d 3 x 2 $$ \begin{aligned} G_{\rm p}(\mathbf r ) = \left[ \int \int _V \mathrm{d}^3{x}_1 \mathrm{d}^3{x}_2\right]^{-1} \int \int _V W({x}_1-{x}_2\in \mathbf r )\mathrm{d}^3{x}_1 \mathrm{d}^3{x}_2 \end{aligned} $$(53)

where the integrals cover the survey volume and W(x1 − x2 ∈ r) = 1 if the pair distance falls in the distance bin of r, and is zero otherwise. This allows us to directly compare the estimated covariance to the theoretical one.

Figure 4 shows the diagonal of the estimated covariance of the one-dimensional correlation function for M = 50, compared with the theoretical value of Eq. (52). We show also the M → ∞ limit from the LC method. This represents the optimal covariance which we would have if we had an infinite random catalog. As expected, the M = ∞ curve lies slightly below the M = 50 curve. The difference is the additional uncertainty from the finite random catalog. Both the sample covariance and LC covariance agree very well with the expected covariance. It is also evident that the LC method results in larger scatter. The lower panel shows the relative difference with respect to the theoretical value, together with 1 σ error bars derived from Eqs. (31) and (38). The error for the LC covariance, measured as the standard deviation, is 2.7 times that of the sample covariance, implying that more than 7 times more realizations are needed to reach the same level of accuracy. Fortunately, from the point of view of the LC method, this is an unrealistically pessimistic situation. This can be traced to the fact that correlations, which in a more realistic situation contribute significantly to the covariance, are nonexistent here. Thus the scatter of the random catalog, which in our method is large due to the small number of objects, contributes a large fraction of the total error. The situation looks very different when we move to realistic cosmological simulations with large correlations.

thumbnail Fig. 4.

Diagonal covariance from random mocks, one-dimensional case. Top: sample covariance, LC, M = ∞ limit, and theoretical prediction. Bottom: the relative errors, and theoretical 1σ error estimates.

5.2. Cosmological mocks

As explained in Sect. 4.1, since we do not know the true covariance, we divided the available 10 000 realizations into two sets of 5000 realizations and used the sample covariance of the first half as a reference. We constructed the covariance for  = 0, 2, 4 multipoles both through the sample covariance and with the LC method, with M = 50.

First we examine the convergence with respect to the number of realizations. This is shown in Fig. 5. We show the squared-sum difference with respect to the reference matrix, for the sample covariance and for LC. Because the bins at the smallest scales have only a few halos, we include the scales in the range 20–200 h−1 Mpc in the sum. We vary the number of realizations used for the covariance estimate under study, but the reference matrix in all cases is the same, based on the full set of 5000 realizations. All matrices have been normalized to the reference diagonal, in order to assign equal weights to all distance scales,

C ̂ ij ( normalized ) = C ̂ ij C ̂ ii ref C ̂ jj ref . $$ \begin{aligned} \hat{\mathrm{C}}_{ij}(\mathrm{normalized}) = \frac{\hat{\mathrm{C}}_{ij}}{\sqrt{\hat{\mathrm{C}}_{ii}^\mathrm{ref}\hat{\mathrm{C}}_{jj}^\mathrm{ref}}} . \end{aligned} $$(54)

thumbnail Fig. 5.

Convergence of the covariance of the correlation function multipoles, with respect to the number of realizations (left) and CPU time (right). We use PINOCCHIO mocks and include scales of r > 20 h−1 Mpc. Dashed lines show the theoretical prediction.

We show the difference as a function of number of realizations, and as a function of CPU time. To further reduce the noise in the measurement we compute the convergence ten times and show the mean over these ten cases. Each case is obtained by randomly splitting the 10 000 PINOCCHIO realizations into two sets of 5000 realizations, one of which used to compute the reference covariance and the other one to compute the sample and LC covariances. The different splits overlap with each other, but even so the procedure significantly reduces the noise in the measured convergence. For the same number of realizations, the sample covariance gives a smaller uncertainty. One needs roughly 1.5 times the number of realizations with LC, to reach the same level of accuracy. In terms of CPU time spent, the situation is inverted. The LC covariance requires only 10% of the CPU cost of the sample covariance to reach the same accuracy.

Of the total wall-time of constructing the sample covariance, 90% is spent on counting the pairs. In the case of LC, this fraction is somewhat lower, 76%. Loading in the catalogs takes roughly the same fraction of time in both cases so the difference in efficiency seems to be in overheads such as code initialization. A possible optimization to reduce these overheads would be to compute all the thousands of 2PCF estimates during a single code execution instead of calling the code executable over and over again.

In Fig. 6 we show the diagonal of the covariance matrix monopole block, for the sample covariance and the LC estimate, along with the reference. We show also the M = ∞ limit of the LC covariance. In the lower panel we show the relative difference with respect to the reference covariance, and the theoretical prediction for the error, as given by Eqs. (31) and (38). Since we are looking at the difference with respect to the reference, the error level shown is the square root of the sum of the variances of the reference and the estimate in question.

thumbnail Fig. 6.

Diagonal of the covariance for the correlation function monopole, PINOCCHIO mocks. On the top we show the LC and sample covariance estimates, and the M = ∞ limit. On the bottom we show the relative errors, and predicted 1σ error level.

Again, the LC estimate has more internal scatter than the sample covariance, but the difference between the methods is significantly smaller than in the case of random mocks. A more striking phenomenon is that the deviation from the reference is strongly correlated in distance, and the general trend of the deviation is very similar for the two estimation methods. In other words, the estimation error is dominated by a correlated error component that is independent of the chosen estimation method, when both estimates are constructed from the same data set. The common component dominates over the additional noise added by the LC method. The amplitude of the component is consistent with the predicted error level, indicating that it represents a random fluctuation.

Figure 7 shows the monopole block of the full LC covariance matrix as a two-dimensional plot. For plotting purposes we normalize the matrix by the diagonal of the reference covariance. There is significant off-diagonal component, showing that the error in the estimated Landy–Szalay correlation function is correlated from one distance bin to another, in line with Fig. 6. The middle panel shows the difference between the LC covariance and the reference. There is no obvious overall bias (which would show up as the over-representation of either the blue or the red color), but the region of correlated error is clearly visible around 100 h−1 Mpc. The bottom panel shows the difference between the LC and sample covariances from the same 5000 realizations. Here the structure is weaker, indicating that the correlated structure in the middle panel is for a large part common for the sample covariance and the LC estimate, as we already saw in Fig. 6.

thumbnail Fig. 7.

Monopole block of the covariance matrix. Top: LC covariance matrix. Middle: difference between the LC covariance matrix and the reference. Bottom: difference between the LC and the sample covariance from same realizations. All are normalized by the diagonal elements of the reference matrix.

We proceed to examine the structure of the LC covariance further. We show the A ̂ $ {\hat{\text{A}}} $ and B ̂ $ {\hat{\text{B}}} $ components for the full multipole covariance in Fig. 8, again normalized with the reference diagonal. The full covariance will be the combination A ̂ + B ̂ / M $ {\hat{\text{A}}}+{\hat{\text{B}}}/M $. We observe that the B ̂ $ {\hat{\text{B}}} $ component is strongly diagonal-dominated, in contrast to the A ̂ $ {\hat{\text{A}}} $ component, indicating that the finite random catalog mainly contributes uncorrelated noise to the 2PCF estimate, on top of the correlated error that arises from the data catalog. The unnormalized diagonals of all three multipole blocks, and their cross-components, are shown in Fig. 9.

thumbnail Fig. 8.

Component matrices A ̂ $ {\hat{\text{A}}} $ (top) and B ̂ $ {\hat{\text{B}}} $ (bottom), for correlation function multipoles, measured from the PINOCCHIO mocks. The blocks from left to right and from the bottom to the top row correspond to  = 0, 2, 4 multipoles, respectively. Both are normalized by the diagonal elements of the reference matrix.

thumbnail Fig. 9.

Covariance diagonals for multipoles  = 0, 2, 4, and their cross-correlation, for PINOCCHIO. Sample covariance and LC. Two bottom rows show the difference between the reference and the estimate scaled by r2. To reduce scatter in the curves all the quantities have been rebinned to bins of width of 10 h−1 Mpc.

The expectation value of the LC covariance in terms of pair-count covariances is given in Eq. (21). However, if we expand Eq. (26) (which defines the LC covariance) in terms of pair-count covariances, we find that the expansion includes more terms than Eq. (21). The expectation value of these additional terms vanishes, but when the covariance is estimated from a finite number of correlation function realizations, these terms differ from zero randomly. This raises the question whether leaving some or all of these zero-expectation-value terms out and constructing the covariance using the pair-count covariances directly would reduce noise in the covariance matrix estimate. We reconstructed the covariance matrix by including all the possible combinations of the zero-expectation-value terms, but it turned out that the most accurate combination is the one defined by Eq. (26). Even though the pair-count covariances do not affect the expectation value of the covariance matrix estimate, they do reduce its variance. This can be understood as follows: the zero-expectation terms are negatively correlated with some of the nonzero terms, and thus they help to cancel out part of the estimation noise.

5.3. Predictions from covariance of covariance

We now proceed to examine the accuracy of the LC covariance estimate in a more quantitative way. Here we make use of predictions of the theoretical covariance of covariance from Sect. 3.

We measure the accuracy of the covariance estimate, as the normalized sum-of-squares difference from the ensemble-average, over all covariance elements,

χ N 2 : = 1 N bin 2 ij 1 C ̂ ii ref C ̂ jj ref ( C ̂ ij C ij ) 2 . $$ \begin{aligned} \chi ^2_N := \frac{1}{N_{\rm bin}^2} \sum _{ij} \frac{1}{\hat{\mathrm{C}}^\mathrm{ref}_{ii}\hat{\mathrm{C}}^\mathrm{ref}_{jj}}( \hat{\mathrm{C}}_{ij}-\langle \mathrm{C}_{ij}\rangle )^2 . \end{aligned} $$(55)

Here C ̂ $ {\hat{\text{C}}} $ represents the covariance estimate, either LC or sample covariance, measured from N realizations (5000), and Nbin is the number of correlation function elements. In our baseline simulation Nbin = 540 (3 multipoles and 180 distance bins). We normalized the sum by the diagonal of the reference covariance, to assign equal weights to all distance bins. Equation (55) expresses the accuracy of the covariance as a single number.

In terms of the covariance of covariance we have

χ N 2 = 1 N bin 2 ij cov ( C ̂ ij , C ̂ ij ) C ̂ ii ref C ̂ jj ref . $$ \begin{aligned} \langle \chi ^2_N\rangle = \frac{1}{N_{\rm bin}^2}\sum _{ij} \frac{\mathrm{cov}(\hat{\mathrm{C}}_{ij},\hat{\mathrm{C}}_{ij})}{\hat{\mathrm{C}}^\mathrm{ref}_{ii}\hat{\mathrm{C}}^\mathrm{ref}_{jj}} . \end{aligned} $$(56)

Since the covariance of covariance scales as 1/(N − 1), we can write this in terms of the N = 2 value as

χ N 2 = χ 2 2 N 1 . $$ \begin{aligned} \langle \chi ^2_N\rangle = \frac{\langle \chi ^2_2\rangle }{N-1} . \end{aligned} $$(57)

For the sample covariance we now have

χ 2 2 Smp = 1 N bin 2 ij ( C ii C jj + C ij 2 ) , $$ \begin{aligned} \langle \chi ^2_2\rangle ^\mathrm{Smp} = \frac{1}{N_{\rm bin}^2}\sum _{ij} \left( \tilde{\mathrm{C}}_{ii}\tilde{\mathrm{C}}_{jj} +\tilde{\mathrm{C}}_{ij}^2 \right) , \end{aligned} $$(58)

where we have absorbed the normalization into the covariance, and denoted the normalized covariance by C $ \tilde{\text{C}} $. For LC we find

χ 2 2 LC = 1 N bin 2 ij [ D ii D jj + D ij 2 + ( 1 2 M 1 ) 2 ( B ii B jj + B ij 2 ) ] . $$ \begin{aligned} \langle {\chi ^2_2}\rangle ^\mathrm{LC}&=\frac{1}{N_{\rm bin}^2}\sum _{ij} \Big [ \tilde{\mathrm{D}}_{ii} \tilde{\mathrm{D}}_{jj} + \tilde{\mathrm{D}}_{ij}^2\nonumber \\&\quad + \left({\textstyle \frac{1}{2}}\!-\!{M^{-1}}\right)^2 \left(\tilde{\mathrm{B}}_{ii}\tilde{\mathrm{B}}_{jj} +\tilde{\mathrm{B}}_{ij}^2\right) \Big ] \,. \end{aligned} $$(59)

Here we have a practical way of predicting the estimation error for the LC and sample covariance, for different values of M. We can also easily predict the effect of rebinning the data into wider distance bins, simply by rebinning the covariance matrices and constructing the covariance of covariance from these.

In Table 3 we have collected statistics on the estimation methods, for a selected random catalog size (different values of M) and for different rebinning schemes. We have used A ̂ $ {\hat{\text{A}}} $ and B ̂ $ {\hat{\text{B}}} $ in the place of A and B, and C ̂ A ̂ + M 1 B ̂ $ {\hat{\text{C}}}\approx{\hat{\text{A}}}+M^{-1}{\hat{\text{B}}} $ in the place of C. We show the computational cost of pair counting in each of the cases, in the units of counting the pairs in one Nd data catalog. The cost of the LC method is the same in all cases, while the cost of the sample covariance scales as 1 + 3M. The cost estimate ignores parts of the computation other than pair counting, for instance disk I/O and various overheads, thus exaggerating the difference between the methods. The estimator variance is expressed as χ 2 2 $ \chi^2_2 $. The standard deviation of a covariance estimate is obtained from this as χ 2 2 / ( N 1 ) $ \sqrt{\chi^2_2/(N-1)} $. We show also an inverse figure-of-merit (iFoM) constructed as the product of the pair-count cost and the χ 2 2 $ \chi^2_2 $ value. Since the estimator variance decreases proportionally to the inverse of the number of realizations N, while the computation time grows proportional to it, this is an N-independent measure of the estimator efficiency. A smaller value indicates a more efficient estimation. The value of iFoM can be interpreted as the computational cost of reaching χ 2 2 $ \chi^2_2 $ = 1. The last column shows the ratio of the sample-covariance iFoM to that of the LC covariance, and is measure of the gain from the LC method.

Table 3.

Predictions for the variance of the covariance estimate, based on the covariance of covariance for PINOCCHIO mocks.

The relative efficiency of the LC method increases with increasing M, as the computational cost of the sample covariance becomes larger. We observe also that for given M, the LC covariance becomes more efficient in comparison to sample covariance, if we combine the distance bins into wider bins. With M = 50 and with 20 h−1 Mpc bins in distance, the efficiency ratio is 17.9, while with narrow 1 h−1 Mpc bins the ratio is 11.9.

The full covariance of covariance is a four-dimensional data object, and is difficult to visualize in its entirety. In the following we examine a two-dimensional subset. We focus on the diagonal of the monopole block of the covariance estimate (plotted in Fig. 6). This is a 1-dimensional data object, thus its covariance is a 2-dimensional matrix. In the following we refer to the covariance of the diagonal of the monopole block of a 2PCF covariance estimate as COVCOV for short. We plot the predicted COVCOV for the sample covariance and for LC in Fig. 10. In both cases, there is a significant off-diagonal structure, which is visually very similar between the two methods. This verifies our earlier observation that the estimation error is correlated between distance bins, and this correlation does not depend on the chosen method. Taking the difference between the COVCOV matrices, we see that the LC estimate has additional scatter compared to the sample covariance, but this additional error component is only weakly correlated from one distance bin to another.

thumbnail Fig. 10.

Predicted covariance of the diagonal of the monopole block of the estimated 2PCF covariance (COVCOV). Top: COVCOV for sample covariance. Middle: COVCOV for LC. Bottom: difference of the two. From normalized covariances.

As a final validation test we applied the inverse square root of the COVCOV to the difference between the LC covariance matrix diagonal and the corresponding reference (quantities plotted in Fig. 6). If the COVCOV correctly describes the errors in the estimated covariance, we expect to see an array of white noise with σ = 1. To account for the fact that the reference covariance has a covariance of its own, we took the COVCOV to be the sum of the reference and the LC COVCOV matrices. We computed the square root using the Schur method implemented in the +scipy+ Python library. The resulting whitened data vector is shown in Fig. 11, along with a random realization of white noise. The data is visually indistinguishable from white noise, which is a valuable validation check. The similarity can also be confirmed by computing a normalized χ2 value

χ 2 = 1 N v T v . $$ \begin{aligned} \chi ^2 = \frac{1}{N}\mathbf v ^{\mathrm{T} } \mathbf v . \end{aligned} $$(60)

thumbnail Fig. 11.

Diagonal monopole difference treated with the inverse square root of COVCOV. For reference a Gaussian random vector of same length.

Here v is the data vector, N is the number of bins, and for a vector of Gaussian white noise we expect a value close to 1. We computed this value for scales r > 20 h−1 Mpc and obtained χ2 = 0.95 for the whitened data vector and χ2 = 0.93 for the Gaussian random vector.

6. Conclusions

We have presented a method for speeding up the computation of the galaxy 2PCF covariance. We have named the method as the Linear Construction, or LC method. We assume that the correlation function is estimated through Landy–Szalay estimator, with a split random catalog. The proposed method applies both to the raw (1- or 2-dimensional) correlation function and to its multipoles.

The proposed method provides an unbiased estimate of the covariance for a split random catalog, that is, for a case where the random-random pair count is constructed as the coadded sum of many small subcatalogs. Since we know that the splitting only weakly affects the 2PCF estimation error, we expect that the LC covariance provides a good approximation also when the RR pairs are counted from the full catalog. This can be traced to the fact that, for large random catalogs, the 2PCF estimation error is dominated by the variance of the galaxy sample, and the secondary error term is related to the data-random pair count, both of which are unaffected by splitting. The scatter of the random-random count plays a minor role.

The computational cost of the LC method per realization, for a random catalog M times the size of the data catalog, is a factor of (1 + 3M)/7 lower than that of the sample covariance. For M = 50 this yields a factor of 21.6 speedup. However, a larger number of realizations is needed, to compensate for the increased scatter in the estimate. In our simulations, 1.2–1.8 times higher a number of realizations was needed to reach a given level of accuracy, depending on bin size. This taken into account, the net cost reduction for M = 50 is a factor of 11.9–17.9. The efficiency increases with increasing bin width. In practice, we observe a halved speedup due to the heavy overhead associated with the handling of many small catalogs. A code specially optimized for covariance computation could improve on this.

The computational cost of the LC covariance is independent of M. Thus the relative gain with respect to the sample covariance increases with increasing size of the random catalog. Since the cost of the covariance computation exceeds that of the actual 2PCF estimation by orders of magnitude, one might want to spend a bit more resources on obtaining a more accurate 2PCF estimate with a higher M, as the cost of the covariance is unaffected.

The LC covariance estimate is readily extrapolated to an arbitrary value of M, including the limit M → ∞. We thus have an estimate of the error budget for any number of random points. which is valuable information when planning for an experiment.

At very small distances our method becomes less reliable due to the small number of objects in a bin. At those small distances we recommend resorting to sample covariance, which at small distances is cheap anyway.

Unlike the sample covariance, the LC covariance is not by construction positive-definite. For applications that require the covariance inverse, we recommend verifying the eigenspectrum of the constructed matrix, and for instance rebinning the data to wider bins, should the matrix turn out to be nonpositive definite.

We further derived a covariance for the estimation error of the LC covariance, and showed that it can be constructed from the components of the covariance itself. Thus, along with the covariance one can readily obtain an estimate of its errors and their correlation. We also discussed the impact on maximum-likelihood parameter estimation.

In the LC method, the covariance is estimated for small random catalogs of size M = Ma and M = 2Ma, and the covariance for arbitrary M is constructed as a linear combination of these. We obtain the maximal reduction in the computational cost with Ma = 1, and we adopted this value in our validation tests. The increased uncertainty in the covariance estimate is compensated for with a larger number of catalog realizations. We have assumed that the mock catalogs are cheap to generate, so that they do not significantly contribute to the total CPU budget. Should this not be the case, the gain from the LC covariance is reduced in comparison with the sample covariance. In this case it may become beneficial to select Ma > 1, to reduce the variance of the covariance estimate. The selection of the optimal method is a trade-off between the cost of the 2PCF computation, the required level of accuracy, and the cost of constructing the mock catalogs.

Future large galaxy surveys, such as the one provided by Euclid, face the challenge of constructing the covariance for huge galaxy samples. We believe the methodology presented here provides a useful tool for meeting that challenge.


Acknowledgments

The 2PCF computations were done at the Euclid Science Data Center Finland (SDC-FI, urn:nbn:fi:research-infras-2016072529), for whose computational resources we thank CSC – IT Center for Science, the Finnish Grid and Cloud Computing Infrastructure (FGCI, urn:nbn:fi:research-infras-2016072533), and the Academy of Finland grant 292882. This work was supported by the Academy of Finland grant 295113. The Euclid Consortium acknowledges the European Space Agency and a number of agencies and institutes that have supported the development of Euclid, in particular the Academy of Finland, the Agenzia Spaziale Italiana, the Belgian Science Policy, the Canadian Euclid Consortium, the French Centre National d’Etudes Spatiales, the Deutsches Zentrum für Luft- und Raumfahrt, the Danish Space Research Institute, the Fundação para a Ciência e a Tecnologia, the Ministerio de Economia y Competitividad, the National Aeronautics and Space Administration, the National Astronomical Observatory of Japan, the Netherlandse Onderzoekschool Voor Astronomie, the Norwegian Space Agency, the Romanian Space Agency, the State Secretariat for Education, Research and Innovation (SERI) at the Swiss Space Office (SSO), and the United Kingdom Space Agency. A complete and detailed list is available on the Euclid web site (http://www.euclid-ec.org).

References

  1. Akeson, R., Armus, L., Bachelet, E., et al. 2019, ArXiv e-prints [arXiv:1902.05569] [Google Scholar]
  2. Alam, S., Ata, M., Bailey, S., Beutler, F., et al. 2005, MNRAS, 470, 2617 [Google Scholar]
  3. Alam, S., Aubert, M., Avila, S., et al. 2021, Phys. Rev. D, 103, 083533 [NASA ADS] [CrossRef] [Google Scholar]
  4. Anderson, T. W. 2003, An Introduction to Multivariate Statistical Analysis (Wiley Interscience), 3rd ed. [Google Scholar]
  5. Colavincenzo, M., Sefusatti, E., Monaco, P., et al. 2019, MNRAS, 482, 4883 [NASA ADS] [CrossRef] [Google Scholar]
  6. Cole, S., Percival, W., Peacock, J., et al. 2005, MNRAS, 362, 505 [NASA ADS] [CrossRef] [Google Scholar]
  7. Dávila-Kurbán, F., Sánchez, A. G., Lares, M., & Ruiz, A. N. 2021, MNRAS, 506, 4667 [CrossRef] [Google Scholar]
  8. DESI Collaboration (Aghamousa, A., et al.) 2016, ArXiv e-prints [arXiv:1611.00036] [Google Scholar]
  9. Dodelson, S., & Schneider, D. 2013, Phys. Rev. D, 88, 063537 [NASA ADS] [CrossRef] [Google Scholar]
  10. Eisenstein, D. J., Zehavi, I., Hogg, D. W., et al. 2005, ApJ, 633, 560 [Google Scholar]
  11. Euclid Collaboration (Scaramella, R., et al.) 2022, A&A, 662, A112 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  12. Friedrich, O., & Eifler, T. 2018, MNRAS, 473, 4150 [NASA ADS] [CrossRef] [Google Scholar]
  13. Gaztañaga, E., & Scoccimarro, R. 2005, MNRAS, 361, 824 [Google Scholar]
  14. Hartlap, J., Simon, P., & Schneider, P. 2007, A&A, 464, 399 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  15. Ivezić, Ž., Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111 [Google Scholar]
  16. Jasche, J., & Lavaux, G. 2017, A&A, 606, A37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  17. Joachimi, B. 2017, MNRAS, 466, L83 [NASA ADS] [CrossRef] [Google Scholar]
  18. Kalus, B., Percival, W. J., Bacon, D. J., et al. 2019, MNRAS, 482, 453 [CrossRef] [Google Scholar]
  19. Keihänen, E., Kurki-Suonio, H., Lindholm, V., et al. 2019, A&A, 631, A73 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  20. Kitaura, F.-S., Rodríguez-Torres, S., Chuang, C.-H., et al. 2016, MNRAS, 456, 4156 [NASA ADS] [CrossRef] [Google Scholar]
  21. Landy, S. D., & Szalay, A. S. 1993, ApJ, 412, 64 [Google Scholar]
  22. Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]
  23. Manera, M., Scoccimarro, R., Percival, W. J., et al. 2013, MNRAS, 428, 1036 [Google Scholar]
  24. Merz, G., Rezaie, M., Seo, H.-J., et al. 2021, MNRAS, 506, 2503 [NASA ADS] [CrossRef] [Google Scholar]
  25. Monaco, P. 2016, Galaxies, 4, 53 [NASA ADS] [CrossRef] [Google Scholar]
  26. Monaco, P., Theuns, T., & Taffoni, G. 2002, MNRAS, 331, 587 [Google Scholar]
  27. Monaco, P., Di Dio, E., & Sefusatti, E. 2019, JCAP, 2019, 023 [CrossRef] [Google Scholar]
  28. Munari, E., Monaco, P., Sefusatti, E., et al. 2017, MNRAS, 465, 4658 [Google Scholar]
  29. Padmanabhan, N., White, M., Zhou, H. H., & O’Connell, R. 2016, MNRAS, 460, 1567 [NASA ADS] [CrossRef] [Google Scholar]
  30. Paz, D. J., & Sánchez, A. G. 2015, MNRAS, 454, 4326 [NASA ADS] [CrossRef] [Google Scholar]
  31. Percival, W. J., Ross, A. J., Sánchez, A. G., et al. 2014, MNRAS, 439, 2531 [NASA ADS] [CrossRef] [Google Scholar]
  32. Percival, W. J., Friedrich, O., Sellentin, E., & Heavens, A. 2022, MNRAS, 510, 3207 [NASA ADS] [CrossRef] [Google Scholar]
  33. Pope, A. C., & Szapudi, I. 2008, MNRAS, 389, 766 [NASA ADS] [CrossRef] [Google Scholar]
  34. Sellentin, E., & Heavens, A. F. 2016, MNRAS, 456, L132 [Google Scholar]
  35. Taylor, A., & Joachimi, B. 2014, MNRAS, 442, 2728 [Google Scholar]
  36. Taylor, A., Joachimi, B., & Kitching, T. 2013, MNRAS, 432, 1928 [Google Scholar]

All Tables

Table 1.

Symbols used in this work.

Table 2.

Parameter values for the PINOCCHIO simulation used in our analysis.

Table 3.

Predictions for the variance of the covariance estimate, based on the covariance of covariance for PINOCCHIO mocks.

All Figures

thumbnail Fig. 1.

Geometry of the mock catalogs used in our analysis. Blue points are the full simulation box and orange points are the slab we use for our analysis. Projected here is a slice of thickness of 100 h−1 Mpc.

In the text
thumbnail Fig. 2.

Correlation function multipoles. Mean over 5000 PINOCCHIO realizations and a single realization. The shaded area around the single realization curve is the 1σ error envelope, computed as the standard deviation of the available realizations.

In the text
thumbnail Fig. 3.

Histogram of correlation function multipole values at r = 25 h−1 Mpc and r = 125 h−1 Mpc. Along with the histograms we show the corresponding best-fit Gaussian distribution in orange.

In the text
thumbnail Fig. 4.

Diagonal covariance from random mocks, one-dimensional case. Top: sample covariance, LC, M = ∞ limit, and theoretical prediction. Bottom: the relative errors, and theoretical 1σ error estimates.

In the text
thumbnail Fig. 5.

Convergence of the covariance of the correlation function multipoles, with respect to the number of realizations (left) and CPU time (right). We use PINOCCHIO mocks and include scales of r > 20 h−1 Mpc. Dashed lines show the theoretical prediction.

In the text
thumbnail Fig. 6.

Diagonal of the covariance for the correlation function monopole, PINOCCHIO mocks. On the top we show the LC and sample covariance estimates, and the M = ∞ limit. On the bottom we show the relative errors, and predicted 1σ error level.

In the text
thumbnail Fig. 7.

Monopole block of the covariance matrix. Top: LC covariance matrix. Middle: difference between the LC covariance matrix and the reference. Bottom: difference between the LC and the sample covariance from same realizations. All are normalized by the diagonal elements of the reference matrix.

In the text
thumbnail Fig. 8.

Component matrices A ̂ $ {\hat{\text{A}}} $ (top) and B ̂ $ {\hat{\text{B}}} $ (bottom), for correlation function multipoles, measured from the PINOCCHIO mocks. The blocks from left to right and from the bottom to the top row correspond to  = 0, 2, 4 multipoles, respectively. Both are normalized by the diagonal elements of the reference matrix.

In the text
thumbnail Fig. 9.

Covariance diagonals for multipoles  = 0, 2, 4, and their cross-correlation, for PINOCCHIO. Sample covariance and LC. Two bottom rows show the difference between the reference and the estimate scaled by r2. To reduce scatter in the curves all the quantities have been rebinned to bins of width of 10 h−1 Mpc.

In the text
thumbnail Fig. 10.

Predicted covariance of the diagonal of the monopole block of the estimated 2PCF covariance (COVCOV). Top: COVCOV for sample covariance. Middle: COVCOV for LC. Bottom: difference of the two. From normalized covariances.

In the text
thumbnail Fig. 11.

Diagonal monopole difference treated with the inverse square root of COVCOV. For reference a Gaussian random vector of same length.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.