Angpow: a software for the fast computation of accurate tomographic power spectra

J.-E. Campagne; J. Neveu; S. Plaszczynski

doi:10.1051/0004-6361/201730399

Home

All issues

Volume 602 (June 2017)

A&A, 602 (2017) A72

Full HTML

Free Access

Issue		A&A Volume 602, June 2017


Article Number		A72
Number of page(s)		8
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/201730399
Published online		15 June 2017

A&A 602, A72 (2017)

Angpow: a software for the fast computation of accurate tomographic power spectra^⋆

J.-E. Campagne, J. Neveu and S. Plaszczynski

LAL, Univ. Paris-Sud, CNRS/IN2P3, Université Paris-Saclay, 91400 Orsay, France
e-mail: campagne@lal.in2p3.fr

Received: 5 January 2017
Accepted: 27 April 2017

Abstract

Aims. The statistical distribution of galaxies is a powerful probe to constrain cosmological models and gravity. In particular, the matter power spectrum P(k) provides information about the cosmological distance evolution and the galaxy clustering. However the building of P(k) from galaxy catalogs requires a cosmological model to convert angles on the sky and redshifts into distances, which leads to difficulties when comparing data with predicted P(k) from other cosmological models, and for photometric surveys like the Large Synoptic Survey Telescope (LSST). The angular power spectrum C_ℓ(z₁,z₂) between two bins located at redshift z₁ and z₂ contains the same information as the matter power spectrum, and is free from any cosmological assumption, but the prediction of C_ℓ(z₁,z₂) from P(k) is a costly computation when performed precisely.

Methods. The Angpow software aims at quickly and accurately computing the auto (z₁ = z₂) and cross (z₁ ≠ z₂) angular power spectra between redshift bins. We describe the developed algorithm based on developments on the Chebyshev polynomial basis and on the Clenshaw-Curtis quadrature method. We validate the results with other codes, and benchmark the performance.

Results. Angpow is flexible and can handle any user-defined power spectra, transfer functions, and redshift selection windows. The code is fast enough to be embedded inside programs exploring large cosmological parameter spaces through the C_ℓ(z₁,z₂) comparison with data. We emphasize that the Limber’s approximation, often used to speed up the computation, gives incorrect C_ℓ values for cross-correlations.

Key words: large-scale structure of Universe / methods: numerical

^⋆

The C++ code is available from https://gitlab.in2p3.fr/campagne/AngPow

© ESO, 2017

1. Introduction

Cosmology is entering the era of wide and deep surveys of galaxies, such as, for example, with the Dark Energy Spectroscopic Instrument (DESI; Levi et al. 2013), the Large Synoptic Survey Telescope (LSST; Ivezic et al. 2008), and the Euclid satellite (Laureijs et al. 2011), in order to investigate the mechanisms of cosmic acceleration (for a review see Weinberg et al. 2013). Cosmological models can be tested, that is, compared against actual measurements, by studying the statistical properties of galaxy clustering. Several methods exist for this, the most classical ones computing correlations in real (Landy & Szalay 1993) or Fourier space (Feldman et al. 1994). However, for wider and deeper surveys, one may also try to condense the clustering information into redshift bins (“shells”) and compute the auto- and cross-correlations between redshift shells (Asorey et al. 2012). This is known as tomography, and allows for a more precise understanding of potential systematic errors in different redshift regions. Several studies compare the merits of this kind of approach with the more classical ones (Asorey et al. 2012, 2014; Di Dio et al. 2014; Nicola et al. 2014; Lanusse et al. 2015) and try to optimize the binning to keep most of the cosmological information. All previous studies have been based on the Fisher formalism, which considers observables as Gaussian; unrepresentative, however, of real life data.

To go on further and prepare the future tomographic analyses, one needs to implement a full pipeline and test the method with, for instance, a Monte-Carlo Markov chain (MCMC) exploration of the cosmological parameter space. But there is a technical bottleneck; running a typical MCMC algorithm in cosmology is already very lengthy and requires computing typically a few 10⁵ models. Each model is the result of a numerical code that solves the cosmological equations (known as “Boltzmann solvers”), such as CLASS¹ (Blas et al. 2011), which takes typically 5–10 s on eight cores. Today this amounts to several days of computation.

For a tomographic method, one further needs to transform the output of the Boltzmann solver, the matter power spectrum, into the observable space, represented as C_ℓ(z_i,z_j) angular power spectra between two redshift shells located at z_i and z_j. This transformation is numerically challenging because of overlapping integrals between very oscillating spherical Bessel functions.

One then often makes use of the Limber’s approximation, which essentiallyreplaces the Bessel functions by a single Dirac value. However, as we show here, this leads to a poor approximation for auto-correlations and may even be incorrect for cross-correlations, since it cannot capture any anti-correlation.

We therefore address here the issue of accurately and quickly computing the integrals required to derive the correlations between tomographic bins. Our goal, in computational terms, is that this computation be faster than one typical Boltzmann code computation time, that is, essentially at or below the 1s level (on eight cores). Another aspect of this work is to provide a stand-alone library that offers a generic interface where the user can plug any matter power spectrum. This is a different approach from CLASSgal (Di Dio et al. 2013) which also provides some theoretical computations related to tomography that are deeply rooted within the CLASS software.

The integrals defining the C_ℓ(z_i,z_j) angular power spectra are introduced in Sect. 2. In Sect. 3 we detail the algorithm implemented in Angpow, while we address some numerical tests in Sect. 4. Section 5 provides insight into the code design and we conclude in Sect. 6.

2. The position of the problem

Our aim is to compute the angular over density power spectrum C_ℓ(z₁,z₂) as a cross-correlation between two z-shells with mean values (z₁,z₂) and also the auto-correlation C_ℓ(z₁) with z₁ = z₂, taking into account, in both cases, possible redshift selection functions. Following notations of reference (Di Dio et al. 2013), for a couple of redshift (z₁,z₂) one computes C_ℓ(z₁,z₂) according to $C_{ℓ} (z_{1}, z_{2}) = \frac{2}{π} \int_{0}^{\infty} d k k^{2} P (k) Δ_{ℓ} (z_{1},k) Δ_{ℓ} (z_{2},k),$ $\begin{equation} C_{\ell}(z_1, z_2) = \frac{2}{\pi} \int_0^\infty \dx k\ k^2\ P(k) \Delta_{\ell}(z_1, k)\Delta_{\ell}(z_2, k) \label{eq-cl-def} , \end{equation}$ (1)with on one hand P(k) the non-normalized primordial power spectrum, and on the other hand, Δ_ℓ(z,k), a general function used to describe physical processes down to redshift z (Di Dio et al. 2013; Bonvin & Durrer 2011). At the lowest order, Δ_ℓ(z,k) can be expressed as the product of the bias b and a growth factor D(z,k) to account for matter density contribution: $Δ_{ℓ}^{mat .} (z,k) = bD (z,k) j_{ℓ} (k r (z)),$ $\begin{equation} \Delta^{\mrm{mat.}}_{\ell}(z, k) = b D(z,k) j_{\ell}(k\,r(z)) \label{eq-DeltaFunc-def} , \end{equation}$ (2)where j_ℓ(x) is a first kind spherical Bessel function of parameter ℓ, and r(z) is the radial comoving distance of the shell located at redshift z.

For thick redshift shells, one introduces two normalized redshift selection functions W₁(z;z₁,σ₁) and W₂(z′;z₂,σ₂) around z₁ and z₂ with typical width σ₁ and σ₂, respectively. Then, one extends Eq. (1)by $\begin{matrix} C_{ℓ}^{thick} (z_{1}, z_{2}; σ_{1}, σ_{2}) & = & \frac{2}{π} ”_{0}^{\infty} d z d z^{'} W_{1} (z; z_{1}, σ_{1}) W_{2} (z^{'}; z_{2}, σ_{2}) \\ \times \int_{0}^{\infty} d k k^{2} P (k) Δ_{ℓ} (z,k) Δ_{ℓ} (z^{'},k) . \end{matrix}$ $\begin{eqnarray} C^{\mrm{thick}}_{\ell} (z_1, z_2;\sigma_1, \sigma_2) &=& \frac{2}{\pi} \iint_0^\infty \dx z \dx z^\prime \ W_1(z; z_1, \sigma_1) W_2(z^\prime; z_2, \sigma_2)\notag \\ &&\quad \times \int_0^\infty \dx k\ k^2\ P(k) \Delta_{\ell}(z, k)\Delta_{\ell}(z^\prime, k). \end{eqnarray}$ (3)It is convenient to introduce the auxiliary function justified in the following section $\begin{matrix} f_{ℓ} (z,k) & \equiv & \sqrt{\frac{2}{π}} k \sqrt{P (k)} Δ_{ℓ} (z,k) \\ = & \sqrt{\frac{2}{π}} k \sqrt{P (k)} {bD (z,k) j_{ℓ} (k r (z)) + ...} \\ = & \sqrt{\frac{2}{π}} k \sqrt{P (k) D (z,k)^{2}} {b j_{ℓ} (k r (z)) + ...} \\ \equiv \end{matrix}$ $\begin{eqnarray} f_\ell(z,k) &\equiv& \sqrt{\frac{2}{\pi}}\ k \sqrt{P(k)} \ \Delta_{\ell}(z, k) \notag \\ &=& \sqrt{\frac{2}{\pi}}\ k \sqrt{P(k)} \left\{ \vphantom{\sqrt{P(k)}}b D(z,k) j_{\ell}(k\,r(z)) + \dots\right\}\notag \\ &=& \sqrt{\frac{2}{\pi}}\ k \sqrt{P(k)D(z,k)^2} \left\{ \vphantom{\sqrt{P(k)}} bj_{\ell}(k\,r(z)) + \dots\right\} \notag \\ &\equiv& \sqrt{\frac{2}{\pi}}\ k \sqrt{P(k,z)}\ \widetilde{\Delta}_\ell(z,k), \label{eq-fell-func} \end{eqnarray}$ (4)where we have used the factorization of the growth factor D(k,z) from the matter density contribution to introduce the notation P(k,z) and $\begin{matrix} ^{􏽥} \\ Δ_{ℓ} \end{matrix} (z,k)$ $\hbox{$\widetilde{\Delta}_\ell(z,k)$}$ . The dots signify that other contributions may be introduced as the redshift distortions and lensing effects that we ignore here for simplicity. Then, $\begin{matrix} C_{ℓ}^{thick} (z_{1}, z_{2}; σ_{1}, σ_{2}) & = & ” \begin{matrix} \infty \\ 0 \end{matrix} d z d z^{'} W_{1} (z; z_{1}, σ_{1}) W_{2} (z^{'}; z_{2}, σ_{2}) \end{matrix}$ $\begin{eqnarray} C^{\mathrm{thick}}_{\ell} (z_1, z_2;\sigma_1, \sigma_2) & = &\iint_0^\infty \mathrm{d} z \mathrm{d} z^\prime \ W_1(z; z_1, \sigma_1) W_2(z^\prime; z_2, \sigma_2)\notag \\ &&\quad \times \int_0^\infty \mathrm{d} k\ f_{\ell}(z, k) f_{\ell}(z^\prime, k). \label{eq-clz1z2-obs} \end{eqnarray}$ (5)The auto-correlation is a particular case where the redshift selection function W₂(z′;z₂,σ₂) is reduced to W₁(z′;z₁,σ₁) and we can use a single W function, which leads to $\begin{matrix} C_{ℓ}^{thick} (z_{1}; σ_{1}) & = & ” \begin{matrix} \infty \\ 0 \end{matrix} d z d z^{'} W (z; z_{1}, σ_{1}) W (z^{'}; z_{1}, σ_{1}) \end{matrix}$ $\begin{eqnarray} C^{\mrm{thick}}_{\ell}(z_1 ;\sigma_1) &= & \iint_0^\infty \dx z\ \dx z^\prime\ W(z; z_1, \sigma_1) W(z^\prime; z_1, \sigma_1)\notag \\ &&\quad \times \int_0^\infty \mathrm{d} k\ f_{\ell}(z, k) f_{\ell}(z^\prime, k). \label{eq-clz1-obs} \end{eqnarray}$ (6)To account for infinite redshift precision at z = z₁, the use of a Dirac selection function for W yields $C_{ℓ}^{δ} (z_{1}) = \frac{2}{π} \int_{0}^{\infty} d k k^{2} P (k, z_{1}) \begin{matrix} 􏽥 \\ Δ_{ℓ}^{2} \end{matrix} (z_{1},k) .$ $\begin{equation} C^{\delta}_{\ell}(z_1) = \frac{2}{\pi} \int_0^\infty \dx k\ k^2 P(k,z_1) \widetilde{\Delta}^2_{\ell}(z_1, k). \label{eq-clz1-delta} \end{equation}$ (7)Angpow is designed to efficiently compute these C_ℓ expressions once one provides access to the power spectra P(k,z), the $\begin{matrix} _{˜} \\ Δ_{ℓ} \end{matrix} (z,k)$ $\hbox{$ \tilde{\Delta}_{\ell}(z, k)$}$ extra function, and the cosmological distance r(z). To simplify the notation, we do not write the width σ of the selection functions if not explicitly needed.

3. A brief description of the computational algorithm

The redshift integral computations of Eq. (5)can be conducted in practice inside the rectangle [z_1min,z_1max] × [z_2min,z_2max] given by the W selection functions using a Cartesian product of a one-dimensional (1D) quadrature defined by the set of sample nodes z_i and weights w_i. In practice, we use the Clenshaw-Curtis quadrature. The corresponding sampling points (z_1i,z_2j) are weighted by the product w_iw_j using the 1D quadrature sample points and weights on both redshift regions with i = 0,...,N_z₁−1 and j = 0,...,N_z₂−1. Then, one gets the following approximation: $C_{ℓ}^{thick} (z_{1}, z_{2}) \approx \sum_{i = 0}^{N_{z_{1}} - 1} \sum_{j = 0}^{N_{z_{2}} - 1} w_{i} w_{j} W_{1} (z_{i}, z_{1}) W_{2} (z_{j}, z_{2}) \begin{matrix} 􏽢 \\ P_{ℓ} \end{matrix} (r_{i}, r_{j})$ $\begin{equation} C^{\mrm{thick}}_{\ell}(z_1, z_2) \approx \sum_{i=0}^{N_{{z}_1}-1}\sum_{j=0}^{N_{{z}_2}-1} w_i w_j W_1(z_i,z_1)W_2(z_j,z_2) \widehat{P}_\ell(r_i,r_j) \label{eq-cross-zquadra} \end{equation}$ (8)with the notations z_i = z_1i, z_j = z_2j and r_i = r(z_1i), r_j = r(z_2j) and $\begin{matrix} 􏽢 \\ P_{ℓ} \end{matrix} (z_{i}, z_{j}) = \int_{0}^{\infty} d k f_{ℓ} (z_{i},k) f_{ℓ} (z_{j},k),$ $\begin{equation} \widehat{P}_\ell(z_i,z_j) = \int_0^\infty \dx k\ f_\ell(z_i,k) f_\ell(z_j,k) , \end{equation}$ (9)defined with the f_ℓ(z,k) function of Eq. (4).

We use a piecewise integration over a user-defined range [k_min,k_max] such that $\begin{matrix} 􏽢 \\ P_{ℓ} \end{matrix} (z_{i}, z_{j}) \approx \sum_{p = 0}^{N_{k} - 1} I_{ℓ} (k_{p}^{ℓ}, k_{p + 1}^{ℓ}; z_{i}, z_{j}),$ $\begin{equation} \widehat{P}_\ell(z_i,z_j) \approx \sum_{p=0}^{N_\mrm{k}-1} I_\ell(k^\ell_p,k^\ell_{p+1};z_i,z_j) , \end{equation}$ (10)where the $k_{p}^{ℓ}$ $\hbox{$k^\ell_p$}$ bounds are related to the roots of j_ℓ(x) noted q_ℓp and the user-defined number of roots q_ℓp per sub-interval $[k_{p}^{ℓ}, k_{p + 1}^{ℓ}]$ $\hbox{$[k^\ell_p, k^\ell_{p+1}]$}$ (see Appendix A). Then, Eq. (8)may be rewritten as $\begin{matrix} \begin{matrix} \begin{matrix} \end{matrix} C_{ℓ}^{thick} (z_{1}, z_{2}) \approx \sum_{i = 0}^{N_{z_{1}}} \sum_{j = 0}^{N_{z_{2}}} w_{i} w_{j} W_{1} (z_{i}, z_{1}) W_{2} (z_{j}, z_{2}) \begin{matrix} \end{matrix} \\ \begin{matrix} \end{matrix} \times \sum_{p = 0}^{N_{k} - 1} I_{ℓ} (k_{p}^{ℓ}, k_{p + 1}^{ℓ}; z_{i}, z_{j}) . \end{matrix} \end{matrix}$ $\begin{eqnarray} \begin{split} C^{\mrm{thick}}_{\ell}(z_1, z_2) \approx \sum_{i=0}^{N_{\mrm{z}_1}}\sum_{j=0}^{N_{\mrm{z}_2}} & w_i w_j W_1(z_i,z_1)W_2(z_j,z_2) \\ &\qquad \times \ \sum_{p=0}^{N_\mrm{k}-1} I_\ell(k^\ell_p,k^\ell_{p+1};z_i,z_j). \end{split} \end{eqnarray}$ (11)The integral $I_{ℓ} (k_{p}^{ℓ}, k_{p + 1}^{ℓ}; r_{i}, r_{j})$ $\hbox{$I_\ell(k^\ell_p,k^\ell_{p+1};r_i,r_j)$}$ defined as $I_{ℓ} (k_{p}^{ℓ}, k_{p + 1}^{ℓ}; r_{i}, r_{j}) = \int_{k_{p}^{ℓ}}^{k_{p + 1}^{ℓ}} d k f_{ℓ} (k, z_{i}) f_{ℓ} (k, z_{j}),$ $\begin{equation} I_\ell(k^\ell_p,k^\ell_{p+1};r_i,r_j) = \int_{k^\ell_p}^{k^\ell_{p+1}} \dx k\ f_\ell(k, z_i) f_\ell(k, z_j) \label{eq-I-integ-cross}, \end{equation}$ (12)is computed using the 3C-algorithm of appendix A. Investigating the different steps of the algorithm, one notices that the sampling of the function f_ℓ(k,z_i) along the k axis for a given $[k_{p}^{ℓ}, k_{p + 1}^{ℓ}]$ $\hbox{$[k^\ell_p, k^\ell_{p+1}]$}$ interval depends on z_i but is independent from z_j and vice versa for the f_ℓ(k,z_j) function (those samplings are independent from z_i). So, one may proceed to k-sampling before performing the double sum over (i,j), which is particularly efficient as the CPU bottleneck is the computation of the spherical Bessel function j_ℓ. Angpow uses a spherical Bessel function implementation based on Numerical Recipes (Press et al. 1992). The brute force Cartesian quadrature evolves as O(N_z₁ × N_z₂), while the optimized version reduces the CPU times scaling to O(N_z₁ + N_z₂). Therefore, as a matter of efficiency, it is more appropriate to exchange the order of the p and (i,j) summations leading to $\begin{matrix} C_{ℓ}^{obs} (z_{1}, z_{2}) & \approx & \sum_{p = 0}^{N_{k} - 1} \sum_{i = 0}^{N_{z_{1}}} \sum_{j = 0}^{N_{z_{2}}} w_{i} w_{j} W (z_{i}, z_{1}) \\ \times W (z_{j}, z_{2}) I_{ℓ} (k_{p}, k_{p + 1}; z_{i}, z_{j}) . \end{matrix}$ $\begin{eqnarray} C^{\mrm{obs}}_{\ell}(z_1, z_2) &\approx& \sum_{p=0}^{N_\mrm{k}-1} \sum_{i=0}^{N_{\mrm{z}_1}}\sum_{j=0}^{N_{\mrm{z}_2}} w_i w_j W(z_i,z_1)\notag \\ &&\quad \times W(z_j,z_2) I_\ell(k_p,k_{p+1};z_i,z_j). \end{eqnarray}$ (13)As a first hint for the 3C-algorithm, we use typically 2⁸−2⁹ polynomial approximations for ℓ_max ≈ 500–1000 of the f_ℓ(k,r_i) and f_ℓ(k,r_j) functions, 100 spherical Bessel roots per sub-interval, 99-point Clenshaw-Curtis quadrature nodes, and weights for z integration in case of σ = 0.02.

4. Numerical tests

4.1. Comparison with other codes

We proceed now to a numerical comparison of the estimation of C_ℓ(z₁) and C_ℓ(z₁,z₂) computed by CLASSgal (Di Dio et al. 2013) and Mathematica (Wolfram Research Inc. 2016) with Dirac redshift selection functions. Concerning CLASSgal (v1.1.3), we have started with the provided explanatory.ini file where we have modified the cosmological parameters to: h = 0.679, Ω_b = 0.0483, Ω_cdm = 0.2582 and Ω_k = Ω_fld = 0. We have also set k_scalar_max_tau0_over_l_max to fix the upper bound of the k-integration taking into account the maximal ℓ value and the redshift mean value. Concerning Angpow we have taken advantage of the possibility to read an external file produced by CLASSgal as an input P(k) computed at z = 0 in association to the growth function computed internally given by Lahav et al. (1991), Carroll et al. (1992). Finally, to avoid the Limber’s approximation for CLASSgal, we have set the “Limber” threshold much higher than the ℓ upper limit.

Fig. 1

Comparison of the computations of the $C_{ℓ}^{δ} (z = 1)$ $\hbox{$C_\ell^{\delta}(z=1)$}$ given by Angpow, Mathematica, and CLASSgal for a Dirac selection function with k_max = 10 Mpc^-1.

Fig. 2

Computations of $C_{l}^{δ} (z)$ $\hbox{$C_l^{\delta}(z)$}$ with Dirac selections centered at z ∈ {0.85,1.00,1.15} with Angpow alone and $C_{l}^{thick} (z)$ $\hbox{$C_l^{\mrm{thick}}(z)$}$ with Angpow and CLASSgal using a Gaussian selection function with mean z = 1, a width of σ = 0.3, and a redshift cut at ± 5σ (in all cases k_max = 0.44 Mpc^-1). For Angpow we use the power spectrum produced by CLASSgal and we have varied the redshift grid sampling resolution from 9 × 9 to 159 × 159 points to reach the converged result (blue curve) in good agreement with the CLASSgal result (cyan points). Comparing the $C_{l}^{δ} (z = 1)$ $\hbox{$C_l^{\delta}(z=1)$}$ to the $C_{l}^{thick} (z = 1,σ = 0.3)$ $\hbox{$C_l^{\mrm{thick}}(z=1,\sigma=0.3)$}$ results we measure the effect of self-cross-correlation inside a thick redshift shell which washes out the matter fluctuation contrast.

Results of the auto-correlation $C_{ℓ}^{δ} (z)$ $\hbox{$C_\ell^{\delta}(z)$}$ computations at z = 1 using Dirac selection functions are shown in Fig. 1. As the three softwares use the same primordial power spectrum, all the results agree with one other within a maximal relative error of 0.06% on the whole ℓ range.

Fig. 3

Comparison between Angpow (red/orange points) and CLASSgal (black points) for several cross-correlation spectra with Gaussian (σ = 0.01) selection and k_max = 1 Mpc^-1. The orange points are used to emphasize negative C_ℓ.

The $C_{ℓ}^{thick} (z)$ $\hbox{$C_\ell^{\mrm{thick}}(z)$}$ auto-correlation computation within a thick redshift band, selected by a Gaussian of mean z = 1 and σ = 0.03 cut at ± 5σ, has been used as a test bench. But, for this test we have neglected the Mathematica software, which is too slow, and have restricted testing to comparison of Angpow to CLASSgal. Figure 2 shows computation results. The orange, violet, and red curves are produced by Angpow using Dirac selection function in the range ± 5σ around the mean redshift z = 1, while the green, forest green, purple, and blue curves are results of the above mentioned Gaussian selection function but sampled using different grid sizes: 9 × 9, 19 × 19, 39 × 39 and 159 × 159 Clenshaw-Curtis sample points. As the number of points, or equivalently the quadrature order, increases, the $C_{ℓ}^{thick} (z)$ $\hbox{$C_\ell^{\mrm{thick}}(z)$}$ computation converges towards the CLASSgal result (cyan points). We also address the cross-correlation computations performed by Angpow and CLASSgal ( $C_{ℓ}^{thick} (z_{1}, z_{2})$ $\hbox{$C_\ell^{\mrm{thick}}(z_1,z_2)$}$ ) using Gaussian selection functions (σ = 0.01 and a ± 5σ cut). The results are shown in Fig. 3. One should not be surprised by negative values since we are cross-correlating two different quantities. In both tests we have used the power spectrum computed at z = 0 by CLASSgal as input to Angpow. The agreement between the two software codes is very good, keeping the relative residuals at values less than 0.02%.

4.2. Note on Limber’s approximation

The Angpow library can also be used to compute, if desired, the first order Limber’s approximation (Loverde & Afshordi 2008). In such an approximation, the spherical Bessel function is formally reduced to $j_{ℓ} (x) \approx \sqrt{\frac{π}{2 ℓ + 1}} δ^{D} (x - (ℓ + \frac{1}{2})) \cdot$ $\begin{equation} j_\ell(x) \approx \sqrt{\frac{\pi}{2\ell +1}} \delta^D\left(x-\left(\ell+\frac{1}{2}\right)\right)\cdot \label{eq-jlx-Limber} \end{equation}$ (14)In such conditions, the product kr(z) is constrained, and if one uses the following notation for the comoving distance computation with d_H = c/H₀, the Hubble distance (H₀ = 100h km s^-1 Mpc^-1 and c the speed of light) and E(z), the dimensionless Hubble parameter, $r (z_{ℓ} (k)) = \frac{l + 1 / 2}{k} = d_{H} \int_{0}^{z_{ℓ} (k)} \frac{d z}{E (z)} \cdot$ $\begin{equation} r(z_\ell(k)) = \frac{l+1/2}{k} = d_\mrm{H}\int_0^{z_\ell(k)} \frac{\dx z}{E(z)}\cdot \end{equation}$ (15)Then, Eq. (5)is transformed to the following expression $\begin{matrix} \begin{matrix} \begin{matrix} \end{matrix} C_{ℓ}^{thick} (z_{1}, z_{2}; σ_{1}, σ_{2}) \approx \frac{2}{d_{H}^{2} (2 ℓ + 1)} \int_{0}^{\infty} d k W_{1} (z_{ℓ} (k); z_{1}, σ_{1}) \begin{matrix} \end{matrix} \\ \begin{matrix} \end{matrix} \times W_{2} (z_{ℓ} (k); z_{2}, σ_{2}) E^{2} (z_{ℓ} (k)) P (k, z_{ℓ} (k)) . \end{matrix} \end{matrix}$ $\begin{eqnarray} \begin{split} C_{\ell}^{\mrm{thick}}(z_1, z_2; \sigma_1, \sigma_2) &\approx \frac{2}{d_\mrm{H}^2(2\ell+1)} \int_0^\infty \dx k\ W_1(z_\ell(k); z_1, \sigma_1) \\& \times W_2(z_\ell(k); z_2, \sigma_2) E^2(z_\ell(k)) P(k,z_\ell(k)). \label{eq-Cl-limber} \end{split} \end{eqnarray}$ (16)This integral can be computed using a divide-and-conquer recursive method with the Gauss-Kronrod quadrature (Laurie 1997). The Gauss sample points are a subset of the Gauss-Kronrod sample points and can easily be used to set up an error estimate to drive the recursive algorithm.

Looking at Eq. (16) one realizes that all the terms are positive, indicating that such approximation is not suitable for cross-correlation where C_ℓ(z₁,z₂) is not guaranteed to be positive as can be explicitly seen in Fig. 3. We have proceeded to the computation of C_ℓ(z) in the case of a Gaussian selection function of mean z = 1 and σ = 0.03 for both Angpow and CLASSgal, with/without the Limber’s approximation. The results are shown in Fig. 4. The two software codes agree very well and show that the Limber’s approximation can give sizeable errors compared to the exact computation; of the order of the cosmic variance in the given example. So, this Limber’s approximation, even if it runs 100 times faster than the exact computation, should then be used with extreme care not only for cross-correlation but also for auto-correlation.

Fig. 4

Comparison of the computations of the $C_{ℓ}^{thick} (z)$ $\hbox{$C_\ell^\mrm{thick}(z)$}$ given by Angpow and CLASSgal either with the Limber’s approximation or the exact computation.

4.3. Correlations in real space

Angpow can also quickly compute the correlation function in real space from the power spectrum $C (θ; z_{1}, z_{2}) = \frac{1}{4 π} \sum_{ℓ = 0}^{ℓ_{\max}} (2 ℓ + 1) \begin{matrix} ˜ \\ C_{ℓ} \end{matrix} (z_{1}, z_{2}) P_{ℓ} (\cos θ),$ $\begin{equation} \label{eq:ctheta} C(\theta; z_1, z_2)=\frac{1}{4\pi} \sum_{\ell=0}^{\ell_\mrm{max}} (2\ell+1) \tilde{C}_{\ell}(z_1, z_2) P_\ell(\cos \theta) , \end{equation}$ (17)where P_ℓ denotes the ℓth order Legendre polynomial. Because the C_ℓ(z₁,z₂) values are generally cut at a given ℓ_max, one needs to introduce an apodization to avoid ringing due to a sharp cut-off. We introduce a Gaussian one (which is the smoothest in both real and harmonic spaces) as in Di Dio et al. (2013) so that the $\begin{matrix} _{˜} \\ C_{ℓ} \end{matrix} (z_{1}, z_{2})$ $\hbox{$ \tilde{C}_{\ell}(z_1, z_2) $}$ term in Eq. (17)is $\begin{matrix} ˜ \\ C_{ℓ} \end{matrix} (z_{1}, z_{2}) = C_{ℓ} (z_{1}, z_{2}) e^{- ℓ (ℓ + 1) / ℓ_{a}^{2}} .$ $\begin{equation} \tilde{C}_{\ell}(z_1, z_2) = C_{\ell}(z_1, z_2 )e^{-\ell(\ell+1)/\ell_{\rm a}^2}. \end{equation}$ (18)The apodization length ℓ_a may depend on the signal but for the standard cosmology shown here (around z = 1) we noticed that using l_a ≃ 0.4ℓ_max gives good results. Correlations in real space are generally easier to comprehend as is shownin Fig. 5, which represents the analogue of Fig. 3 but in real space. Here one may identify a peak, named the “Baryonic Acoustic Oscillation” (e.g., Weinberg et al. 2013) that decreases in the cross-correlation when the distance between shells increases and is finally washed out.

Fig. 5

Cross-correlations in real space corresponding to the spectra shown on Fig. 3. The points show where the function was evaluated.

4.4. Speed tests

Angpow is designed and written in C++ and parallelization is achieved through OpenMP. To qualify the code we provide four input parameter files and their corresponding outputs obtained in one of our runs. In all tests, we used a ΛCDM reference cosmology, a P(k) at z = 0 provided by CLASSgal, ℓ_max = 1000, a 3C-algorithm with 2⁹ Chebyshev polynomial order, and 100 roots per sub-k-interval:

Test 1:
auto-correlation with a Dirac selection function at z = 1 and k_max = 10 Mpc^-1;
Test 2:
cross-correlation with two Dirac selection functions at z = 1 and z = 1.05 and k_max = 10 Mpc^-1;
Test 3:
auto-correlation with a Gaussian selection function with (z_mean = 1, σ_z = 0.02, 5σ_z-cut) and k_max = 1 Mpc^-1 and a radial quadrature based on N_pts = 139 sample points;
Test 4:
cross-correlation with two Gaussian selection functions with (z_mean,1 = 0.90, z_mean,2 = 1.00) both with (σ_z = 0.02, 5σ_z-cut) and k_max = 1 Mpc^-1 and a radial quadrature based on N_pts = 139 sample points.

We have tested the code both on laptop (Linux, MacOSX) as well as on Computer Center (CCIN2P3 in France and NERSC in the USA). We use OpenMP to distribute the computation of a given C_ℓ on a single thread. Table 1 gives Central Processing Unit (CPU) execution times averaged over ten processes. The code wall time decreases reasonably well with the number of threads and a wall time at the $\hbox{$\mathcal{O}$}$ (1s) level can be reached to reconstruct these accurate spectra. Such performances are much higher than those obtained with CLASSgal when not using the Limber approximation. For example, on our Test-3 setup, running the latter takes about 15s (on 16 threads), which is to be compared to about 0.5s in Table 1.

Table 1

Wall time (in seconds) measured at CCIN2P3 (on Intel Xeon CPU E5-2640 v3 processors) for the test benches described in the text, according to the number of OpenMP threads used.

Fig. 6

Evolution of the wall time with respect to the width of the selection function (σ) illustrated in the condition of Test 3 and using 16 threads. In the upper scale of the frame is shown an indication of the radial_order used to sample the along the z direction for a given σ (see Sect. 5).

Figure 6 shows the dependence of the wall time with respect to the width of the selection function (σ) in the conditions of Test 3 using 16 threads. For a given σ, we have chosen the minimal radial_order value such that the relative accuracy on the C_ℓ is of the order of 0.01% compared to a computation with a much larger radial_order value (see Sect. 5). If one uses a looser criteria on the accuracy of the C_ℓ or if the number of sigma is lower than 5, then one may use a lower radial_order and gain on the wall time.

5. Code design and input parameters

Angpow is written in C++ which allows both good CPU performances and keeps the code flexible. A front end interface to Python is also foreseen and the code is distributed publicly². The angpow.cc file is an example of the library usage to perform C_ℓ and C(θ) computations. We also provide the limber.cc file if one wants to test the Limber’s approximation (Sect. 4.2). The different files are located in self-explained directories: src, inc/Angpow, lib, data. Finally, a README file provides details for the installation and build procedures.

The two main classes Pk2Cl and KIntegrator (located in angpow_pk2cl.h and angpow_kinteg.h files) are generic codes using abstract base classes. They define interface to the power spectrum function P(ℓ,k,z) (class PowerSpecBase), the generalisation of P(k,z) used in Eq. (5); to the comoving distance computation r(z) (class CosmoCoorBase); and to the radial/redshift selection functions W(z) (class RadSelectBase). The user can derive their own concrete classes to access a third party library or use the ones implemented by default. For instance, we have implemented a file access to a (k,P(k))-tuple saved by the CLASSgal output. In this implementation, we have coded the growth factor defined in Lahav et al. (1991), Carroll et al. (1992) as minimal $\begin{matrix} _{˜} \\ Δ_{ℓ} \end{matrix} (z,k)$ $\hbox{$\tilde{\Delta}_\ell(z,k)$}$ function (Eq. (2)).

To run the executable, one provides an ascii file defining the input parameters that drive the computational conditions of the algorithm and define the I/O locations. Some ready-made input parameter files are also provided (angpow_bench<n>.ini) as well as the C_ℓ output files (angpow_bench<n>_cl.txt.REF) corresponding to the icpc outputs of Table 1.

Among the different input parameters some are more sensitive than others, as those that deal with the radial/redshift 1D quadrature and the 3C algorithm. Here is a closer look at these parameters:

cl_kmax: this is the maximal value of k in the k-integral. We have not set up an internal algorithm to determine this upper bound. As a hint, one may consider a relation with the factor ℓ_max/r(z_min). The lower bound on k is internally fixed using the cut-off x_min(ℓ) defined as x < x_min(ℓ) ⇔ j_ℓ(x) < cut (see the input parameters jl_xmin_cut and Lmax_for_xmin set as default to 5 × 10^-10 and 2000, respectively).
radial_order: if noted n, this fixes the number of sample points along one z direction, that is, N_pts = 2n−1. The accuracy on the selection function as well as the CPU increase with n but we keep a O(n) complexity of the k-sampling (see Fig. 6);
chebyshev_order: if noted N, this fixes the degree d of the Chebyshev polynomial approximation of the f_ℓ(k,z_i) and f_ℓ(k,z_j) functions (Eq. (4)); that is, d = 2^N. Keeping the same degree of approximation for both functions guarantees a power of 2 for the product approximation. Even if not mandatory, this helps in getting CPU performance for the DCT-I transform (using the FFTW library). When ℓ_max increases it may be worth updating this parameter by 1 unit step. For ℓ_max = 500chebyshev_order is set to 8 by default. Increasing the angular spectrum computation up to ℓ_max = 1000 keeping this default value leads to absolute error of the order of 10^-6 for Tests 1 and 2 and 5 × 10^-10 for Test 3 and 4, then to get better accuracy in this case we use chebyshev_order = 9.
n_bessel_roots_per_interval: this is the number of Bessel roots q_ℓp used to define the bounds of the integral $I_{ℓ} (k_{p}^{ℓ}, k_{p + 1}^{ℓ}; r_{i}, r_{j})$ $\hbox{$I_\ell(k^\ell_p,k^\ell_{p+1};r_i,r_j)$}$ (Eq. (12)). By default it is set to 100. There is an interplay with the chebyshev_order parameter as a lower n_bessel_roots_per_interval value is coherent with a lower chebyshev_order.
total_weight_cut and deltaR_cut: these two threshold parameters are used to avoid unnecessary 3C algorithm processing (especially for the k-sampling of the f_ℓ(k,z_i) or f_ℓ(k,z_j) functions). So, we do not consider a couple (z_i,z_j) for which either the product w_iw_jW(z_i,z₁)W(z_j,z₂) is too low (total_weight_cut cut) or the radial distance | r(z_i)−r(z_j) | is too large to produce a sizeable contribution to the final C_ℓ. The deltaR_cut cut is in Mpc units and is used in conjunction with has_deltaR_cut set to 1. These two threshold parameters depend on the use case under consideration and for preliminary tests we recommend to set both total_weight_cut and has_deltaR_cut to 0.

6. Summary and outlooks

We have set up a fast and generic software to compute the tomographic C_ℓ(z₁,z₂) with redshift selection functions. Angpow is versatile enough to accept user-defined matter power spectrum P(k), transfer functions $\begin{matrix} _{˜} \\ Δ_{ℓ} \end{matrix} (k,z),$ $\hbox{$\tilde{\Delta}_\ell(k,z),$}$ and cosmology. The code provides an accurate computation of the auto and cross-correlation power spectra, checked by comparison with other codes, which is fast enough to be included inside MCMC cosmology softwares. The rapidity of the software relies on the use of the 3C-algorithm, adapted to the computation of integrals over spherical Bessel functions, while other codes rely on the Limber’s approximation. We emphasize that the Limber’s approximation can lead to incorrect C_ℓ(z₁,z₂), especially in the case of cross-correlations, as Limber’s Dirac functions cannot model the interferences between two spherical Bessel functions at different redshifts.

This code is thus fast and accurate enough to be used to test cosmological parameters, and perform tomographic analysis of the galaxy distribution. The definition of the $\begin{matrix} _{˜} \\ Δ_{ℓ} \end{matrix} (k,z)$ $\hbox{$\tilde{\Delta}_\ell(k,z)$}$ function is general and can accept zero order function as $\begin{matrix} _{˜} \\ Δ_{ℓ}^{mat} \end{matrix} (k,z)$ $\hbox{$\tilde{\Delta}^{\rm mat}_\ell(k,z)$}$ , but also relativistic corrections such as the redshift space distortions or the gravitational lensing; despite these corrections, it can also contain spherical Bessel functions (see e.g., Di Dio et al. 2013). Because Angpow provides the correct angular power spectra for cross-correlations, it can be a key tool to perform an integrated approach to cosmology, as advertised in Nicola et al. (2016). In particular, we propose this tool for deriving the auto- and cross-correlation angular power spectra for galaxy clustering, but also with angular power spectra from cosmic shear and the cosmological microwave background.

Angpow is publicly available³ and can be interfaced to other codes; a Python interface is foreseen. At the moment the code only accepts two redshift bins but soon it will be generalized to any number of bins. Feedback from Angpow users would be greatly accepted.

¹

http://class-code.net

²

https://gitlab.in2p3.fr/campagne/AngPow

³

From https://gitlab.in2p3.fr/campagne/AngPow

Acknowledgments

The authors want to thank M. Reinecke who kindly provided pieces of the code, and J. D. McEwen for fruitful discussion on the Chebyshev transform.

References

Asorey, J., Crocce, M., Gaztañaga, E., & Lewis, A. 2012, MNRAS, 427, 1891 [NASA ADS] [CrossRef] [Google Scholar]
Asorey, J., Crocce, M., & Gaztañaga, E. 2014, MNRAS, 445, 2825 [NASA ADS] [CrossRef] [Google Scholar]
Baszenski, G., & Tasche, M. 1997, Linear Algebra and its Application, 252, 1 [CrossRef] [Google Scholar]
Blas, D., Lesgourgues, J., & Tram, T. 2011, JCAP, 2011, 034 [NASA ADS] [CrossRef] [Google Scholar]
Bonvin, C., & Durrer, R. 2011, Phys. Rev. D, 84, 063505 [NASA ADS] [CrossRef] [Google Scholar]
Carroll, S. M., Press, W. H., & Turner, E. L. 1992, ARA&A, 30, 499 [NASA ADS] [CrossRef] [Google Scholar]
Di Dio, E., Montanari, F., Lesgourgues, J., & Durrer, R. 2013, JCAP, 11, 044 [CrossRef] [Google Scholar]
Di Dio, E., Montanari, F., Durrer, R., & Lesgourgues, J. 2014, JCAP, 1, 042 [CrossRef] [Google Scholar]
Feldman, H. A., Kaiser, N., & Peacock, J. A. 1994, ApJ, 426, 23 [NASA ADS] [CrossRef] [Google Scholar]
Frigo, M., & Johnson, S. G. 2005, special issue on “Program Generation, Optimization, and Platform Adaptation, Proc. IEEE, 93, 216 [Google Scholar]
Giorgi, P. 2012, IEEE Trans. Comput., 61, 780 [CrossRef] [Google Scholar]
Glasser, M. L., & Montaldi, E. 1993, ArXiv e-prints [arXiv:math/9307213] [Google Scholar]
Ivezic, Z., Tyson, J. A., Abel, B., et al. 2008, ArXiv e-prints [arXiv:0805.2366] [Google Scholar]
Lahav, O., Lilje, P. B., Primack, J. R., & Rees, M. J. 1991, MNRAS, 251, 128 [NASA ADS] [CrossRef] [Google Scholar]
Landy, S. D., & Szalay, A. S. 1993, ApJ, 412, 64 [NASA ADS] [CrossRef] [Google Scholar]
Lanusse, F., Rassat, A., & Starck, J.-L. 2015, A&A, 578, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]
Laurie, D. P. 1997, Math. Comput., 466, 1133 [NASA ADS] [CrossRef] [Google Scholar]
Levi, M., Bebek, C., Beers, T., et al. 2013, ArXiv e-prints [arXiv:1308.0847] [Google Scholar]
Loverde, M., & Afshordi, N. 2008, Phys. Rev. D, 78, 123506 [NASA ADS] [CrossRef] [Google Scholar]
Lucas, S., & Stone, H. 1995, J. Comput. Appl. Math., 64, 217 [CrossRef] [Google Scholar]
Nicola, A., Refregier, A., Amara, A., & Paranjape, A. 2014, Phys. Rev. D, 90, 063515 [NASA ADS] [CrossRef] [Google Scholar]
Nicola, A., Refregier, A., & Amara, A. 2016, Phys. Rev. D, 94, 083517 [NASA ADS] [CrossRef] [Google Scholar]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical Recipes in C, 2nd edn.: The Art of Scientific Computing (New York: Cambridge University Press) [Google Scholar]
Waldvogel, J. 2006, BIT Numerical Mathematics, 46, 195 [CrossRef] [Google Scholar]
Weinberg, D. H., Mortonson, M. J., Eisenstein, D. J., et al. 2013, Phys. Rep., 530, 87 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
Wolfram Research Inc. 2016, Mathematica 11.0, Champaign, Illinois, USA [Google Scholar]

Appendix A: Clenshaw-Curtis-Chebyshev algorithm (3C-algorithm)

Each integral type of Eq. (12)involves the product of “highly” oscillatory functions. The purpose of this section is not to provide a review of all the integration methods used in the different fields of physics to tackle such a difficult task. To focus on our case, where we have to deal with (at least) the product of spherical Bessel functions, the authors point out that the reader may find either specific integral solving rules as in Glasser & Montaldi (1993) or general methods as in Lucas & Stone (1995). However we need a precise and also very fast method. We cannot rely on methods that solve the problem of a product of spherical Bessel functions multiplied by a regular function. In fact, both the primordial power spectrum and the extension beyond the matter density $\begin{matrix} _{˜} \\ Δ^{mat .} \end{matrix} (z,k)$ $\hbox{$\tilde{\Delta}^{\mrm{mat.}}(z,k)$}$ may show oscillation features in the form of derivative of spherical Bessel functions. So, we have searched and found a general method that meets our requirements of precision and speed.

Equation (12)is a special case of the following generic integral after a proper change of variable $I = \int_{-1}^{1} d xf (x) g (x) .$ $\appendix \setcounter{section}{1} \begin{equation} I = \int_{-1}^1 \dx x f(x) g(x). \end{equation}$ (A.1)To get an approximate value of this integral, we use in this section the Clenshaw-Curtis quadrature at order N_cc (noting h = f × g): $I \approx \sum_{k = 0}^{N_{cc}} w_{k} f (x_{k}) g_{(} x_{k}) = \sum_{k = 0}^{N_{cc}} w_{k} h (x_{k}),$ $\appendix \setcounter{section}{1} \begin{equation} I \approx \sum_{k=0}^{N_{\mrm{cc}}} w_k f(x_k)g_(x_k) = \sum_{k=0}^{N_{\mrm{cc}}} w_k h(x_k) \label{eq-CC-productInt} , \end{equation}$ (A.2)where the sampling points are defined as x_k = coskπ/N_cc (k = 0,...,N_cc) and the corresponding weights w_k are addressed later in the section. But, if the functions f and g have a highly oscillatory behavior, one needs, in principle, to use large values of N_cc to reach a sufficient accuracy level. In that case, dealing with the above sum may not be computationally efficient. The idea is then to use Chebyshev series to approximate both functions f and g. Then, one performs the product of both Chebyshev series, which is also a Chebyshev series but with a higher order, and finally one uses a fast Clenshaw-Curtis weights computation to perform the last weighted sum. We briefly describe those steps leaving the details of the demonstration that the interested reader can find in Baszenski & Tasche (1997).

Let f_N be a polynomial approximation of f of degree N−1. We expend f_N onto the following basis of the first kind of Chebyshev polynomials { T_n;n = 0,...,N−1 } which have the property T_n(cosθ) = cosnθ: $\begin{matrix} f_{N} (x) = \frac{a_{0}}{2} + \sum_{k = 1}^{N - 1} a_{k} T_{k} (x) . \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} f_N(x) = \frac{a_0}{2}+\sum_{k=1}^{N-1} a_k T_k(x). \end{eqnarray}$ (A.3)To determine the a_k coefficients one uses the following sampling vector $f^{(N)} = (f (t_{μ}^{(N)}))^{T} with t_{μ}^{(N)} \equiv \cos \frac{μπ}{N}; μ = 0, ...,N,$ $\appendix \setcounter{section}{1} \begin{equation} \mbb{f}^{(N)} =( f( t_\mu^{(N)}))^T\qquad \mrm{with} \qquad t_\mu^{(N)} \equiv \cos \frac{\mu \pi}{N};\; \mu=0,\dots,N , \end{equation}$ (A.4)of length N + 1 and related to the vector a^(N) = (a₀,...,a_N−1,0)^T by the linear algebra relation $a^{(N)} = \frac{2}{N} C_{N}^{I} f^{(N)},$ $\appendix \setcounter{section}{1} \begin{equation} \mbb{a}^{(N)} = \frac{2}{N}\mbb{C}^I_N\, \mbb{f}^{(N)} \label{eq-aCoeff} , \end{equation}$ (A.5)with $C_{N}^{I}$ $\hbox{$\mbb{C}^I_N$}$ being a discrete cosine transform of type-I (DCT-I) matrix of dimension (N + 1)². Similarly, we note g_M a polynomial approximation of g of degree M−1 from which we determine the sampling vector g^(M) using the sample points $t_{μ}^{(M)}$ $\hbox{$t_\mu^{(M)}$}$ . The coefficient vector b^(M) = (b₀,...,b_M−1,0)^T is related to g^(M) using a relation similar to Eq. (A.5), namely $b^{(M)} = \frac{2}{M} C_{M}^{I} g^{(M)} .$ $\appendix \setcounter{section}{1} \begin{equation} \mbb{b}^{(M)} = \frac{2}{M}\mbb{C}^I_M\, \mbb{g}^{(M)}. \label{eq-bCoeff} \end{equation}$ (A.6)By combining the polynomial approximations f_N and g_M, the function h is then approximated by a Chebyshev series of degree N + M−2 with coefficient vector c^(P) of length P + 1 with P ≥ N + M−1. Using a relation of type Eq. (A.5), the vector c^(P) is related to the sampling vector $h^{(P)} = {(h {({t_{μ}^{(P)}}^{)}}^{)}}^{T}; μ = 0, ...,P .$ $\appendix \setcounter{section}{1} \begin{equation} \mbb{h}^{(P)} = \left(h\left(t_\mu^{(P)}\right)\right)^T;\; \mu=0,\dots,P. \end{equation}$ (A.7)To get h^(P) it is not necessary to compute c^(P) and proceed to an inversion of a DCT-I matrix. The key point is that if we note ⊙ , the component-wise multiplication, one has $h^{(P)} = f^{(P)} ⊙ g^{(P)} .$ $\appendix \setcounter{section}{1} \begin{equation} \mbb{h}^{(P)} = \mbb{f}^{(P)} \odot \mbb{g}^{(P)} . \end{equation}$ (A.8)Moreover, f^(P) (g^(P)) is obtained from a^(N) (b^(M)) of length N + 1 (M + 1) by an extension to a larger vector at least of length N + M noted ${\begin{matrix} ˜ \\ a \end{matrix}}^{(P)}$ $\hbox{$\tilde{\mbb{a}}^{(P)}$}$ ( ${\begin{matrix} ˜ \\ b \end{matrix}}^{(P)}$ $\hbox{$\tilde{\mbb{b}}^{(P)}$}$ ) by appending with zeros: $\begin{matrix} {\begin{matrix} ˜ \\ a \end{matrix}}^{(P)} & = & (a^{(N)}, 0, ..., 0), \\ {\begin{matrix} ˜ \\ b \end{matrix}}^{(P)} & = & (b^{(M)}, 0, ..., 0) . \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} \tilde{\mbb{a}}^{(P)} &=& (\mbb{a}^{(N)}, 0, \dots, 0), \nonumber \\ \tilde{\mbb{b}}^{(P)} &=& (\mbb{b}^{(M)}, 0, \dots, 0). \end{eqnarray}$ (A.9)Then, the sampling vector used in Eq. (A.2)where one identifies N_cc = P is determined by $h^{(N_{cc})} = (C_{N_{cc}}^{I} {\begin{matrix} ˜ \\ a \end{matrix}}^{(N_{cc})}) ⊙ (C_{N_{cc}}^{I} {\begin{matrix} ˜ \\ b \end{matrix}}^{(N_{cc})}),$ $\appendix \setcounter{section}{1} \begin{equation} \mbb{h}^{(N_{\mrm{cc}})} = \left(\mbb{C}^I_{N_{\mrm{cc}}}\, \tilde{\mbb{a}}^{(N_{\mrm{cc}})}\right) \odot \left(\mbb{C}^I_{N_{\mrm{cc}}}\, \tilde{\mbb{b}}^{(N_{\mrm{cc}})}\right) \label{eq-hSample} , \end{equation}$ (A.10)using $C_{N_{cc}}^{I}$ $\hbox{$\mbb{C}^I_{N_{\mrm{cc}}}$}$ the DCT-I matrix of dimension (N_cc + 1)² and the inversion property $(C_{P}^{I})^{-1} = (2 / P) C_{P}^{I}$ $\hbox{$(\mbb{C}^I_P)^{-1}=(2/P) \mbb{C}^I_P$}$ . In some sense, for both f and g approximation sampling vectors, we have performed a Chebyshev basis change to a larger parameter space compatible with the polynomial degree involved in the product f^(N) × g^(M).

The second key point is that the Clenshaw-Curtis weights associated to h^(N_cc) in Eq. (A.2)can also be computed with a DCT-I transform from the vector (2 /N_cc)(1,0,−1/3,0,−1/15,...,((−1)^k + 1)/2(1−k²),... ) of length N_cc + 1 (Waldvogel 2006) (the normalization depends on the exact definition of the DCT-I used).

So, to perform the integral given by Eq. (A.2), one needs 4 + 1 DCT-I transforms, 1 for the Clenshaw-Curtis weights and 4 to transform the Chebyshev coefficients vectors. The DCT-I transform may be implemented using O(nlog n) efficient algorithm, for example, the FFTW library (Frigo & Johnson 2005) used by Angpow. Angpow uses a power of 2 for N, M, and also P (keeping P ≥ N + M−1) for a fast implementation. The 3C-algorithm is a special case of a more general class of algorithms dealing with the product of polynomials. We note that according to reference (Giorgi 2012) an even faster algorithm (although moderate) might be implemented in a future version of Angpow if necessary. We note finally that this general method can be applied to use cases beyond the power spectrum computation in other fields of interest.

All Tables

Table 1

Wall time (in seconds) measured at CCIN2P3 (on Intel Xeon CPU E5-2640 v3 processors) for the test benches described in the text, according to the number of OpenMP threads used.

In the text

All Figures

	Fig. 1 Comparison of the computations of the $C_{ℓ}^{δ} (z = 1)$ $\hbox{$C_\ell^{\delta}(z=1)$}$ given by Angpow, Mathematica, and CLASSgal for a Dirac selection function with k_max = 10 Mpc^-1.
In the text

Fig. 2

Computations of $C_{l}^{δ} (z)$ $\hbox{$C_l^{\delta}(z)$}$ with Dirac selections centered at z ∈ {0.85,1.00,1.15} with Angpow alone and $C_{l}^{thick} (z)$ $\hbox{$C_l^{\mrm{thick}}(z)$}$ with Angpow and CLASSgal using a Gaussian selection function with mean z = 1, a width of σ = 0.3, and a redshift cut at ± 5σ (in all cases k_max = 0.44 Mpc^-1). For Angpow we use the power spectrum produced by CLASSgal and we have varied the redshift grid sampling resolution from 9 × 9 to 159 × 159 points to reach the converged result (blue curve) in good agreement with the CLASSgal result (cyan points). Comparing the $C_{l}^{δ} (z = 1)$ $\hbox{$C_l^{\delta}(z=1)$}$ to the $C_{l}^{thick} (z = 1,σ = 0.3)$ $\hbox{$C_l^{\mrm{thick}}(z=1,\sigma=0.3)$}$ results we measure the effect of self-cross-correlation inside a thick redshift shell which washes out the matter fluctuation contrast.

In the text

	Fig. 3 Comparison between Angpow (red/orange points) and CLASSgal (black points) for several cross-correlation spectra with Gaussian (σ = 0.01) selection and k_max = 1 Mpc^-1. The orange points are used to emphasize negative C_ℓ.
In the text

	Fig. 4 Comparison of the computations of the $C_{ℓ}^{thick} (z)$ $\hbox{$C_\ell^\mrm{thick}(z)$}$ given by Angpow and CLASSgal either with the Limber’s approximation or the exact computation.
In the text

	Fig. 5 Cross-correlations in real space corresponding to the spectra shown on Fig. 3. The points show where the function was evaluated.
In the text

	Fig. 6 Evolution of the wall time with respect to the width of the selection function (σ) illustrated in the condition of Test 3 and using 16 threads. In the upper scale of the frame is shown an indication of the radial_order used to sample the along the z direction for a given σ (see Sect. 5).
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Asorey, J., Crocce, M., Gaztañaga, E., & Lewis, A. 2012, MNRAS, 427, 1891 [NASA ADS] [CrossRef] [Google Scholar]

[2] Asorey, J., Crocce, M., & Gaztañaga, E. 2014, MNRAS, 445, 2825 [NASA ADS] [CrossRef] [Google Scholar]

[3] Baszenski, G., & Tasche, M. 1997, Linear Algebra and its Application, 252, 1 [CrossRef] [Google Scholar]

[4] Blas, D., Lesgourgues, J., & Tram, T. 2011, JCAP, 2011, 034 [NASA ADS] [CrossRef] [Google Scholar]

[5] Bonvin, C., & Durrer, R. 2011, Phys. Rev. D, 84, 063505 [NASA ADS] [CrossRef] [Google Scholar]

[6] Carroll, S. M., Press, W. H., & Turner, E. L. 1992, ARA&A, 30, 499 [NASA ADS] [CrossRef] [Google Scholar]

[7] Di Dio, E., Montanari, F., Lesgourgues, J., & Durrer, R. 2013, JCAP, 11, 044 [CrossRef] [Google Scholar]

[8] Di Dio, E., Montanari, F., Durrer, R., & Lesgourgues, J. 2014, JCAP, 1, 042 [CrossRef] [Google Scholar]

[9] Feldman, H. A., Kaiser, N., & Peacock, J. A. 1994, ApJ, 426, 23 [NASA ADS] [CrossRef] [Google Scholar]

[10] Frigo, M., & Johnson, S. G. 2005, special issue on “Program Generation, Optimization, and Platform Adaptation, Proc. IEEE, 93, 216 [Google Scholar]

[11] Giorgi, P. 2012, IEEE Trans. Comput., 61, 780 [CrossRef] [Google Scholar]

[12] Glasser, M. L., & Montaldi, E. 1993, ArXiv e-prints [arXiv:math/9307213] [Google Scholar]

[13] Ivezic, Z., Tyson, J. A., Abel, B., et al. 2008, ArXiv e-prints [arXiv:0805.2366] [Google Scholar]

[14] Lahav, O., Lilje, P. B., Primack, J. R., & Rees, M. J. 1991, MNRAS, 251, 128 [NASA ADS] [CrossRef] [Google Scholar]

[15] Landy, S. D., & Szalay, A. S. 1993, ApJ, 412, 64 [NASA ADS] [CrossRef] [Google Scholar]

[16] Lanusse, F., Rassat, A., & Starck, J.-L. 2015, A&A, 578, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]

[18] Laurie, D. P. 1997, Math. Comput., 466, 1133 [NASA ADS] [CrossRef] [Google Scholar]

[19] Levi, M., Bebek, C., Beers, T., et al. 2013, ArXiv e-prints [arXiv:1308.0847] [Google Scholar]

[20] Loverde, M., & Afshordi, N. 2008, Phys. Rev. D, 78, 123506 [NASA ADS] [CrossRef] [Google Scholar]

[21] Lucas, S., & Stone, H. 1995, J. Comput. Appl. Math., 64, 217 [CrossRef] [Google Scholar]

[22] Nicola, A., Refregier, A., Amara, A., & Paranjape, A. 2014, Phys. Rev. D, 90, 063515 [NASA ADS] [CrossRef] [Google Scholar]

[23] Nicola, A., Refregier, A., & Amara, A. 2016, Phys. Rev. D, 94, 083517 [NASA ADS] [CrossRef] [Google Scholar]

[24] Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical Recipes in C, 2nd edn.: The Art of Scientific Computing (New York: Cambridge University Press) [Google Scholar]

[25] Waldvogel, J. 2006, BIT Numerical Mathematics, 46, 195 [CrossRef] [Google Scholar]

[26] Weinberg, D. H., Mortonson, M. J., Eisenstein, D. J., et al. 2013, Phys. Rep., 530, 87 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]

[27] Wolfram Research Inc. 2016, Mathematica 11.0, Champaign, Illinois, USA [Google Scholar]

Angpow: a software for the fast computation of accurate tomographic power spectra⋆

1. Introduction

2. The position of the problem

3. A brief description of the computational algorithm

4. Numerical tests

4.1. Comparison with other codes

4.2. Note on Limber’s approximation

4.3. Correlations in real space

4.4. Speed tests

5. Code design and input parameters

6. Summary and outlooks

Acknowledgments

References

Appendix A: Clenshaw-Curtis-Chebyshev algorithm (3C-algorithm)

All Tables

All Figures

Angpow: a software for the fast computation of accurate tomographic power spectra^⋆