Open Access
Issue
A&A
Volume 678, October 2023
Article Number A169
Number of page(s) 10
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/202347358
Published online 18 October 2023

© The Authors 2023

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

High-precision cosmic microwave background (CMB) measurements currently provide the strongest constraints on a wide range of key cosmological parameters (e.g., Planck Collaboration VI 2020). Traditionally, these constraints are derived by first compressing the information contained in the high-dimensional raw time-ordered data into pixelized sky maps and angular power spectra (e.g., Bennett et al. 2013; Planck Collaboration II 2020; Planck Collaboration III 2020). This compressed dataset is typically summarized in terms of a low-dimensional power spectrum-based likelihood from which cosmological parameters are derived through Markov chain Monte Carlo (MCMC) sampling (Lewis & Bridle 2002; Planck Collaboration V 2020).

A key element in this pipeline procedure is the so-called likelihood function, ℒ(C), where C denotes the angular CMB power spectrum (e.g., Planck Collaboration V 2020). This function summarizes both the best-fit values of all power spectrum coefficients and their covariances and correlations. The accuracy of the resulting cosmological parameters, both in terms of best-fit values and their uncertainties, corresponds directly to the accuracy of this likelihood function. As such, this function must account for both low-level instrumental effects (such as correlated noise and calibration uncertainties) and high-level data selection and component separation effects. In general, this function is neither factorizable nor Gaussian, and, except for a few well known special cases, it has no analytical exact and closed form. Unsurprisingly, significant efforts have therefore been devoted to establishing computationally efficient and accurate approximations. Perhaps the most well known example of this is a standard multivariate Gaussian distribution in units of the angular power spectrum of the sky signal, C, with a covariance matrix tuned by end-to-end simulations (e.g., Planck Collaboration V 2020). A second widely used case is that of a low-resolution multivariate Gaussian defined in pixel space with a dense pixel-pixel noise covariance matrix (Hinshaw et al. 2013; Planck Collaboration V 2020). Yet other examples include log-normal or Gaussian-plus-lognormal distributions (e.g., Verde et al. 2003), as well as various hybrid combinations (e.g., Gjerløw et al. 2013).

However, as the signal-to-noise ratios of modern CMB data sets continue to increase, the relative importance of systematic uncertainties grows, and they are often highly nontrivial to describe through low-dimensional and simplified likelihood approximations, both in terms of computational expense and accuracy. These difficulties were anticipated more than two decades ago, and an alternative end-to-end Bayesian approach was proposed independently by Jewell et al. (2004) and Wandelt et al. (2004). The key aspect of this framework is that the analysis is global, meaning all aspects of the data are modeled and fitted simultaneously, as opposed to compartmentalized, which is the more traditional CMB pipeline approach. Technically speaking, this is implemented in the form of an MCMC sampler in a process that is statistically analogous to the very last cosmological parameter estimation step in a traditional pipeline, such as CosmoMC (Lewis & Bridle 2002) or Cobaya (Torrado & Lewis 2021), with the one fundamental difference that all parameters in the entire analysis pipeline are sampled over jointly within the MCMC sampler. Thus, while a standard cosmological parameter code typically handles a few dozen, or perhaps a hundred, free parameters, the global approach handles billions of free parameters. One practical method of dealing with such a large number of parameters of different types is Gibbs sampling, which draws samples from a joint distribution by iterating over all relevant conditional distributions.

In principle, a global approach is preferable, since better results are in general obtained by fitting correlated parameters jointly rather than separately. However, and equally clearly, this approach is also organizationally and technically more complicated to implement in practice, simply because all aspects of the analysis have to be accounted for simultaneously within one framework. One example of this type of approach is Commander3 (Galloway et al. 2023), which is an end-to-end CMB Gibbs sampler developed by the BeyondPlanck Collaboration (2023) to reanalyze the Planck Low Frequency Instrument (LFI) observations. To date, using Gibbs sampling within this framework is the only way to sample over the entire joint distribution of low-level instrumental parameters, component maps, and cosmological parameters. This work was subsequently superseded by the COSMOGLOBE project1, which is a community-wide open science collaboration that ultimately aims to perform the same type of analysis for all available state-of-the-art large-scale radio, microwave, and submillimeter experiments to then use the resulting analysis to derive one global model of the astrophysical sky. The first COSMOGLOBE Data Release (DR1) was made public in March 2023 (Watts et al. 2023), and included the first joint end-to-end analysis of both Planck LFI and the Wilkinson Microwave Anisotropy Probe (WMAP; Bennett et al. 2013), resulting in lower systematic residuals in both experiments.

There are many other methods of Bayesian inference that can be used to estimate cosmological parameters from CMB maps. For example, Anderes et al. (2015) show how to efficiently sample the joint-posterior of the CMB lensing potential and sky map using Hamiltonian Monte Carlo techniques, and Carron (2019) created a framework to measure the tensor-to-scalar ratio using an approximate, analytic marginalization of lensed CMB maps. The marginal unbiased score expansion method has also been shown to be a powerful algorithm for high-dimensional CMB analyses (Millea & Seljak 2022).

The numerical challenges involved in implementing cosmological parameter estimation without likelihood approximations in Gibbs sampling were originally explored and partially resolved by Jewell et al. (2009) and Racine et al. (2016). A similar approach was taken by Millea et al. (2020), who derived a re-parameterization of the unlensed sky map and the lensing potential to improve the runtime of Gibbs sampling, similar to the method of Racine et al. (2016). This framework was later enhanced in Millea et al. (2021) to included several systematic parameters and was applied to SPTpol data.

In principle, each of the aforementioned methods could be used as the CMB map-to-parameter sampling step within a larger time-ordered-data-to-parameter end-to-end Gibbs sampler. The goal of this work is to focus on the algorithm of Racine et al. (2016) and implement it in Commander3. We will apply it to simulations with realistic noise and sky coverages, validate the implementation and evaluate its performance. Until now, the highest-level output from Commander3 has been the angular CMB power spectrum. With the code development that led to this paper, it is now technically able to also directly output cosmological parameters.

The rest of the paper is organized as follows. In Sect. 2 we provide a brief review of both the COSMOGLOBE Gibbs sampler method in general and the specific cosmological parameter sampler described by Racine et al. (2016). In Sect. 3 we validate our implementation by comparing it to two special cases, namely (i) Cobaya coupled to a uniform and (ii) full-sky case and a special-purpose Python sampler for a case with uniform noise but a constant latitude Galactic mask. We also characterize the computational costs of the new sampling step for various data configurations. Finally, we summarize and conclude in Sect. 4.

2. Algorithms

We start by briefly reviewing the Bayesian framework for global end-to-end analysis as described by the BeyondPlanck Collaboration (2023) and Watts et al. (2023) and the cosmological parameter estimation algorithm proposed by Racine et al. (2016), and discuss how these can be combined in Commander3 (Galloway et al. 2023).

2.1. Bayesian end-to-end CMB analysis

The main algorithmic goal of the COSMOGLOBE framework is to derive a numerical representation of the full posterior distribution P(ω ∣ d), where ω is the set of all free parameters and d represents all available data. In this notation, ω simultaneously accounts for instrumental, astrophysical, and cosmological parameters, which are typically explicitly defined by writing down a signal model, for instance taking the form

d = s ( ω ) + n , $$ \begin{aligned} {\boldsymbol{d}} = {\boldsymbol{s}}(\omega ) + {\boldsymbol{n}}, \end{aligned} $$(1)

where d is the data vector; s(ω) represents some general data model with free parameters ω; and n is instrumental noise. In practice, one typically assumes that the noise is Gaussian distributed with a covariance matrix N, and the likelihood can therefore be written as

L ( ω ) = P ( d ω ) e 1 2 ( d s ( ω ) ) T N 1 ( d s ( ω ) ) . $$ \begin{aligned} \mathcal{L} (\omega ) = P({\boldsymbol{d}}\mid \omega ) \propto e^{-\frac{1}{2}\left({\boldsymbol{d}}-{\boldsymbol{s}}(\omega )\right)^T\mathsf N ^{-1}\left({\boldsymbol{d}}-{\boldsymbol{s}}(\omega )\right)}. \end{aligned} $$(2)

The full posterior distribution reads P(ω ∣ d)∝ℒ(ω)P(ω), where P(ω) represents some set of user-defined priors.

As a concrete example, COSMOGLOBE Data Release 1 (Watts et al. 2023) adopted the following data model for the time-ordered data to describe Planck LFI and WMAP,

d = G P B M a + s orb + s fsl + s inst + n corr + n w , $$ \begin{aligned} {\boldsymbol{d}} =\mathsf G \mathsf P \mathsf B \mathsf M {\boldsymbol{a}}+ {\boldsymbol{s}}^\mathrm{orb} +{\boldsymbol{s}}^\mathrm{fsl} + {\boldsymbol{s}}^\mathrm{inst} + {\boldsymbol{n}}^\mathrm{corr} +{\boldsymbol{n}}^\mathrm{w} , \end{aligned} $$(3)

where G is a time-dependent gain factor; P is a pointing matrix; B denotes instrumental beam convolution; M is a mixing matrix that describes the amplitude of a given sky component at a given frequency; a describes the amplitude of each component at each point in the sky; sorb is the orbital CMB dipole; sfsl is a far sidelobe contribution; sinst is an instrument-specific correction term; ncorr denotes correlated noise; and nw denotes white noise. This expression directly connects important instrumental effects (e.g., gains, beams, and correlated noise) with the astrophysical sky (e.g., M and a), and provides a well-defined model for the data at the lowest level; this approach is what enables global modeling.

For our purposes, the most important sky signal component is the CMB anisotropy field, aCMB. The covariance of this component reads S= a CMB T a CMB $ \mathsf{S} = \langle {\boldsymbol{a}}_{\mathrm{CMB}}^T {\boldsymbol{a}}_{\mathrm{CMB}}\rangle $. Under the assumption of a statistically isotropic universe, this matrix is diagonal in harmonic space, a lm a l m * = C l (θ) δ l l δ m m $ \langle a_{\ell m} a^*_{\ell^\prime m^\prime}\rangle = C_{\ell}(\theta) \delta_{\ell\ell^\prime}\delta_{mm^\prime} $, where a CMB ( n ̂ ) = , m a m Y m ( n ̂ ) $ a_{\mathrm{CMB}}(\hat{n}) = \sum_{\ell,m} a_{\ell m} Y_{\ell m}(\hat{n}) $ is the usual spherical harmonics expansion and C is called the angular power spectrum. This power spectrum depends directly on a small set of cosmological parameters, θ, and for most universe models can be computed efficiently by Boltzmann solvers such as CAMB (Lewis et al. 2000) or CLASS (Lesgourgues 2011).

With this notation, the goal is now to compute P(ω ∣ d), where ω = {G, ncorr, M, a, θ, …}. Unfortunately, directly evaluating or sampling from this distribution is unfeasible. In practice, we therefore resort to MCMC sampling in general, and Gibbs sampling in particular, which is a well-established method for sampling from a complicated multivariate distribution by iterating over all conditional distributions. For the data model described in Eq. (3), this process can be described schematically in the form of a Gibbs chain,

G P ( G d , n corr , M , a , θ ) $$ \begin{aligned} \mathsf G&\,\leftarrow P(\mathsf G&\, \mid \boldsymbol{d},&\,&\,{\boldsymbol{n}}_{\rm corr},&\,\mathsf M ,&\,{\boldsymbol{a}},&\,\theta )\end{aligned} $$(4)

n corr P ( n corr d , G , M , a , θ ) $$ \begin{aligned} {\boldsymbol{n}}_{\rm corr}&\,\leftarrow P({\boldsymbol{n}}_{\rm corr}&\, \mid \boldsymbol{d},&\,\mathsf G ,&\,&\,\mathsf M ,&\,{\boldsymbol{a}},&\,\theta )\end{aligned} $$(5)

M P ( M d , G , n corr , , a , θ ) $$ \begin{aligned} \mathsf M&\,\leftarrow P(\mathsf M&\, \mid \boldsymbol{d},&\,\mathsf G ,&\,{\boldsymbol{n}}_{\rm corr},&\,,&\,{\boldsymbol{a}},&\,\theta )\end{aligned} $$(6)

a P ( a d , G , n corr , M , θ ) $$ \begin{aligned} {\boldsymbol{a}}&\,\leftarrow P({\boldsymbol{a}}&\, \mid \boldsymbol{d},&\,\mathsf G ,&\,{\boldsymbol{n}}_{\rm corr},&\,\mathsf M ,&\,&\,\theta )\end{aligned} $$(7)

θ P ( θ d , G , n corr , M , a , ) , $$ \begin{aligned} \theta&\,\leftarrow P(\theta&\, \mid \boldsymbol{d},&\,\mathsf G ,&\,{\boldsymbol{n}}_{\rm corr},&\,\mathsf M ,&\,{\boldsymbol{a}},&\,), \end{aligned} $$(8)

where ← indicates replacing the parameter on the left-hand side with a random sample from the distribution on the right-hand side.

As described by the BeyondPlanck Collaboration (2023) and Watts et al. (2023), this process has been implemented in Commander3 by BEYONDPLANCK and COSMOGLOBE. However, in the code described there, the last step only supports a power spectrum estimation (i.e., θ = C). In this work, we did not sample any instrumental or foreground parameters, and we assumed a foreground-cleaned map of the CMB as a starting point. Rather, the goal of the paper is to replace the last step, Eq. (8), with actual cosmological parameter estimation of a CMB map, in which θ takes on the usual form of the dark matter density Ωc, the Hubble constant H0, the spectral index of scalar perturbations ns, etc.

2.2. Joint sampling of CMB sky signal and cosmological parameters

Cosmological parameter estimation through end-to-end CMB Gibbs sampling has been discussed in the literature for almost two decades, starting with Jewell et al. (2004) and Wandelt et al. (2004). However, as pointed out by Eriksen et al. (2004), a major difficulty regarding this method is a very long correlation length in the low signal-to-noise regime, i.e., at high multipoles. Intuitively, the origin of this problem lies in the fundamental Gibbs sampling algorithm itself, namely that it only allows for parameter variations parallel or orthogonal to the coordinate axes of each parameter in question, and not diagonal moves. For highly degenerate distributions, this makes it very expensive to move from one tail of the distribution to the other. A solution to this problem was proposed by Jewell et al. (2009), who introduced a joint CMB sky signal and C move. This idea was further refined by Racine et al. (2016), who noted that faster convergence could be obtained by only rescaling the low signal-to-noise multipoles. In the following, we give a brief summary of these ideas, both standard Gibbs sampling in Sect. 2.2.1 and joint sampling in Sect. 2.2.2.

2.2.1. Standard CMB Gibbs sampling

Starting with the standard Gibbs sampling algorithm (Jewell et al. 2004; Wandelt et al. 2004), we first note that as far as cosmological parameter estimation with CMB data is concerned, only the CMB map, aCMB (which we for brevity denote a in the rest of this section), is directly relevant. All other parameters in Eq. (3) only serve to produce the cleanest possible estimate of a and to propagate the corresponding uncertainties. From now on, we therefore consider a greatly simplified data model of the form,

r = B a + n , $$ \begin{aligned} {\boldsymbol{r}} = \mathsf B {\boldsymbol{a}} + {\boldsymbol{n}}, \end{aligned} $$(9)

where r now represents a foreground-cleaned CMB map (or residual) obtained by subtracting all other nuisance terms from the raw data, and our goal is to estimate cosmological parameters from this map. Under the assumption that the CMB is a Gaussian random field, this could in principle be done simply by mapping out the exact marginal posterior distribution by brute force,

P ( θ r ) L ( θ ) P ( θ ) e 1 2 r T ( B T S ( θ ) B + N ) 1 r | B T S ( θ ) B + N | , $$ \begin{aligned} P(\theta \mid {\boldsymbol{r}}) \propto \mathcal{L} (\theta )P(\theta ) \propto \frac{e^{-\frac{1}{2} {\boldsymbol{r}}^T (\mathsf B ^T \mathsf S (\theta ) \mathsf B + \mathsf N )^{-1}{\boldsymbol{r}}}}{\sqrt{\left|\mathsf B ^T \mathsf S (\theta ) \mathsf B + \mathsf N \right|}}, \end{aligned} $$(10)

where BTS(θ)B is the beam-convolved CMB signal covariance matrix and N is the corresponding noise covariance matrix, and we have assumed uniform priors, P(θ) = 1. This, however, becomes intractable for modern CMB experiments as the computational expense scales as O( N p 3 ) $ \mathcal{O}(N_{\rm p}^3) $, where Np is the number of pixels, and nowadays typical CMB maps include millions of pixels.

In the special (and unrealistic) case of uniform noise and no Galactic mask, this likelihood can be simplified in harmonic space, in which it does become computationally tractable. Let us define the observed angular CMB power spectrum (including beam convolution and noise) to be C ̂ o = 1 2 + 1 m r m r m $ \hat{C}^{\mathrm{o}}_{\ell} = \frac{1}{2\ell+1}\sum_m r_{\ell m}r^*_{\ell m} $ and the corresponding white noise spectrum to be defined by Nmm = Nδδmm. In that case, the likelihood in Eq. (10) for temperature-only observations simplifies to

ln P ( θ r ) = 2 + 1 2 [ C ̂ o b 2 C ( θ ) + N ln ( C ̂ o b 2 C ( θ ) + N ) ] , $$ \begin{aligned} \ln P(\theta \mid {\boldsymbol{r}}) = \sum _{\ell } -\frac{2\ell +1}{2} \left[\frac{\hat{C}^{\mathrm{o} }_{\ell }}{b_\ell ^2 C_{\ell }(\theta ) + N_\ell }-\ln \left(\frac{\hat{C}^{\mathrm{o} }_{\ell }}{b_\ell ^2 C_{\ell }(\theta ) + N_\ell } \right) \right], \end{aligned} $$(11)

where b is the beam response function and we have removed constant terms; a similar expression for polarization is available in, for example, Larson et al. (2007) and Hamimeche & Lewis (2008). This expression is used later to validate the production code.

While it is difficult to evaluate Eq. (10) directly, it is in fact possible to sample from it using Gibbs sampling (Jewell et al. 2004; Wandelt et al. 2004). First, we note that the joint posterior distribution P(θ, a ∣ r) can be written as

P ( θ , a r ) = P ( θ , a , r ) P ( r ) = P ( r a ) P ( a θ ) P ( θ ) P ( r ) e 1 2 ( r B a ) T N 1 ( r B a ) | N | e 1 2 a T S 1 a | S | P ( θ ) P ( r ) , $$ \begin{aligned} \nonumber P(\theta , {\boldsymbol{a}} \mid {\boldsymbol{r}})&= \frac{P(\theta , {\boldsymbol{a}}, {\boldsymbol{r}})}{P({\boldsymbol{r}})} = P({\boldsymbol{r}} \mid {\boldsymbol{a}})P({\boldsymbol{a}}\mid \theta )\frac{P(\theta )}{P({\boldsymbol{r}})}\\&\propto \frac{e^{-\frac{1}{2} \left({\boldsymbol{r}}-\mathsf B {\boldsymbol{a}} \right)^T \mathsf N ^{-1}\left({\boldsymbol{r}}-\mathsf B {\boldsymbol{a}} \right)}}{\sqrt{\left|\mathsf N \right|}} \frac{e^{-\frac{1}{2} {\boldsymbol{a}}^T \mathsf S ^{-1}{\boldsymbol{a}}}}{\sqrt{\left|\mathsf S \right|}}\frac{P(\theta )}{P({\boldsymbol{r}})}, \end{aligned} $$(12)

and drawing samples from this distribution can be achieved through Gibbs sampling,

a i + 1 P ( a θ i , r ) , θ i + 1 P ( θ a i + 1 , r ) . $$ \begin{aligned} {\boldsymbol{a}}^{i+1}&\leftarrow P({\boldsymbol{a}} \mid \theta ^{i}, {\boldsymbol{r}}),\\ \theta ^{i+1}&\leftarrow P(\theta \mid {\boldsymbol{a}}^{i+1}, {\boldsymbol{r}}). \end{aligned} $$(13)

With these joint samples in hand, a numerical representation of P(θ ∣ r) can be obtained simply by marginalizing over a, which is equivalent to simply making histograms of the θ samples.

For this algorithm to work, we need to be able to sample from each of the two conditional distributions in Eqs. (13) and (14). Starting with the cosmological parameter distribution in Eq. (14), we note that P(θ ∣ ai + 1, r) = P(θ ∣ ai + 1), which is equivalent to the distribution in Eq. (11) with no beam and noise. This can be coupled to a standard Boltzmann solver and Metropolis-Hastings sampler, similar to CAMB and CosmoMC. We return to this step in the next section.

Next, for the amplitude distribution in Eq. (13), one can show by “completing the square” of a in Eq. (12) that

P ( a θ , r ) e 1 2 ( a a ̂ ) T ( S 1 + B T N 1 B ) ( a a ̂ ) , $$ \begin{aligned} P({\boldsymbol{a}} \mid \theta , {\boldsymbol{r}}) \propto e^{-\frac{1}{2} \left({\boldsymbol{a}} - \hat{{\boldsymbol{a}}}\right)^T \left(\mathsf S ^{-1} + \mathsf B ^T\mathsf N ^{-1}\mathsf B \right) \left({\boldsymbol{a}} - {\hat{{\boldsymbol{a}}}}\right)}, \end{aligned} $$(14)

where we have defined the so-called mean field map,

a ̂ [ S 1 + B T N 1 B ] 1 B T N 1 r . $$ \begin{aligned} \hat{{\boldsymbol{a}}} \equiv \left[\mathsf S ^{-1} + \mathsf B ^T \mathsf N ^{-1}\mathsf B \right]^{-1} \mathsf B ^T \mathsf N ^{-1} {\boldsymbol{r}}. \end{aligned} $$(15)

This is a multivariate Gaussian distribution with mean a ̂ $ \hat{{\boldsymbol{a}}} $ and a covariance matrix [S−1+BTN−1B]−1 that can be sampled from solving the following equation using conjugate gradients (Shewchuk 1994; Seljebotn et al. 2019),

[ S 1 + B T N 1 B ] a = B T N 1 r + S 1 2 w 0 + B T N 1 2 w 1 , $$ \begin{aligned} \left[\mathsf S ^{-1} + \mathsf B ^T \mathsf N ^{-1}\mathsf B \right]{\boldsymbol{a}} = \mathsf B ^T \mathsf N ^{-1} {\boldsymbol{r}} + \mathsf S ^{-\frac{1}{2}}{\boldsymbol{w}}_0 +\mathsf B ^T \mathsf N ^{-\frac{1}{2}}{\boldsymbol{w}}_1, \end{aligned} $$(16)

where w0 and w1 are randomly drawn Gaussian maps with unit variance and zero mean. The resulting sample is typically referred to as a “constrained realization”.

For the purposes of the next section, it is useful to decompose the full constrained realization into two separate components, namely the mean field map defined by Eq. (16) and the fluctuation map,

f ̂ [ S 1 + B T N 1 B ] 1 ( S 1 2 w 0 + B T N 1 2 w 1 ) $$ \begin{aligned} \hat{{\boldsymbol{f}}} \equiv \left[\mathsf S ^{-1} + \mathsf B ^T \mathsf N ^{-1}\mathsf B \right]^{-1} \left(\mathsf S ^{-\frac{1}{2}}{\boldsymbol{w}}_0 +\mathsf B ^T \mathsf N ^{-\frac{1}{2}}{\boldsymbol{w}}_1 \right) \end{aligned} $$(17)

such that a = a ̂ + f ̂ $ {\boldsymbol{a}} = \hat{{\boldsymbol{a}}} + \hat{{\boldsymbol{f}}} $. For a data configuration with uniform noise and full-sky temperature data, these equations can be solved exactly in spherical harmonics space, fully analogously to Eq. (11),

a ̂ m = r m b C N + b 2 C , f ̂ m = w 0 m N C N + b 2 C + w 1 m N b C N + b 2 C . $$ \begin{aligned} \hat{a}_{\ell m}&= r_{\ell m}\frac{b_{\ell }C_{\ell }}{N_\ell + b_{\ell }^2C_{\ell }},\\ \hat{f}_{\ell m}&= { w}_{0\ell m}\frac{N_{\ell }\sqrt{C_{\ell }}}{N_\ell + b_{\ell }^2C_{\ell }}+{ w}_{1\ell m}\frac{\sqrt{N_{\ell }}b_{\ell }C_\ell }{N_\ell + b_{\ell }^2C_{\ell }}. \end{aligned} $$(18)

As summarized in Appendix A, we can find a closed-form expression for N lm l m 1 $ N_{\ell m{\ell^{\prime}}{m^{\prime}}}^{-1} $ for a constant latitude mask with isotropic noise which greatly simplifies solving Eq. (17), and we have made a Python script that performs these calculations in order to validate the main Commander3 code discussed below.

To build intuition, these maps are illustrated for one random sample with semi-realistic instrument characteristics in Fig. 1. The top panel shows the mean field map (or “Wiener filter map”); this map summarizes all significant information in r, and each mode is weighted according to its own specific signal-to-noise ratio taking into account both instrumental noise and masking. As a result, all small scales in the Galactic plane are suppressed, and only the largest scales are retained, which are constrained by the high-latitude observations in conjunction with the assumptions of both Gaussianity and statistical isotropy. The second panel shows the fluctuation map, which essentially serves to replace the lost power in a ̂ $ \hat{{\boldsymbol{a}}} $ with a completely random fluctuation. The sum of the two terms, shown in the bottom panel, represents one possible full-sky realization that is consistent with both the data and the assumed cosmological parameters, θ. If we were to produce a second such constrained realization, it would look almost identical at high Galactic latitudes, where the data are highly constraining, while the Galactic plane would exhibit different structures on small and intermediate scales.

thumbnail Fig. 1.

Example of a constrained CMB sky map realization produced during traditional Gibbs sampling. From top to bottom, the three panels show the mean field map ( a ̂ $ \hat{{\boldsymbol{a}}} $), the fluctuation map ( f ̂ $ \hat{{\boldsymbol{f}}} $), and the full constrained realization ( a = a ̂ + f ̂ $ {\boldsymbol{a}} = \hat{{\boldsymbol{a}}}+\hat{{\boldsymbol{f}}} $). This example is generated with the data configuration discussed in Sect. 3, which corresponds to Planck LFI 70 GHz noise properties and a realistic Galactic mask.

A corresponding comparison of angular power spectra is shown in Fig. 2. The relative signal-to-noise ratio of each multipole moment can be seen as the ratio between the intermediate and dark-colored lines. In temperature, this value is higher than unity up to  ≲ 700, while for EE polarization it is less than unity everywhere except at the very lowest multipoles. A closely related way to interpret this figure is that the intermediate curve shows the total information content of the data. Thus, for the BB spectrum there is no visually relevant information present at all, and the full constrained realization is entirely defined by the fluctuation map, which is given by the assumed cosmological model, θ. This explains why the standard Gibbs sampler works well in the high signal-to-noise regime but poorly in the low signal-to-noise limit: When we have high instrumental noise, N → ∞, then we have no information about am from the data rm, and the total sky sample only becomes a realization of the assumed variance, C (i.e., a m f ̂ m = C w 0 m $ a_{\ell m} \rightarrow \hat{f}_{\ell m} = \sqrt{C_{\ell}} \mathit{w}_{0\ell m} $). However, since we have no information of the mean field map a ̂ m 0 $ \hat{a}_{\ell m} \rightarrow 0 $, we also do not know C. Hence, we get a strong degeneracy between the C and am. Gibbs sampling is well known to perform poorly for strongly correlated parameters, as one cannot move diagonally through the parameter space from one end of the joint posterior distribution to the other.

thumbnail Fig. 2.

Angular power spectra for each of the constrained realization maps shown in Fig. 1. The ΛCDM cosmological model corresponding to θ is shown as a dashed black line, while the colored curves show (from dark to light) spectra for the fluctuation map, the mean field map, and the full constrained realization. Note that in the fully noise-dominated cases, such as the entire B-mode power spectrum and all high multipoles, the fluctuation term f ̂ m $ \hat f_{\ell m} $ and the CMB estimate am have nearly identical power spectra by design.

2.2.2. Joint sky signal and cosmological parameter sampling

As the Gibbs sampler struggles with long correlation lengths, a joint sampler was proposed by Jewell et al. (2009) and Racine et al. (2016) that was designed to establish an efficient sampling algorithm for the degeneracy between a and θ in the low signal-to-noise regime. Mathematically, this algorithm is a standard Metropolis-Hastings sampler with a Gaussian transition rule for θ,

w ( θ θ i ) = e 1 2 ( θ θ i ) T C θ 1 ( θ θ i ) , $$ \begin{aligned} { w}(\theta \mid \theta ^i) = e^{-\frac{1}{2} \left(\theta - \theta ^i \right)^T \mathsf C _{\theta }^{-1}\left(\theta - \theta ^i \right)}, \end{aligned} $$(19)

where Cθ is a user-specified (and pre-tuned) proposal covariance matrix made from a previous chain using a suboptimal, diagonal proposal matrix. This is followed by a deterministic rescaling of the fluctuation map in the signal amplitude map,

a m i + 1 = a ̂ m i + 1 + ( C i + 1 ) 1 / 2 ( C i ) 1 / 2 f ̂ m i , $$ \begin{aligned} a_{\ell m}^{i+1} = \hat{a}_{\ell m}^{i+1} + \left(\mathsf C ^{i+1}_{\ell }\right)^{1/2}\left(\mathsf C ^{i}_{\ell }\right)^{-1/2} \hat{f}_{\ell m}^{i}, \end{aligned} $$(20)

where we have defined the covariance matrix

C i = [ C TT ( θ i ) C TE ( θ i ) 0 C TE ( θ i ) C EE ( θ i ) 0 0 0 C BB ( θ i ) . ] $$ \begin{aligned} \mathsf C ^i_\ell = \begin{bmatrix} C^{TT}_{\ell }\left(\theta ^i\right)&C^{TE}_\ell \left(\theta ^i\right)&0\\ C^{TE}_{\ell }\left(\theta ^i\right)&C^{EE}_\ell \left(\theta ^i\right)&0\\ 0&0&C^{BB}_\ell \left(\theta ^i\right). \end{bmatrix} \end{aligned} $$(21)

Here, we have set the parity-odd power spectra to zero as predicted by Λ cold dark matter (CDM), C l EB = C l TB =0 $ C^{EB}_\ell = C^{TB}_\ell = 0 $. Therefore, we defined the rescaled fluctuation map,

f ̂ m scaled , i + 1 ( C i + 1 ) 1 / 2 ( C i ) 1 / 2 f ̂ m i . $$ \begin{aligned} \hat{f}_{\ell m}^{\mathrm{scaled},\, i+1} \equiv \left(\mathsf C ^{i+1}_{\ell }\right)^{1/2}\left(\mathsf C ^{i}_{\ell }\right)^{-1/2} \hat{f}_{\ell m}^{i}. \end{aligned} $$(22)

As shown by Racine et al. (2016), the acceptance rate for this joint move is

A = min [ 1 , π ( θ i + 1 ) π ( θ i ) P ( θ i + 1 ) P ( θ i ) ] , $$ \begin{aligned} A = \mathrm{min} \left[1, \frac{\pi (\theta ^{i+1})}{\pi (\theta ^i)} \frac{P(\theta ^{i+1})}{P(\theta ^i)} \right], \end{aligned} $$(23)

where P(θ) is the prior on θ and

π ( θ i ) = exp [ 1 2 ( r B a ̂ i ) T N 1 ( r B a ̂ i ) 1 2 a i , T S i , 1 a ̂ i 1 2 f ̂ i , T B T N 1 B f ̂ i ] , $$ \begin{aligned} \nonumber \pi (\theta ^{i}) = \mathrm{exp} \bigg [&-\frac{1}{2} \left({\boldsymbol{r}}-\mathsf B \hat{{\boldsymbol{a}}}^i\right)^T \mathsf N ^{-1}\left({\boldsymbol{r}}-\mathsf B \hat{{\boldsymbol{a}}}^i\right)\\&-\frac{1}{2}{\boldsymbol{a}}^{i,T} \mathsf S ^{i, -1}\hat{{\boldsymbol{a}}}^i -\frac{1}{2} \hat{{\boldsymbol{f}}}^{i, T}\mathsf B ^T\mathsf N ^{-1} \mathsf B \hat{{\boldsymbol{f}}}^i\bigg ], \end{aligned} $$(24)

where we for brevity have omitted the θi dependence of a ̂ i $ \hat{{\boldsymbol{a}}}^i $, f ̂ i $ \hat{{\boldsymbol{f}}}^i $ and Si.

We note that the acceptance rate uses the scaled fluctuation term, f ̂ m scaled , i + 1 $ \hat{f}_{\ell m}^{\mathrm{scaled},\, i+1} $, for sample i + 1 instead of the calculated fluctuation term in Eq. (17). Hence, we only need to calculate f m i + 1 $ f_{\ell m}^{i+1} $ from this equation if the sample is accepted. This allows us to save roughly half the computational time of discarded samples as compared to accepted samples.

The full algorithm can now be summarized in terms of the following steps:

  1. We start with an initial guess of the cosmological parameters, θ0. From that, we calculate S0 with CAMB; the mean field map, a ̂ 0 $ \hat{{\boldsymbol{a}}}^0 $ from Eq. (16); and the fluctuation map, f ̂ 0 $ \hat{{\boldsymbol{f}}}^0 $, from Eq. (18).

  2. We then draw a new cosmological parameter sample from w(θ ∣ θi), and reevaluate Si + 1, a ̂ i + 1 $ \hat{{\boldsymbol{a}}}^{i+1} $, and the scaled fluctuation term f ̂ scaled , i + 1 $ \hat{{\boldsymbol{f}}}^{\mathrm{scaled},\, i+1} $.

  3. We calculate the acceptance term of Eq. (25), and accept or reject according to the regular Metropolis-Hasting rule.

  4. If the sample is accepted, then we calculate f ̂ i + 1 $ \hat{{\boldsymbol{f}}}^{i+1} $. In the next iteration this term becomes f ̂ i $ \hat{{\boldsymbol{f}}}^{i} $, which will appear in the acceptance probability and the equation for f ̂ scaled , i + 1 $ \hat{{\boldsymbol{f}}}^{\mathrm{scaled},\, i+1} $.

  5. Iterate 2–4.

The intuition behind this joint move is illustrated in Fig. 3. The top panel shows the standard Gibbs sampling algorithm, in which only parameter moves parallel to the axes are allowed. This works well in the high signal-to-noise regime (left column), where the joint distribution is close to symmetric. In the low signal-to-noise regime, however, the distribution is strongly tilted, because of the tight coupling between f ̂ $ \hat{{\boldsymbol{f}}} $ and θ discussed at the end of Sect. 2.2.1, and many orthogonal moves are required in order to move from one tail of the distribution to the other. On the other hand, with the joint algorithm, in the bottom panel, steps are non-orthogonal, with a slope defined exactly by the local signal-to-noise ratio of the mode in question. The net result is a much faster exploration of the full distribution.

thumbnail Fig. 3.

Schematic comparison of the standard Gibbs sampler (top panels) and the joint sampler of Racine et al. (2016), bottom panels. The left and right columns show high and low signal-to-noise regimes, respectively. The Gibbs sampler performs poorly in the low signal-to-noise regime as it requires a large number of samples to explore the posterior distribution. The joint sampler in the lower panels performs well in both regimes as it allows the next sample to move diagonally in the {a, θ} parameter space.

3. Results

The main goal of this paper is to implement the joint sampling algorithm into Commander3 and characterize its performance both in terms of accuracy and computational cost on simulated data. All Commander code used in the following is available as Open Source software in a GitHub repository2. For the current paper, we focused on a standard six-parameter ΛCDM model and chose θ = (Ωbh2, ΩCDMh2, H0, τ, As, ns) as our base parameters, where Ωb and ΩCDM are the current density of baryonic and CDM; H0 is the Hubble parameter (and h is the normalized Hubble constant, h = H 0 100 km s 1 Mpc 1 $ h=\frac{H_0}{100\,\mathrm{km s}^{-1} \mathrm{Mpc}^{-1}} $); τ is the optical depth at reionization; and As and ns are the amplitude and tilt of the scalar primordial power spectrum. We used CAMB3 (Lewis et al. 2000) to evaluate all ΛCDM power spectra, C(θ).

In the following we consider three different simulations, corresponding to increasing levels of complexity and realism. A natural early target for this algorithm is a reanalysis of COSMOGLOBE DR1; hence, all three cases are similar to the Planck LFI channels. We use the Planck data release 4 70 GHz beam transfer functions (Planck Collaboration Int. LVII 2020), and for the two first cases, the noise level is isotropic with a value matching twice the mean of the full-sky 70 GHz rms map, while for the third case is given by one realization of the actual non-isotropic BEYONDPLANCK 70 GHz noise rms distribution.

The second difference between the three cases is their sky coverage. The first case considers full-sky data, for which the analytical posterior distribution solution is readily available through Eq. (11). The second case implements a constant-latitude Galactic mask for which an analytical expression is also available, albeit more complicated than for the full sky (see Appendix A for details). The third case uses a realistic Galactic and point-source mask, for which no analytical expression is available.

To validate the main Commander3 results, we developed a special-purpose Python program4 that implements the same joint sampling algorithm described above for the two first cases, and this serves essentially as a basic bug-check with respect to the Fortran implementation. In addition, we also implemented a uniform, full-sky likelihood module for Cobaya (Torrado & Lewis 2021), an industry-standard cosmological parameter Metropolis sampler, and this serves as an end-to-end check on the entire code, including integration with CAMB.

3.1. Accuracy validation

The first main goal of this paper is to demonstrate that the new Commander3 implementation of the Racine et al. (2016) algorithm performs as expected with respect to accuracy. Thus, we start our discussion by considering the uniform and full-sky case discussed above. For each of the three available codes – Commander3, Python, and Cobaya – we produced 50 000 samples; we show the full posterior distributions in Fig. 4. The agreement between all three codes is excellent, and the overall variations are within ∼0.1σ.

thumbnail Fig. 4.

Comparison of cosmological parameter posterior distributions derived with Commander3 (blue), Python (red), and Cobaya (gray) for a simulation with uniform noise and full-sky coverage. They are made from a single chain per code, and the true input cosmological parameter values are indicated by vertical and horizontal dashed lines.

Second, Fig. 5 shows a similar comparison between the Commander3 and Python implementation. The sky mask is in this case defined by |b|< 5.7° where b is the Galactic latitude, removing 10% of the sky. We also find good agreement in this case between the two implementations, and the true input values (marked by vertical dashed lines) all fall well within the full posterior distributions. However, we do note that the introduction of a sky mask significantly increases the overall run time of the algorithm, since one now has to solve for the mean field map repeatedly at each step with conjugate gradients, rather than with brute force matrix inversions. In total, we only produce ∼11 000 and 17 000 samples for each of the two codes for this exercise, at a total cost of 𝒪(105) CPU-hours.

thumbnail Fig. 5.

Comparison of marginal cosmological parameter posterior distributions derived with Commander3 and Python for a simulation with uniform noise and a 10% constant latitude mask. The total number of independent samples produced by the two codes is 𝒪(102), which accounts for the Markov chain correlation length of the algorithm; the Monte Carlo uncertainty due to a finite number of samples is therefore significant.

Finally, in Fig. 6 we show trace plots for each cosmological parameter for the realistic configuration as produced with Commander3, noting that none of the other codes are technically able to produce similar estimates. In this case, the computational cost is even higher, and we only produce a total of 1600 samples. Still, by comparing these traces with the true input values (marked as dashed horizontal lines), we do see that the resulting samples agree well with the expected mean values. The uncertainties are also as expected since they are smaller and on the order of the noisier 10% constant latitude and isotropic noise case in Fig. 5.

thumbnail Fig. 6.

Trace plot of the cosmological parameters for a semi-realistic case based on the Planck LFI 70 GHz noise level and a realistic Galactic mask. The dashed horizontal lines show the true input values.

3.2. Markov chain correlation length and computational costs

The second main goal of this paper is to quantify the computational costs involved in this algorithm and identify potential bottlenecks that could be optimized in future work. The effective cost per independent Markov chain sample can be written as the product of the raw sampling cost per sample and the overall Markov chain correlation length. The latter can be quantified in terms of the autocorrelation function,

ζ θ ( Δ ) ( θ i μ ) σ ( θ i + Δ μ ) σ , $$ \begin{aligned} \zeta _\theta (\Delta ) \equiv \left\langle \frac{(\theta _i - \mu )}{\sigma }\frac{(\theta _{i+\Delta } - \mu )}{\sigma }\right\rangle , \end{aligned} $$(25)

where μ and σ are the posterior mean and standard deviation computed from the Markov chain for θ. We define the overall correlation length to be the first value of Δ for which ζ < 0.1. This function is plotted for both Cobaya and Commander3 in Fig. 7; for the latter, dashed lines show full-sky results and dot-dashed lines show results for a 10% mask. Overall, we see that the Commander3 correlation length is 50–60 for the full-sky case, increasing to 50–100 for the masked case. For comparison, Cobaya typically has a correlation length of about 10.

thumbnail Fig. 7.

Markov chain autocorrelation functions for Cobaya (solid lines) and Commander3 for a set of typical cosmological parameters. Dashed and dot-dashed lines show Commander3 results for the full-sky case and the 10 % mask case, respectively. The horizontal dashed gray line indicates an autocorrelation of 0.1, which we use to define the overall correlation length.

The basic computational cost per sample for Commander3 is plotted as black points in Fig. 8. Here we see that the overall cost increases rapidly with decreasing sky coverage, starting at 0.2 CPU-hours for a full-sky sample to about 4 CPU-hours for a 20% constant latitude mask and uniform noise. As the isotropic noise cases have twice the noise of the mean of the 70 GHz rms map, the higher signal-to-noise causes an increase in runtime, and for the realistic LFI 70 GHz scanning pattern and 15% mask, the cost is 20 CPU-hours/sample. Taking into account that the overall correlation length is ∼100, the effective cost is therefore about 2000 CPU-hours per independent sample; the gray dots in Fig. 8 indicates this total cost for each configuration.

thumbnail Fig. 8.

CPU-hours per accepted sample. The black dots are the costs per sample, while the gray dots are the costs per effective sample, i.e., normalized by the correlation length of the chain. This is compared to the 812 CPU-hours to generate one single end-to-end sample of DR1 COSMOGLOBEPlanck LFI and WMAP channels, in red.

To interpret the implications of these costs, it is useful to compare with the overall cost of the full end-to-end processing pipeline within which this algorithm is designed to operate. The most relevant comparison point is therefore the cost of the COSMOGLOBE DR1 processing, which is 812 CPU-hours for the combination of Planck LFI and WMAP, marked as a red dot in Fig. 8. In its current state, we find that the new cosmological parameter step alone is thus about 2.5 times more expensive than the full end-to-end processing.

4. Summary and discussion

This work had two main goals. The first goal was simply to implement the cosmological parameter sampling algorithm proposed by Racine et al. (2016) into the state-of-the-art Commander3 CMB Gibbs sampler and demonstrate that this new step works in a production environment. A head-to-head comparison with both Cobaya and a stand-alone Python implementation demonstrates the fidelity of the method. We therefore conclude that this goal has been met, and as such this work finally adds true “end-to-end” processing abilities to Commander3.

The second goal was to measure the computational performance of the method, assess whether it is already suitable for large-scale production, and identify potential bottlenecks that should be resolved in future work. In this case, we find that the overall cost per independent Gibbs sample for a data set similar to COSMOGLOBE DR1 is about 2000 CPU-hours, which is more than a factor of two higher than the total cost of all other steps in the end-to-end algorithm, including low-level calibration, map-making, and component separation. Therefore, although the code is technically operational at the current time, we also conclude that it is still too expensive to be useful in a full-scale production environment.

At the same time, this analysis has also revealed the underlying origin of these high expenses, which can be divided into two parts. First, the overall cost per sample is dominated by the expense for solving for the mean field (Eq. (16)) and fluctuation maps (Eq. (18)) using conjugate gradients. In this respect, we note that the current implementation uses a standard diagonal preconditioner to solve these equations, as described by Eriksen et al. (2004). However, after more than two decades of development, far more efficient preconditioners have been explored and described in the literature, including multi-resolution or multi-grid methods (e.g., Seljebotn et al. 2014, 2019). Implementing one of them can potentially reduce the basic cost per sample by one or two orders of magnitude. The second issue is a long correlation length. We note that the current algorithm is based on a very basic random-walk Metropolis sampler, which generally performs poorly for highly correlated parameters; in the cases considered here, the correlation between the amplitude of scalar perturbations, As, and the optical depth of reionization, τ, is a typical example of such a correlation. This suggests that the correlation length can be greatly reduced in several ways, for instance either by adding new data sets (such as weak gravitational lensing constraints) or by implementing a more sophisticated sampling algorithm that uses derivative information, such as a Hamiltonian sampler. Overall, we consider it very likely that future work will be able to reduce the total cost by one or two orders of magnitude, as needed for production, and we hope that the current results can provide inspiration and motivation for experts in the field to join the effort.

Finally, before concluding this paper, we note that a key application of the Bayesian end-to-end analysis framework as pioneered by BEYONDPLANCK and COSMOGLOBE is the analysis of next-generation CMB B-mode experiments, for instance LiteBIRD. In this case, it is worth noting that the polarization-based signal-to-noise ratio of the tensor-to-scalar ratio, r, is much lower than Planck and WMAP’s temperature signal-to-noise ratio to the ΛCDM parameters, and both the conjugate gradient cost and Markov chain correlation length are likely to be much shorter than for the case considered in this paper. As such, it is conceivable that the current method already performs adequately for LiteBIRD, and this will be explored5 in future work.


4

Publicly available at https://github.com/LilleJohs/COLOCOLA

Acknowledgments

The current work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement numbers 819478 (ERC; COSMOGLOBE) and 772253 (ERC; BITS2COSMOLOGY). In addition, the collaboration acknowledges support from RCN (Norway; grant no. 274990). Moreover, Simone Paradiso aknowledges support from the Government of Canada’s New Frontiers in Research Fund, through grant NFRFE-2021-00595. We acknowledge the use of the Legacy Archive for Microwave Background Data Analysis (LAMBDA), part of the High Energy Astrophysics Science Archive Center (HEASARC). HEASARC/LAMBDA is a service of the Astrophysics Science Division at the NASA Goddard Space Flight Center. Some of the results in this paper have been derived using the healpy and HEALPix packages (Górski et al. 2005; Zonca et al. 2019).

References

  1. Anderes, E., Wandelt, B., & Lavaux, G. 2015, ApJ, 808, 152 [NASA ADS] [CrossRef] [Google Scholar]
  2. Bennett, C. L., Larson, D., Weiland, J. L., et al. 2013, ApJS, 208, 20 [Google Scholar]
  3. BeyondPlanck Collaboration 2023, A&A, 675, A1 [Google Scholar]
  4. Carron, J. 2019, Phys. Rev. D, 99, 043518 [NASA ADS] [CrossRef] [Google Scholar]
  5. Eriksen, H. K., O’Dwyer, I. J., Jewell, J. B., et al. 2004, ApJS, 155, 227 [Google Scholar]
  6. Galloway, M., Andersen, K. J., & Aurlien, R. 2023, A&A, 675, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  7. Gjerløw, E., Mikkelsen, K., Eriksen, H. K., et al. 2013, ApJ, 777, 150 [CrossRef] [Google Scholar]
  8. Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
  9. Hamimeche, S., & Lewis, A. 2008, Phys. Rev. D, 77, 103013 [NASA ADS] [CrossRef] [Google Scholar]
  10. Hinshaw, G., Larson, D., Komatsu, E., et al. 2013, ApJS, 208, 19 [Google Scholar]
  11. Jewell, J., Levin, S., & Anderson, C. H. 2004, ApJ, 609, 1 [Google Scholar]
  12. Jewell, J. B., Eriksen, H. K., Wandelt, B. D., et al. 2009, ApJ, 697, 258 [NASA ADS] [CrossRef] [Google Scholar]
  13. Larson, D. L., Eriksen, H. K., Wandelt, B. D., et al. 2007, ApJ, 656, 653 [Google Scholar]
  14. Lesgourgues, J. 2011, ArXiv e-prints [arXiv:1104.2932] [Google Scholar]
  15. Lewis, A., & Bridle, S. 2002, Phys. Rev. D, 66, 103511 [Google Scholar]
  16. Lewis, A., Challinor, A., & Lasenby, A. 2000, ApJ, 538, 473 [Google Scholar]
  17. Millea, M., & Seljak, U. 2022, Phys. Rev. D, 105, 103531 [NASA ADS] [CrossRef] [Google Scholar]
  18. Millea, M., Anderes, E., & Wandelt, B. D. 2020, Phys. Rev. D, 102, 123542 [NASA ADS] [CrossRef] [Google Scholar]
  19. Millea, M., Daley, C. M., Chou, T-L., et al. 2021, ApJ, 922, 259 [NASA ADS] [CrossRef] [Google Scholar]
  20. Planck Collaboration II. 2020, A&A, 641, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Planck Collaboration III. 2020, A&A, 641, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  22. Planck Collaboration V. 2020, A&A, 641, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Planck Collaboration VI. 2020, A&A, 641, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Planck Collaboration Int. LVII. 2020, A&A, 643, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  25. Racine, B., Jewell, J. B., Eriksen, H. K., & Wehus, I. K. 2016, ApJ, 820, 31 [NASA ADS] [CrossRef] [Google Scholar]
  26. Seljebotn, D. S., Mardal, K.-A., Jewell, J. B., Eriksen, H. K., & Bull, P. 2014, ApJS, 210, 24 [NASA ADS] [CrossRef] [Google Scholar]
  27. Seljebotn, D. S., Bærland, T., Eriksen, H. K., Mardal, K. A., & Wehus, I. K. 2019, A&A, 627, A98 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  28. Shewchuk, J. R. 1994, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Edition 1 1/4, http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf [Google Scholar]
  29. Torrado, J., & Lewis, A. 2021, JCAP, 05, 057 [Google Scholar]
  30. Verde, L., Peiris, H. V., Spergel, D. N., et al. 2003, ApJS, 148, 195 [NASA ADS] [CrossRef] [Google Scholar]
  31. Wandelt, B. D., Larson, D. L., & Lakshminarayanan, A. 2004, Phys. Rev. D, 70, 083511 [Google Scholar]
  32. Watts, D. J., Basyrov, A., Eskilt, J. R., et al. 2023, A&A, in press, https://doi.org/10.1051/0004-6361/202346414 [Google Scholar]
  33. Wieczorek, M. A., & Meschede, M. 2018, Geochem. Geophys. Geosyst., 19, 2574 [Google Scholar]
  34. Zonca, A., Singer, L., Lenz, D., et al. 2019, J. Open Source Softw., 4, 1298 [Google Scholar]

Appendix A: Analytic expression for a constant latitude mask

The goal of this appendix is to show how we can calculate N lm l m 1 $ N_{\ell m \ell^{\prime} m^{\prime}}^{-1} $ analytically for a constant latitude mask and uniform noise. This expression makes the map-making equation, in Eq. (17), computationally faster to solve as we derive our expression to be N lm l m 1 δ m m $ N_{\ell m \ell^{\prime} m^{\prime}}^{-1} \propto \delta_{mm^{\prime}} $. These results are generally known, but we re-derive them here for completeness.

For uniform instrumental noise with no mask, the pixel covariance matrix is Npp = σ2δpp where σ is the white noise uncertainty per pixel. Including the constant latitude mask, the inverse of the noise covariance matrix can be written in pixel space as

( N 1 ) p p = 1 σ 2 δ p p H ( | θ ( p ) π / 2 | b ) . $$ \begin{aligned} \left(N^{-1} \right)_{pp^{\prime }} = \frac{1}{\sigma ^2} \delta _{pp^{\prime }} H(|\theta (p) -\pi /2|-b). \end{aligned} $$

Here, H is the Heaviside function, meaning that we mask every pixel p where |θ(p)−π/2|< b for some latitude b in radians. For a masked pixel, we have (N−1)pp = 0, meaning that the noise at that pixel is infinite, Npp = ∞.

Defining the spherical harmonics functions Y m ( p ) = Y m ( n ̂ ( p ) ) $ Y_{\ell m}\left(p\right) = Y_{\ell m}\left(\hat{n}(p)\right) $ and n ̂ ( p ) = ( θ ( p ) , ϕ ( p ) ) $ \hat{n}(p) = (\theta(p), \phi(p)) $ are the spherical coordinates at pixel p, which we can transform into spherical harmonics space,

( N 1 ) m m = p p ( N 1 ) p p Y m ( p ) Y m ( p ) = 1 σ 2 p p Y m ( p ) Y m ( p ) δ p p H ( | θ π / 2 | b ) = 1 σ 2 p Y m ( p ) Y m ( p ) H ( | θ π / 2 | b ) = 1 σ 2 p P m P m e i ( m m ) ϕ H ( | θ π / 2 | b ) . $$ \begin{aligned} \nonumber \left(N^{-1}\right)_{\ell m \ell ^{\prime } m^{\prime }}&= \sum _{p p^{\prime }}\left(N^{-1}\right)_{pp^{\prime }}Y^{*}_{\ell m}(p)Y_{\ell ^{\prime } m^{\prime }}(p^{\prime })\\ \nonumber&= \frac{1}{\sigma ^2}\sum _{p p^{\prime }} Y^{*}_{\ell m}(p)Y_{\ell ^{\prime } m^{\prime }}(p^{\prime }) \delta _{pp^{\prime }} H(|\theta -\pi /2|-b)\\ \nonumber&= \frac{1}{\sigma ^2}\sum _{p } Y^{*}_{\ell m}(p)Y_{\ell ^{\prime } m^{\prime }}(p) H(|\theta -\pi /2|-b)\\&= \frac{1}{\sigma ^2}\sum _p \tilde{P}_{\ell m}\tilde{P}_{\ell ^{\prime } m^{\prime }} e^{-i(m-m^{\prime })\phi } H(|\theta -\pi /2|-b). \end{aligned} $$(A.1)

Here, we have defined P m = P m ( θ ) = Δ m P m ( cos ( θ ) ) $ \tilde{P}_{\ell m}=\tilde{P}_{\ell m}(\theta) = \Delta_{\ell m}P_{\ell m}(\cos(\theta)) $, where Δ m = ( 1 ) m 2 + 1 4 π ( m ) ! ( + m ) ! $ {\Delta_{\ell m}=(-1)^m \sqrt{\frac{2\ell+1}{4\pi}\frac{(\ell - m)!}{(\ell+m)!}}} $ and Pm(cos(θ)) are the associated Legendre polynomials. Thus, we write the spherical harmonics functions as Y m ( p ) = P m e i m ϕ $ Y_{\ell m}(p) = \tilde{P}_{\ell m} e^{im\phi} $.

We now switch from discrete pixels to continuous space, meaning that we change the sum to an integration where we account for the number of pixels, Npix, per area element,

p N pix 4 π d Ω = N pix 4 π 0 2 π d ϕ 0 π d θ sin ( θ ) . $$ \begin{aligned} \sum _p \rightarrow \frac{N_{\mathrm{pix} }}{4\pi }\int d\Omega = \frac{N_{\mathrm{pix} }}{4\pi }\int _{0}^{2\pi } d\phi \int _{0}^{\pi } d\theta \sin (\theta ). \end{aligned} $$(A.2)

This gives us

( N 1 ) m m = N pix 4 π σ 2 0 2 π d ϕ 0 π d θ sin ( θ ) P m P m e i ( m m ) ϕ · H ( | θ π / 2 | b ) = N pix 2 σ 2 δ m m 0 π d θ sin ( θ ) P m P m H ( | θ π / 2 | b ) = N pix 2 σ 2 δ m m · ( 0 π / 2 b d θ sin ( θ ) P m P m + π / 2 + b π d θ sin ( θ ) P m P m ) . $$ \begin{aligned} \nonumber \left(N^{-1}\right)_{\ell m \ell ^{\prime } m^{\prime }}&= \frac{N_{\mathrm{pix} }}{4\pi \sigma ^2}\int _{0}^{2\pi } d\phi \int _{0}^{\pi } d\theta \sin (\theta )\tilde{P}_{\ell m} \tilde{P}_{\ell ^{\prime } m^{\prime }} e^{-i(m-m^{\prime })\phi } \\ \nonumber&\cdot H(|\theta -\pi /2|-b)\\ \nonumber&= \frac{N_{\mathrm{pix} }}{2\sigma ^2} \delta _{mm^{\prime }}\int _{0}^{\pi } d\theta \sin (\theta )\tilde{P}_{\ell m} \tilde{P}_{\ell ^{\prime } m^{\prime }} H(|\theta -\pi /2|-b)\\ &= \frac{N_{\mathrm{pix} }}{2\sigma ^2} \delta _{mm^{\prime }}\nonumber \\&\cdot \left(\int _{0}^{\pi /2-b} d\theta \sin (\theta ) \tilde{P}_{\ell m} \tilde{P}_{\ell ^{\prime } m^{\prime }}+\int _{\pi /2+b}^{\pi } d\theta \sin (\theta )\tilde{P}_{\ell m} \tilde{P}_{\ell ^{\prime } m^{\prime }}\right). \end{aligned} $$(A.3)

Writing x = cos(θ), we know that the associated Legendre polynomials Pm(x) are either symmetric or antisymmetric in x → −x. Pm(x) is symmetric in x → −x when  + m = even and antisymmetric when  + m = odd. Since m = m′, we note that the two integrals in the last line of Eq. (A.3) cancel each other if  + ′= odd. We, therefore, only get nonzero elements when  + ′= even, for which the two integrals are equal. Hence, for  + ′= even, we get

( N 1 ) m m = N pix σ 2 δ m m 0 π / 2 b d θ sin ( θ ) P m ( θ ) P m ( θ ) = N pix σ 2 δ m m sin ( b ) 1 d x P m ( arccos ( x ) ) P m ( arccos ( x ) ) . $$ \begin{aligned} \nonumber \left(N^{-1}\right)_{\ell m \ell ^{\prime } m^{\prime }}&= \frac{N_{\mathrm{pix} }}{\sigma ^2} \delta _{mm^{\prime }}\int _{0}^{\pi /2-b} d\theta \sin (\theta )\tilde{P}_{\ell m}(\theta )\tilde{P}_{\ell ^{\prime } m^{\prime }}(\theta )\\&=\frac{N_{\mathrm{pix} }}{\sigma ^2} \delta _{mm^{\prime }}\int _{\sin (b)}^{1} dx \, \tilde{P}_{\ell m}(\arccos (x)) \tilde{P}_{\ell ^{\prime } m}(\arccos (x)). \end{aligned} $$(A.4)

In the full-sky case, b = 0, we find ( N 1 ) m m = δ δ m m N pix 4 π σ 2 $ \left(N^{-1}\right)_{\ell m \ell^\prime m^\prime} = \delta_{\ell \ell^\prime}\delta_{mm^\prime}\frac{N_{\mathrm{pix}}}{4\pi \sigma^2} $ which is equivalent to a diagonal noise power spectrum N = 4 π σ 2 N pix $ {N_{\ell} = \frac{4\pi\sigma^2}{N_{\mathrm{pix}}}} $.

The integral in Eq. (A.4) can be solved numerically by gridding x, and P m $ \tilde{P}_{\ell m} $ can be calculated in Python using the library pyshtools (Wieczorek & Meschede 2018). The azimuthally symmetric Galactic mask used for this work has a sky coverage of fsky = 0.90, giving sin(b) = 0.1. Therefore, we require many fewer grid points for x if we use the following identity for  + ′= even,

δ m m sin ( b ) 1 d x P m ( arccos ( x ) ) P m ( arccos ( x ) ) = $$ \begin{aligned}&\delta _{mm^{\prime }} \int _{\sin (b)}^{1} dx \, \tilde{P}_{\ell m}(\arccos (x)) \tilde{P}_{\ell ^{\prime } m}(\arccos (x)) = \end{aligned} $$(A.5)

1 4 π δ δ m m δ m m 0 sin ( b ) d x P m ( arccos ( x ) ) P m ( arccos ( x ) ) , $$ \begin{aligned} \frac{1}{4\pi }\delta _{\ell \ell ^{\prime }}\delta _{m m^{\prime }} -&\delta _{mm^{\prime }} \int _{0}^{\sin (b)} dx \, \tilde{P}_{\ell m}(\arccos (x)) \tilde{P}_{\ell ^{\prime } m}(\arccos (x)), \end{aligned} $$(A.6)

which comes from the orthonormality condition for spherical harmonics. We now only need to grid x in the interval 0 ≤ x ≤ sin(b) = 0.1, as opposed to the interval 0.1 ≤ x ≤ 1.

Since N lm l m 1 δ m m $ \mathsf{N}^{-1}_{\ell m \ell^{\prime} m^{\prime}} \propto \delta_{m m^{\prime}} $, we get simplified matrix expressions. Imagine multiplying the matrix N 1 = N lm l m 1 $ \mathsf{N}^{-1} = N^{-1}_{\ell m \ell^{\prime} m^{\prime}} $ by the vector b = bℓm:

N 1 · b = m ( N 1 ) m m b m = ( N 1 ) m m b m $$ \begin{aligned} \mathsf N ^{-1} \cdot \boldsymbol{b}&= \sum _{\ell ^{\prime } m^{\prime }}\left(\mathsf N^{-1}\right)_{\ell m \ell ^{\prime } m^{\prime }}b_{\ell ^{\prime } m^{\prime }} = \sum _{\ell ^{\prime }}\left(N^{-1}\right)_{\ell m \ell ^{\prime } m}b_{\ell ^{\prime } m} \end{aligned} $$(A.7)

= ( N 1 ) , ( m ) b ( m ) . $$ \begin{aligned} &= \sum _{\ell ^{\prime } }\left(\mathsf N ^{-1}\right)^{(m)}_{\ell , \ell ^{\prime }}b^{(m)}_{\ell ^{\prime }}. \end{aligned} $$(A.8)

To solve the map-making equation, Eq. (17), we get a matrix equation for each 0 ≤ m ≤ max. This gives us max + 1 number of matrix equations where the dimensions of the matrices are maximally (max + 1)×(max + 1). This is numerically much quicker to invert rather than inverting the full matrix [S−1+BTN−1B] once.

All Figures

thumbnail Fig. 1.

Example of a constrained CMB sky map realization produced during traditional Gibbs sampling. From top to bottom, the three panels show the mean field map ( a ̂ $ \hat{{\boldsymbol{a}}} $), the fluctuation map ( f ̂ $ \hat{{\boldsymbol{f}}} $), and the full constrained realization ( a = a ̂ + f ̂ $ {\boldsymbol{a}} = \hat{{\boldsymbol{a}}}+\hat{{\boldsymbol{f}}} $). This example is generated with the data configuration discussed in Sect. 3, which corresponds to Planck LFI 70 GHz noise properties and a realistic Galactic mask.

In the text
thumbnail Fig. 2.

Angular power spectra for each of the constrained realization maps shown in Fig. 1. The ΛCDM cosmological model corresponding to θ is shown as a dashed black line, while the colored curves show (from dark to light) spectra for the fluctuation map, the mean field map, and the full constrained realization. Note that in the fully noise-dominated cases, such as the entire B-mode power spectrum and all high multipoles, the fluctuation term f ̂ m $ \hat f_{\ell m} $ and the CMB estimate am have nearly identical power spectra by design.

In the text
thumbnail Fig. 3.

Schematic comparison of the standard Gibbs sampler (top panels) and the joint sampler of Racine et al. (2016), bottom panels. The left and right columns show high and low signal-to-noise regimes, respectively. The Gibbs sampler performs poorly in the low signal-to-noise regime as it requires a large number of samples to explore the posterior distribution. The joint sampler in the lower panels performs well in both regimes as it allows the next sample to move diagonally in the {a, θ} parameter space.

In the text
thumbnail Fig. 4.

Comparison of cosmological parameter posterior distributions derived with Commander3 (blue), Python (red), and Cobaya (gray) for a simulation with uniform noise and full-sky coverage. They are made from a single chain per code, and the true input cosmological parameter values are indicated by vertical and horizontal dashed lines.

In the text
thumbnail Fig. 5.

Comparison of marginal cosmological parameter posterior distributions derived with Commander3 and Python for a simulation with uniform noise and a 10% constant latitude mask. The total number of independent samples produced by the two codes is 𝒪(102), which accounts for the Markov chain correlation length of the algorithm; the Monte Carlo uncertainty due to a finite number of samples is therefore significant.

In the text
thumbnail Fig. 6.

Trace plot of the cosmological parameters for a semi-realistic case based on the Planck LFI 70 GHz noise level and a realistic Galactic mask. The dashed horizontal lines show the true input values.

In the text
thumbnail Fig. 7.

Markov chain autocorrelation functions for Cobaya (solid lines) and Commander3 for a set of typical cosmological parameters. Dashed and dot-dashed lines show Commander3 results for the full-sky case and the 10 % mask case, respectively. The horizontal dashed gray line indicates an autocorrelation of 0.1, which we use to define the overall correlation length.

In the text
thumbnail Fig. 8.

CPU-hours per accepted sample. The black dots are the costs per sample, while the gray dots are the costs per effective sample, i.e., normalized by the correlation length of the chain. This is compared to the 812 CPU-hours to generate one single end-to-end sample of DR1 COSMOGLOBEPlanck LFI and WMAP channels, in red.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.