Free Access
Issue
A&A
Volume 541, May 2012
Article Number A74
Number of page(s) 10
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/201118207
Published online 01 May 2012

© ESO, 2012

1. Introduction

Measurements of the cosmic microwave background (CMB) anisotropies are powerful cosmological probes. In the currently favored cosmological model, with the nearly Gaussian-distributed curvature perturbations, almost the entire statistical information is contained in the CMB angular power spectrum. The observed quantity on the sky is generally the CMB temperature anisotropy Θ(􏿻p)\hbox{$\Theta(\vec{p})$} in direction 􏿻p\hbox{$\vec{p}$}, which is described as T(􏿻p)=TCMB[1+Θ(􏿻p)]\hbox{$T(\vec{p})=T_{\rm CMB}[1+\Theta(\vec{p})]$}. This field is expanded on the spherical harmonic functions as Θ(􏿻p)==0+m=a[ℓ,m]Yℓm(􏿻p),wherea[ℓ,m]=S2Θ(􏿻p)Yℓm(􏿻p)d􏿻p,\begin{eqnarray} &&\Theta(\vec{p})=\sum_{\ell=0}^{+\infty}\sum_{m=-\ell}^{\ell} a[\ell,m]Y_{\ell m}(\vec{p}), \\ &&\text{ where } a[\ell,m] = \int_{\mathbb{S}^2} \Theta(\vec{p}) Y^*_{\ell m}(\vec{p}) {\rm d}\vec{p}, \end{eqnarray}where S2 ⊂ R3 is the unit sphere, is the multipole moment, which is related to the angular size on the sky as  ~ 180°/θ and m is the phase ranging from  −  to . The a[ℓ,m]  are the spherical harmonic coefficients of the (noise-free) observed sky. For a Gaussian random field, the mean and covariance are sufficient statistics, meaning that they carry the whole statistical information of the field. If the random field has zero mean, E(a00) = 0 and the expansion can be started at  = 2, neglecting the dipole terms, i.e.,  = 11. For \hbox{$\ell \geqslant 2$}, the triangular array (a[ℓ,m] )ℓ,m represents zero-mean, complex-valued random coefficients, with variance E(|a[ℓ,m]|2)=C[]>0,\begin{equation} \label{eq:almvar} \mathbb{E}(|a[\ell,m]|^2 )= C[\ell] > 0, \end{equation}(3)where C[]  is the CMB angular power spectrum, which only depends on because of the isotropy assumption. Therefore, from Eq. (3), an unbiased estimator of C[]  is given by the empirical power spectrum 􏽢C[]=12+1m|a[ℓ,m]|2.\begin{equation} \label{eq:CMB_PS} \widehat{C}[\ell] = \frac{1}{2\ell+1}\sum_{m}\left|a[\ell,m]\right|^{2}\!. \end{equation}(4)Furthermore, becasue the random field is stationary, the spherical harmonic coefficients are uncorrelated, E(a[ℓ,m]a[,m])=δδmmC[].\begin{equation} \mathbb{E}(a[\ell,m]a^*[\ell^\prime,m^\prime]) = \delta_{\ell\ell^{\prime}}\delta_{mm^{\prime}}C[\ell] . \end{equation}(5)Since they are Gaussian, they are also independent. The angular power spectrum depends on the cosmological parameters through an angular transfer function T(k) as C[]=4πdkkT2(k)P(k),\begin{equation} C[\ell]=4\pi\int\frac{{\rm d}k}{k}\; T_\ell^{2}(k)P(k), \label{eq:CMB} \end{equation}(6)where k defines the scale and P(k) is the primordial matter power spectrum.

Making accurate measurements of this power spectrum has been one of the main goals of cosmology in the past two decades. We have seen a range of ground- and balloon-based experiments, such as Acbar (Reichardt et al. 2009) and CBI (Readhead et al. 2004), as well as satellite experiments, such as WMAP (Bennett et al. 2003) and the recently launched satellite Planck2. All these experiments produce a temperature map of the sky from which the CMB power spectrum is obtained. The estimation of the power spectrum from CMB experiments is important because this spectrum is a way to estimate the cosmological parameters that describe the Universe. General methods for extracting this spectrum from a Npix-map, with nonuniform coverage and correlated noise are quite expensive and time-consuming especially for Planck, where we will be dealing with Npix = 5 × 107. All these experiments estimate the CMB angular power spectrum from a sky map, which is a realization of the underlying true power spectrum; no matter how much the experiments improve, we are still limited to an accuracy within the cosmic variance. This means that even if we had a perfect experiment (i.e. with zero instrumental noise), we would not be able to recover a perfect power spectrum, because of the cosmic variance limit.

In this paper we investigate the possibility of estimating the true underlying power spectrum from a realized spectrum; an estimation of the true power spectrum without a need to know the cosmological parameters. For this we exploit the sparsity properties of the CMB power spectrum, and capitalize on it to propose an estimator of the theoretical power spectrum.

The idea of sparsity in different dictionaries has been used previously. For example, Mukherjee & Wang (2004) used wavelets to estimate the level of non-Gaussianity in the first year of the WMAP data. In Mukherjee & Wang (2003) wavelets were used to estimate the primordial power spectrum. Faÿ et al. (2008) have used wavelets to estimate the power spectrum from a CMB map, as an alternative to the MASTER method of Hivon et al. (2002). There have also been previous attempts to smooth the CMB power spectrum, for instance spline-fitting, where the smoothed power spectrum was used for visual aids (Oh et al. 1999). Also, Aghamousa et al. (2012) have smoothed WMAP 1-3-5-7 power spectra to investigate their evolution.

We used the sparsity of the CMB power spectrum as a key ingredient to estimate the theoretical power spectrum without having to know the cosmological parameters; this estimate will not belong to a set of possible theoretical power spectra (i.e., all C[]  that can be obtained by CAMB3 by varying the cosmological parameters). Instead, this estimation should be useful for other applications, such as:

  • Monte Carlo: we may need to conduct Monte Carlo simulations insome applications without assuming the cosmologicalparameters.

  • Wiener filtering: Wiener filtering is often used to filter the CMB map and requires the theoretical power spectrum as an input. We may not wish to assume any cosmology at this stage of the processing.

  • Some estimators (weak lensing, integrated SachsÐWolfe (ISW), etc.) require the theoretical power spectrum to be known. Using a data-based estimate of the theoretical C[]  could be an interesting alternative, or at least a good first guess in an iterative scheme where the theoretical C[]  is required for determining the cosmological parameters.

Paper content

Section 2 introduces the concepts of sparse representation and its applications to the CMB power spectrum. In Sect. 3 we explain how sparsity is used to propose an estimator of the theoretical power spectrum. Experimental results are described and discussed in Sect. 4, and Sect. 5 is devoted to a discussion and comparison of the moving average estimator. In Sect. 6, conclusions are drawn and potential perspectives are stated.

2. Sparsity of the CMB power spectrum

2.1. A brief tour of sparsity

A signal X = (X[1] ,...,X[N] ) considered as a vector in RN, is said to be sparse if most of its entries are equal to zero. If k number of the N samples are not equal to zero, where k ≪ N, then the signal is said to be k-sparse. If only a few of the entries have high values and the rest are zero or close to zero, the signal is said to be weakly sparse (or compressible). With a slight abuse of terminology we call compressible signals sparse. Generally, signals are not sparse in direct space, but can be sparsified by transforming them to another domain. For example, sin(x) is 1-sparse in the Fourier domain, while it is clearly not sparse in the original one. In the so-called sparsity synthesis model, a signal can be represented as the linear expansion X=Φα=i=1Tφiα[i],\begin{equation} X =\Phi\alpha=\sum_{i=1}^{T}\phi_{i}\alpha[i] , \end{equation}(7)where α[i]  are the synthesis coefficients of X, Φ = (φ1,   ...,   φT) is the dictionary, and φi are called the atoms (elementary waveforms) of the dictionary Φ. In the language of linear algebra, the dictionary Φ is a N × T matrix whose columns are the normalized atoms, supposed here to be normalized to a unit 2-norm, i.e. i[1,T],φi22=n=1N|φi[n]|2=1\hbox{$\forall i\in[1,T],\left\Vert \phi_{i}\right\Vert _2^2=\sum_{n=1}^{N}\left|\phi_{i}[n]\right|^{2}=1$}4. A function can be decomposed in many dictionaries, but the best dictionary is that with the sparsest (most economical) representation of the signal. In practice, it is convenient to use dictionaries with fast implicit transform (such as Fourier transform, wavelet transform, etc.) which allow us to directly obtain the coefficients and reconstruct the signal from these coefficients using fast algorithms that run in linear or almost linear time (unlike matrix-vector multiplications). The Fourier, wavelet, and discrete cosine transforms are certainly the most well known dictionaries. A comprehensive account of sparsity and its applications can be found in the monograph Starck et al. (2010).

2.2. Which dictionary for the theoretical CMB power spectrum?

We investigated the sparsity of the CMB power spectrum in two different dictionaries, both having a fast implicit transform: the wavelet transform (WT) and the discrete cosine transform (DCT).

thumbnail Fig. 1

Theoretical CMB power spectrum along with the reconstructed power spectra, using the DCT and WT dictionaries. The panels show the reconstructions for different fractions of the coefficients. The inner plots show the differences between the actual and the reconstructed power spectra. Both dictionaries suffer from boundary effects, but this is more severe for DCT because the corresponding atoms are not compactly supported. The power spectrum that is decomposed onto the two dictionaries is in the form ( + 1)C[] /2π.

thumbnail Fig. 2

Non-linear approximation (NLA) error curves for the two dictionaries. Below 1% the DCT curve is dropping faster, which means it is performing better. However, past  ~ 2% the DCT curve flattens off, while WT decreases to  ~ 0 very quickly.

Figure 1 shows an angular power spectrum (calculated with CAMB, Lewis et al. 2000; with WMAP7, Larson et al. 2011, parameters) along with the DCT- and WT-reconstructed power spectra with a varying fraction of the largest transform coefficients retained in the reconstruction. The inner plots show the difference between the actual power spectrum and the reconstructed ones. Clearly, with only few coefficients the shape of the power spectrum is correctly reconstructed in both dictionaries. The height and the position of the peaks and troughs are very important here because the estimation of the cosmological parameters heavily relies on these characteristics of the power spectrum. The best domain would be the one with the sparsest representation and yet the most accurate representation of the power spectrum. Let C[(M) be its best M-term approximation, i.e., obtained by reconstructing from the M-largest (in magnitude) coefficients of C[]  in a given domain. To compare the WT and DCT dictionaries, we plot the resulting non-linear approximation (NLA) error curve in Fig. 2, which shows the reconstruction error EM as a function of M, the number of retained coefficients EM=C[]C[](M)2C[]2×100.\begin{equation} E_{M}= {\frac{\left\Vert {C}[\ell]-{{C}[\ell]}^{(M)}\right\Vert _2}{\left\Vert C[\ell]\right\Vert _2}}\times100. \label{NLA} \end{equation}(8)As M increases we approach the complete reconstruction, and the error reaches zero when all coefficients have been used. Usually the domain with the steepest EM curve is the sparsest domain. In this case, though, both dictionaries show a very similar behavior. There is only a small window in the coefficients for which DCT performs better than WT. However, DCT flattens after using  ~ 1% of the coefficients and does not improve the reconstruction until many coefficients have been used.

Both dictionaries seem to suffer from boundary problems at low and high s. This can be solved for high s because one can always perform the reconstruction beyond the desired . For low s it can be solved by different means, for instance by extrapolating the spectrum. Note that the boundary problems are more severe in the DCT domain than in the WT; this is because DCT atoms are not compactly supported.

thumbnail Fig. 3

A simulated CMB power spectrum along with the reconstructed spectra, using the DCT and WT dictionaries. The black solid line is the true underlying power spectrum from which the simulations were made. The blue and red dots show the simulated and the reconstructed power spectra respectively. With only 1% of the coefficients, DCT can recover the input power spectrum (i.e. the black solid line) very well, recovering the peaks and troughs accurately. Unlike DCT, WT seems to have difficulties in recovering the peaks and troughs. The inner plots shows the NMSE curves.

Next we investigated the sparsity of a set of realized spectra in the two dictionaries. We simulated 100 maps from the theoretical power spectrum used above and estimated their power spectra using Eq. (4). As before, we decomposed each realization in the DCT and WT dictionaries and reconstructed keeping increasing fractions of the highest coefficients. At this stage, it is important to note that because we are dealing with the empirical power spectrum, we are no longer in an approximation setting but instead in an estimation one. Indeed, the empirical power spectrum can be considered a noisy version of the true one. Intuitively, reconstructing from a very small fraction of high coefficients will reject most of the noise (low estimator variance) but at the price of retaining only a small fraction of the true spectrum coefficients (strong bias). The opposite is true when a large proportion of coefficients is kept in the reconstruction. Therefore, there is a threshold value that will entail a bias-variance trade-off, hence minimizing the estimation risk. This is exactly the idea underlying thresholding estimators in sparsifying domains.

thumbnail Fig. 4

Average of the DCT-reconstructed power spectra when different fractions of the most significant coefficients used in the reconstruction. The inner plot shows the NMSE curve for this average. The minimum of the curve is at less than 10% of the coefficients, meaning that the true CMB spectrum can be recovered with less than 10% of the coefficients while ensuring a good bias-variance trade-off.

This discussion is clearly illustrated by the inner plots of Fig. 3, which shows the normalized mean-square error (NMSE) defined as NMSEM=C[]􏽢C[](M)2C[]2×100,\begin{equation} \mathrm{NMSE}_{M} = {\frac{\left\Vert {C}[\ell]-{\widehat{C}[\ell]}^{(M)}\right\Vert _2}{\left\Vert C[\ell]\right\Vert _2}}\times100, \end{equation}(9)as a function of the fraction of coefficients used in the reconstruction. The error is large when only a few coefficients are used. As more coefficients are included, one starts to recover the main (i.e., the general shape of the spectrum) features of the power spectrum. With more coefficients, more noise enters the estimation and the error increases again. The NMSE curve shows a clear minimum at which the underlying true power spectrum is best recovered.

Despite the differences in the performance of the two dictionaries, the minima of the NMSE have about the same proportions of the coefficients. This is because the NMSE reflects a global behavior. On the one hand, although the DCT can recover the features of the spectrum correctly, it is less smooth than WT. On the other hand, the WT cannot reconstruct the proper shape of the power spectrum, but provides a smoother estimate.

Figures 4 and 5 show the same results for an average over the 100 realizations. Evidently, on average DCT performs very well in recovering the features of the CMB spectrum. Indeed, the minimum of NMSE curve is at a lower proportion of coefficients for DCT than WT. This is because the small noisy features of the DCT-reconstructed spectra cancel out in the averaging, while the wrong recovery of the shape of the spectrum by WT does not. In a nutshell, DCT seems to perform better in reconstructing the true underlying CMB power spectrum from its realizations.

To summarize, from the above discussion, we conclude the following:

  • the CMB power spectrum is very sparse in both DCT and WTdictionaries, although their sparsifying capabilities aredifferent;

  • DCT recovers global features of spectrum (i.e., the peaks and troughs), while WT recovers localized features;

  • WT recovers more localized (noisy) features than the global ones for realizations, while the DCT concentrates on the global features.

In the next section, these complementary capabilities of the DCT and WT transforms will be combined to propose a versatile way for adaptively estimating the theoretical power spectrum from a single realization of it.

3. Sparse reconstruction of the theoretical power spectrum

We begin with the simple model where the observed signal Y is contaminated by a zero-mean white Gaussian noise, Y = X + ε, where X is the signal of interest and \hbox{$\varepsilon \sim \mathcal{N}(0,\sigma^2)$}. Sparse recovery with an analysis-type sparsity prior amounts to finding the solution of the following problem: minXΦTX1  s.t.  YX2δ,\begin{equation} \label{eq_minl1_den} \min_{X} \norm{ \Phi^{T} X}_1\ ~~ s.t. ~~ \norm{Y - X}_2 \le \delta, \end{equation}(10)where ΦTX represents the transform coefficients of X in the dictionary Φ, and δ controls the fidelity to the data and obviously depends on the noise standard deviation σ.

We now turn to denoising the power spectrum from one empirical realization of it. In this case, however, the noise is highly non-Gaussian and needs to be treated differently. Indeed, as we will see in the next section, the empirical power spectrum will entail a multiplicative χ2-distributed noise with a number of degrees of freedom that depends on . That is, the noise has a variance profile that depends both on the true spectrum and . We therefore need to stabilize the noise on the empirical power spectrum prior to estimation, using a variance stabilization transform (VST). Hopefully, the latter will yield stabilized samples that have (asymptotically) constant variance, say 1, irrespective of the value of the input noise level.

thumbnail Fig. 5

Same as Fig. 4, but for the WT domain. It seems that WT cannot perform well for simulated power spectra, compared to the DCT domain; the minimum of the NMSE curve is at more than 35% – while the curve is quite flat after  ~ 20%.

3.1. Variance stabilizing transform

In the statistical literature the problem of removing the noise from an empirical power spectrum is called periodogram denoising (Donoho 1993). Komm et al. (1999) approximated the noise with a correlated Gaussian noise model and derived a threshold at each wavelet scale using the MAD (median of absolute deviation) estimator. A more elegant approach was proposed in Donoho (1993) and Moulin (1994), where the so-called Wahba VST was used. This VST is defined as 𝒯(X)=(logX+γ)6π,\begin{equation} \label{eq:wahba} {\cal T}(X) = \left( \log X + \gamma \right) \frac{\sqrt6}{\pi}, \end{equation}(11)where γ = 0.57721... is the Euler-Mascheroni constant. After the VST, the stabilized samples can be treated as if the noise contaminating them were white Gaussian noise with unit variance.

We will take a similar path here, generalizing the above approach to the case of the angular power spectrum. Indeed, from Eq. (4), one can show that under mild regularity assumptions on the true power spectrum, 􏽢C[]dC[]Z[],where2,2LZ[]~χ2L2,L=2+1,\begin{equation} \label{eq:Clasymp} \widehat{C}[\ell] \overset{d}{\rightarrow} C[\ell] Z[\ell], \text{ where } \forall \ell \geq 2, \, 2L Z[\ell] \sim \chi^2_{2L}, L=2\ell+1, \end{equation}(12)where d\hbox{$\overset{d}{\rightarrow}$} means convergence in distribution. From Eq. (12), it is appealing then to take the logarithm to transform the multiplicative noise Z into an additive one. The resulting log -stabilized empirical power spectrum reads Cs[]:=𝒯(􏽢C[])=log􏽢C[]μL=logC[]+η[],\begin{equation} \label{eq:vst} C^{s}[\ell] := {\cal T}_\ell (\widehat{C}[\ell]) = \log \widehat{C}[\ell] -\mu_L = \log C[\ell] + \eta[\ell] , \end{equation}(13)where η[] : = log Z[]  − μL, L = 2 + 1. Using the asymptotic results from Bartlett & Kendall (1946) on the moments of log  − χ2 variables, it can be shown that μL=ψ0L)(logL\hbox{$\mu_L = \psi_0\parenth{L} - \log L$}, E(η[] ) = 0 and σL2=Var[η[]]=ψ1L)(\hbox{$\sigma_L^2=\var{\eta[\ell]} = \psi_1\parenth{L}$}, where ψm(t) is the standard polygamma function, ψm(t)=dm+1dtm+1logΓ(t)\hbox{$\psi_m(t)=\frac{{\rm d}^{m+1}}{{\rm d} t^{m+1}}\log \Gamma(t)$}.

We can now consider the stabilized Cs[]  as noisy versions of the log C[] , where the noise is zero-mean additive and independent. Owing to the central limit theorem, the noise tends to Gaussian with variance σL2\hbox{$\sigma_L^2$} as increases. At low , normality is only an approximation. Indeed, the noise η[]  has a probability density function of the form pη()=(2L)L2LΓ(L)exp[L(+μLe+μL)],\begin{equation} p_{\eta}(\ell) = \frac{(2L)^{L}}{2^L \Gamma(L)}\exp\crochets{L\parenth{\ell +\mu_L-{\rm e}^{\ell + \mu_L}}}, \end{equation}(14)which might be used to estimate the thresholds in the wavelet domain.

To standardize the noise, we slightly modify the VST (13)to the normalized form Cs[]:=𝒯(􏽢C[])=log􏽢C[]μLσL=Xs[]+ε[],\begin{equation} \label{eq_logvar} C^{s}[\ell] := {\cal T}_\ell ( \widehat{C}[\ell] ) = \frac{ \log \widehat{C}[\ell]-\mu_L}{\sigma_L} = X^{s}[\ell] + \varepsilon[\ell] , \end{equation}(15)where now the noise ε[]  is a zero-mean (asymptotically) Gaussian with unit variance, and Xs[] : = log C[] /σL. It can be checked that the Wahba VST (11)is a specialization of Eq. (15) to L = 0.

In the following, we will use the operator notation \hbox{${\cal T}(X)$} for the VST that applies Eq. (15) entry-wise to each X[] , and ℛ(X) its inverse operator, i.e. ℛ(X): = (ℛ(X[] )) with ℛ(X[] ) = exp(σLX[] ).

3.2. Signal detection in the wavelet domain

Without of loss of generality, we restrict our description here to the wavelet transform. The same approach applies to other sparsifying transforms, e.g., DCT.

To estimate the true CMB power spectrum from the wavelet transform, it is important to detect the wavelet coefficients that are “significant”, i.e. those wavelet coefficients whose absolute value is too high to be due to noise (cosmic variance + instrumental noise). Let wj[]  be the wavelet coefficient of a signal Y at scale j and location . We define the multiresolution support M of Y as Mj[]={ifwj[]issignificant,ifwj[]otherwise.\begin{equation} {M}_j[\ell] = \begin{cases} 1 & \mbox{ if } w_{j}[\ell] \mbox{ is significant,} \\ 0 & \mbox{ if } w_{j}[\ell] \mbox{ otherwise.} \end{cases} \end{equation}(16)For Gaussian noise, it is easy to derive an estimate of the noise standard deviation σj at scale j from the noise standard deviation, which can be evaluated with good accuracy in an automated way (Starck & Murtagh 1998). To detect the significant wavelet coefficients, it suffices to compare the wavelet coefficients in magnitude |wj[] | to a threshold level tj. This threshold is generally taken to be equal to κσj, where κ ranges from 3 to 5. This means that a small magnitude compared to the threshold implies that the coefficients are very likely to be due to noise and are hence insignificant. This decision corresponds to the hard-thresholding operator \begin{eqnarray} \begin{array}{l} \mbox{ if } \left| w_{j}[\ell] \right| \ \geq \ t_j \ \ \mbox{ then } w_{j}[\ell] \mbox{ is significant, } \\ \mbox{ if } \left| w_{j}[\ell] \right| \ < \ t_j \ \ \mbox{ then } w_{j}[\ell] \mbox{ is not significant.} \end{array} \end{eqnarray}(17)To summarize, the multiresolution support is obtained from the signal Y by computing the forward transform coefficients, applying hard thresholding, and recording the coordinates of the retained coefficients.

3.3. Power spectrum recovery algorithm

We now turn to the adaptive estimator of the true CMB power spectrum C[]  from its empirical estimate 􏽢C[]\hbox{$\widehat{C}[\ell]$}. Because we benefit from the (asymptotic) normality of the noise in the stabilized samples Cs[]  in Eq. (15), we are in position to easily construct the multiresolution support M of Cs as described in the previous section. Once the support M of significant coefficients has been determined, our goal is to reconstruct an estimate 􏽥X\hbox{$\widetilde{X}$} of the true power spectrum, known to be sparsely represented in some dictionary Φ(regularization), such that the significant transform coefficients of its stabilized version reproduce those of Cs (fidelity to data). Furthermore, because a power spectrum is a positive, a positivity constraint must be imposed. These requirements can be cast as seeking an estimate that solves the following constrained optimization problem: minXΦTX1s.t.{\begin{equation} \label{eq_min_supp} \min_{X} \| { \Phi}^{T}{X}\|_1 \quad \mathrm{s.t.} \quad \begin{cases} X \geqslant 0 \\ M \odot \big(\Phi^{T}{\cal T}(X)\big) = M \odot \big(\Phi^{T} C^{s}\big), \end{cases} \end{equation}(18)where  ⊙  stands for the Hadamard product (i.e., entry-wise multiplication) of two vectors. This problem has a global minimizer, which is bounded. However, in addition to non-smoothness of the l1-norm and the constraints, the problem is also non-convex because of the VST operator \hbox{${\cal T}$}. It is therefore far from easy to solve.

We propose the following scheme, which starts with an initial guess of the power spectrum X(0) = 0, and then iterates for n = 0 to Nmax − 1, 􏽥X=(𝒯(X(n))+ΦM(ΦT(Cs𝒯(X(n)))))X(n+1)=𝒫+(Φ STλn(ΦT􏽥X)),\begin{equation} \label{eq_iter_theo_powspec} \begin{split} \widetilde{{X}} &= {\cal R} \left( {\cal T} \left( { X}^{(n)}\right) + { \Phi} M \odot \left({ \Phi}^{T} \left( C^{s} -{\cal T} \left( { X}^{(n)}\right) \right) \right)\right) \\ {X}^{(n+1)} &= \mathcal{P}_{+}\left( { \Phi} ~ \text{ST}_{\lambda_n}({ \Phi}^{T}\widetilde{{X}}) \right), \end{split} \end{equation}(19)where \hbox{$\mathcal{P}_{+}$} denotes the projection onto the positive orthant and guarantees non-negativity of the spectrum estimator, STλn(w) = (STλn(w[i] ))i is the soft-thresholding with threshold λn that applies term-by-term the shrinkage rule STλn(w[i])={if|w[i]|λn,otherwise.\begin{equation} \text{ST}_{\lambda_n}(w[i]) = \begin{cases} \mathrm{sign}(w[i])(|w[i]| - \lambda_n) & \text{if} \ |w[i]| \geqslant \lambda_n, \\ 0 & \text{otherwise} . \end{cases} \end{equation}(20)Here, we have chosen a decreasing threshold with the iteration number n, λn = (Nmax − n)/(Nmax − 1). More details pertaining to this algorithm can be found in Starck et al. (2010).

Algorithm 1 summarizes the main steps of the sparse denoising algorithm. A similar approach was proposed in Starck et al. (2009) and Schmitt et al. (2010) for Poisson noise removal in 2D and 3D data sets.

3.4. Instrumental noise

In practice, the data are generally contaminated by an instrumental noise, and estimating the true CMB power spectrum C[]  from the empirical power spectrum 􏽢C[]\hbox{$\widehat{C}[\ell]$} requires one to remove this instrumental noise. The instrumental noise is assumed to be stationary and independent of the CMB. We also assumed that we have access to the power spectrum of the noise, or we can compute the empirical power spectrum 􏽢SN[]\hbox{$\widehat{S}_N[\ell]$} of at least one realization, either from a JackKnife data map or from realistic instrumental noise simulations. The above algorithm can be adapted to handle this case after rewriting the optimizing problem as follows: minXΦTX1s.t.{\begin{equation} \label{eq_min_supp_noise} \min_{X} \| { \Phi}^{T}{X} \|_1 \quad \mathrm{s.t.} \quad \begin{cases} X \geqslant 0 \\ M \odot \big(\Phi^{T}{\cal T}(X+\widehat{S}_N)\big) = M \odot \big(\Phi^{T} C^{s}\big). \end{cases} \end{equation}(21)

Thus, Eq. (19) becomes 􏽥X=(𝒯(X(n)+􏽢SN)+ΦM(ΦT(Cs𝒯(X(n)+􏽢SN))))􏽢SNX(n+1)=𝒫+(ΦSTλn(ΦT􏽥X)).\begin{equation} \begin{split} \widetilde{X} = &\ \ \mathcal{R} \biggl( \mathcal{T} ( X^{(n)} + \widehat{S}_N ) \ \\ &\ \ + \Phi M \odot \Bigl( \Phi^{T} \bigl( C^{s} - \mathcal{T} ( X^{(n)}+\widehat{S}_N ) \bigr) \Bigr) \biggr) - \widehat{S}_N\\ X^{(n+1)} = &\ \ \mathcal{P}_{+} \bigl( \Phi \ \textrm{ST}_{\lambda_n}( \Phi^{T}\widetilde{X} ) \bigr) . \end{split} \label{eq_iter_theo_powspec_noise} \end{equation}(22)Algorithm 1 can be modified accordingly.

thumbnail Fig. 6

Theoretical CMB power spectrum (black line), the empirical power spectrum of one realization (blue dots), and the avegared of estimated power spectra (red line) using the TOUSI algorithm. The inner plots show a zoomed-in version.

thumbnail Fig. 7

Cosmological parameters estimated from the true CMB power spectrum (black line) and the mean of the reconstructed power spectra (red line). The dashed line is the true input parameters, i.e., those used to calculate the theoretical power spectrum using CAMB.

3.5. Combining several dictionnaries

We have seen in Sect. 2.2 that the WT and DCT dictionaries had complementary benefits. Indeed, each dictionary is able to capture the features with shapes similar to its atoms well. More generally, assuming that we have D dictionaries Φ1,...D. Given a candidate signal Y, we can derive a support Md associated to each dictionary Φd, for d ∈  { 1,...,D } . The optimization problem to solve now reads minXΦTX1s.t.{\begin{equation} \begin{split} & \min_{X} \| { \Phi}^{T}{X} \|_1 \quad \mathrm{s.t.} \quad \\ & \left \{\begin{array}{l} X \geqslant 0 \\ M_d \odot \big(\Phi_d^{T}{\cal T}(X+\widehat{S}_N)\big) = M_d \odot \big(\Phi_d^{T} C^{s}\big), ~ d \in \{1,\ldots,D\}. \end{array} \right . \end{split}\label{eq_min_supp_combi_noise} \end{equation}(23)Again, this is a challenging optimization problem. We propose to solve it by applying Eq. (22) successively and alternatively on each dictionary Φd. Algorithm 2 describes in detail the different steps.

4. Application: Monte Carlo simulations

We simulated 100 maps from a theoretical CMB power spectrum that was calculated by CAMB. The power spectra of these maps are equivalent to 100 realizations of the true CMB power spectrum – this realized spectrum is what we have access to in reality. Each of these 100 simulated spectra were run through the TOUSI algorithm, with the aim of recovering the theoretical spectrum from which these 100 spectra were simulated (i.e., the one that was calculated by CAMB).

Figure 6 shows an example of this; the empirical power spectrum of one realization (blue dots) that was fed into the TOUSI algorithm, the average of the 100 estimated power spectral using TOUSI (red line) and the input theoretical spectra (black line). The black line is the input theoretical spectrum that was calculated by CAMB, which is what we are trying to recover. The reconstruction of the peaks and troughs of the power spectrum by this algorithm is very impressive. This is very important because these features define the cosmological parameters. To additionally check the accuracy of these reconstructed spectra, we estimated a set of cosmological parameters from these spectra, using CosmoMC (Lewis & Bridle 2002). First, a set of cosmological parameters were estimated from the 100 simulated spectra. The results are shown in Fig. 7 as black solid lines. Then a “mean” reconstructed spectrum was calculated by averaging the 100 reconstructed spectra. This, in principle (i.e., if the algorithm has worked), should be an estimation of the true input spectrum with the same characteristics and the same cosmological parameters. To test this, we used this “mean” reconstructed spectrum to simulate another 100 maps and then 100 spectra. These simulated spectra were run through CosmoMC to estimate the same set of cosmological parameters. These are shown as red lines in Fig. 7. Clearly, the TOUSI algorithm can reconstruct the true underlying power spectrum with great accuracy in the cosmological parameters.

thumbnail Fig. 8

Power spectrum estimate in the presence of instrumental noise. The blue dots show the empirical power spectrum of one realization with instrumental noise. Yellow dots show the estimated power spectrum of one of the simulated noise maps. Green dots show the the spectrum with the noise power spectrum removed. The black and red solid lines are the input and reconstructed power spectra, respectively. The inner plots show a zoomed-in version.

4.1. Data with instrumental noise

Here we present the performance of the TOUSI algorithm in the presence of instrumental noise. The noise maps were simulated using a theoretical (Planck level) noise power spectrum. They were added to the CMB maps simulated above and the power spectra of the combined maps were estimated using Eq. (4).

Figure 8 shows the reconstruction of the theoretical CMB spectrum in the presence of noise. The blue dots show the empirical power spectrum of one realization with instrumental noise. Yellow dots show the estimated power spectrum of one of the simulated noise maps. Green dots show the the spectrum with the noise power spectrum removed. The black and red solid lines are the input and reconstructed power spectra, respectively. The theoretical power spectrum can be reconstructed up to the point where the structure of the power spectrum has not been destroyed by the instrumental noise. In our case, having Planck level noise, this goes to up to 2500. Obviously, the TOUSI performs very well in reconstructing the input power spectrum even in the presence of instrumental noise.

4.2. Test on WMAP7 power spectrum

We tested our algorithm on real data. Figure 9 shows the application of our technique to the WMAP7 power spectrum. The method indeed works well up to of  ~800. After this point the instrumental noise becomes so dominant that the features of the spectrum are washed out and cannot be recovered.

5. Sparsity versus averaging

A very common approach to reduce the noise on the power spectrum is the moving average filter, i.e., average values in a given window, 􏽥CA[b]=1b(b+1)ωb=bωb2b+ωb2(+1)􏽢C[],\begin{equation} \widetilde{C}^{A}[b] = \frac{1}{b(b+1)\omega_b} \sum_{\ell=b-{\frac{\omega_b}{2}}}^{b+{\frac{\omega_b}{2}}} \ell (\ell+1) \widehat{C}[\ell], \end{equation}(24)and the window size ωb is increasing with . Here, we used window sizes of  { 1,2,5,10,20,50,100 } , respectively, for ranging from  { 2,11,31,151,421,1201,2501 }  to  { 10,30,150,420,1200,2500,3200 } , which have also been used in the framework of the Planck project in Leach et al. (2008).

Figure 10 shows a reconstruction of the power spectrum for TOUSI versus averaging. From the NMSE curve it is clear that our algorithm is much more efficient. Figure 11 shows the average error the 100 realizations as a function of E[]=1100i=1100C[]􏽥Ci[]2,\begin{equation} E[\ell] = { \frac{1}{100} \sum_{i=1}^{100} \parallel C[\ell] - \widetilde{C}_i[\ell] \parallel_2}, \end{equation}(25)where 􏽥Ci\hbox{$\widetilde{C}_i$} is the estimated power spectrum from the i-th realization. We display the errors for the spectra estimated by the empirical estimator (the realization, black dotted line), the averaging estimator (red dashed line) and TOUSI (solid blue line). The cosmic variance is overplotted as a solid black line. Evidently, the expected error is highly reduced when using the sparsity-based estimator.

thumbnail Fig. 9

TOUSI algorithm applied to the WMAP7 power spectrum. The technique works well up to of  ~800, i.e., before the instrumental noise becomes dominant washes the features of the spectrum out.

thumbnail Fig. 10

Mean denoised spectra with wavelets (blue solid line) and averaging (red dashed line) from 100 realizations. The inner plot shows the normalized error for both dictionaries.

thumbnail Fig. 11

Mean error for the 100 realizations, for the realizations (black dotted line), the averaging denoising (red dashed line) and the sparse wavelet filtering (blue solid line). The inner plot shows a zoom between l = 2000 and l = 3000.

6. Conclusion

Measurements of the CMB anisotropies are powerful cosmological probes. In the currently favored cosmological model, with the nearly Gaussian-distributed curvature perturbations, almost the entire statistical information is contained in the CMB angular power spectrum. We have investigated the sparsity of the CMB power spectrum in two dictionaries, the DCT and WT. In both dictionaries the CMB power spectrum can be recovered with only a few percentages of the coefficients, meaning the spectrum is very sparse. The two dictionaries have different characteristics and can accommodate reconstructing different features of the spectra. The DCT can help recover the global features of the spectrum, while WT helps recover small localized features. The sparsity of the CMB spectrum in these two domains has helped us develop an algorithm, TOUSI, which estimates the true underlying power spectrum from a given realized spectrum. This algorithm uses the sparsity of the CMB power spectrum in both WT and DCT domains and takes the best from both worlds to obtain a highly accurate estimate from a single realization of the CMB power spectrum. This could be a replacement for CAMB if knowing the cosmological parameters is not necessary. The developed IDL code will be released with the next version of ISAP (Interactive Sparse astronomical data Analysis Packages) via the web site http://jstarck.free.fr/isap.html.


1

The dipole anisotropy is dominated by the Earth’s motion in space and it is hence ignored.

3

CAMB solves the Boltzmann equations for a cosmological model set out by the given cosmological parameters.

4

The lp-norm of a vector X, p ≥ 1, is defined as ∥Xp = ( ∑ i|X[i] |p)1/p, with the usual adaptation  ∥ X ∥  = maxiX[i] .

Acknowledgments

The authors thank Marian Douspis, Olivier Doré, and Amir Hajian for useful discussions. This work is supported by the European Research Council grant SparseAstro (ERC-228261).

References

  1. Aghamousa, A., Arjunwadkar, M., & Souradeep, T. 2012, ApJ, 745, 114 [NASA ADS] [CrossRef] [Google Scholar]
  2. Bartlett, M. S., & Kendall, D. G. 1946, J. Roy. Stat. Soc., Ser. B, 8, 128 [Google Scholar]
  3. Bennett, C. L., Hill, R. S., Hinshaw, G., et al. 2003, ApJS, 148, 97 [NASA ADS] [CrossRef] [Google Scholar]
  4. Donoho, D. 1993, in Proc. Symp. Appl. Math., ed. A. M. Society, 47, 173 [Google Scholar]
  5. Faÿ, G., Guilloux, F., Betoule, M., et al. 2008, Phys. Rev. D, 78, 083013 [NASA ADS] [CrossRef] [Google Scholar]
  6. Hivon, E., Górski, K. M., Netterfield, C. B., et al. 2002, ApJ, 567, 2 [NASA ADS] [CrossRef] [Google Scholar]
  7. Komm, R. W., Gu, Y., Hill, F., Stark, P. B., & Fodor, I. K. 1999, ApJ, 519, 407 [NASA ADS] [CrossRef] [Google Scholar]
  8. Larson, D., Dunkley, J., Hinshaw, G., et al. 2011, ApJS, 192, 16 [NASA ADS] [CrossRef] [Google Scholar]
  9. Leach, S. M., Cardoso, J.-F., Baccigalupi, C., et al. 2008, A&A, 491, 597 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Lewis, A., & Bridle, S. 2002, PRD, 66, 103511 [Google Scholar]
  11. Lewis, A., Challinor, A., & Lasenby, A. 2000, ApJ, 538, 473 [NASA ADS] [CrossRef] [Google Scholar]
  12. Moulin, P. 1994, IEEE Transactions on Signal Processing, 42, 3126 [NASA ADS] [CrossRef] [Google Scholar]
  13. Mukherjee, P., & Wang, Y. 2003, ApJ, 599, 1 [NASA ADS] [CrossRef] [Google Scholar]
  14. Mukherjee, P., & Wang, Y. 2004, ApJ, 613, 51 [NASA ADS] [CrossRef] [Google Scholar]
  15. Oh, S. P., Spergel, D. N., & Hinshaw, G. 1999, ApJ, 510, 551 [NASA ADS] [CrossRef] [Google Scholar]
  16. Readhead, A. C. S., Mason, B. S., Contaldi, C. R., et al. 2004, ApJ, 609, 498 [NASA ADS] [CrossRef] [Google Scholar]
  17. Reichardt, C. L., Ade, P. A. R., Bock, J. J., et al. 2009, ApJ, 694, 1200 [NASA ADS] [CrossRef] [Google Scholar]
  18. Schmitt, J., Starck, J. L., Casandjian, J. M., Fadili, J., & Grenier, I. 2010, A&A, 517, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Starck, J.-L., & Murtagh, F. 1998, PASP, 110, 193 [NASA ADS] [CrossRef] [Google Scholar]
  20. Starck, J.-L., Fadili, J. M., Digel, S., Zhang, B., & Chiang, J. 2009, A&A, 504, 641 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Starck, J.-L., Murtagh, F., & Fadili, M. 2010, Sparse Image and Signal Processing (Cambridge University Press) [Google Scholar]

All Figures

thumbnail Fig. 1

Theoretical CMB power spectrum along with the reconstructed power spectra, using the DCT and WT dictionaries. The panels show the reconstructions for different fractions of the coefficients. The inner plots show the differences between the actual and the reconstructed power spectra. Both dictionaries suffer from boundary effects, but this is more severe for DCT because the corresponding atoms are not compactly supported. The power spectrum that is decomposed onto the two dictionaries is in the form ( + 1)C[] /2π.

In the text
thumbnail Fig. 2

Non-linear approximation (NLA) error curves for the two dictionaries. Below 1% the DCT curve is dropping faster, which means it is performing better. However, past  ~ 2% the DCT curve flattens off, while WT decreases to  ~ 0 very quickly.

In the text
thumbnail Fig. 3

A simulated CMB power spectrum along with the reconstructed spectra, using the DCT and WT dictionaries. The black solid line is the true underlying power spectrum from which the simulations were made. The blue and red dots show the simulated and the reconstructed power spectra respectively. With only 1% of the coefficients, DCT can recover the input power spectrum (i.e. the black solid line) very well, recovering the peaks and troughs accurately. Unlike DCT, WT seems to have difficulties in recovering the peaks and troughs. The inner plots shows the NMSE curves.

In the text
thumbnail Fig. 4

Average of the DCT-reconstructed power spectra when different fractions of the most significant coefficients used in the reconstruction. The inner plot shows the NMSE curve for this average. The minimum of the curve is at less than 10% of the coefficients, meaning that the true CMB spectrum can be recovered with less than 10% of the coefficients while ensuring a good bias-variance trade-off.

In the text
thumbnail Fig. 5

Same as Fig. 4, but for the WT domain. It seems that WT cannot perform well for simulated power spectra, compared to the DCT domain; the minimum of the NMSE curve is at more than 35% – while the curve is quite flat after  ~ 20%.

In the text
thumbnail Fig. 6

Theoretical CMB power spectrum (black line), the empirical power spectrum of one realization (blue dots), and the avegared of estimated power spectra (red line) using the TOUSI algorithm. The inner plots show a zoomed-in version.

In the text
thumbnail Fig. 7

Cosmological parameters estimated from the true CMB power spectrum (black line) and the mean of the reconstructed power spectra (red line). The dashed line is the true input parameters, i.e., those used to calculate the theoretical power spectrum using CAMB.

In the text
thumbnail Fig. 8

Power spectrum estimate in the presence of instrumental noise. The blue dots show the empirical power spectrum of one realization with instrumental noise. Yellow dots show the estimated power spectrum of one of the simulated noise maps. Green dots show the the spectrum with the noise power spectrum removed. The black and red solid lines are the input and reconstructed power spectra, respectively. The inner plots show a zoomed-in version.

In the text
thumbnail Fig. 9

TOUSI algorithm applied to the WMAP7 power spectrum. The technique works well up to of  ~800, i.e., before the instrumental noise becomes dominant washes the features of the spectrum out.

In the text
thumbnail Fig. 10

Mean denoised spectra with wavelets (blue solid line) and averaging (red dashed line) from 100 realizations. The inner plot shows the normalized error for both dictionaries.

In the text
thumbnail Fig. 11

Mean error for the 100 realizations, for the realizations (black dotted line), the averaging denoising (red dashed line) and the sparse wavelet filtering (blue solid line). The inner plot shows a zoom between l = 2000 and l = 3000.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.