Sparse pointsource removal for fullsky CMB experiments: application to WMAP 9year data
Laboratoire AIM, UMR CEACNRSParis 7, Irfu, SAp/SEDI, Service d’Astrophysique, CEA Saclay, 91191 GifSurYvette Cedex, France
email: florent.sureau@cea.fr
Received: 18 September 2013
Accepted: 28 March 2014
Missions such as WMAP or Planck measure fullsky fluctuations of the cosmic microwave background and foregrounds, among which bright compact source emissions cover a significant fraction of the sky. To accurately estimate the diffuse components, the pointsource emissions need to be separated from the data, which requires a dedicated processing. We propose a new technique to estimate the flux of the brightest point sources using a morphological separation approach: point sources with known support and shape are separated from diffuse emissions that are assumed to be sparse in the spherical harmonic domain. This approach is compared on both WMAP simulations and data with the standard local χ^{2} minimization, modelling the background as a loworder polynomial. The proposed approach generally leads to 1) lower biases in flux recovery; 2) an improved root meansquare error of up to 35%; and 3) more robustness to background fluctuations at the scale of the source. The WMAP 9year pointsourcesubtracted maps are available online.
Key words: cosmic background radiation / methods: data analysis / methods: statistical
© ESO, 2014
1. Introduction
Missions such as Planck and WMAP provide highresolution fullsky maps of microwave emissions. The primary goal of these missions is to measure the fluctuations in the cosmic microwave background (CMB), the relic radiation of the big bang, and hence provide key information about the birth and evolution of our Universe. These missions also provide fullsky information on the Galactic and extragalactic emissions in a frequency range that has not been probed before. One of the main scientific challenges in the CMB data analysis consists of accurately separating the CMB, the various Galactic emissions (synchrotron, freefree, dust and Galactic compactsource emissions, to name a few) as well as the contribution from extragalactic sources. In particular, bright compactsource emissions cover a significant fraction of the sky (Planck Collaboration VII 2011; Wright et al. 2009), even at high Galactic latitudes, where the CMB is less affected by other diffuse foregrounds. These emissions therefore need to be carefully dealt with for CMB or Galactic data analysis.
From a sourceseparation perspective, these emissions are difficult to model since the compact sources display both spectral and temporal variability. As reported in Wright et al. (2009), at least onethird of the extragalactic WMAP sources display temporal variability with high confidence, with more than a 2:1 range in fluxes. On the other hand, extrapolating the fluxes from catalogues obtained at lower frequency than that probed by WMAP or Planck is erroneous for radio sources (Planck Collaboration XIV 2011): flatspectrum emissions from the core of extragalactic sources are dominating for Planck or WMAP channels, whereas steepspectrum lobe emissions dominate at lower frequencies. Finally, their high spectral variability makes it very difficult to estimate their flux with generic sourceseparation techniques that implicitly rely on the factorization of spatial and spectral information (Tegmark & de OliveiraCosta 1998; Vielva et al. 2003; LópezCaniego et al. 2006). This means that each CMB data set needs to be processed independently to accurately estimate the compactsource fluxes in this data set.
The most distinctive information for separating compact and diffuse emissions is based on morphology, in particular for point sources where the shape of the emission is the point spread function (PSF) of the instrument at the wavelength considered. Consequently, specific pipelines were developed based on this information in the CMB data analysis; the compact sources are first detected and fluxestimated and then are either masked or their contribution subtracted before component separation is performed (Planck Collaboration XII 2014).
Numerous approaches based on the morphology of the sources have been proposed to address the initial problem of source detection in the CMB data: morphological imageprocessing such as Sextractor (Bertin & Arnouts 1996), matched filters (Tegmark & de OliveiraCosta 1998; Barreiro et al. 2003; LópezCaniego et al. 2006), waveletbased techniques (Cayón et al. 2000; Vielva et al. 2001; MartínezGonzález et al. 2003; Barreiro et al. 2003; LópezCaniego et al. 2006), and Bayesian detection (Hobson & McLachlan 2003; Savage & Oliver 2007; Carvalho et al. 2009; Guglielmetti et al. 2009; Argüeso et al. 2011). Multichannel techniques, which take into account either an estimate of the power spectrum of diffuse emissions (Herranz & Sanz 2008) or the alleged spectral signatures for the point sources (Lanz et al. 2011; Vio et al. 2013) have also been devised. Other multichannel techniques detect the point sources by cancelling the CMB contribution in the data by a weighted sum of the channels (Wright et al. 2009; Ramos et al. 2011; Scodeller et al. 2012). We refer to Paykari & Starck (2012) for a brief review of the techniques. Some compactsource catalogues have been published for the WMAP (for instance, Wright et al. 2009; Chen & Wright 2009; Ramos et al. 2011; Scodeller et al. 2012; Bennett et al. 2013) or Planck (Planck Collaboration VII 2011; Planck Collaboration XXVIII 2014) missions.
When bright sources are detected, their contribution can be removed from the data. Several strategies have been proposed.

Masking: this is a standard approach where a pointsource mask isapplied to the data map before it is analysed. This solution is veryefficient for CMB power spectrum estimation, and the price to payis only a small loss of sensitivity at low multipoles (Noltaet al. 2009; Larsonet al. 2011). However, for CMBnonGaussianity analysis the mask may create spurious featuresin the nonGaussianity estimators. In addition, maskingcomplicates the Galactic diffuse emission analysis

Masking and inpainting: to avoid artefacts caused by the mask on large scales, the masked data can be interpolated using inpainting techniques. Several methods have been proposed, such as diffuse inpainting (Zacchei et al. 2011), CMBconstrained realization inpainting (Bucher & Louis 2012; Kim et al. 2012), or sparse inpainting (Abrial et al. 2007, 2008; Starck et al. 2013b). It was shown that sparse inpainting, based on the sparsity of the CMB in spherical harmonics, does not significantly impact nonGaussianity measures such as the skewness and kurtosis of the CMB data (Abrial et al. 2007) or the integrated SachsWolfe signal (Dupé et al. 2011) and the weaklensing signal (Perotto et al. 2010). These inpainting approaches are, however, expected to be inaccurate in estimating the CMB in the masked regions, which would also be detrimental to Galactic studies.

Fitting: a more progressive approach is to first model the background due to diffuse emissions, then model the shape of the compact sources by modelling the instrument PSF, and finally estimate the flux by aperture photometry or PSF fitting. This is typically performed when the flux and the position are not jointly estimated in the detection algorithm, as performed in the Bayesian detection approaches (where the background can be modelled through a covariance matrix or described by a loworder polynomial). This method was adopted by the Planck and WMAP teams to estimate the pointsource flux (Planck Collaboration VII 2011; Planck Collaboration XXVIII 2014; Wright et al. 2009; Bennett et al. 2013), where the background was modelled as a loworder polynomial (baseline or linear background). The few background parameters and the pointsource flux were then estimated locally by χ^{2} minimization.
Scodeller & Hansen (2012) showed that removing the pointsource fluxes instead of masking them provides similar results on the final CMB power spectrum estimation with no biases on higher order statistics. A cleaned CMB map is much easier to analyse than a masked CMB map, since there is no need to handle the pointsource mask anymore. Another advantage is that it provides a CMB estimate at the pointsource positions in the mask (or a pointsourcefree channel estimate if applied on a channel instead of on the reconstructed CMB map), useful in particular for Galactic studies, while inpainting techniques only try to fill the gaps without destroying the statistical properties of the map.
In this paper, we propose a new approach for pointsource removal that is based on a morphological sourceseparation method, assuming the sources have already been detected. We use a more flexible (and potentially more complex) model for the background to capture its fluctuations more accurately at the scale of the sources. In this approach we assume that the diffuse emissions are sparse in the spherical harmonic domain, while the point sources are sparse in the direct domain, and resolved compact sources are sparse in the wavelet domain. In Sect. 2, we describe this method along with the proposed algorithm, which is adapted from recent convexoptimization techniques that solve the corresponding inverse problem. Results on compactsource removal in fullsky WMAP realistic simulations are presented in Sect. 3, where our approach is compared with the standard fluxfitting or loworder polynomial backgroundfitting as performed by WMAP and Planck consortia (Planck Collaboration VII 2011; Planck Collaboration XXVIII 2014; Wright et al. 2009; Bennett et al. 2013). Sparse pointsource removal is then applied on WMAP data in Sect. 4, code information is given in Sect. 5, and conclusions and perspectives are drawn in Sect. 6.
2. Sparse pointsource removal
2.1. Modelling of the sky
We modelled microwavechannel data over the full sky y ∈ R^{Ne} (i.e. the channel map is composed of N_{e} realvalued pixels) composed of three components { x_{1},x_{2},x_{3} } ∈ R^{Ne}: the point sources, the extended compact sources, and the diffuse emissions. In this decomposition, the diffuse background is composed of the CMB itself and the synchrotron, freefree, dust diffuse emissions as well as all the sources with flux below the detection cut. The forward model can be cast as (1)where n ∈ R^{Ne} is an additional noise, assumed to be a realvalued Gaussian random field (but not necessarily stationary). In the associated inverse problem, we need to estimate three unknowns from one equation, which is not possible without additional constraints/information. We now discuss these assumptions and constraints on each of the components.

Pointsource catalogues are available for WMAP and Planckdata sets. Using these catalogues, x_{1} can be modelled as a set of Dirac functions { δ_{p} } _{p} convolved by the instrumental beam b: where N_{p} is the number of point sources that need to be removed and f_{p} is the flux of the pth point source. The source fluxes are expected and enforced to be positive.

In case of extended compact sources, the morphological information is weaker. Several extended sources were detected and flagged in the Planck data set, mostly located close to the Galactic centre, while in the WMAP data such sources were not considered (detection was performed outside a Galactic mask). In this work, we assumed that these emissions can be modelled through a sparse decomposition in the undecimated isotropic spherical wavelet dictionary (Starck et al. 2006): , where N_{s} is the number of wavelet scales, θ_{jk} corresponds to a wavelet atom at scale j and position k, and we have overall N_{w} = N_{e}N_{s} wavelet coefficients at all scales. Assuming sparsity of this Galactic component, only a few wavelet coefficients w_{jk} at selected locations k close to the Galactic centre are assumed significantly away from zero.

The background emission x_{3} is much more complex to model accurately. It is usually represented as a local loworder polynomial background (either baseline, or first order) because of its local smoothness, and in this work it is characterized by its sparsity in the spherical harmonic domain. This assumption of sparsity in CMB applications is motivated in Starck et al. (2013a). x_{3} is also assumed bandlimited because of the beam effect.
These components can only be separated if the components are sparse enough in mutually incoherent dictionaries, such as a dirac basis and a spherical harmonic basis (Donoho & Huo 2001; Donoho & Elad 2003). Said differently, compactsource emissions are expected to decrease the sparsity level of the data measured on spherical harmonics, which will be the key ingredient driving the separation process in our approach.
2.2. Associated inverse problem
Using all these hypotheses and constraints, we consider the following inverse problem: (2)where

f ∈ R^{Np} is a vector containing the flux of the N_{p} point sources, is the positive orthant of R^{Np} to enforce the positivity constraint, and B ∈ R^{Ne × Np} is a matrix operating on fluxes and implementing the local projection of the beam at the location of the point sources;

is a vector containing the N_{w} wavelet coefficients at all scales, W represents the matrix implementing the spherical wavelet transform (containing each wavelet atom as a column), M corresponds to a binary mask equal to one where compact extended sources are expected. We promote sparsity of x_{2} by making use of the ℓ_{1} norm of w as a penalization term;

a ∈ C^{Nl} contains the N_{l} complex spherical harmonic coefficients of x_{3} and is the space of bandlimited signals on the sphere (up to a chosen multipole ℓ_{max} depending on the resolution of the channel). S ∈ C^{Ne × Nl} is the matrix describing the orthogonal spherical harmonic transform: x_{3} = Sa. Sparsity of x_{3} is enforced through the use of a ℓ_{1} norm penalization term on a, which is computed as the sum of the modulus of the complex multipole as in the sparse inpainting described in Starck et al. (2013a)

Σ is the (nonstationary) noise covariance matrix and denotes the square of the ℓ_{2} norm weighted by Σ^{1}.
The reconstructed pointsourcefree map can be obtained by . The problem described in Eq. (2) is a convex problem and is related to a constrained morphological component analysis (Starck et al. 2004, 2005) or a basis pursuit denoising problem (Chen & Donoho 1998) with a deconvolution step. The principal advantage of this constrained formulation lies in having only a few hyperparameters to set, which are easily interpretable: the expected noise level ϵ; the balance between ℓ_{1} norm in the wavelet and spherical harmonic domains. More explicitely, the higher the ratio (γ/β), the sparser the solution for x_{3} is in the spherical harmonic domain, and conversely, the lower the ratio, the sparser the estimate of x_{2} in the wavelet domain. Setting them equal would lead to penalize both nonsparse solutions for x_{2} in the wavelet domain and x_{3} in the spherical harmonic domain, as performed in morphological component analysis (Starck et al. 2004, 2005). Note that once the ratio is set, changing their modulus does not change the inverse problem.
The main difficulty here lies in controlling the interplay between sparsity and data fidelity constraints: how can we efficiently estimate a sparse solution (more precisely with minimal ℓ_{1} norm), knowing that many combinations of x_{1}, x_{2} and x_{3} can satisfy the data fidelity constraint?
In the next section, we propose an algorithm to solve this problem.
2.3. Proposed algorithm
Inverse problems as described by Eq. (2) can be solved using primaldual approaches (e.g. Chambolle & Pock 2011; Briceño Arias & Combettes 2011; Becker et al. 2011; Combettes & Pesquet 2012). The proposed algorithm was derived from Chambolle & Pock (2011), it requires only one application of the costly spherical harmonic and wavelet transforms and one application of their adjoint per iteration and does not require subiterations. The algorithm is set as follows:
morphological component analysis with a primaldual approach (SPSR)

1
Choose (a^{0},f^{0},w^{0},t^{0}) ∈ C^{Nl} × R^{Np} × R^{Nw} × R^{Ne} and , , . Choose also the hyperparameters γ,β and τ,σ s.t. (assuming normalized   S   _{2} =   MW   _{2} =   B   _{2} = 1).
 2
where is the projection onto the positive orthant (i.e. sets to 0 negative fluxes) and is the projection onto the set of considered bandlimited signals (i.e. sets to 0 all multipoles greater than the chosen ℓ_{max}); is the standard softthresholding operator applied componentwise: (4)where  x_{i}  is the complex modulus of x_{i} for complex vectors. From Chambolle & Pock (2011), the sequence ((f^{n},a^{n},w^{n}),t^{n}) converges to a saddle point of the primaldual problem with a restricted dual gap decreasing as (firstorder method). We initialised the algorithm with null images.
The sparsesourceremoval (SPSR) algorithm has four parameters to set: β, γ, τ, and σ. The last two are the primal and dual steps, respectively, and drive the convergence speed of the algorithm. The ratio of the first two parameters is the hyperparameter that controls the balance in the sparsity of the decomposition of the extended compact sources and the diffuse background as discussed in the previous section. The choice of their modulus also affects the convergence speed of the algorithm: a high modulus leads to a slowly built sparse approximation, or said differently, the algorithm would provide very sparse solutions at the beginning of the algorithm that are far from satisfying the data fidelity constraint, however; a low modulus, in contrast, leads to building approximations that satisfy the data fidelity constraint early in the iterations, but are not as sparse. In practice, several values need to be experimentally tested to obtain an algorithm with reasonable convergence speed and to derive the required number of iterations.
To assess the relative performance of the proposed approach, flux estimates obtained with SPSR were compared with the loworder polynomial fitting approach (Bennett et al. 2013; Planck Collaboration VII 2011; Planck Collaboration XXVIII 2014). Fluxes were estimated in a 3.5σ region as previously recommended in Wright et al. (2009) using a LevenbergMarquardt algorithm with local χ^{2} minimization (using the C++ library ALGLIB (www.alglib.net), Sergey Bochkanov and Vladimir Bystritsky). The local background was either modelled with a baseline or a firstorder polynomial (FITC and FITL in the following) and a positivity constraint was applied on the fluxes. Clustered point sources with overlapping fitting regions were jointly fitted for. However, in practice, such situations were scarce in the WMAP simulations reported.
3. Results on synthetic WMAP simulations
3.1. Planck Sky Model simulations at WMAP frequencies
The Planck Sky Model (PSM) software (Delabrouille et al. 2013) was employed to simulate WMAPlike data. Each one of the five WMAP frequency channels included the following:

A diffuse component, comprised of a Gaussian CMB, generatedfrom a sixparameter ΛCDM model (with default values from WMAP 7year data combined with BAO and the Hubble constant measurements, and with added C_{ℓ} lensing) and synchrotron, freefree, thermaldust and spinningdust emissions, simulated with default PSM parameters.

A compact component, constituted by radio, infrared, and strong ultracompact HII region emissions, as well as a farinfrared background (model jgn2005 in the PSM). More specifically, radio sources are essentially derived from observations conducted between 0.85 GHz and 4.85 GHz. Each one of these ~2 million simulated sources possessed its own spectrum, composed of power laws with spectral indices varying across predefined bands of electromagnetic frequencies. As reported in Delabrouille et al. (2013), this catalogue is expected to faithfully reproduce the clustering properties of the radio sources and the observed/modelled source number counts at 5 and 20 GHz. In the PSM, the infrared sources were derived from the IRAS pointsource catalogue and faintsource catalogue, with assumed modified blackbody emissions to extrapolate the flux at lower frequencies. Simulated sources were also added near the Galactic plane and in IRAS gaps to obtain the same mean surface density as a function of flux (down to 80 mJy) as in the regions well covered by IRAS. Emissions from ultracompact HII regions were also derived from IRAS, with flux extrapolated according to a modified blackbody fit with added lowfrequency flux with a freefree spectral index when a radiocounterpart is found. Note that compact emissions from the Galaxy were also simulated as part of diffuse emissions. Finally, the simulated farinfrared background is composed of realistic distribution of clustered point sources with assumed fluxdependent spectral indices and reproduced Planck observations reasonably well.

Noise modelled as a nonstationary Gaussian random field, with variance in each pixel derived from the WMAP 9year hit maps; the channel bandpasses were modelled as diracs located at the centre frequency, and the beams were assumed Gaussian with a fullwidth at half maximum given by the beam size provided by the WMAP consortium.
Overall, the PSM simulations reflect the complexity in estimating the strong pointsource fluxes (which is essential to test the robustness of the proposed approach), with complex diffuse background, unresolved sources with clustering properties below the detection limit, strong compact emissions in the Galactic plane, and nonstationary noise. A more comprehensive description of the PSM is provided in Delabrouille et al. (2013). A patch extracted from these WMAP 9year simulated data is represented in Fig. 1 at each of the five WMAP frequency channels to illustrate the complexity of the source separation, because the resolution, noise, and background change in each of the channels. At lowfrequency channels, the background fluctuates at the scale of the point sources, and noise contribution to the flux estimates are lower than in the highfrequency channels, where noise becomes important at the scale of the source, as reported in Wright et al. (2009).
Fig. 1 Patch of the simulated sky centred on two detected point sources at the WMAP wavelengths. 

Open with DEXTER 
Fig. 2 Evolution with the number of iterations of the normalized ℓ_{2} norm of the residuals (left) and normalized ℓ_{1} norm of the diffuse and compact component for channel Ka and for various values for the modulus of β = γ. Reaching the upper limit of the datafidelity constraint leads to a normalized residual ℓ_{2} norm of 1. The ℓ_{1} norm was normalized to the ℓ_{1} norm of the input data in direct space (sum of the absolute values of the pixels in the frequency map). M corresponds to the highest modulus of the spherical harmonic coefficients in the input image. 

Open with DEXTER 
3.2. Flux recovery in noiselimited catalogue
3.2.1. Catalogue generation
First, a catalogue of point sources was independently generated from the simulations for each WMAP channel by retaining the sources with flux above 5σ of the estimated radiometer noise (Wright et al. 2009). The number of sources retained for each channel and the flux cut are displayed in Table 1. This scenario reflects an optimistic situation where only noise would be a limiting factor for source detection (and not the background). Consequently, it allows us to investigate how well point sources can be estimated even when their flux is similar to the background levels. Note that the cut in flux increases with frequency (because of the increasing level of the noise) and our sources are mainly radio sources (i.e. with flux decreasing with increasing frequency). Consequently, fewer sources are detected at higher frequency channels.
Characteristics of the pointsource catalogue generated from simulations.
Fig. 3 Flux recovery with the proposed approach as a function of the number of iterations. After 13 350 (1350) iterations, the flux estimates have converged for channel K (W). Note that the fluxes are displayed in logarithmic scales to visualize the convergence speed for both low and highflux sources. 

Open with DEXTER 
Statistics on the recovered flux for each channel for the three approaches and for various flux bands (in mJy).
3.2.2. SPSR working conditions
The working conditions of SPSR were chosen as follows: the noise hyperparameter ϵ corresponds in the following to the 95th percentile of a χ^{2}(N_{e}) distribution according to the whitened noise statistics. For all channels the hyperparameter ratio was set to β/γ = 1 to favour both sparsity of x_{2} in the wavelet domain and x_{3} in the spherical harmonic domain, as discussed before.
A range of values for the modulus of β = γ was first tested and turned out to be crucial for the convergence speed of the algorithm. This is illustrated in Fig. 2, where SPSR was run with a high number of iterations for channel Ka. Setting the modulus too high (or too low) leads to many iterations necessary to fulfill the datafidelity constraint (or minimizing the ℓ_{1} penalty for components that satisfy the datafidelity term). For instance, for values of β = 0.001M (or β = 0.005M) the datafidelity constraint is satisfied in the first one thousand iterations, but the sparsitypenalty term decreases very slowly and has not converged after 50 000 iterations. Conversely, β = 0.1M (or β = 0.05M) leads to very sparse approximations that do not satisfy the datafidelity constraint even after 50 000 iterations. We chose the value β = 0.01M for all channels, which led to the best convergence properties among the tested values.
Fig. 4 Statistics on error flux using the various approaches for the five channels for five different flux bands (flux <500 mJy, 500 mJy ≤ flux <1 Jy, 1 mJy ≤ flux <2 Jy, 2 Jy ≤ flux <5 Jy, flux ≥5 Jy). Quartiles and extreme values are plotted. 

Open with DEXTER 
Fig. 5 Pointsource residuals with noise for the various approaches in a projected region of channel Q, illustrating the lower bias obtained for this channel with the proposed approach. These maps were obtained by subtracting the diffuse component from the pointsourcesubtracted map. In the first two maps, three negative regions can be visually detected because of the overfitting of the pointsource flux in the FITC and FITL approaches, which is different from the SPSR residual map. This does not display biases that are significantly higher than the noise level. 

Open with DEXTER 
Finally, the number of iterations needs to be set. In addition to the modulus of β = γ, the convergence speed of the algorithm depends on the conditioning of the beam matrix B. The number of iterations therefore varies from channel to channel, requiring more iterations for lower resolution. The flux estimates at several stages of the proposed algorithm are illustrated in Fig. 3 for channels K and W, which correspond to the lowest and highest resolution and to the highest and lowest noise, respectively. In practice, 13 350 iterations were chosen for channel K and 9750 iterations for all other channels as a compromise between processing time and level of convergence. We also ensured that the statistics measured were marginally changing with respect to the relative difference between methods when reaching this number of iterations. Generally, the large number of iterations in both cases is caused by the absence of regularity in the inverse problem of Eq. (2), which leads to firstorder algorithms such as SPSR. To give a pratical example of the computational cost of SPSR, 9750 iterations for the channel Ka represent ~80 h of processing in a cluster with 20 Intel Xeon 2 GHz processors, essentially spent in the spherical harmonic and wavelet transforms.
Fig. 6 Projected regions with large discrepancy between the proposed approach and the local loworder polynomial background modelling. Images correspond to the simulated data, the diffuse component with noise, and the pointsourcesubtracted data with the three approaches. SPSR in this case allows a better subtraction of the pointsource contribution because the local variations in the background are not captured by the other two approaches. 

Open with DEXTER 
3.2.3. Flux recovery and pointsource subtraction
We compared pointsource flux recovery in different flux bands for the FITC, FITL, and SPSR approaches in the simulated channels. The bias, rootmean square errors (RMSE), and mean absolute deviation (MAD) (less sensitive to outliers) are displayed in Table 2. For all methods, the RMSE or MAD varies from channels K to W as a consequence of increasing noise contribution and decreasing background fluctuations at the scale of the point sources, as previously observed in Wright et al. (2009). When the noise contribution to flux uncertainty is predominant (in channel W), FITC and FITL have generally lower biases, RMSE, and MAD than SPSR. Note, however, that the improvement is very small (at most 6% difference in RMSE, bottom row of Table 2). Therefore, it seems that in the noisedominating case, fitting a few local parameters for the diffuse background, as in FITC and FITL, gives slightly more robust flux estimates than SPSR, where many more global parameters are estimated. The situation is the opposite when background fluctuations are the foremost contribution as in the first four channels: SPSR now leads to much lower RMSE and MAD than FITC or FITL, as well as lower biases (for the first three channels). In particular, point sources with flux lower than 1 Jy are better recovered by about 12%, 26%, 32% and 8% in terms of RMSE for channels K to V, with much lower biases for the first three channels (more than 70 mJy better for channel K, for instance).
Statistics on the distribution of the error in the different flux bands are displayed in Fig. 4. In these boxplots both quartiles (three horizontal bars making the “box”) and extreme values (upper and lower horizontal bars) are represented for the various approaches and the different channels. First, the same conclusions can be drawn as in Table 1 by focusing on the interquartile range and median for the different channels: SPSR outperforms FITC and FITL, except for channel W, where results are only slightly degraded. In particular, Fig. 5 illustrates that the bias in the lowest frequency channels is higher than the noise level in the residual maps for FITC and FITL, but not for SPSR.
These boxplots also illustrate that the proposed approach leads to lower extreme errors, while FITL or FITC can both fail with large flux errors. An example of such an incorrect fluxfitting is illustrated in Fig. 6 for channel K, where FITC and FITL lead to overestimated flux in a complex background region that cannot be accurately modelled with loworder polynomials. In contrast, SPSR successfully estimates the background and therefore gives a better flux estimate.
3.3. Flux recovery with internal multichannel catalogue derived from simulations
The noisefree catalogue is useful to assess whether the more flexible background model in SPSR indeed better captures the background fluctuation than loworder local polynomials, in particular, to recover lowflux sources. However, in practice detection is performed internally for each channel, and a catalogue is built by combining these multichannel detections. The fluxrecovery problem is slightly different in this case, since the catalogue is first subject to Eddington bias (excess detection in positive fluctuations of the background, which translates into higher biases in estimating the flux and impacts lowflux sources), and then the task is also to recover as accurately as possible the spectra of sources that can be lower than the detection level in some of the channels (e.g. in higher frequencies for radio sources).
3.3.1. Catalogue generation
As was previously done for the WMAP official pointsource catalogue, we therefore constructed a catalogue by first applying a matched filter to each channel, taking into account secondorder statistics of CMB and noise in spherical harmonic space (Bennett et al. 2013; Tegmark & de OliveiraCosta 1998). Sources were considered detected if the value in the filtered map was higher than 5σ of the filtered background (computed locally). This resulted in 689 (442, 415, 248, and 172) sources for channel K (Ka, Q, V, and W). Compared with the results obtained in Table 1, a subpopulation of the previous catalogue is essentially considered at this step: sources with sufficiently high local contrast to the diffuse background that they can be internally detected. After merging of the sources, the final simulationderived multichannel catalogue contained 724 sources, and we processed these in the same way as desribed in the previous paragraph. However, after merging, the sources considered for a channel with this approach are no longer a subpopulation of the previous catalogue: sources detected in other bands are considered as well.
Statistics on the recovered flux for each channel using the multichannel internal catalogue for the three approaches and for various flux bands (in mJy).
Fig. 7 Statistics on error flux computed from internal catalogue and using the various approaches for the five channels for five different flux bands (flux <500 mJy, 500 mJy ≤ flux <1 Jy, 1 mJy ≤ flux <2 Jy, 2 Jy ≤ flux <5 Jy, flux ≥ 5 Jy). Quartiles and extreme values are plotted. 

Open with DEXTER 
3.3.2. Flux recovery and pointsource subtraction
The same working conditions were set for SPSR as in the previous simulations. The statistics obtained as in Table 2 are presented in Table 3. The values reported for medium or highflux point sources in the first channels are similar if not identical, because this subpopulation did not change from the previous catalogue. However, significant discrepancies from Table 2 can be noted. First, the biases have increased for all methods for lowflux sources, an indication for Eddington bias. Note, however, that the bias has increased less for SPSR than for FITC or FITL in absolute values, leading to even larger differences between the methods (SPSR leads to the lowest biases for all channels, with a bias now up to more than 100 mJy lower for sources lower than 1 Jy in channel K). This is also true for the RMSE of these lowflux sources, which essentially did not change for SPSR and was slightly degraded for FITC or FITL. Note that most of the lowflux sources in the catalogue are not detected in the W band, which might explain why SPSR gave better RMSE for mediumflux sources than FITC or FITL, while this was not the case in the noiselimited catalogue. The proposed approach in this multichannel scenario therefore leads to the lowest errors, even for flux estimated in channel W.
Fig. 8 Projected region in channel V of the WMAP 9year data showing the largest discrepancy between SPSR and FITC or FITL estimated fluxes for two fitted sources. Images correspond to the WMAP data, the pointsourcesubtracted data with FITC, FITL, and SPSR approaches, and pointsourceestimated data using SPSR. The differences here between the approaches are a fraction of noise and the diffuse component levels. 

Open with DEXTER 
The distribution of errors is reported in Fig. 7. SPSR seems as robust to large flux errors in the internal catalogue as in the noiselimited case with respect to FITC or FITL. Note the distribution of errors for channel W for lowflux sources (lower than 500 mJy), which were indeed not detected in this channel (compare with Fig. 4). Large flux errors are observed for all methods, but are less pronounced for SPSR than for FITC or FITL.
As a summary for the internal catalogue, the most striking difference compared with the noiselimited catalogue occurs in lowflux sources that are now subject to Eddington bias, whereas for highflux the results are essentially similar. Better relative performance in this situation in terms of bias and RMSE or MAD may be attributed to two different phenomena: the proposed approach would be less affected by Eddington bias and/or lead to better estimates for sources not detected in the band.
4. WMAP 9year processing
The proposed approach was then applied to subtract pointsource emission from WMAP 9year data. The maps were processed as follows: the differential assemblies data deconvolved with the asymmetrical part of the beam were averaged to obtain 9year frequency band data (Bennett et al. 2013). The beam considered for each channel was obtained as the mean over all axisymetric beams provided for the differential assemblies at that frequency. A catalogue was then built by merging the two catalogues provided by the WMAP collaboration, resulting in 628 considered point sources.
The same processing (with the same working conditions) as for the simulations was then performed for both SPSR and the local loworder polynomial minimization. In the absence of ground truth, it is obviously difficult to quantitatively assess the relative performance of the algorithm as in the simulation. Only extreme cases that visually illustrate the performance of the approaches are therefore presented in Figs. 8 and 9.
Figure 8 represents a patch centred on the source with the strongest difference between SPSR and FITL for channel V. Because the difference between the approaches is only a fraction of the background fluctuations at larger scales than the sources, visually no difference can be observed for that channel. However, Fig. 9 illustrates an extreme case for channel K, suggesting some overestimation of the flux in a complex background scenario with FITC and FITL, but not with SPSR, as reported in Fig. 6 in the simulation.
Fig. 9 Projected region in WMAP 9year data showing the large discrepancy in between SPSR and FITC or FITL estimated fluxes for two fitted sources. Images correspond to the WMAP data, the pointsourcesubtracted data with FITC, FITL, and SPSR approaches. The positive ring around two the fitted regions as well as a negative peak inside these regions seem to indicate that FITC or FITL are here overfitting. The same phenomenon is absent for SPSR. 

Open with DEXTER 
5. Reproducible research
List of products made available in this paper in the spirit of reproducible research, available at http://www.cosmostat.org/Products.
In the spirit of participating in reproducible research, we make all codes and resulting products that constitute the main results of this paper public. In Table 4 we list all products that are made freely available as a result of this paper, and which are available at http://www.cosmostat.org/Products.
For the five WMAP 9year channels, SPSR was applied using the following command line in the open source package iSAP software^{1}: >map_psfree =mrs_sparse_pointsource_removal(Map, GalMask, BeamInfo, StdMap, Niter=Niter).
For Ka to W maps, Niter was fixed to 9750, and to 13 350 for the K channel. Galmask is the WMAP pointsource catalogue mask^{2}, StdMap is the map containing the standard deviation of the noise per pixel, and BeamInfo is a structure containing the WMAP 9year beam at each pointsource position. The code WMAP9_DATA_routines.pro can be used to obtain the required maps and beams and to call the mrs_sparse_ pointsource_removal routine to derive the final products, as illustrated in the script wmap9_remove_point_sources.pro.
6. Conclusions
We proposed a new approach for detected pointsource flux estimation and subtraction. Compared with the standard approach, which estimates pointsource flux according to local loworder polynomial models of the background, the proposed technique is based on a global modelling of the background, which is assumed to be sparse in spherical harmonics to better capture its fluctuations. Bright pointsource emissions decrease the sparsity of this background, which is the key phenomenon driving the separation process. An algorithm was adapted from recent convex optimization developments to solve the corresponding inverse problem.
We evaluated the proposed technique as well as techniques used in WMAP and Planck collaboration on realistic simulations of the WMAP microwave sky. In a noiselimited catalogue, except for channel W, where noise leads to faint differences between the estimates and a slightly poorer estimate for the proposed approach for lowflux sources (about 6%), our approach outperforms local polynomial fitting. In the internally derived catalogue, SPSR also consistently leads to

the lowest biases in the first three channels (up to100 mJy lower bias for sources <1 Jy in channel K)

the lowest RMSE in all channels and all flux bands (with values decreased by at least 5%, 25%, 28%,7%, and 3% and up to 35%, 28%, 32%, 9%, and 14% for channels K to W)

more robust pointsource subtraction in complex background, as illustrated in specific examples

consequently better multichannel estimates of the pointsource fluxes, which is useful to estimate more accurate spectra.
This technique was finally applied to the WMAP 9year data deconvolved with the beam asymmetries and the resulting pointsourcesubtracted maps are available online. We focused in this paper on point sources. Even if the proposed model for extended compact sources is rather simple in SPSR, it is sufficient to derive a background accurate enough to improve on pointsource flux recovery compared with FITC or FITL. However, better modelling of the extended compact sources is still required if they need be subtracted from the data. We left this task for future work.
Acknowledgments
This work was funded by the European Research Council grant SparseAstro (ERC228261), and the Swiss National Science Foundation (SNSF). We used the Healpix^{3} software (Górski et al. 2005), the iSAP^{1} software, and WMAP data^{4}.
References
 Abrial, P., Moudden, Y., Starck, J., et al. 2007, J. Fourier Anal. Appl., 13, 729 [CrossRef] [Google Scholar]
 Abrial, P., Moudden, Y., Starck, J., et al. 2008, Stat. Methodol., 5, 289 [NASA ADS] [CrossRef] [Google Scholar]
 Argüeso, F., Salerno, E., Herranz, D., et al. 2011, MNRAS, 414, 410 [NASA ADS] [CrossRef] [Google Scholar]
 Barreiro, R. B., Sanz, J. L., Herranz, D., & MartínezGonzález, E. 2003, MNRAS, 342, 119 [NASA ADS] [CrossRef] [Google Scholar]
 Becker, S. R., Candès, E. J., & Grant, M. C. 2011, Math. Programm. Comput., 3, 165 [CrossRef] [Google Scholar]
 Bennett, C., Larson, D., Weiland, J. L., et al. 2013, ApJS, 208, 20 [NASA ADS] [CrossRef] [Google Scholar]
 Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Briceño Arias, L., & Combettes, P. 2011, SIAM J. Optim., 21, 1230 [CrossRef] [Google Scholar]
 Bucher, M., & Louis, T. 2012, MNRAS, 424, 1694 [NASA ADS] [CrossRef] [Google Scholar]
 Carvalho, P., Rocha, G., & Hobson, M. P. 2009, MNRAS, 393, 681 [NASA ADS] [CrossRef] [Google Scholar]
 Cayón, L., Sanz, J. L., Barreiro, R. B., et al. 2000, MNRAS, 315, 757 [NASA ADS] [CrossRef] [Google Scholar]
 Chambolle, A., & Pock, T. 2011, J. Math. Imaging and Vision, 40, 120 [CrossRef] [MathSciNet] [Google Scholar]
 Chen, S., & Donoho, D. 1998, in IEEE Int. Conf. Acoustics, Speech and Signal Processing, 3, 1865 [Google Scholar]
 Chen, X., & Wright, E. L. 2009, ApJ, 694, 222 [NASA ADS] [CrossRef] [Google Scholar]
 Combettes, P. L., & Pesquet, J.C. 2012, SetValued and Variational Analysis (Springer Verlag), 20, 307 [CrossRef] [Google Scholar]
 Delabrouille, J., Betoule, M., Melin, J.B., et al. 2013, A&A, 553, A96 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Donoho, D., & Elad, M. 2003, Proc. the National Academy of Sciences, 100, 2197 [NASA ADS] [CrossRef] [Google Scholar]
 Donoho, D., & Huo, X. 2001, IEEE Trans. Inf. Theory, 47, 2845 [CrossRef] [Google Scholar]
 Dupé, F.X., Rassat, A., Starck, J.L., & Fadili, M. J. 2011, A&AS, 534, A51 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [NASA ADS] [CrossRef] [Google Scholar]
 Guglielmetti, F., Fischer, R., & Dose, V. 2009, MNRAS, 396, 165 [NASA ADS] [CrossRef] [Google Scholar]
 Herranz, D., & Sanz, J. L. 2008, IEEE J. Select. Topics Signal Process., 2, 727 [NASA ADS] [CrossRef] [Google Scholar]
 Hobson, M. P., & McLachlan, C. 2003, MNRAS, 338, 765 [NASA ADS] [CrossRef] [Google Scholar]
 Kim, J., Naselsky, P., & Mandolesi, N. 2012, ApJ, 750, L9 [NASA ADS] [CrossRef] [Google Scholar]
 Lanz, L., Herranz, D., LopezCaniego, M., et al. 2011 [arXiv:1110.6877] [Google Scholar]
 Larson, D., Dunkley, J., Hinshaw, G., et al. 2011, ApJS, 192, 16 [NASA ADS] [CrossRef] [Google Scholar]
 LópezCaniego, M., Herranz, D., GonzálezNuevo, J., et al. 2006, MNRAS, 370, 2047 [NASA ADS] [CrossRef] [Google Scholar]
 MartínezGonzález, E., Diego, J. M., Vielva, P., & Silk, J. 2003, MNRAS, 345, 1101 [NASA ADS] [CrossRef] [Google Scholar]
 Nolta, M. R., Dunkley, J., Hill, R. S., et al. 2009, ApJS, 180, 296 [NASA ADS] [CrossRef] [Google Scholar]
 Paykari, P., & Starck, J.L. S. 2012, Cosmic Microwave Background Data Analysis, eds. M. J. Way, J. D. Scargle, K. M. Ali, & A. N. Srivastava, 55 [Google Scholar]
 Perotto, L., Bobin, J., Plaszczynski, S., Starck, J., & Lavabre, A. 2010, A&A, 519, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Planck Collaboration VII, 2011, A&A, 536, A7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Planck Collaboration XIV, 2011, A&A, 536, A14 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Planck Collaboration XII, 2014, A&A, in press, DOI: 10.1051/00046361/201321580 [Google Scholar]
 Planck Collaboration XXVIII, 2014, A&A, in press, DOI: 10.1051/00046361/201321524 [Google Scholar]
 Ramos, E. P. R. G., Vio, R., & Andreani, P. 2011, A&A, 528, A75 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Savage, R. S., & Oliver, S. 2007, ApJ, 661, 1339 [NASA ADS] [CrossRef] [Google Scholar]
 Scodeller, S., & Hansen, F. K. 2012, ApJ, 761, 119 [NASA ADS] [CrossRef] [Google Scholar]
 Scodeller, S., Hansen, F. K., & Marinucci, D. 2012, ApJ, 753, 27 [NASA ADS] [CrossRef] [Google Scholar]
 Starck, J.L., Elad, M., & Donoho, D. 2004, Advances in Imaging and Electron Physics (Elsevier), 132, 287 [CrossRef] [Google Scholar]
 Starck, J.L., Elad, M., & Donoho, D. 2005, IEEE Transactions on Image Processing, 14, 1570 [NASA ADS] [CrossRef] [MathSciNet] [PubMed] [Google Scholar]
 Starck, J.L., Moudden, Y., Abrial, P., & Nguyen, M. 2006, A&A, 446, 1191 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Starck, J.L., Donoho, D. L., Fadili, M. J., & Rassat, A. 2013a, A&A, 552, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Starck, J.L., Fadili, M. J., & Rassat, A. 2013b, A&A, 550, A15 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Tegmark, M., & de OliveiraCosta, A. 1998, ApJ, 500, L83 [NASA ADS] [CrossRef] [Google Scholar]
 Vielva, P., MartínezGonzález, E., Cayón, L., et al. 2001, MNRAS, 326, 181 [NASA ADS] [CrossRef] [Google Scholar]
 Vielva, P., MartínezGonzález, E., Gallegos, J. E., Toffolatti, L., & Sanz, J. L. 2003, MNRAS, 344, 89 [NASA ADS] [CrossRef] [Google Scholar]
 Vio, R., Andreani, P., Ramos, E. P. R. G., & da Silva, A. 2013, A&A, 556, A96 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Wright, E. L., Chen, X., Odegard, N., et al. 2009, ApJS, 180, 283 [NASA ADS] [CrossRef] [Google Scholar]
 Zacchei, A., Maino, D., Baccigalupi, C., et al. 2011, A&A, 536, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
All Tables
Statistics on the recovered flux for each channel for the three approaches and for various flux bands (in mJy).
Statistics on the recovered flux for each channel using the multichannel internal catalogue for the three approaches and for various flux bands (in mJy).
List of products made available in this paper in the spirit of reproducible research, available at http://www.cosmostat.org/Products.
All Figures
Fig. 1 Patch of the simulated sky centred on two detected point sources at the WMAP wavelengths. 

Open with DEXTER  
In the text 
Fig. 2 Evolution with the number of iterations of the normalized ℓ_{2} norm of the residuals (left) and normalized ℓ_{1} norm of the diffuse and compact component for channel Ka and for various values for the modulus of β = γ. Reaching the upper limit of the datafidelity constraint leads to a normalized residual ℓ_{2} norm of 1. The ℓ_{1} norm was normalized to the ℓ_{1} norm of the input data in direct space (sum of the absolute values of the pixels in the frequency map). M corresponds to the highest modulus of the spherical harmonic coefficients in the input image. 

Open with DEXTER  
In the text 
Fig. 3 Flux recovery with the proposed approach as a function of the number of iterations. After 13 350 (1350) iterations, the flux estimates have converged for channel K (W). Note that the fluxes are displayed in logarithmic scales to visualize the convergence speed for both low and highflux sources. 

Open with DEXTER  
In the text 
Fig. 4 Statistics on error flux using the various approaches for the five channels for five different flux bands (flux <500 mJy, 500 mJy ≤ flux <1 Jy, 1 mJy ≤ flux <2 Jy, 2 Jy ≤ flux <5 Jy, flux ≥5 Jy). Quartiles and extreme values are plotted. 

Open with DEXTER  
In the text 
Fig. 5 Pointsource residuals with noise for the various approaches in a projected region of channel Q, illustrating the lower bias obtained for this channel with the proposed approach. These maps were obtained by subtracting the diffuse component from the pointsourcesubtracted map. In the first two maps, three negative regions can be visually detected because of the overfitting of the pointsource flux in the FITC and FITL approaches, which is different from the SPSR residual map. This does not display biases that are significantly higher than the noise level. 

Open with DEXTER  
In the text 
Fig. 6 Projected regions with large discrepancy between the proposed approach and the local loworder polynomial background modelling. Images correspond to the simulated data, the diffuse component with noise, and the pointsourcesubtracted data with the three approaches. SPSR in this case allows a better subtraction of the pointsource contribution because the local variations in the background are not captured by the other two approaches. 

Open with DEXTER  
In the text 
Fig. 7 Statistics on error flux computed from internal catalogue and using the various approaches for the five channels for five different flux bands (flux <500 mJy, 500 mJy ≤ flux <1 Jy, 1 mJy ≤ flux <2 Jy, 2 Jy ≤ flux <5 Jy, flux ≥ 5 Jy). Quartiles and extreme values are plotted. 

Open with DEXTER  
In the text 
Fig. 8 Projected region in channel V of the WMAP 9year data showing the largest discrepancy between SPSR and FITC or FITL estimated fluxes for two fitted sources. Images correspond to the WMAP data, the pointsourcesubtracted data with FITC, FITL, and SPSR approaches, and pointsourceestimated data using SPSR. The differences here between the approaches are a fraction of noise and the diffuse component levels. 

Open with DEXTER  
In the text 
Fig. 9 Projected region in WMAP 9year data showing the large discrepancy in between SPSR and FITC or FITL estimated fluxes for two fitted sources. Images correspond to the WMAP data, the pointsourcesubtracted data with FITC, FITL, and SPSR approaches. The positive ring around two the fitted regions as well as a negative peak inside these regions seem to indicate that FITC or FITL are here overfitting. The same phenomenon is absent for SPSR. 

Open with DEXTER  
In the text 