Weak lensing mass reconstruction using sparsity and a Gaussian random field

We introduce a novel approach to reconstruct dark matter mass maps from weak gravitational lensing measurements. The cornerstone of the proposed method lies in a new modelling of the matter density field in the Universe as a mixture of two components:(1) a sparsity-based component that captures the non-Gaussian structure of the field, such as peaks or halos at different spatial scales; and (2) a Gaussian random field, which is known to well represent the linear characteristics of the field.Methods. We propose an algorithm called MCALens which jointly estimates these two components. MCAlens is based on an alternating minimization incorporating both sparse recovery and a proximal iterative Wiener filtering. Experimental results on simulated data show that the proposed method exhibits improved estimation accuracy compared to state-of-the-art mass map reconstruction methods.


Introduction
In recent years, interest has increased in exploring the two most dominant components of the universe: dark matter and dark energy. To this end, large-scale imaging and spectroscopic surveys are currently under development, such as the Euclid mission (Laureijs et al. 2011), the Rubin Observatory Legacy Survey of Space and Time (Abell et al. 2009), and the Roman Space Telescope (Spergel et al. 2015), which will map the sky with unprecedented accuracy. A prominent cosmological probe for these surveys is weak gravitational lensing.
Weak gravitational lensing measures correlations in the small distortions of distant galaxies caused by the gravitational potential of massive structures along the line of sight. Its effect on distant source galaxies is twofold: the galaxy shapes are magnified by a convergence field, κ, while the galaxy ellipticities are perturbed from their underlying intrinsic value by a shear field, γ. To contribute constraints on cosmological parameters and models, two-point correlation functions of the shear have been used with considerable success (Kilbinger et al. 2013;Alsing et al. 2016a). This type of correlation is sufficient to statistically describe the Gaussian structures of the lensing field, such as those expected to be present either in the early universe or at large scales that are less affected by gravitational collapse. To capture non-Gaussian structures such as those expected in smaller scales at later time, higher order moments of the shear need to be employed (Munshi & Coles 2017).
On the other hand, the convergence density field is not observed directly as a result of the mass-sheet degeneracy (Bartelmann & Schneider 2001;Kilbinger 2015). Physically, the convergence field reveals the projected total matter density along the line of sight, weighted by a lensing kernel in the mid-distance between the observer and the galaxy sources. The density field is inhomogeneous, encompassing Gaussiantype large-scale structures, as well as non-Gaussian features, such as peaks. To shed light on the way in which the convergence density field constrains cosmology, peak statistics (e.g. Jain & Van Waerbeke 2000;Marian et al. 2011;Lin & Kilbinger 2015;Liu & Haiman 2016;Peel et al. 2017a;Fluri et al. 2018;Li et al. 2019;Ajani et al. 2020) and higher order correlation functions and moments, such as the Minkowksi functionals (e.g. Kratochvil et al. 2012;Shirasaki et al. 2012;Petri et al. 2013), have been applied directly on mass maps. It is therefore essential for mass-mapping methods to preserve both Gaussian and non-Gaussian features during the reconstruction process.
Mass-mapping methods solve an ill-posed problem because the sampling of the lensing field is irregular and the signalto-noise ratio on small scales is low. A widely used algorithm to perform mass mapping is the Kaiser-Squires method (Kaiser & Squires 1993), which is expressed as a simple linear operator in Fourier space. However, it is inevitable that this estimator returns poor results because it does not take particular care of the noise or missing data. A different approach, motivated also by the Bayesian framework, is that of Wiener filtering (Wiener 1949). In this approach, a Gaussian random field is assumed as a prior for the convergence map, which is responsible for inserting some bias that prevents our estimate from over-fitting (Zaroubi et al. 1995). Moreover, a recently proposed customised method is the algorithm called gravitational lensing inversion and mapping using sparse estimators (GLIMPSE2D; Lanusse et al. 2016). GLIMPSE2D is a highly sophisticated algorithm that takes advantage of the sparse regularisation framework to solve the ill-posed linear inverse problem. GLIMPSE2D is based on sparse representations (i.e. wavelets) and is therefore well designed to recover piece-wise smooth features. An analytical comparison of these three estimators is provided in Jeffrey et al. (2018a).
In this paper we propose to bridge the gap between the sparse regularisation method of GLIMPSE2D and the Wienerfiltering method by modelling the matter density field in the universe using both linear and non-linear characteristics. Specifically, we assume that the density field is modelled as a mixture of two terms: (1) a non-Gaussian term that adopts a sparse representation in a selected wavelet basis (Starck et al. 2015), and (2) a Gaussian term that is modelled using a Gaussian random field (Elsner & Wandelt 2013;Horowitz et al. 2019). The non-Gaussian signal component is able to capture the non-linear characteristics of the convergence field, such as peaks, while the Gaussian component of the signal is responsible for capturing the lower-frequency characteristics of the underlying field, such as smooth variations. To our knowledge, this is the first time that this mixture modelling is proposed for mass-map reconstruction.
This paper is structured as follows. In Sect. 2 we introduce the formalism of weak gravitational lensing and describe the mass-map reconstruction problem. To this end, we provide a brief overview of the current algorithms of Kaiser-Squires, Wiener filtering, and GLIMPSE2D. We then present our proposed mass-mapping method in Sect. 3. The method is novel in the sense that it exploits both sparsity in the wavelet domain and a Gaussian random field model. Section 4 illustrates the enhanced estimation performance of the proposed method by providing experiments conducted on both simulated and real data.
Notation: We use (·) * to denote the Hermitian transpose of a matrix or the adjoint operator of a transform. With · 1 and · 2 we denote the 1 and 2 norm, respectively, ( . The determinant of a matrix or the absolute value of a scalar is denoted by |·|, while diag(x) stands for a diagonal matrix that contains the elements of vector x on its diagonal. Finally, R N is the N-dimensional Euclidean space, 0 denotes the zero vector, 1 the all-ones vector, and I K is the K × K identity matrix.

Weak-lensing mass mapping
Gravitational lensing describes the phenomenon where the light emitted from distant galaxies is deflected as it passes through a foreground mass distribution. The lensing effect distorts the images of distant galaxies; the distortion is proportional to the size and shape of the projected matter distribution along the line of sight. Specifically, the mapping between the source coordinates, β, and the lensed image coordinates, θ, is given by the lens equation (e.g. Kilbinger 2015), where ψ(·) defines the lensing potential that conceals the deflection of light rays by the gravitational lens. Under the Born approximation, which assumes that the lensing potential is weak enough, we may linearise the coordinate transformation of Eq. (1) by using the Jacobian A = ∂β/∂θ as where A i j = ∂β i /∂θ j are the coefficients of the amplification matrix A, and we assume the Einstein summation convention.
The symmetrical matrix A can also be parametrised in terms of the convergence, κ, and the shear, γ, as The convergence can then be defined as a dimensionless quantity that relates to the lensing potential through the Poisson equation, while the shear is mathematically expressed as a complex field whose components also relate to ψ, Equation (3) shows that the convergence causes an isotropic change in the size of the source image because it appears in the diagonal of A. In comparison, the shear causes anisotropic changes of the image shapes. The convergence κ can also be interpreted via Eq. (4) as a weighted projection of the mass density field between the observation and the source. Factoring out the term (1 − κ) in Eq.
(3) leaves the amplification matrix dependent on the reduced shear, which is directly measured in lensing surveys, and it is defined as g = γ/(1 − κ). In the weak-lensing limit, where γ, κ 1, the reduced shear is approximately equal to the true shear, that is, g γ.
In this paper we are interested in recovering the convergence κ from the reduced shear data. This is an ill-posed inverse problem because of the finite sampling of the reduced shear over a limited area of the survey and the presence of shape noise in the measurements. In the following we review some of the customised weak-lensing mass reconstructing algorithms, namely the Kaiser-Squires, the Wiener filtering, and the GLIMPSE2D methods.
Kaiser-Squires. A theoretical framework for reconstructing convergence maps from the observable weak-lensing shear in the Fourier domain was proposed in Kaiser & Squires (1993). As we showed in Eqs. (4) and (5), the convergence and shear are both expressed as second-order derivatives of the lensing potential. Their interrelation through the lensing potential ψ is expressed by a two-dimensional convolution (Kaiser & Squires 1993), where D(θ) = −1/(θ 1 − iθ 2 ) 2 . This convolution is equivalently expressed in Fourier space as the element-wise multiplication, where k is the wave vector, and the Fourier transform of the kernel D(θ) is given bỹ with k 1 and k 2 being the two frequency components of k. Discretising Eq. (7) and adopting matrix notation, we may consider A99, page 2 of 13 that the observed shear γ is generated as a linear combination of the convolution matrix A and the unknown underlying field κ, where n is the statistical uncertainty vector associated with the data. Based on Eq. (8), A can be decomposed in Fourier space as A = FPF * , where F denotes the discrete Fourier transform, F * is its adjoint, and P is the diagonal operator that defines the convergence field-shear relation in Fourier space, where k 2 = k 2 1 + k 2 2 andκ = Fκ. Equation (11) corresponds to a discretised version of Eq. (8).
As it stands, the Kaiser-Squires inversion of Eq. (11) has several drawbacks. First, it is not defined for k = 0, which stems from the mass-sheet degeneracy (i.e. the mean value of the convergence field cannot be retrieved). Next, it is ill-posed because typically, the shear field is a discrete under-sampling of the underlying convergence field. Neither does it take masked data into account. Nonetheless, the Kaiser-Squires estimate is still used in practice because it is easy to apply.
Wiener filtering. The Wiener filter was introduced in the 1940s (Wiener 1949), and it is the optimal linear filter in the minimum mean-square sense that provides a denoised version of the desired signal. The Wiener-filtered estimate of the convergence map can be expressed using the linear equation where the matrix W is given by and Σ κ and Σ n are the pre-defined covariance matrix of the convergence and the noise, respectively. When the noise is not stationary, the Wiener-filter solution is not straightforward, and an iterative approach is required. This is discussed in the next section. From a Bayesian perspective, Wiener filtering is equivalent to the maximum a posteriori (MAP) estimator given that the convergence is a zero-mean Gaussian signal with covariance Σ κ . When we assume that shear measurements are distorted by uncorrelated Gaussian noise, the likelihood function shares the noise properties, that is, while the distribution of the Gaussian random field can be written as Given Eqs. (14) and (15) above, the MAP estimator is expressed as which leads to the minimisation It is easy to see that the minimum is attained at κ G = Wγ, where W is the Wiener filter given in Eq. (12). The implementation of the Wiener filter requires the inversion of the matrix W that includes both a signal covariance matrix component and the noise covariance matrix component.
The unknown κ G is assumed to be a Gaussian random field with a covariance matrix diagonal in Fourier space, Σ κ = F * C κ F, where C κ is a diagonal matrix with diagonal values equal to the theoretical power spectrum P κ . When the noise is stationary, its covariance matrix is also diagonal with diagonal elements equal to the noise power spectrum P n . In this case, the filter solution is obtained in Fourier space byκ G =Wγ, where the Wiener filter isW = P κ P κ +P n . In practice, the noise is generally not stationary and depends on the number of shear measurements in the area related to a given pixel of the shear field. Therefore the noise covariance matrix is diagonal in pixel space, not in Fourier space, and the Wiener-filter solution becomes harder to derive, requiring either making an incorrect assumption (i.e. that the noise is stationary) or inverting a very large matrix. This renders its inversion computationally complex and prone to numerical errors. To circumvent this computationally intensive operation, a forward-backward (FB) proximal iterative Wiener filtering was proposed in Bobin et al. (2012) for the denoising of cosmic microwave background spherical maps, exploiting the property that the signal and noise covariance matrices are diagonal in pixel and Fourier space, respectively. Equation (17) comprises two separable terms, Following the FB method (Starck et al. 2015), we can solve Eq. (18) by designing an iterative fixed point algorithm as which is known to converge when µ < 2/ A * Σ n −1 A 2 , Combettes & Wajs (2005).
Computing the proximal operator in Eq. (19), we have the following iterative Wiener-filtering algorithm: -Forward step: -Backward step: where t is an auxiliary variable, µ = min(Σ n ), P η = 2µI and κ 0 = 0. This algorithm is free from matrix inversions because both Σ n used in Eq. (20) and P κ used in Eq. (21) are diagonal matrices. A similar algorithm, exploiting transformations between pixel and harmonic space, was proposed by Elsner & Wandelt (2013) using the messenger field framework. These methods can also be adapted to efficiently sample from the posterior distribution of the unknown mass map (Alsing et al. 2017;Jeffrey et al. 2018b). The Wiener approach recovers the Gaussian component of the convergence field well, but it is far from optimal at extracting the non-Gaussian information from the data because peak-like structures are suppressed. This has motivated the development of sparse recovery methods based on wavelets.
Sparse recovery. It has been shown that sparse recovery using wavelets is a very efficient way to reconstruct convergence maps (Starck et al. 2006b;Leonard et al. 2012;Lanusse et al. 2016;Peel et al. 2017b;Price et al. 2020). The mass-mapping problem is addressed as a general ill-posed problem, which is solved through weighted 1 -norm regularisation in a waveletbased analysis sparsity model. Sparsity in the discrete cosine transform (DCT) domain was also proposed to fill the missing data area in the convergence map (Pires et al. 2009).
The GLIMPSE algorithm avoids any binning or smoothing of the input data that could potentially cause loss of information. The primary cost function is where T is the nonuniform discrete Fourier transform (NDFT) matrix, P is defined in Eq. (11), λ is a sparsity regularisation parameter, ω is a weighting vector, Φ * is the adjoint operator of the wavelet transform, and i R is an identity function that drives the imaginary part of the convergence to zero. This cost function is also generalised in GLIMPSE2D in order to replace the shear by the reduced shear, or to incorporate the flexion information.
It has been shown that GLIMPSE2D significantly outperforms Wiener filtering for peaks recovery, while Wiener filtering does better on the Gaussian map content (Jeffrey et al. 2018a).
Deep learning. Deep-learning techniques have recently been proposed and appear to be very promising (Jeffrey et al. 2020). The input of the neural network is not the shear field directly, but rather the Wiener-filter solution. We could imagine that the closer we are to the true solution with standard techniques such as Wiener filter or sparsity, the better deep learning can improve the solution. Open questions related to deep learning remain to be answered, such as its generalisation to cosmologies not present in the training data set, or the potential bias introduced by using a theoretical power spectrum in the Wiener-filter solution serving as input of the neural network.

New convergence map model
We showed two different models in Sect. 2: the first modelling the convergence map as a Gaussian random field, which recovers the large scales of the convergence map well but suppresses peak structures, and the second assuming the convergence map is compressible in the wavelet domain (i.e. sparse modelling). The sparse recovery is clearly complementary to the Wiener-filter solution because it recovers peaks extremely well, but recovers the Gaussian content poorly.
To address these limitations, it appears to be natural to introduce a novel modelling approach, where the convergence field κ is assumed to comprise two parts, a Gaussian and a non-Gaussian: The non-Gaussian part of the signal κ NG is subject to a sparse decomposition in a wavelet dictionary, while the component κ G is assumed to be inherently non-sparse and Gaussian. The morphological component analysis (MCA) has been proposed (Starck et al. 2004;Elad et al. 2005) to separate two components mixed in a single image when these components have different morphological properties. This appears to be impossible because we have two unknowns and one equation, but it was shown that it is sometimes possible to extract these two components if we can exploit their morphological differences. This requires different penalisation functions C G and C NG on each of these two components, and we need to minimise The MCA performs an alternating minimisation scheme: -Estimate κ G assuming κ NG is known: -Estimate κ NG assuming κ G is known: Gaussian component κ G . We used the standard Wienerfilter modelling where κ G is assumed to be a Gaussian random field, and the solution of Eq. (25) is obtained using the iterative Wiener filtering presented in the previous section.
Non-Gaussian component. There are different ways to use a sparse model in the MCA framework. The most obvious would be to use standard 1 or 0 -norm regularisation in a waveletbased sparsity model, as is done in the GLIMPSE2D algorithm. This would give where p = 0 or 1, Φ is the wavelet matrix, and λ is the regularisation parameter (Lagrange multiplier). After implementing this approach, we found that large wavelet scales and Fourier low frequencies are relatively close, leading to difficulties in separating the information. We therefore investigated another approach that first involves estimating the set Ω of active coefficients, that is, the scales and positions where wavelet coefficients are above a given threshold. These are typically between three and five times the noise standard deviation relative to each wavelet coefficient. Ω can therefore be seen as a mask in the wavelet domain, where Ω j,x = 1 if a wavelet coefficient detected at scale j and position x, that is, when | (Φ * A * γ) j,x |> λσ j,x , and 0 otherwise. The noise σ j,x at scale j and position x can be determined using noise realisations as in the GLIMPSE algorithm. An even faster approach is to detect the significant wavelet coefficients on Φ * A * Σ n − 1 2 γ instead of Φ * A * γ. The noise is therefore whitened because the noise factor A * Σ n − 1 2 is Gaussian with a standard deviation equal to unity and with a uniform power spectrum. We implemented both approaches and found similar results, although the second approach is easier.
When this wavelet mask Ω is estimated, we can estimate the non-Gaussian component κ NG by with C NG (κ NG ) = i R (κ NG ).
This changes the original formalism because the data fidelity term is now different, but it presents a very interesting advantage. When Ω is fixed, the algorithm is almost linear and only the positivity constraint remains. Therefore we can easily derive a good approximation of the error map just by propagating noise and relaxing this positivity constraint. This is further discussed below. Similarly to the GLIMPSE method, a positivity constraint is applied on the non-Gaussian component κ NG . Peaks in κ can be on top of voids, and therefore have negative pixel values. As peaks are captured by the non-Gaussian component, they are positive by construction in κ NG , but the convergence map κ = κ G + κ NG can still be negative at peak positions. The larger the non-Gaussianities, the more we can expect MCALens to improve on linear methods such as the Wiener filtering.
The prior signal auto-correlation of the Gaussian component is included within the signal covariance term of the Gaussian component. We encode no explicit prior auto-correlation for the non-Gaussian signal and no explicit prior cross-correlation between the Gaussian and non-Gaussian component. Clearly, these correlations exist, but including their contribution in the prior in this framework would be extremely difficult theoretically and in practice. However, these correlations will still appear in the final reconstruction, driven by the correlation information in the data.
We solved the recovery problem of Eq. (23) using a two-step optimisation procedure. First, a gradient descent step to minimise Eq. (29) to recover the non-Gaussian component κ NG . This was followed by an iteration of the iterative Wiener filtering to minimise Eq. (17). Details of the method are given in Algorithm 1.
The number of scales N s used to compute the wavelet transform of an N x × N y image are automatically derived by N s = int(log(min(N x , N y ))). The λ parameter is a detection level, which was fixed for all our experiments with real and simulated data to 5, that is, five times the noise standard deviation. This is a conservative threshold and gave excellent results.

Errors
Much attention has recently been given to estimate errors or uncertainties with mass-map products. For linear methods such as Wiener filtering or Kaiser-Squires, it is easy to estimate the standard deviation (or root-mean-square, RMS) per pixel, just by propagating noise realisations using the same reconstruction filters. The uncertainty per pixel does not, however, give the probability that a clump in a reconstructed image is true or only due to some noise fluctuations. Another approach, closer to certain science cases with maps, estimates the significance of clumps. Peel et al. (2017b) used Monte Carlo simulations to address the significance of clumps. Repetti et al. (2019) proposed a hypothesis test called BUQO to perform the same task, requiring the user to define manually a mask around the clump. Similarly, Price et al. (2019) performed hypothesis tests of Abell-520 cluster structures using highest posterior density regions. With MCALens, we can include aspects of both approaches.
RMS and signal-to-noise ratio maps. Algorithm 1 includes two steps that involve a non-linear operator, first to estimate Ω in line 4 and then in line 11 to perform the positivity constraint. To propagate noise realisations, Ω has to be set to the one obtained with the data, and this non-linearity step therefore does not occur, so that only the positivity remains. A full linear algorithm could therefore be obtained just by removing this positivity constraint during the noise propagation by replacing κ (n+1) NG = [S] + in Algorithm 1, line 11 by κ (n+1) NG = S. This causes more noise to enter the solution because few pixel values with negative values in κ NG are not thresholded, and the derived RMS map is therefore slightly conservative. We can therefore build noise realisations, run the MCALens algorithm on each realisation, and derive the RMS map by taking pixel per pixel the standard deviation of the obtained reconstructed maps. The signal-to-noise ratio (S/N) map is derived by dividing the absolute value of the reconstructed map by the RMS map. unit variance. This corresponds to performing a hypothesis test H 0 that the wavelet coefficient is due to noise alone, and if a given wavelet coefficient α j,x is such that | α j,x |> λ, then the H 0 hypothesis is rejected and we assume that the coefficient amplitude cannot be explained by noise fluctuations and is therefore due to signal. λ is therefore directly related to the significance of the wavelet coefficients, and the mask Ω indicates which coefficients are detected with a given significance level. Ω j,x is binary, and we build the significance map s by s x = j Ω j,x , that is, a simple co-adding of all binary scales. Such a map could also be used as a way to automatically derive the user mask required in the BUQO method (Repetti et al. 2019). We present in Sect. 4 examples of RMS, S/N, and significance maps.

Extension to the sphere
When a wide-field map needs to be reconstructed, the flat approximation can no longer be used, and we have to build a map on the sphere. A traditional approach is to decompose the sphere into overlapping patches, assume a flat approximation on each individual patch, reconstruct each patch independently, and finally, recombine all patches on the sphere. This solution is certainly good enough to recover clumps relative to clusters, that is, the non-Gaussian component, but certainly not good enough for the Gaussian component, which contains information at low frequencies. In the framework of the DES project, a 1500 deg 2 map has been reconstructed (Chang et al. 2018) using HEALPIX pixelisation (Górski et al. 2005) and a straightforward spherical Kaiser-Squires algorithm consisting of 1. calculating a spin transform of the HEALPIX shear map to obtein the E and B spherical harmonic coefficients, 2. smoothing by a Gaussian the E and B modes in the spherical harmonic domain, 3. applying an inverse spherical transform independently to each of these two modes to obtain the two convergence maps κ E and κ B .
Sparsity in a Bayesian framework (Price et al. 2021) and forward-fitting in harmonic space (Mawdsley et al. 2020) have also recently been proposed for spherical mass mapping. Using similarly a HEALPIX shear and convergence pixelisation, we can also easily derive an extension of the iterative-Wiener filtering and the MCALens algorithm by -replacing the matrix A = FPF * in Eq. (10) by A = 2 Y 0 Y * , where s Y and s Y * represent the forward and inverse spin-s spherical harmonic transforms, respectively, -replacing the wavelet decomposition Φ used in Eq. (28) by the spherical wavelet decomposition (Starck et al. 2006a).
Then the same MCAlens algorithm as in Algorithm 1 can be used to derive a spherical convergence map. An example is given in Sect. 4.3.

B-mode
We focused earlier on the convergence map (i.e. E-mode). The MCAlens algorithm given in Algorithm 1 remains valid when E and B mode are to be estimated by adopting the following notation: where the matrix Φ consists of applying a sparse decomposition independently on each mode. In practice, we used the same wavelet decomposition for both modes (i.e. Φ E = Φ B ). The delicate point is applying the Wiener filter to the B-mode at line 15 of the algorithm. In theory, P κ B = 0, and by construction, no Gaussian component can be recovered. Because the B-mode is mainly useful for investigating systematic errors, we are not interested in recovering the B-mode least-squares estimator (which is zero), and we find it more useful to process the B-mode similarly to the E-mode. We therefore advocate use P κ B = P κ E instead. In this way, the fluctuations in the E-mode can be properly compared to those in the B-mode.

Toy-model experiment
We used a simulation derived from RAMSES N-body cosmological simulations (Teyssier 2002), with a ΛCDM model 2 , with a pixel size of 0.34 × 0.34 and a galaxy redshift of 1. To obtain a realistic mask and noise behaviour, we used the MICE pixel noise covariance derived for the DES project (see Appendix A). As the pixel resolution is different, it leads to an optimistic realisation, but it has the advantage of illustrating the effect of our MCA model well. We ran MCAlens using 100 iterations, λ = 5 (i.e. detection at 5σ), the MICE covariance matrix, and the theoretical power spectrum was the true map power spectrum. To evaluate the results, we ran 100 different noise realisations, and we applied both Wiener filtering and MCAlens. We calculated the reconstruction error at different resolutions with where x = i x 2 i , κ is the reconstructed convergence map, κ t is the true convergence map, G σ (x) is the convolution of x with a Gaussian with standard deviation σ, and M is the binary mask with M k = 1 if the covariance matrix is not infinite at location k (i.e. we have data at this location) and 0 otherwise. Figure 1 shows the mean error Err % (σ) for the Wiener filter and MCAlens solutions. The black curve shows the difference between the Wiener-filter error and the MCAlens error. This allows us to better visualise that the improvement is larger at fine scales. It is interesting to note that MCAlens is better than Wiener filter even at large scales. When the MCAlens non-Gaussian component contains a significant number of features as in the case of this experiment, these features also contribute in a a non negligible way on larger scales, which explains why MCAlens remains better than Wiener filtering at large scales. MCAlens leads to a clear improvement in terms of quadratic error. The top panels of Fig. 2 show the simulated convergence map and the MCAlens reconstructed map. The bottom panels show the Gaussian and the non-Gaussian part recovered from the noisy data. The sum of these two components is equal to the MCAlens reconstruction (top right). This shows that the non-linear and linear components can be recovered well. Figure 3 shows the RMS map, the S/N map, and the significance map.

Columbia lensing simulation
We used a convergence map, released by the Columbia lensing group 3 (Liu et al. 2018), corresponding to a cosmological model with parameters M ν , Ω m , 10 9 A s , M ν , h, w = {0.1, 0.3, 2.1, 0, 0.7, −1}, and with a pixel size of 0.4 × 0.4 . We rebinned the map to 0.8 × 0.8 , and similarly to the previous experiment, we simulated noisy data using the same MICE covariance, applying a global rescaling in order to have realistic noise corresponding to a mean number of galaxies equal to 30 per arcmin 2 , as we expect in future space lensing surveys. To evaluate the results, we ran 100 different noise realisations, and we applied Kaiser-Squires, sparse recovery, Wiener filtering, and MCAlens. For the sparse recovery and MCAlens, we used λ = 5 (i.e. detection at 5σ), and for Wiener filtering and MCAlens, we the used the theoretical power spectrum as the true convergence map power spectrum. Figure 4 shows the error computed using Eq. (30). In addition to the well-recovered peaks, MCAlens also leads to a clear improvement in terms of quadratic error compared the other methods. In contrast to the RAMSES experiment, we see here a convergence at large scales between MCAlens and Wiener filter because the MCAlens non-Gaussian component contains only a few peaks (see Fig. 5, middle right), which therefore contribute only negligibly to the largest scales.
The top row of Fig. 5 shows the simulated convergence map, the Wiener-filter result, and the Kaiser-Squires reconstruction with smoothing applied. The middle row shows the MCAlens map and its Gaussian and non-Gaussian parts recovered from the noisy the data. The sum of these two components is equal to the MCAlens map. Similarly to previous experiments, the nonlinear and the linear components are well recovered. The bottom row of Fig. 5 shows the RMS map, the S/N map, and the significance map.

Spherical data
We created a full shear map from the full-sky MICE simulated map and its covariance matrix, which has about three galaxies per arcmin 2 . We ran the spherical MCAlens method with 200 A99, page 8 of 13 J.-L. Starck et al.: Weak-lensing mass reconstruction using sparsity and a Gaussian random field  iterations, λ = 5 (i.e. detection at 5σ), and we used the power spectrum of the simulation as the theoretical power spectrum. The top row of Fig. 6 shows the simulated noise-free map at 1 and 3 degrees, and the bottom row shows the MCAlens non-Gaussian and Gaussian components.

COSMOS field
In this last section, we apply MCAlens to reconstruct a convergence map of the 1.64 deg 2 HST/ACS COSMOS survey (Scoville et al. 2007). We used the bright galaxies shape catalogue produced in Schrabback et al. (2010).
The results after MCAlens was applied on COSMOS data are presented in Fig. 7. The top row shows the galaxy count map, the Wiener-filter map, and the Kaiser-Squires map smoothed with a Gaussian at a full width at half maximum of 2.4 arcmin. The bottom row shows the Glimpse, MCAlens E-mode, and MCAlens B-mode maps. White dots show the locations and redshifts of X-ray selected massive galaxy clusters from the XMM-Newton Wide Field Survey (Finoguenov et al. 2007) with 0.3 < z < 1.0.

Conclusion
A novel mass-mapping algorithm has been presented that is able to recover high-resolution convergence maps from weak gravitational lensing measurements. Our proposed process involves a model with two components, a Gaussian and a non-Gaussian, for which we developed an efficient algorithm to derive the solution. We showed that we can also handle a non-diagonal covariance matrix. We extended the method so that it can deal with spherical maps, which is needed for future surveys such as the Euclid space mission. Our experiments clearly show a significant improvement compared to currently used algorithms.
In the spirit of reproducible research, the MCAlens algorithm is publicly available in the CosmoStat's Github package 4 , including the material needed to reproduce the simulated experience (folder examples/mcalens_paper and script make_fig.py). Uncorrelated, complex shape noise values were randomly drawn from a Gaussian distribution and added to the shear value of each selected galaxy. This noise per galaxy is zero mean and has a variance 2σ 2 = 0.1636 (as estimated from data (Jeffrey et al. 2018a)). The final pixelised noise (n) has a variance that depends on the number of galaxies per pixel.
In our simulated data, we mimicked these conditions by choosing to remove all galaxies in given regions. Here no shear measurements are available, and the noise variance is effectively infinite. The top row of Fig. A.1 shows a simulated convergence map with the DES SV footprint and mask, and the bottom row shows two simulated observed shear component maps, γ 1 and γ 2 . Missing data are a common problem for galaxy surveys because foreground objects that obscure the background galaxies have to be masked out. In addition to the non-stationary noise, observed shear fields therefore also present missing data. By noting the mask M as equal to 1 if we have shear measurements at a pixel position and zero otherwise, missing data can be handled in Wiener filtering by forcing the noise covariance matrix to be very high at locations where M = 0. This causes the solution to be different from zero and smoothed in the missing data area, and it remove border-effect artefacts. The Wiener filtering can therefore be seen as an inpainting technique because it fills the missing area in the image. Alternative inpainting techniques were proposed in the past through sparse recovery techniques (Pires et al. 2009;Lanusse et al. 2016) or Gaussian constraint realisations (Zaroubi et al. 1995;Jeffrey et al. 2018a). Because the Wiener model assumed the solution to be a Gaussian random field, it would make sense to have a solution where the inpainted area presents the same statistical properties as in non-inpainted area. This property is by construction verified with constraint realisations, and it was shown that this is also the case with sparse inpainting (Pires et al. 2009). A similar sparse inpainting can be very easily included in the proximal Wiener filtering, minimising the following equation: where p = 0 or 1, Φ is the discrete cosine eictionary (DCT), and λ is a Lagrangian parameter. We obtain the forward-backward algorithm, called InpWiener: -Forward step: -Backward step: where and ∆ Φ,λ is the proximal operator defined in Pires et al. (2009), which consists of applying a DCT transform to u, thresholding the DCT coefficients, and reconstructing an image from the thresholded coefficients, and Areas for which we possess information are processed as in the usual Wiener-filter case, while the inpainting regularisation affects area with missing data (i.e. when M = 0). Concerning Eq. (B.1), it is interesting to note the following relations between InpWiener and other methods: -KS: if β = 0, λ = 0 and Σ n is diagonal with constant values along the diagonal (i.e. stationary Gaussian noise), InpWiener leads to the non-iterative standard Kaiser-Squires solution.
-GIKS: If β = 0, InpWiener leads to an inpainted generalised the Kaiser-Squires solution where the InpWiener forward is unchanged, and the backward step becomes κ n+1 = M t + (1 − M)∆ Φ,λ t. (B.5) Similarly to Sects. 3.5 and 3.5, these algorithms can handle data on the sphere and jointly reconstruct E and B modes.
Inpainted Wiener-filter experiment. To test InpWiener, we used the public MICE (v2) simulated galaxy catalogue presented in Appendix A. Figure B.1 shows the Wiener-filter solution (left) and the inpainted Wiener-filter solution (right) derived from the shear components shown in Fig. A.1.

B.2. Agnostic Wiener filtering
The Wiener-filter method needs to know the theoretical power spectrum P κ , and the solution therefore varies with the assumed cosmological model used to derive P κ . To avoid this issue, a solution could be to estimate P κ directly from the shear measurements, for instance using a mask correction as in Upham et al. (2020). Bayesian techniques have also been used to infer the map and power spectrum (Wandelt et al. 2004;Jasche & Lavaux 2014;Alsing et al. 2016b). An alternative approach is to used the GIKS inpainting algorithm to first fill the missing area and then compute the power spectrum of the inpainted map. When GIKS is applied to the data and a set of R noise realisations, the final estimator is P κ = powspec(κ Data ) − 1 R i powspec(κ Rea i ). (B.6) Because the data noise P κ will still be noisy, a final denoising step or a function fitting can be done.
As an illustration, we fitted the function f (k, a, u, e, c) = exp(| a * (uk) −e |) + c to the estimated noisy P κ . Figure B.2 shows an example of an estimated power spectrum following this approach.
Experiment: Effect of an unknown theoretical power spectrum. In this experiment, we used the same public MICE simulations, and we extracted 18 shear different shear maps with different noise realisations. For each of them, we applied the forward-backward Wiener-filter algorithm using the true theoretical power spectrum, and we applied the inpainted agnostic Wiener-filter method on the same data, re-estimating for each of the 18 shear data sets the theoretical power spectrum. Figure B.3 shows the reconstruction errors at different resolutions for Kaiser-Squires, Wiener filter, and inpainted agnostic Wiener filter. The inpainting clearly has no effect on the reconstruction error, which is expected because only the area where the mask is equal to one is used in the error calculation, and also because the agnostic approach has very little effect on the final solution. We do not claim that we should use an agnostic approach when a Wiener filtering is applied, but it is interesting to have this option available, for example when Wiener filtering is used without any assumption about the cosmology, and to give us the possibility to verify whether cosmological priors affect the results.  Theoretical power spectrum of one convergence map. In black the true theoretical power spectrum; in red, the spectrum estimated using the GIKS algorithm, corrected from noise power spectrum, and in blue the fit.