Starlet l1-norm for weak lensing cosmology

We present a new summary statistic for weak lensing observables, higher than second order, suitable for extracting non-Gaussian cosmological information and inferring cosmological parameters. We name this statistic the 'starlet $\ell_1$-norm' as it is computed via the sum of the absolute values of the starlet (wavelet) decomposition coefficients of a weak lensing map. In comparison to the state-of-the-art higher-order statistics -- weak lensing peak counts and minimum counts, or the combination of the two -- the $\ell_1$-norm provides a fast multi-scale calculation of the full void and peak distribution, avoiding the problem of defining what a peak is and what a void is: The $\ell_1$-norm carries the information encoded in all pixels of the map, not just the ones in local maxima and minima. We show its potential by applying it to the weak lensing convergence maps provided by the MassiveNus simulations to get constraints on the sum of neutrino masses, the matter density parameter, and the amplitude of the primordial power spectrum. We find that, in an ideal setting without further systematics, the starlet $\ell_1$-norm remarkably outperforms commonly used summary statistics, such as the power spectrum or the combination of peak and void counts, in terms of constraining power, representing a promising new unified framework to simultaneously account for the information encoded in peak counts and voids. We find that the starlet $\ell_1$-norm outperforms the power spectrum by $72\%$ on M$_{\nu}$, $60\%$ on $\Omega_{\rm m}$, and $75\%$ on $A_{\rm s}$ for the Euclid-like setting considered; it also improves upon the state-of-the-art combination of peaks and voids for a single smoothing scale by $24\%$ on M$_{\nu}$, $50\%$ on $\Omega_{\rm m}$, and $24\%$ on $A_{\rm s}$.


Introduction
Weak gravitational lensing by the large-scale structure represents a powerful tool for estimating cosmological parameters. Past, present, and future cosmological surveys, such as the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS) (Heymans et al. 2012), the Kilo-Degree Survey (KiDS) (Heymans et al. 2020), the Dark Energy Survey (DES) (Abbott et al. 2019;To et al. 2020), Hyper SuprimeCam (HSC) (Mandelbaum & Hyper Suprime-Cam (HSC) Collaboration 2017), Euclid (Laureijs et al. 2011), and the Rubin Observatory (LSST Science Collaboration et al. 2009), consider it as one of the main physical probes for investigating unsolved questions in current cosmology, such as: what the properties of the dark components of the universe are, what the origin of its accelerated expansion is (Riess et al. 1998;Perlmutter et al. 1999), and what the sum of neutrino masses is (Lesgourgues & Pastor 2012). It is very well known that, in the context of weak lensing, secondorder statistics as the two-point correlation function or its Fourier transform (the power spectrum) do not capture the non-Gaussian information encoded in the non-linear features of weak lensing data (Weinberg et al. 2013). This has motivated the introduction of several higher-order statistics, such as Minkowski functionals (Kratochvil et al. 2012;Petri et al. 2015;Vicinanza et al. 2019;Marques et al. 2019;Parroni et al. 2020), higher-order moments Vicinanza et al. 2018;Peel et al. 2018;Chang et al. 2018;Gatti et al. 2020), the bispectrum (Takada & Jain 2004;Chan & Blot 2017;Coulton et al. 2019), peak counts (Kruse & Schneider 1999;Kratochvil et al. 2010;Dietrich & Hartlap 2010;Maturi et al. 2011;Pires et al. 2012;Hamana et al. 2012;Hilbert et al. 2012;Marian et al. 2012Marian et al. , 2013Martinet et al. 2015;Lin & Kilbinger 2015;Giocoli et al. 2018;Martinet et al. 2018;Peel et al. 2018;Li et al. 2019), the scattering transform (Cheng et al. 2020), wavelet phase harmonic statistics (Allys et al. 2020), and machine learning-based methods (Fluri et al. 2018a;Peel et al. 2019;Gupta et al. 2018;Rasmussen & Williams 2005;Ribli et al. 2019;Shirasaki et al. 2019), to account for non-Gaussian information in cosmological analysis. Focusing on peak counts, it has been shown that this statistic is particularly powerful in breaking degeneracy between the standard model and fifth forces in the dark sector (Peel et al. 2018) as well as in constraining cosmological parameters when employed in a multi-scale setting Lin et al. 2016;Fluri et al. 2018b;Ajani et al. 2020;Zürcher et al. 2020). In particular, Ajani et al. (2020) have shown that multi-scale peak counts significantly outperform the weak lensing power spectrum, improving the constraints on the sum of neutrino masses m ν ≡ M ν by 63% when using a starlet filter; multi-scale peak counts were also shown to be so constraining that the addition of the power spectrum does not further improve constraints. A very interesting feature of multi-scale peaks, when they are obtained using the starlet transform (Starck et al. 2007), is the behaviour of the covariance matrix that tends to encode all information in its diagonal elements.
Another weak lensing probe of large-scale structure is represented by cosmic voids, namely under-dense regions of the large-scale matter field (Colberg et al. 2008;Pisani et al. 2019). Local minima of weak lensing convergence maps, namely pixels with values smaller than their eight neighbouring pixels, have been proposed as tracers of the matter distribution voids to infer cosmological parameters, both in a mono-scale setting (Coulton et al. 2020;Martinet et al. 2020;Davies et al. 2020) and in a Article number, page 1 of 8 arXiv:2101.01542v1 [astro-ph.CO] 5 Jan 2021 A&A proofs: manuscript no. 39988 multi-scale setting (Zürcher et al. 2020). More specifically, Coulton et al. (2020) found that lensing minima alone are slightly less constraining than the peaks alone and, in agreement with Martinet et al. (2020) and Zürcher et al. (2020), that the combination of the two statistics produces significantly tighter constraints than the power spectrum.
In this paper, we propose, for the first time, using the 1norm of wavelet coefficients of weak lensing convergence maps. We show that it provides a unified framework for a joint multiscale peak and void analysis and that it takes into account the information encoded in all pixels of the map.
This letter is structured as follows: In Sect. 2, we define the 1 -norm. In Sect. 3, we specify the details for the analysis, defining the summary statistics employed and the corresponding choice of filters and binning. We show how we compute the covariance matrix, show how we build the likelihood, show the posterior distributions, and define the estimators used to quantify our results. In Sect. 4, we describe our results, and we draw our conclusions in Sect. 5. We dedicate Appendix A to background definitions in weak lensing and to the details of the simulations. In Appendix B, we discuss features of covariance matrices.

Starlet peaks
Multi-scale peak counts can be derived using either a set of Gaussian kernels of different sizes or a wavelet decomposition, such as the starlet transform (Ajani et al. 2020). The starlet transform decomposes a convergence map κ of size N × N into a set W = {w 1 , ..., w j max , c J } of J = j max + 1 bands of the same size, where j max is the number of wavelet scales considered, c J is the coarse scale, namely a very smoothed version of the original image κ, and w j are the wavelet bands at scale 2 j pixels. A complete description and derivation of the starlet transform algorithm can be found in Starck et al. (2007Starck et al. ( , 2016. It was also shown in Leonard et al. (2012) that the starlet transform can be interpreted as a fast multi-scale aperture mass decomposition where the aperture mass kernels have a compact support and are compensated functions (namely, the kernel integral is null). Starlet peaks are then derived by considering n bins with bin edges given by the minimum and maximum values of each band in W. An interesting advantage of such an approach is that each wavelet band covers a different frequency range, which leads to an almost diagonal peak count covariance matrix (Ajani et al. 2020). This is not the case when a standard multi-scale Gaussian analysis is applied on the convergence map.

Starlet extrema
As mentioned in the introduction, cosmic void analysis is an alternative to peak analysis for studying convergence maps, and the combination of the two improves the constraints on cosmological parameters. It is interesting to notice how a starlet decomposition can naturally include a multi-scale void analysis. Instead of extracting maxima (peaks) in each band, we can also extract minima (pixels with values smaller than their eight neighbours), and a joint peak-void multi-scale is therefore obtained by extracting wavelet coefficient extrema (minima+maxima). The starlet decomposition therefore provides a very natural framework for a joint multi-scale peak and void analysis.

Starlet 1 -norm
A particularity of peak and void statistics is that only a few pixels are considered, while other high-order statistics, such as bispectrum or Minkowski functionals, use all the pixels. In a starlet framework, we should emphasise that starlet peaks have mainly positive values and starlet voids negative values due to the property of the wavelet function. So instead of counting the number of peaks or voids in a given bin i defined by two values, B i and B i+1 , we could take the sum of all wavelet coefficients with an amplitude between B i and B i+1 . If B i and B i+1 are positive, this corresponds to the definition of the set of coefficients S j,i at scale j and in bin i such that This can be generalised to positive and negative bins using: where ||.|| 1 is the standard 1 -norm (i.e. ||x|| 1 = k |x k |) and the index u runs from 1 to the number of pixels in a given bin i at scale j (i.e. #coe f (S j,i )). The quantity l j,i 1 defined in Equation (1) is nothing more than the 1 -norm of the binned pixel values of the starlet coefficients of the original image κ map. In the following, we will name S 1 , the starlet 1 -norm, as the set S 1 of all l j,i 1 numbers obtained from the different scales j and bins i.
This approach enables us to extract the information encoded in the absolute value of all pixels in the map instead of characterising it only by selecting local minima or maxima. An interesting advantage is that it avoids the open issue of how to define a void (Colberg et al. 2008). It is interesting to notice that this S 1 statistic is also directly related to the density probability function of the starlet coefficients at different scales.

Summary statistics
We provided constraints on the sum of neutrino masses M ν , on the matter density parameter Ω m , and on the power spectrum amplitude A s by employing five different statistics applied to the MassiveNus 1 simulations (Liu et al. 2018). Three of them are statistics that are used in weak lensing studies: the power spectrum, combined mono-scale peak and void counts, and multiscale peak count (using the starlet transform). The two others are the starlet extrema and the starlet 1 -norm S 1 that we introduced in the previous section.
We computed the summary statistics on maps of the signalto-noise field, where we define the signal to noise as the ratio between the noisy convergence κ convolved with the filter W(θ ker ) over the smoothed standard deviation of the noise for each realisation per redshift: where W(θ ker ) can be a single-Gaussian filter or a starlet filter. Concerning σ filt n , its definition depends on the employed filter. For a Gaussian kernel, it is given by the standard deviation of the smoothed noise maps, while for the starlet case we need to estimate the noise at each wavelet scale for each image per redshift. To estimate the noise level at each starlet scale, we followed Starck & Murtagh (1998) and used the fact that the standard deviation of the noise at the scale j is given by σ j = σ e j σ I , where σ I is the standard deviation of the noise of the image and σ e j are the coefficients obtained by taking the standard deviation of the starlet transform of a Gaussian distribution with standard deviation one at each scale j. For the purposes of this study, we mimicked the shape noise expected for a survey such as Euclid 2 (Laureijs et al. 2011;Euclid Collaboration et al. 2020) as defined in Appendix A. We performed a tomographic analysis using four source redshifts, z s = {0.5, 1.0, 1.5, 2.0}, with corresponding values for the galaxy number density n gal per source redshift bin n gal = {11.02, 11.90, 5.45, 1.45}.
A: We computed the power spectrum as defined in Equation (A.8) on noisy convergence maps filtered with a Gaussian kernel with smoothing size θ ker = 1 arcmin. We considered angular scales with 24 logarithmically spaced bins in the range = [300, 5000]. B: Regarding mono-scale peaks and voids, we computed peaks (as pixels with values larger than their eight neighbours) and voids (as pixels with values smaller than their eight neighbours) on noisy convergence maps. The maps were filtered with a single-Gaussian kernel with smoothing size θ ker = 2 arcmin, with 29 linearly spaced bins for peaks between the minimum and maximum of the map in S /N and 29 linearly spaced bins for voids between the negative maximum and positive minimum of the maps in S /N. C: Starlet peak counts were computed as pixels with values larger than their eight neighbours. They were computed on noisy convergence maps filtered with a starlet kernel with four scales corresponding to [1.6 , 3.2 , 6.4 , coarse], with 29 linearly spaced bins for each scale between the minimum and maximum values of each S /N map. D: Starlet extrema were obtained by combining the peaks computed on maps with only S /N > 0 contribution with starlet voids computed on maps with only S /N < 0 contribution. We used a starlet as we did with (C), with 58 linearly spaced bins (29 for peaks and 29 for voids).
E: The starlet 1 -norm S 1 was computed following Equation (1) on noisy convergence maps filtered with a starlet kernel with four scales. We followed the same process as we did for (C) and (D). Statistics (D) and (E) are our new proposals. In all statistics where we employed the starlet decomposition, the finest frequency we considered was θ ker = 1.6 arcmin, corresponding to the maximum angular scale max = 2149.

Covariance matrices
The MassiveNus suite of simulations (described in more detail in Appendix A) also includes a cosmology with massless neutrinos {M ν , Ω m , 10 9 A s }={0.0, 0.3, 2.1} obtained from initial conditions different from those of the massive neutrino simulations. We used this additional simulation set to compute the covariance matrix of the observable. The covariance matrix elements are computed as 2 https://www. Euclid-ec.org where N is the number of observations (in this case, the 10000 realisations), x r i is the value of the considered observable in the i th bin for a given realisation r, and is the mean of the observable in a given bin over all the realisations. Furthermore, we took into account the loss of information due to the finite number of bins and realisations by adopting the estimator introduced by Hartlap et al. (2007) for the inverse of the covariance matrix: where N is the number of realisations, n bins the number of bins, and C * the covariance matrix computed for the power spectrum and peak counts, whose elements are given by Equation (3). For illustration, we scaled the covariance for a Euclid 3 sky coverage by the factor f map / f survey , where f map = 12.25 deg 2 is the size of the convergence maps and f Euclid = 15000 deg 2 .

Likelihood
To perform Bayesian inference and get the probability distributions of the cosmological parameters, we used a Gaussian likelihood for a cosmology-independent covariance: where d is the data array, C is the covariance matrix of the observable, and µ is the expected theoretical prediction as a function of the cosmological parameters θ. In our case, the data array is the mean over the (simulated) realisations of all the separate observables, or, of their different combinations, for our fiducial model. Cosmological parameters are the ones for which simulations are available, namely {M ν , Ω m , 10 9 A s }, but the same statistics can of course be applied to other parameters if simulations allow.
In order to predict the theoretical values of all observables given a new set of values for the cosmological parameters {M ν , Ω m , 10 9 A s }, we employed an interpolation with Gaussian process regression (Rasmussen & Williams 2005) using the scikit-learn python package. The emulator used in this paper is the same as that employed in Li et al. (2019).

Result estimators
To quantify the constraining power of the different summary statistics, we employed the following estimators, which are based on Euclid Collaboration et al. (2020). To have an approximate quantification of the size of the parameter contours, we considered the following figure of merit (FoM): whereF is the marginalised Fisher sub-matrix that we estimated as the inverse of the covariance matrix among the set of cosmological parameters under investigation and n is equal to the parameter space dimensionality. We show the values of the FoM for our observables in Table 1. To estimate the marginalised 1σ  (7) for different parameter pairs for each observable employed in the likelihood analysis. In the last column, we provide the 3D FoM given as the inverse of the volume in (M ν , A s , Ω m ) space. error on a single parameter θ α (which means having included all the degeneracies with respect to other parameters), we used the quantity: where C αα are the diagonal elements of the parameter covariance matrix. We show the values of the σ αα for our observables in Table 2.

MCMC simulations and posterior distributions
We explored and constrained the parameter space with the emcee package, which is a python implementation of the affine invariant ensemble sampler for Markov chain Monte Carlo (MCMC) introduced by Foreman- Mackey et al. (2013). We assumed a flat prior, a Gaussian likelihood function as defined in Sect. 6, and a model-independent covariance matrix as discussed in Sect. 3.2. The walkers were initialised in a tiny Gaussian ball of radius 10 −3 around the fiducial cosmology [M ν , Ω m , 10 9 A s ] = [0.1, 0.3, 2.1], and we estimated the posterior using 120 walkers. Our chains are stable against the length of the chain, and we verified their convergence by employing Gelman Rubin diagnos-

Results
We now illustrate forecast results on the sum of neutrino masses M ν , on the matter density parameter Ω m , and on the power spectrum amplitude A s for a survey with Euclid-like noise in a tomographic setting with four source redshifts, z s = [0.5,1.0,1.5,2.0]. We compare results for the different observables defined in Sect.
3.1 and investigate the impact of the choice of the filter on the covariance matrix. In Fig. 1, we show the comparison between the constraints obtained using different summary statistics, as described in Sect. 3.1. As expected, we see that all higher-order statistics are more constraining than the power spectrum. The new result of this letter is represented by the starlet 1 -norm: The inclusion of all pixels enables us to retrieve tighter constraints than the combination of local minima and maxima (voids+peaks). Specifically, for all parameter space planes, the 1 -norm FoM values, illustrated in Table 1, are about twice as large compared to those for the combined local minima and maxima (and more than seven times larger than the power spectrum FoM values).
In this work, we have also introduced starlet extrema as a new summary statistic to constrain the parameters. Similarly to the 1 -norm, starlet extrema are computed between the minimum and maximum S /N value of the map, but they are defined as the combination of local maxima computed on S /N > 0 and local minima computed on S /N < 0 (namely, they do not encode the information present in all pixels). We see that starlet extrema FoM values are larger than starlet peaks and peaks+voids monoscales, suggesting that starlet extrema can be a good multi-scale higher-order statistic candidate. However, the 1 -norm remains the statistic that performs the best in terms of constraining power with respect to all the summary statistics we have considered. In Fig. 2, we show the marginalised constraints on each cosmological parameter corresponding to the different observables. To compare the improvement obtained by employing the different statistics, we computed the 1σ marginalised error for each parameter, as summarised in Table 2. We find that the starlet 1norm outperforms the power spectrum by 72% on M ν , 60% on Ω m , and 75% on A s , and the state-of-the-art peaks+voids for a single smoothing scale by 24% on M ν , 50% on Ω m , and 24% on A s . Starlet extrema outperform the power spectrum by 72% on M ν , 20% on Ω m , and 70% on A s . We also quantify the improvement provided by the 1 -norm with respect to our previous study (Ajani et al. 2020), finding that the starlet 1 -norm outperforms starlet peaks by 28% on M ν , 33% on Ω m , and 20% on A s . In Appendix B, we also compare the covariance matrices obtained when using starlet extrema and the 1 -norm, and we find that starlet extrema present a more diagonal covariance for the observable.

Conclusions
In this letter, we have proposed using starlet 1 -norm statistics on weak lensing converge maps to constrain cosmological parameters. The measure of multi-scale peak amplitudes can be seen as a measure of the 1 -norm of a subset of positive wavelet coeffi-cients. Similarly, the measure of void amplitudes can be seen as a measure of the 1 -norm of a subset of negative wavelet coefficients. Wavelets therefore provide a great framework for a joint peak and void analysis, in which information from all wavelet coefficients is included. We therefore propose using a very simple 1 -norm statistic, defined as the sum of the 1 -norm of all coefficients in a given S /N (pixel) bin for each wavelet scale, as defined in Equation (1). We investigate the impact of employing the starlet 1 -norm as summary statistics computed on weak lensing convergence maps to estimate cosmological parameters, and we find that the 1 -norm outperforms the two state-of-the-art summary statistics, the power spectrum and the combination of mono-scale peaks and voids, by, respectively, 72% and 24% on M ν , 60% and 50% on Ω m , and 75% and 24% on A s . We have furthered proposed starlet extrema and compared them to the 1norm: In this case as well, the latter performs better in terms of constraining power, within the current ideal setting, while the former present the advantage of a more diagonal covariance matrix. We are aware that the statistical power alone is not sufficient to serve as a robust probe for precision cosmology; for their usage, it will be important to test how these statistics react in a non-ideal setting and how their performance is impacted by systematics in the signal. We will dedicate a future study to this aspect.
We conclude that the new statistic proposed here presents several advantages in the context of cosmological parameter inference: It provides a fast calculation of the full void and peak distribution; it does not rely on a specific definition of peaks and voids or some arbitrary threshold; it instead encodes information of the entire pixel distribution without excluding pixels that are not local minima or maxima; and it leads to tighter constraints, at least within an ideal setting. The starlet decomposition therefore provides a very powerful framework for a joint multi-scale peak and void analysis. and shifting the spatial planes. Each κ map has 512 2 pixels, corresponding to a 12.25 deg 2 total angular size area in the range ∈ [100, 37000] with a resolution of 0.4 arcmin. We mimicked Euclid-like shape noise at each source redshift assuming Gaussian noise with mean zero and variance: where we set the dispersion of the ellipticity distribution to σ = 0.3 and the pixel area is given by A pix 0.16 arcmin 2 .