Separation of dust emission from the cosmic infrared background in Herschel observations with wavelet phase harmonics

The low-brightness dust emission at high Galactic latitudes is of interest with respect to studying the interplay among the physical processes involved in shaping the structure of the interstellar medium (ISM), as well as in statistical characterizations of the dust emission as a foreground to the cosmic microwave background (CMB). Progress in this avenue of research has been hampered by the difficulty related to separating the dust emission from the cosmic infrared background (CIB). We demonstrate that the dust and CIB may be effectively separated based on their different structure on the sky and we use the separation to characterize the structure of diffuse dust emission on angular scales, where the CIB is a significant component in terms of power. We used scattering transform statistics, wavelet phase harmonics (WPH) to perform a statistical component separation using Herschel SPIRE observations. This component separation is done only from observational data using non-Gaussian properties as a lever arm and is done at a single 250 µ m frequency. This method, which we validated on mock data, gives us access to non-Gaussian statistics of the interstellar dust and an output dust map that is essentially free from CIB contamination. Our statistical modeling characterizes the non-Gaussian structure of the diffuse ISM down to the smallest scales observed by Herschel . We recovered the power law shape of the dust power spectrum up to k = 2 arcmin − 1 , where the dust signal represents 2% of the total power. Going beyond the standard power spectra analysis, we show that the non-Gaussian properties of the dust emission are not scale-invariant. The output dust map reveals coherent structures at the smallest scales, which had been hidden by the CIB anisotropies. This aspect opens up new observational perspectives on the formation of structure in the diffuse ISM, which we discuss here in reference to a previous work. We have succeeded in performing a statistical separation from the observational data at a single frequency by using non-Gaussian statistics.


Introduction
Thermal emission from interstellar dust and the cosmic infrared background (CIB) are the two main emission components of the sky at far-infrared and sub-millimeter wavelengths.The emission from dust is a tracer of diffuse interstellar matter imaged for the first time across the sky by the IRAS space mission (Gautier et al. 1992;Miville-Deschênes et al. 2007).More recently, the Herschel space mission has provided data at a higher angular resolution, which are useful in characterizing the filamentary structure of molecular clouds (André et al. 2010;Miville-Deschênes et al. 2010;Robitaille et al. 2019;Yahia et al. 2021).The CIB is the diffuse background emission associated with the dust emission from galaxies integrated over their cosmic evolution (Hauser & Dwek 2001).It provides us with information on galaxy evolution and the large-scale structure of the universe (Knox et al. 2001;Béthermin et al. 2013;Planck Collaboration XXX 2014;Maniyar et al. 2018).Power spectra are the simplest means of analysing far-infrared maps of the sky and statistically distinguishing dust and CIB anisotropies.At high Galactic latitudes, the two sources of emission are entwined and the CIB power spectrum dominates that of the emission from diffuse interstellar matter on angular scales smaller than about 1 • (Miville- Deschênes et al. 2002;Lagache et al. 2007;Viero et al. 2013;Planck Collaboration XXX 2014;Mak et al. 2017).
The separation of the dust emission from CIB anisotropies is a major difficulty, hampering the analysis of the low-brightness emission from the diffuse interstellar medium (ISM).Dust emission at high Galactic latitudes is of specific interest in studying the interplay between thermal instability, interstellar turbulence and magnetic fields in shaping the structure of the ISM (Vázquez-Semadeni et al. 2000;Kritsuk & Norman 2002;Saury et al. 2014;Hennebelle & Inutsuka 2019), as well as to statistically characterize dust emission foreground to the cosmic microwave background (CMB; Jewell 2001;Miville-Deschênes et al. 2007).Several component separation methods were employed to analyze the Planck sky maps.They were based on combinations of frequency maps or modeling of the spectral energy distribution (SED; Planck Collaboration IV 2020).To separate the dust emission from the CIB, we need to follow a different approach using the structure on the sky because the two components have very similar SEDs.The generalized needlet internal linear combination (GNILC) method has led the way in this regard.Planck Collaboration Int.XLVIII (2016a) produced dust emission maps that are corrected for a large part of the CIB anisotropies, but at the expense of losing the structure of the dust emission on small angular scales, where the CIB is dominant, as well as of keeping residual CIB emission on large angular scales where the dust is dominant (Chiang & Ménard 2019).The correlation between dust and H I emission has been extensively used to produce CIB maps but this method does not yield a dust map independent of H I (Boulanger et al. 1996;Planck Collaboration XXX 2014;Lenz et al. 2019).
This paper circumvents these limitations by using scattering transforms (Mallat 2012;Bruna & Mallat 2013).These statistics are similar to convolution neural networks but are written in an explicit mathematical form.They combine convolutions of the input image with wavelets on several oriented scales, with a non-linear operator to capture interactions between scales as specific imprints of non-Gaussian textures (Cheng & Ménard 2021).Based on predefined wavelet filters, these summary statistics can be used to characterize the non-Gaussian texture of images without any learning step.We make use of two variants of scattering transforms: the wavelet scattering transform (WST) and the wavelet phase harmonics (WPH).The WST has been used to analyze synthetic maps of dust total intensity and polarization built from MHD simulations of interstellar clouds in Allys et al. (2019), Régaldo-Saint Blancard et al. (2020), and Saydjari et al. (2021).Allys et al. (2019) also presented a first application to a Herschel observation of dust emission.The WPH statistics were introduced in cosmology to analyze simulations of the largescale structure of the Universe by Allys et al. (2020) who showed that they may be used to synthesize maps reproducing the non-Gaussian texture of the input image.Régaldo-Saint Blancard et al. (2021) used this capability to statistically separate dust emission from data noise in Planck polarization maps, using their different non-Gaussian properties.This paper extends this approach to the separation of dust and CIB emission using Herschel SPIRE observations at a single 250 µm wavelength, performing a statistical component separation relying solely on observational data.Our scientific motivation is twofold.First, we want to demonstrate that dust and CIB may be effectively separated based on their different structure on the sky.Second, we want to use the separation to characterize the structure of diffuse dust emission at high Galactic latitude on angular scales where the CIB is the dominant component in terms of power.
The paper is organized as follows.We present the observations we use in Sect. 2. Our component separation method is introduced in Sect. 3 and validated on mock data in Sect. 4. In Sect.5, we present the results of our dust and CIB component separation based on Herschel SPIRE observations.The output dust map is used to characterize the non-Gaussian structure of diffuse dust emission at high Galactic latitude in Sect.6.The paper results and perspectives are summarized in Sect.7. The paper has five appendices that present a summary of the mathematical notations used in this paper (Appendix A), the specific set of WPH statistics used in the paper (Appendix B), the H I data used to build the mock data (Appendix C), the WPH statistics of dust emission for the mock and Herschel data (Appendix D), and a brief presentation of the reduced wavelet scattering transform (RWST, Appendix E).

Observations
This study makes use of observations obtained with the Spectral and Photometric Imaging REceiver (SPIRE) on the Herschel Space Observatory (Griffin et al. 2010).The SPIRE photometer was used to image the sky emission in three broad spectral bands centred at 250, 350, and 500 µm.We only use the 250 µm (corresponding to a frequency of 1200 GHz) images, which provide an angular resolution of 18 ′′ (full width at half maximum, FWHM, of the beam).
We aim to separate the dust and CIB emissions in an observation of diffuse interstellar matter in the so-called Spider region at a high Galactic latitude.To characterize the CIB statistics, we also made use of a SPIRE observation towards the Lockman Hole (LH), a sky area known to have a small amount of interstellar matter from H I 21 cm observations (Lockman et al. 1986).The Spider observations are presented in Sect.2.1 and the LH one is given in Sect.2.2.

Spider region
The Spider field (named for its prominent "legs" emanating from a central "body," Marchal & Martin 2023, and references within) targets diffuse interstellar matter at high Galactic latitudes along a segment of the North Celestial Pole (NCP) Loop (Heiles 1984(Heiles , 1989;;Meyer & Roth 1991), a conspicuous feature of H I and dust emission extending over ∼30 • in the northern sky.The SPIRE observations of this field were performed in the same way as described by Miville-Deschênes et al. (2010) for the Polaris field.The Spider field was observed twice in different directions.
To have two independent maps (one for each scanning direction) with distinct noise realizations, we use maps from the Level 2 stage of the data processing.These maps are combined at the Level 3 stage of the data processing.We checked that the Level 2 and 3 maps of the Spider region differ only by a signal compatible with instrumental noise.The maps were downloaded from The Herschel Archive1 .We refer to the start guide to Herschel-SPIRE2 for the data processing and map characteristics.
Here, we refer to the two Spider maps as d S1 and d S2 .These maps contain very bright galaxies that have a strong influence on our statistical analysis.We removed the ∼70 bright sources whose peak brightness is over 6 MJy/sr by replacing the corresponding pixels by a close similar area of the same map.We identified these sources directly from the maps, but we have verified that they are all extragalactic.We also checked that our results are not particularly sensitive to the way we chose to remove the sources.Then, we can define: The mean map, d S , images the total (dust+CIB) noisy infrared emission.We take the half-difference map, d N , as a statistical realization of the instrumental noise of d S .We note that this map does not include instrument systematics that would have exactly the same imprint in both d S1 and d S2 .The two images are displayed in Fig. 1.These are squares with sides of 1144 pixels corresponding to 1.91 degrees on the sky3 .The pixel size of the images is 6 ′′ .

LH region
LH is one of the fields targeted by the Herschel Multi-tiered Extragalactic Survey (HERMES, Oliver et al. 2012).Viero et al. (2013) have used the data to characterize the structure of the CIB with the power spectra.Their analysis shows that on this field the contamination of the extended emission by Galactic dust is much smaller than in the Spider field.We have used the Level 3 image.
The LH map at 250 µm was downloaded from the Herschel Database in Marseille (HeDaM4 , Shirley et al. 2021), where the field is referred to as LOCKMAN-SWIRE.The image obtained after cropping the irregular edges and removing the ∼40 galaxies whose peak brightness is over 6 MJy sr −1 is called d L and is presented in Fig. 1.The LH image is also a square with sides of 1144 pixels corresponding to 1.91 degrees on the sky.The pixel size is 6 ′′ as for the Spider maps.The noise spectrum is flat for k > 0.2 arcmin −1 .The flattening of the d S and d L spectra to different constant values at high k indicates that the noise power is one order of magnitude larger for the Spider observation.The d L spectrum shows the progressive attenuation of the sky emission by the telescope beam for k > 1 arcmin −1 .This attenuation is not apparent for the d S map because the noise amplitude is higher.We checked that the power spectrum for the LH field is consistent with the one presented by Viero et al. (2013).In Fig. 2, we also show the power spectrum of the dust emission in the LH field as estimated by Viero et al. (2013).We used their Eqs.(8) with P 0 = 4.5 × 10 5 Jy 2 sr −1 and α c = −3.66.As the dust contribution is small except at the largest scale, the power spectrum of d L is essentially that of the CIB.The power spectrum of d S is dust dominated for k < 0.2 arcmin −1 .The CIB contribution is significant for higher k.

Component separation method
In this section, we explain the procedure that allows us to derive a statistical model of the dust emission from the Spider Herschel observations.We present the mathematical formalism A1, page 3 of 17 of the problem in Sect.3.1 and our algorithm implementation in Sect.3.2.

Mathematical formalism
We want to separate the dust emission from the CIB and the Herschel instrumental noise.This problem can be written as a system of three equations involving the three observational maps: d S , d L , and d N .The sky maps d S and d L include the dust, CIB, and noise, while the difference map d N is a noise map that is free from dust and from the CIB.We thus write: where s, c, and n terms represent the dust, CIB, and noise contributions to each image, respectively.
Our scientific objective is twofold.First, we aim to derive a statistical model of the dust emission from the Spider observation expressed in terms of WPH statistics.Second, we aim to derive an estimate of the Spider dust map s S as one specific realization of the dust model.
We simplify Eq. (2), neglecting s L with respect to c L and s S , and n L with respect to n S and c L .These simplifications are supported by the comparison of the d S , d L , and s L spectra in Fig. 2. In the following, we also consider that the CIB is a statistically isotropic signal on the celestial sphere, ignoring its correlation with the large-scale structure of the universe (Planck Collaboration XVIII 2014;Serra et al. 2014).
We rewrite Eq. ( 2) as: where we have introduced the map, c ′ L , which represents a hypothetical observation of LH with the data noise statistically matching that of the Spider observation.We also introduce c ′ S = c S + n S to rewrite Eq. (3) in terms of the WPH statistics: where Φ(m) are the WPH statistics (see Appendix B) computed for a map, m.The second equation follows from the isotropy of the CIB on the celestial sphere.These equations lead us to consider two statistical processes: S associated with the dust emission and C ′ associated with the sum of the CIB and the Spider data noise.Hereafter, we refer to S and C ′ as the dust and contamination, respectively.The map s S is a realization of S , while the maps c ′ S and c ′ L are distinct realizations of C ′ .From the map, c ′ L , we can directly compute the WPH statistics that characterize C ′ .To characterize S we use a statistical component separation algorithm explained below.

Algorithm principle
The framework we used for statistical component separation has already been studied through the denoising of Planck interstellar dust polarization maps on a flat sky using WPH (Régaldo-Saint Blancard et al. 2021) and on the sphere, using an implementation of cross-WST on healpix (Delouis et al. 2022).We followed and extended the work of Régaldo-Saint Blancard et al. (2021).This method consists of an iterative minimization in pixel space of a loss function.We performed an iteration on a dust map, u, to converge to an estimator sS of s S , such that the u + c ′ S map and the d S map become "close enough" in terms of WPH statistics.
The quantitative description of the algorithm is described by the loss function.The expression of our loss function L(u) is: where where || • || stands for the Euclidean norm and ⟨•⟩ i stands for the average over i and α is a real number that will be chosen to balance the two loss terms.The set {c ′ S,i } i<N represents N sky maps generated from the WPH statistics Φ(c ′ L ) (see Appendix B.2).We use the WPH statistics defined as in Régaldo-Saint Blancard et al. (2022) and refer to Appendix B.1 for technical details.The WPH statistics used in this paper are made of 3441 coefficients.This definition includes a normalization which is done by using two reference maps, m 1 and m 2 , one for each of the loss terms.
Our loss function, L(u), is composed of two non-orthogonal terms.The L 1 term is the loss function used in Régaldo-Saint Blancard et al. (2021).It follows directly from the top line of Eq. ( 4).It constrains the u map such that if we add an independent realization of c ′ S , the sum has the same WPH statistics as d S .We averaged over several syntheses of c ′ S so that the separation does not depend on the deterministic properties of one specific realization of C ′ .The L 2 term follows from the bottom line of Eq. ( 4).It constrains the d S − u map, which we want to converge to c ′ S = d S − s S to have the same WPH statistics as c ′ S .In Régaldo-Saint Blancard et al. (2021), the contamination was a piece-wise Gaussian noise, while in our case, C ′ is a non-Gaussian process.Experimentally, we found that the L 2 (u) term is necessary to account for the non-Gaussianity of C ′ .
Recently, Régaldo-Saint Blancard et al. (2022) introduced cross-WPH statistics to capture the non-Gaussian correlations between different maps.They applied it successfully to the building of multifrequency dust generative models.Simultaneously, Delouis et al. (2022) developed cross-WST statistics on the sphere and applied it to the denoising of Planck interstellar dust polarization full-sky maps.The success of this method lies in the use of cross-statistics between half-missions maps with distinct data noises, and the TE dust correlation (Planck Collaboration Int.XXX 2016b), which makes use of the high signal-to-noise ratio (S/N) of the dust total intensity map.In our case, the same CIB sky signal is present in both half-missions maps, but we could have added a loss term based on the dust-H I correlation.We did not implement such a term because the H I data (Blagrave et al. 2017) has a lower angular resolution than the Herschel maps and a higher noise level.
To implement the algorithm, we need to choose α, the two reference maps, m 1 and m 2 , used for the loss normalization and the initial value, u 0 , of the u map.These choices will be discussed in the following sections.At the end of the optimization, we converge to a map sS .The WPH statistics of sS constitute the dust statistical model.The model may be used to generate new realizations where the optimal choice of the initial map depends on the scientific objective.
The maps generated from the statistical model have structure on all scales, including those where the power of the contamination is much larger than that of the dust.Our goal is to reproduce A1, page 4 of 17

Validation on mock data
To validate our method, we applied our component separation algorithm to mock data.The mock data are introduced in Sect.4.1.We compare input and output maps in Sect.4.2.In Sect.4.3, we show that the output maps reproduce statistics of the input mock data, which are used as diagnostics in interstellar astrophysics.

Mock data
We present how we built the mock data from observations to be as close as possible to the real data.We produce a set of mock maps defined as: where m stands for mock data and the i index indicates the different mock maps.Guided by the correlation between dust and gas in the high latitude sky (Lenz et al. 2019) S map has only 312 × 312 pixels, we cut the 1144 × 1144 contamination map, c ′ L , into nine independent patches of equal areas of 312 × 312 pixels.These patches allowed us to take into account the spatial variations of the CIB statistics over the LH region.They were combined into 72 independent pairs; for each pair, we used the first patch to produce the mock data and the second one as input to learn the statistics of the contamination.The mock d m,i S were thus constructed as the sum of c ′ m,i S and s m S .One pair of maps {c ′ m,i S , d m,i S } is plotted next to the s m S map in Fig. 3.We used the set to run our separation algorithm on these 72 different cases.The mean and the standard deviation of these results allow to verify that the algorithm does not produce a significant bias on the statistics of the dust map.Unfortunately, our validation is based on a small sample because the 72 cases are not independent.However, we are confident that this method, which allows us to take into account the non-Gaussianity of the CIB encoded in the observations, gives a satisfactory check of the significance of any potential bias.Using WPH syntheses (Appendix B.2), we could have generated more non-Gaussian realizations of the contamination to better sample the chance correlation with the dust mock map, but this possibility was constrained by the required computation time.It is important to take into account the variance of the CIB statistics to verify the algorithm ability to determine the dust statistics when there are small differences between the statistics of the patch, c ′ m, j L , used to model the contamination, with respect to those of the contamination in d m,i S .This is only indicative because the variance between the contamination patches is not necessarily commensurate with the CIB variance on the sky between the LH and Spider fields.

Implementation of the component separation and results
Here, we describe the implementation of the component separation algorithm before presenting the output images.To apply the algorithm, we made the following choices for the initial map, u 0 , the maps used for the loss normalizations, m 1 and m 2 , and the inter-loss weight, α (Sect.3.2).
As in Régaldo-Saint Blancard et al. (2021), we chose u 0 = d m,i S to obtain an output dust map reproducing the observed map on scales where the dust emission dominates.The m 1 and m 2 maps are used to normalize the WPH statistics versus scale in L 1 and L 2 .The simplest normalization would be to use the mock observation d m,i S for m 1 and a contamination map c ′ m,i L for m 2 .Régaldo-Saint Blancard et al. (2021) showed that this choice for m 1 is not optimal to reproduce the dust WPH statistics on the smallest scales, where the contamination is dominant.Ideally, we would need to know beforehand the dust power spectrum to give the appropriate weight to the dust WPH statistics.In practice, we ran the algorithm in two steps.First, with m 1 = d m,i S to converge to a first dust map, sm,i S,0 , which does not reproduce the non-Gaussian statistics well, but has the proper dust power spectrum.For this first run, we only used the first loss term, L 1 , computed on a subset of WPH statistics that only contain power spectrum-like terms.
Second, we ran the algorithm with the two loss terms with m 1 = sm,i S,0 and m 2 = c ′ m,i L .We choose α to give more weight to L 1 than to L 2 at the start of the gradient descent because we want the first loss to dominate at this point.This is because the second loss term is initially applied to an identically zero map, since u 0 = d S , and thus contains no information.We expect L 2 to help d S − u to converge to the statistics of the contamination in a second stage.We set α such that L 1 (u 0 ) ∼ 10α L 2 (u 0 ) and find that the result of the separation is only weakly dependent on this specific choice.For the validation algorithm, a run takes about 2 h on a 32 GB GPU.
The WPH statistics of d m,i S , s m S , c ′ m,i S , sm,i S , and c′ m,i recover the WPH statistics of dust within error bars at all scales.This validates our algorithm with respect to our primary goal, which is to properly recover the non-Gaussian statistics of dust emission.Figure 3 presents the output maps of our separation algorithm applied on d m,i S .Comparing by eye sm,i S with s m S , we see that we have efficiently removed the contamination without losing the non-Gaussian structure of the dust emission down to the smallest scales.The comparison of c′ m,i S and c ′ m,i S is equally satisfactory.The standard deviation of s m S , c ′ m,i S , d m,i S , sm,i S , c′ m,i S , and s m,i S − sm,i S are 2.39, 1.17, 2.61, 2.34, 1.17, and 0.64, respectively.The d m,i S mock map does not exactly resemble the Herschel d S one because the first covers a much larger sky area than the second.The sky region covered by the d S map corresponds approximately to the quarter at the bottom left of d m,i S .In Fig. 4, we present the power spectra of the six maps shown in Fig. 3, plus the cross spectrum between s m S and sm,i S .These spectra have been computed on apodized maps and binned in order to lower the statistical variance at large values of k.Given the power spectra of s m S and sm,i S are very close at all scales, the separation is able to recover the power spectrum of the dust map more than one order of magnitude under the contamination.The power spectra of c ′ m,i S and c′ m,i S are also very close, which could be expected since its directly constrained by the L 2 loss term.It shows the ability of our method to extract a realistic contamination from the components mixture.
The cross spectrum ⟨s m S × sm,i S ⟩ i shows the transition from a deterministic to a statistical separation at the scale of k = 0.7 arcmin −1 , where the dust power becomes lower than that of the contamination.This transition is similar to that reported by Régaldo-Saint Blancard et al. (2021) and Delouis et al. (2022) for the denoising of Planck dust polarization maps.On the smallest scales, s m S and sm,i S are two independent realizations of the dust statistics.This explains the factor of 2 difference between the power spectrum of s m S − sm,i S and those of s m S and sm,i S .Indeed, we see coherent features distributed at small scales in the difference A1, page 6 of 17 map at the bottom right corner of Fig. 3, which testify to the displacement of structures from s m S to sm,i S .

Non-Gaussian diagnostic for dust astrophysics
We show that the dust output maps reproduce statistics used as diagnostics to characterize the non-Gaussianity of interstellar imaging observations.We considered the probability distribution functions (PDFs) of the intensity and its spatial increments and the reduced wavelet scattering transform (RWST, Allys et al. 2019).The notation ⟨•⟩ i represents the mean of the statistics computed over our set of 72 separations (Sect.4.1), whereas the error bar is the standard deviation.
Figure 5 presents the PDFs of the dust intensity, which is commonly used as a diagnostic of the structure of molecular clouds (Burkhart 2021;Lombardi et al. 2015).The PDF of d m,i S is clearly biased towards low and high values by the contamination.After the component separation we recover very well, within statistical uncertainties, the PDF of the dust map s m S .The PDF of the contamination map c ′ m,i S is also well recovered.Figure 6 shows PDFs of the increments of the dust emission for three lags.The increment at a position x for a lag l in pixels is the set of differences δI l (x) = I(x) − I(x + l) with l < |l| < l + 1.Such PDFs have been computed on velocity maps (Hily-Blant et al. 2008) and polarization maps (Régaldo-Saint Blancard et al. 2021) to characterize the intermittence of turbulence in the ISM.The PDF of d m,i S is far more Gaussian than that of s m S , especially for small pixel lags where the impact of the contamination is the largest.For the three lags, the PDF of sm,i S matches the one of s m S within statistical uncertainties over at least three orders of magnitude.
The RWST statistics are low-dimensional non-Gaussian interpretable statistics that has found many applications in astrophysics (Allys et al. 2019;Régaldo-Saint Blancard et al. 2020;Saydjari et al. 2021).They are briefly presented in Appendix E. We use here the S iso 1 , S aniso 1 , S iso,1 2 , and S iso,2 2 coefficients.Figure 7 presents those RWST statistics for d m,i S , s m S and sm,i S .The S iso 1 and S aniso 1 characterize the amplitude as well as the level of anisotropy as a function of scales, respectively.The S iso,1 2 characterize the couplings between different scales, while the S iso,2 2 describe their isotropic angular modulation (Allys et al. 2019).The RWST statistics of d m,i S are strongly biased by the contamination.We can also notice that the RWST statistics of s m S and sm,i S are very close with respect to that of d m,i S at most of the scales, but we can notice small discrepancies at the smallest scales.
In the three examples presented in Figs.5-7 we see that these non-Gaussian statistics of s m S are very well recovered through our component separation method.This result illustrates the possibility of using the WPH model to determine non-Gaussian statistics that are not directly constrained in the component separation.It demonstrates the relevance of the WPH statistics for astrophysics.We point out the large difference between the statistics of d m,i S and those of s m S , which shows that the component separation step is essential to build a precise non-Gaussian statistical model of the dust emission.

Application to Herschel SPIRE observations
We applied our component separation method to the Herschel observation of the Spider field, d S (Sect.5.1).The power spectrum of the output map is presented and analyzed in Sect.5.2.In Sect.5.3, we present images that highlight a main non-Gaussian feature of the dust maps: the correlation between structures across scales.

Herschel separation results
Our algorithm is applied to the d S map as described in Sect.4.2.We increased the pixel size of the image from 6 ′′ to 12 ′′ because the dust emission is highly attenuated by the beam on those scales (more than five orders of magnitude at 6 ′′ ).This resampling is done using a low-pass filter in Fourier space with k max = 2.5 arcmin −1 equal to the Nyquist frequency for 12 ′′ pixels.This filtering does not induce ringing in the map because the beam attenuation on the signal power at k = 2.5 arcmin −1 is large (about a factor 65). Furthermore, for the purposes of normalization, it allows us to keep the exact same procedure from the validation on mock data to this case.
The component separation algorithm was applied according to the steps given in Sect.4.2 for the mock data.We choose u 0 = d S for the initial map.First, we run the algorithm using only the L 1 loss term, computed on a subset of WPH statistics that only contain power spectrum-like terms, with m 1 = d S to produce an intermediate dust map sS,0 .Second, we run the algorithm using the two loss terms with m 1 = sS,0 and m 2 = c ′ L .We set α such that L 1 (u 0 ) ∼ 10α L 2 (u 0 ).In this case, a run of the algorithm takes about 8 h on a 32 GB GPU.
In Fig. 8, we present the output maps of our separation algorithm applied on the Spider Herschel observation.From a visual point of view, the results are very satisfactory: it indeed seems that most of the CIB contamination has been identified as such.It is notably the case of all galaxies that are individually resolved, which is not surprising because they have an extremely clear non-Gaussian signature, both in terms of frequency and spatial location.Conversely, it does not seem that structures related to Galactic dust have leaked to the reconstructed contamination above the scales where it begins to dominate.This is impressive for results that have been obtained from three observational patches only.It should be noted, however, that the CIB reconstruction has visual defects at the position of the very bright horizontal Galactic dust structure at the top of the image.We A1, page 7 of 17 (couplings between different dyadic scales), and S iso,2 2 (angular modulation of the couplings between different dyadic scales).The variable j corresponds to the physical scale of 2 j .The S iso,1 2 and S iso,2 2 are normalized with respect to the S iso 1 and are plotted as a function of the scale ratio j 2 − j 1 for j 1 ∈ [0, J − 1] and j 2 ∈ [ j 1 + 1, J − 1].Each curve corresponds to a given j 1 .The colored bands represent the ±1σ error-bars.L and c′ S .The main scientific results of the component separation algorithm are the statistics of the dust emission and, in particular, its WPH statistics.They are obtained from the sS map, which is a realization of the estimated WPH statistical model of the dust emission, conditioned by its large scales, where the dust power is dominant.The sS map is then highly correlated at large scales with the d S map, up to a scale where there is a transition from a deterministic to a statistical behavior (Régaldo-Saint Blancard et al. 2021;Delouis et al. 2022).
From the WPH statistical model, we can generate diverse dust maps depending on the specific choice of the initial map, u 0 , and the scientific goal.For example, we can use an H I map as initial condition to ensure that it is decorrelated from the CIB.For other statistical applications, it is useful to have multiple realizations of the dust emission.In terms of this ability to generate various dust maps, our method differs from deterministic component separations.
Once we ran the component separation algorithm on the Herschel observation, we obtained a dust map that provides us with an estimate of the non-Gaussian statistics of the dust emission, which are not biased by the CIB and noise.We then estimated the error bars on these dust statistics as follows.We did not use the error bars determined on the mock data because the statistics of the mock dust map only approximate the true statistics of the dust emission, in particular, because it is based on an H I observation with a lower resolution than the one of the Herschel observations.First, we used the WPH synthesis method (described in Appendix B.2) to synthesize ten new realizations of the dust from our separated dust map, sS , and ten new realizations of the contamination from the contamination map c ′ L .Summing these new dust and contamination maps, we obtain ten mock mixture maps, to which we apply the component separation algorithm to obtain ten new separated dust maps.The standard deviations of the statistics computed over these ten dust maps are then taken as error bars.We note that in this procedure, we cannot use patches of the LH and noise maps to define different contamination samples because the Spider image is as large as them.Unfortunately, this does not allow us to take into account the spatial variations of the contamination statistics in our estimate of the error bars.

Power spectra
Figure 9 presents the beam-corrected power spectrum of d S , the beam-corrected power spectrum of sS , and its power-law fit.These spectra have been computed on apodized maps and binned in order to lower the statistical variance at large k.At k = 2 arcmin −1 , the power of sS is two orders of magnitude lower than that of d S .
Figure 9 shows that the power spectrum of sS is very close to a power-law after beam correction.To estimate the spectral index of the power-law and its uncertainty, we need to add to the component separation uncertainty the cosmic variance.We must do this because we characterize a statistical process using A solution to this problem is to impose multiple constraints during the component separations by using different local masks, as done in Delouis et al. (2022).However, we did not implement this approach due to the lack of a sufficient number of pixels.Power spectra of the input and output for the component separation applied to Herschel SPIRE maps at 250 µm.Top: beamcorrected power spectrum of d S , beam-corrected power spectrum of sS and its power-law fit.Bottom: ratio with the power-law fit.The colored bands represent ±1σ error-bars.The component separation extends the scale range where the dust power spectrum is found to have a power-law shape by a factor of an observation (one realization of the process) of finite sky area A. The standard deviation of the dust power spectrum computed at wavenumber k is: where N k is the number of modes at wavenumber k.This leads to where the wavenumber k and the bin size ∆ k are expressed in rad −1 (Scott et al. 1994;Knox 1995).We compute the total uncertainty as the quadratic sum of σ k and the component separation uncertainty and perform a power-law fit of the beam-corrected power spectrum of sS in log scale.The bottom panel of Fig. 9 presents the beam-corrected power spectra of d S and sS divided by the power-law fit 7 .We can see that the power-law fits well the beam-corrected power spectrum of sS until k = 2 arcmin −1 .This result is satisfactory because it indicates that the component separation algorithm did not filter out the small scales up to a k value, where the power of sS is about 2% of that of d S .
The maximum wavenumber up to which the power-law shape is measured is thus increased by almost a decade by the component separation.
The slope of the spectrum is −2.95 ± 0.04.We compare this fit result with values measured for the Polaris Flare, a brighter field observed by Herschel (Miville-Deschênes et al. 2010), as well as for a diffuse cloud using Planck, WISE and optical data (Miville-Deschênes et al. 2016).In both cases, the CIB is subdominant and is neglected.The power spectrum of the dust maps is a power-law in both fields, but the slope in log scale goes from −2.65 in Polaris to −2.9 ± 0.1 for the diffuse cloud.The slope we find is consistent with the latter value.

Coherent structures
The component separation allows us to identify coherent structures that are hidden by the CIB in the SPIRE maps.To illustrate this result, we computed the maps with increments for different lags, presented in this work.Computed on a map I at a lag l, the pixel at the position x of the increment map is the mean of {I(x) − I(x ′ ), |x − x ′ | = l}.In the following, the dust statistics are computed at 0.4, 0.8, and 1.6 arcmin.These scales corresponds to wavenumbers 2.5, 1.25, and 0.625 arcmin −1 .
Figure 10 shows the increment maps of d S and sS .One can identify coherent structures on both sets of increments maps, with a much improved contrast for sS .The component separation also highlights the correlation between coherent structures at different scales.This result extends earlier results obtained with wavelet convolution of IRAS sky maps (Abergel et al. 1996;Jewell 2001;Miville-Deschênes et al. 2007) and with Herschel observations on brighter clouds (Robitaille et al. 2019) to smaller angular scales.

Non-Gaussian statistics of the diffuse dust emission
Here, we quantify the non-Gaussian statistics of diffuse dust emission in the Spider field and compare our results with earlier studies.

Coherent structures across scales
The RWST statistics (see Appendix E) provide statistical insights into the multiscale filamentary structure of the cold neutral (right) coefficients of sS and sG S are compared.These coefficients correspond respectively to the couplings between dyadic scales and their angular modulation.They are normalized with respect to the S iso 1 and are plotted as a function of j 2 − j 1 for j 1 ∈ [0, J − 1] and j 2 ∈ [ j 1 + 1, J − 1].Each curve corresponds to a given j 1 .The j 2 − j 1 differences on the bottom axes correspond to ratio of angular scales θ 2 /θ 1 on the top axes.The colored bands represent ±1σ error-bars.These coefficients testify of the not-scale invariance and the filamentary structure of the output dust map sS .medium.The dust and CIB separation allows us to expand on the analysis of Herschel and H I data presented by Allys et al. (2019) and Lei & Clark (2023).
Figure 11 presents two RWST coefficients that characterize correlations between scales (Allys et al. 2019).Both are plotted versus the scale ratio ( j 2 − j 1 ).In each plot, the sS statistics are A1, page 10 of 17  S are compared at scales θ = 0.4, 0.8, and 1.6 ′ .They are displayed as a function of I * ψ θ /σ θ where ψ θ is the wavelet at scale θ and σ θ is the standard deviation of the wavelet coefficients.We employed the same oriented wavelets as those used to calculate WPH statistics.The PDFs are computed over all the pixels of the eight convolution maps obtained using the wavelets of different orientations.The colored bands represent ±1σ error-bars.compared to those of the map sG S obtained by randomizing the phase of the Fourier transform of sS .This comparison highlights deviations from Gaussianity.Unlike the Gaussian field, the S iso,1 2 of sS do not depend solely on j 2 − j 1 .This result shows that the dust emission is not statistically scale-invariant.The S iso,2 2 coefficients quantify the angular modulation of the coupling between scales.The angle dependence relates to the filamentary structure of the diffuse ISM because it measures how the coupling between two scales varies between the aligned and orthogonal orientations.The non-vanishing values of S iso,2 2 of sS at large j 2 − j 1 testify of the presence of long filaments.

Signatures of turbulence intermittency
Statistics of velocity increments derived from CO observations results have been theoretically interpreted as a signature of intermittency of interstellar turbulence (Falgarone et al. 2015).The same result is theoretically expected for the dust polarization and the gas column density (Momferratos 2015).
Figure 12 presents the PDFs of the increment images of sS shown in Fig. 10.The PDFs are roughly scale-invariant.They display non-Gaussian wings with no clear trend with the lag.This result contrasts with what has been reported for other observables.Indeed, the PDFs of increments computed for the gas velocity from spectroscopic CO observations (Hily-Blant et al. 2008) and for the dust polarization from Planck data (Régaldo-Saint Blancard et al. 2021) show non-Gaussian wings that become increasingly prominent for decreasing lag.

Structure formation in the diffuse ISM
The formation of structure in the cold interstellar medium is thought to be driven by the interplay between the turbulent gas dynamics and thermal instability (Kritsuk & Norman 2002;Saury et al. 2014).The statistics of dust maps provide observational insight.Miville-Deschênes et al. (2007) presented PDFs of wavelet convolutions of IRAS dust images.They have on all scales a pronounced skewness, which they relate to the nonlinearity of the dynamical processes driving the formation of structures.The Herschel data and the dust and CIB separation allow us to extend this analysis to smaller angular scales.
Figure 13 shows PDFs of wavelet convolutions on angular scales from 0.4 ′ to 1.6 ′ , well below the IRAS 5 ′ resolution.The wavelets are those used for the WPH data analysis.The PDFs of sS exhibits non-Gaussian wings, but smaller skewness than that of d S .

Conclusion
We have made use of the distinct textures of the CIB and the dust emission on the sky to develop and apply a component separation method on Herschel observations at a single frequency.The main results of our work are as follows.
The component separation problem on Herschel data involve three components: the dust, the CIB, and the data noise.We reduce this problem to a single inverse problem in terms of WPH statistics and present an algorithm to solve it.The results of this algorithm are a WPH generative model of the dust emission and an output map which is a realization of the model correlated with the input map.
We build a set of realistic mock data using an H I map as a template of CIB-free dust emission map.These data are used to validate the method.We show that we are able to retrieve the WPH statistics of the dust emission as well as non-Gaussian statistics used in astrophysics to characterize interstellar imaging data.Unlike methods that minimize the mean squared error in pixel space, we reproduce the power spectrum of the input map down to the smallest angular scales.
The method is applied to a Herschel SPIRE observation of diffuse interstellar matter at 250 µm.We succeed in performing a statistical separation from observational data only at a single frequency by using non-Gaussian statistics.The power spectrum of the output map is well fitted by a power-law up to k = 2 arcmin −1 , where the dust signal represents 2% of the total power.The obtained slope is −2.95 ± 0.04.
We analyzed the non-Gaussian properties of the Spider dust emission on scales where the CIB signal is dominant.The component separation step is essential for characterizing the non-Gaussianity of the dust emission.Going beyond a standard power spectra analysis, we show that the non-Gaussian properties of the dust emission are not scale-invariant.The separated dust map reveals coherent structures at the smallest scales.This work offers several perspectives for future work: -We underline that we use in this paper only one of the three wavelengths of Herschel SPIRE.Thanks to the recent development of cross-WPH statistics (Régaldo-Saint Blancard et al. 2022), our method could be extended to a multi-channel component separation of the dust and CIB on Herschel data with an aim to statistically characterize the dust SED; -We could use external templates as an H I observation for the dust; -It would also be useful to compare the gas and dust to test whether these two interstellar components are coupled on small angular scales for the cold and warm phases.This work could also be extended to other fields mapped with SPIRE; -Our statistical component separation could also be used on Planck data to separate the dust and CIB up to larger angular scales on the sphere.where S (k) is the power spectrum of ρ, which only characterize each wavenumber, k, independently.This shows that in order to characterize coupling between scales, it is necessary to introduce non-linearities.The second step to construct WPH statistics thus consists in introducing non-linearities in order to characterize interactions between scales.Indeed, if we want to get an information from the covariance between two fields, they must have common frequencies.We can take the modulus of the different wavelet convolutions, but also use a non-linear operator to capture information about phase alignement between different scales.This operator is the Phase Harmonics operator, which is defined as: ∀z ∈ C, ∀p ∈ N, [z] p = |z| e i arg(z)×p . (B.2) For p 0, this operator multiplies the complex phase of a field by a constant integer, computing the harmonics of its phase.This operator makes the phase of the two filtered images vary at the same spatial frequency in order to characterize the statistical phase alignment by means of a covariance.We can now construct different "WPH moments," by computing covariance of phase harmonics of wavelet convolutions which share common frequencies, namely, whose spectral supports overlap.The WPH moments depend on a translation vector, τ, that allows us to increase the spectral resolution.The general expression of WPH moments is: C j 1 ,l 1 ,p 1 , j 2 ,l 2 ,p 2 (τ) = Cov([ρ * ψ j 1 ,l 1 ] p 1 (x), [ρ * ψ j 2 ,l 2 ] p 2 (x + τ)). (B. 3) The number of translation vectors considered is 1 + 8∆ n .We note that due to the choice of the harmonics p 1 and p 2 , there are different ways to couple a pair of scale.The summary statistics we will use are WPH statistics, which are built from a set of WPH moments.In this work, we take a set of moments defined in Régaldo-Saint Blancard et al. (2022).
A1, page 14 of 17 Auclair, C., et al.: A&A, 681, A1 (2024) This is a small modification of Allys et al. (2020), in which the set of moments has been chosen to construct a generative model of the large-scale structure matter density field (Allys et al. 2020).With J = 7, L = 4, and ∆ n = 2, it gives a set of 3441 moments.To describe these moments, we will follow the notations of this paper and define several types of moments: if the two scales are equal, we use the letter S , and if not, we use a C; the type of moment is identified by their respective harmonics p 1 and p 2 .
The harmonics, p, multiply the frequency of a signal by a factor, p.As we want similar frequencies in order to have meaningful moments, several choices of harmonics are possible for a given pair of scales.First, we can take (p 1 , p 2 ) = (0, 0), which will lead to common frequencies even if the scales are different.We can also take (p 1 , p 2 ) = (0, 1) which will lead to common frequencies if j 2 > j 1 .Finally, if we want to keep the phases and to take different scales, we have to choose p 1 and p 2 such that j 1 p 1 ∼ j 2 p 2 , and the simplest choice is then to take (p 1 , p 2 ) = (1, j 1 j 2 ).For j 1 = j 2 , it boils down to (p 1 , p 2 ) = (1, 1) which leads to the power spectrum-like moments, which we refer to the S 11 moments (see Eq. B.1).For each type of moments, the particular moment is then labelled by the characteristic spatial frequencies probed (one for S term, two for C terms), indexed by their couple ( j, l).For example, C 01 j 1 ,l 1 , j 2 ,l 2 is the moment whose expression is Cov(|ρ * ψ j 1 ,l 1 |(x), (ρ * ψ j 2 ,l 2 )(x)).

(B.4)
All these moments depend on the amplitude of the image power spectrum, but we want to have coefficients describing the non-Gaussian features only.Using Eq.B.1, we can show that the power spectrum information is contained in the S 11 moments.Then, by normalizing all the moments by the S 11 and the S 00 ones, we get coefficients that describe non-Gaussianity independently of the power spectrum.For example, the expressions of the normalized C 01 moments are C 01 j 1 ,l 1 , j 2 ,l 2 / S 00 j 1 ,l 1 S 11 j 2 ,l 2 .
A1, page 1 of 17 Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.This article is published in open access under the Subscribe to Open model.Subscribe to A&A to support open access publication.

Fig. 1 .
Fig. 1.Herschel SPIRE maps at 250 µm.Left: map of the Spider field, d S .Center: map of the LH field, d L .Right: data noise for the Spider observations, d N .

Figure 2
Figure 2 presents the power spectra of the d S , d L , and d N maps.The figure also includes the power spectrum of a SPIRE observation of Neptune at 250 µm.As in Miville-Deschênes et al. (2010), we consider Neptune as a point-like source and use this observation to compute the power spectrum of the point spread function (PSF).The noise spectrum is flat for k > 0.2 arcmin −1 .The flattening of the d S and d L spectra to different constant values at high k indicates that the noise power is one order of magnitude larger for the Spider observation.The d L spectrum shows the progressive attenuation of the sky emission by the telescope beam for k > 1 arcmin −1 .This attenuation is not apparent for the d S map because the noise amplitude is higher.

AuclairFig. 3 .
Fig. 3. Input and output maps for the component separation applied to the mock data.The three top images show the input dust map s m S (left), one example of the input contamination c ′ m,i S (middle), and the mock data d m,i S (right).The three bottom images show output maps from the component separation: the dust map sm,i S (left), the contamination c′ m,i S (middle), and the difference s m S − sm,i S (right).
, we use an integrated intensity map, I H I of 21 cm data to build s m S as a pure dust map free from the CIB.Specifically, we made use of the DF dataset, located at (α, δ) = (10 h 30 m ,73 • 48 ′ ) (i.e., centred on the Spider field) that was part of the DHIGLS 5 H I survey (Blagrave et al. 2017) with the Synthesis Telescope (ST) at the Dominion Radio Astrophysical Observatory.A detailed description of the DF dataset is presented in Appendix C, along with the procedure used to reduce noise in the data using the Gaussian decomposition algorithm ROHSA (Marchal et al. 2019).The velocity range of integration is −9.7 < v < 26.5 km s −1 .The I H I map is a square with 312 pixels on each side corresponding to 5.2 degrees on the sky.The sky region covered by the I H I map includes the one covered by the d S map.To obtain s m S , we scaled the I H I map, so that the power spectrum of d m,i S would approximately match that of d S .The s m S map is used as a spatial template of clean dust emission at the Herschel resolution.The mock dust map is presented in the top left panel of Fig. 3.For the contamination, we use the c ′ L map that combines two Herschel maps: the LH CIB map, d L , and the noise map, d N , of the Spider observation.Since the s m

SFig. 4 .
Fig. 4. Power spectra of the input and output maps of the component separation applied to the mock data.The figure shows the power spectra of d m,iS , s m S , c ′ m,i S , sm,i S , c′ m,i S , and s m S − sm,i S as well as the cross spectrum between s m S and sm,i S .The notation ⟨•⟩ i represents the mean of the spectra computed over the 72 separation runs.The colored bands represent ±1σ error-bars, computed as the standard deviation of these spectra.The component separation allows to recover the power spectra of the input maps within statistical uncertainties.

AuclairFig. 5 .
Fig. 5. PDFs of the dust intensity for the input and output maps of the component separation applied to the mock data.The PDFs of d m,i S , s m S , sm,i S , c ′ m,i S , and c′ m,i S are compared.The color bands represent ±1σ errorbars.

AuclairFig. 6 .Fig. 7 .
Fig. 6.PDFs of increments of the dust intensity the input and output maps of the component separation applied to the mock data.The PDFs of d m,i S , s m S and sm,i S are compared for three lag values: 2 pixels (left), 8 (middle), and 32 (right).The PDFs are plotted as a function of δI/σ l , where σ l is the standard deviation of the increments of s m S at lag l.The colored bands represent ±1σ error-bars.

Fig. 8 .
Fig. 8. Input and output maps for the component separation applied to Herschel SPIRE maps at 250 µm.Left: map of the Spider field d S .Center: output dust map sS .Right: output contamination map c′ S .The d S map is separated into two components: the dust, sS , and the CIB+noise contamination, c′ S .A1, page 8 of 17 Fig.9.Power spectra of the input and output for the component separation applied to Herschel SPIRE maps at 250 µm.Top: beamcorrected power spectrum of d S , beam-corrected power spectrum of sS and its power-law fit.Bottom: ratio with the power-law fit.The colored bands represent ±1σ error-bars.The component separation extends the scale range where the dust power spectrum is found to have a power-law shape by a factor of

dFig. 10 .
Fig.10.Maps of increments for the component separation applied to Herschel Spider observations.The Neperian logarithm of the absolute value of the increments computed at θ = 0.4, 0.8, and 1.6 arcmin lags are compared for the input (d S , top row) and output ( sS , bottom row) maps.We substracted the log of the standard deviation of each map.Zooming in on these maps allows for a review of the smallest scales.

Fig. 11 .
Fig. 11.RWST statistics of the output dust map for the component separation applied to Herschel SPIRE maps at 250 µm.The S iso,1 2 (left) and

AuclairFig. 12 .
Fig. 12. Increment PDFs of the input and output dust maps for the component separation applied to Herschel SPIRE maps at 250 µm.The increment δI of d S , sS , and sG S are compared at θ = 0.4, 0.8, and 1.6 arcmin lags.The PDFs are displayed as a function of δI/σ θ , where σ θ is the standard deviation of the increments of sS at lag θ.The colored bands represent ±1σ error-bars.

Fig. 13 .
Fig. 13.Wavelet coefficents of the input and output dust maps for the component separation applied on the Herschel Spider map.The PDFs of the real part of the wavelet coefficients of d S , sS , and sGS are compared at scales θ = 0.4, 0.8, and 1.6 ′ .They are displayed as a function of I * ψ θ /σ θ where ψ θ is the wavelet at scale θ and σ θ is the standard deviation of the wavelet coefficients.We employed the same oriented wavelets as those used to calculate WPH statistics.The PDFs are computed over all the pixels of the eight convolution maps obtained using the wavelets of different orientations.The colored bands represent ±1σ error-bars.