The RWST, a comprehensive statistical description of the non-Gaussian structures in the ISM

The interstellar medium (ISM) is a complex non-linear system governed by gravity and magneto-hydrodynamics, as well as radiative, thermodynamical, and chemical processes. Our understanding of it mostly progresses through observations and numerical simulations, and a quantitative comparison between these two approaches requires a generic and comprehensive statistical description. The goal of this paper is to build such a description, with the purpose to permit an efficient comparison independent of any specific prior or model. We start from the Wavelet Scattering Transform (WST), a low-variance statistical description of non-Gaussian processes, developed in data science, that encodes long-range interactions through a hierarchical multiscale approach based on the Wavelet transform. We perform a reduction of the WST through a fit of its angular dependencies, allowing to gather most of the information it contains into a few components whose physical meanings are identified, and that describe, e.g., isotropic and anisotropic behaviours. The result of this paper is the Reduced Wavelet Scattering Transform (RWST), a statistical description with a small number of coefficients that characterizes complex structures arising from non-linear phenomena, free from any specific prior. The RWST coefficients encode moments of order up to four, have reduced variances, and quantify the couplings between scales. To show the efficiency and generality of this description, we apply it successfully to three kinds of processes: fractional Brownian motions, MHD simulations, and Herschel observations in a molecular cloud. With fewer than 100 coefficients when probing 6 scales and 8 angles on 256*256 maps, we were able with the RWST to perform quantitative comparisons, to infer relevant physical properties, and to produce realistic synthetic fields.


Introduction
The interstellar medium (ISM) serves as a good example of how complex natural physical systems can be. Its physics involve a highly nonlinear interplay of gravity and magnetohydrodynamics (MHD), as well as radiative, thermodynamical, and chemical processes (Draine 2011). This complexity precludes the advent of a comprehensive model of the ISM, whose understanding mostly progresses empirically through observations, numerical simulations, and phenomenological models. Those approaches each benefit from continually improving observational capabilities (see, e.g. Schinnerer et al. 2013;Pabst et al. 2017;Pety et al. 2017;Cormier et al. 2018) and computational power (see, e.g. Gent et al. 2013;Hennebelle 2018;Hopkins et al. 2018). A key point of ISM studies therefore lies in the quantitative comparison between observational and simulated data, which has to be done statistically. To perform such a comparison, however, requires to properly characterise non-Gaussian processes with long-range correlations that are a consequence of the complex nonlinear physics at play. This must also be done using statistical descriptions that keep a reasonably low dimensionality in order to be of sensible use (Donoho 2000).
In recent years, the complex physics of the ISM has also become closely related to observational cosmology since some cosmological signals of interest are much smaller than the emission from Galactic foregrounds. An example of this is the search for B-modes of polarisation in the cosmic microwave background (CMB). A potential detection of this signal would constrain inflation models in the very early Universe. However, it cannot be conclusive unless the submillimetre polarised thermal emission from Galactic dust is properly accounted for (BICEP2/Keck Array and Planck Collaborations 2015). This, in turn, requires a statistical model of this foreground emission in order to optimise component separation methods and reliably quantify uncertainties affecting the expected primordial signal (Planck Collaboration IV 2019). Such models exist, but only as phenomenological ones (Vansyngel et al. 2017), which are hampered by simplifying assumptions, for example that the random (turbulent) component of the Galactic magnetic field may be described as a Gaussian process. The development of a comprehensive statistical model of the ISM is therefore not only a goal for Galactic astrophysics. There are also important implications in cosmology.
It is often possible to find specific statistical estimators to test a given phenomenological model and estimate its parameters. In this class of estimators, one may think of diagnostics of the intermittent dissipation of turbulence in the probability distribution functions (PDFs) of velocity fluctuations at small scales (Frisch 1995;Falgarone et al. 2009), of the evolutionary state of molecular clouds based on column-density PDFs (Kainulainen et al. 2009), of the relative contributions of solenoidal and compressive modes of turbulence from spectro-imaging moment maps , and of the relative orientation of interstellar filaments and magnetic fields with dedicated histograms (Planck Collaboration Int. XXXV 2016) or more evolved tools (Jow et al. 2018). These tailored statistical descriptions allow us to characterise some specific non-Gaussian features, but they are of limited scope when no prior model is available.
Other statistical descriptions are not specifically designed to test a phenomenological model, but rather to characterise the morphology of the observed fields 1 . In this category, we find descriptions in terms of filaments, sheets, and voids based on Morse theory (Sousbie 2011), hierarchical structure analyses using dendrograms (Houlahan & Scalo 1992;Rosolowsky et al. 2008), or detections of linear structures, such as the Rolling Hough Transform (Clark et al. 2014). The stability of these descriptions under small deformations of the fields is, however, not easy to ensure.
An inherent difficulty to statistically modelling ISM processes in a comprehensive way lies in their long-range interaction properties. In this case, a description using probability distributions of pixel values must be based on conditional probabilities involving many points, and this is not easily tractable. A simpler way to describe such processes involves a hierarchical multiscale approach: the small scale interactions lead to the formation of local structures at intermediate scales, that in turn interact to form structures at larger scales, etc. This requires properly separating the variability of the process under study at different scales, which is precisely the purpose of the wavelet transform (Cohen & Ryan 1995;Van Den Berg 2004;Farge et al. 2010;Farge & Schneider 2015).
Second-order moments of wavelet coefficients are closely related to standard power-spectrum approaches (Flandrin 1992;Meyer et al. 1999;Farge et al. 2010). As a first step, some physical insight into ISM processes may be gained from these power spectrum analyses. They discriminate between different models of turbulence that make various assumptions about the compressibility of the fluid and the presence of a magnetic field (see, e.g. Kolmogorov 1941;Iroshnikov 1964;Kraichnan 1965a,b;Sridhar & Goldreich 1994;Goldreich & Sridhar 1995;Kowal & Lazarian 2007;Falceta-Gonçalves et al. 2014). However, secondorder moments do not fully describe the statistical properties of non-Gaussian fields. Higher-order statistical moments have also been used, such as bispectra (Burkhart et al. 2009) or structure functions (She & Leveque 1994), but these are prone to exhibit high variances due to outliers, and are therefore of limited use when only limited good quality data is available.
To beat these shortcomings, recent advances in data science have shown that it is possible to extract non-Gaussian features of fields in the multiresolution framework provided by the wavelet 1 In this paper, we use the word field to describe the two-dimensional physical quantities under study that our statistical descriptions are applied to. This unifies different words that could be used in other communities, such as "image", "texture", or "flow". transform, while keeping a reduced variance (see Sect. 2.3). The Wavelet Scattering Transform (WST, Mallat 2012), which makes use of the properties of directional wavelets, is inspired by the architecture of convolutional neural networks, and yields state-of-the-art results for image classification problems, without requiring any training stage (Bruna & Mallat 2013;Sifre & Mallat 2013). The outputs of the WST, called scattering coefficients, constitute an efficient, low-variance, low-dimensionality statistical description of non-Gaussian processes. They contain information on moments of order higher than two, are able to capture long-range correlations, and can be related to physical properties of the systems under study.
We note that studies of ISM emission maps making a direct use of the wavelet transform, and therefore related to the work presented here, have also been conducted. For instance, Khalil et al. (2006) used the wavelet transform modulus maxima method (Mallat & Zhong 1992) to analyse HI maps from the Canadian Galactic Plane Survey (Taylor et al. 2003) in terms of their multifractal spectrum and local, scale-dependent anisotropies. More recently, Robitaille et al. (2014) used complex Morlet wavelets on thermal dust emission maps from the Hi-GAL Herschel survey (Molinari et al. 2010), to separate their Gaussian and non-Gaussian components by thresholding on the probability distribution function (PDF) of the wavelet coefficients, finding in particular that the non-Gaussian part correlates well with the molecular gas emission. These approaches are in some sense akin to studying the first layer of the WST that we describe in Sect. 2.2.
The goal of this paper is to make use of this new method borrowed from data science to statistically characterise the complex structures of the ISM. With this purpose in mind, this paper introduces a statistical description of even lower dimension, the reduced wavelet scattering transform (RWST), that is obtained from the WST through the identification of the different angular modulations of the scattering coefficients, whose physical meanings are identified. This reduction does not require specific priors, but assumes that the angular dependency is smooth, as expected for physical systems. We show it to be successful in characterizing very different processes: fractional Brownian motions (Stutzki et al. 1998), column density maps generated from MHD simulations, and an observation of the Polaris Flare molecular cloud with the Herschel satellite (Miville-Deschênes et al. 2010). The RWST allows us to perform quantitative comparisons between these processes, and to produce realistically looking synthetic fields.
The paper is organised as follows: Sect. 2 offers a simplified presentation of the WST aimed at a general audience of physicists. Section 3 introduces the RWST and discusses the generality of the angular reduction that is performed. It also presents a validation of this approach through the synthesis of random fields based on the WST and RWST coefficients. Section 4 reviews the various components of the RWST coefficients, and gives examples of what physical features are encoded in these coefficients. Our conclusions and some perspectives for future works are presented in Sect. 5. Five appendices complete the paper. The basic properties of Morlet wavelets are given in Appendix A; the three different classes of physical fields used to build examples are described in Appendix B; some comments on the generalizations and limits of the RWST are given in Appendix C; the possibility to achieve a local statistical description of fields with the reduced scattering coefficients, as well as the difficulties it poses, are discussed in Appendix D; finally, additional examples of RWST for different processes are given in Appendix E.

Global wavelet scattering transform
The starting point of the statistical description introduced in this paper is the wavelet transform. Its ability to perform an efficient scale separation allows to study processes with long-range interactions by means of a hierarchical multiscale approach, and the progressive identification of structures at different scales that interact with each other (Farge et al. 2010). From this, the WST has been built specifically to quantify these couplings between such structures.

Introduction
Since its first introduction in data science (Mallat 2012), the WST has led to state-of-the-art classification results for handwritten digits and texture discrimination (Bruna & Mallat 2013), including the most difficult textures databases (Sifre & Mallat 2013). It has also been applied to quantum chemical energy regression and the prediction of molecular properties (Eickenberg et al. 2018). The goal of this section is to present and synthesise for a general audience of physicists the construction and the properties of the WST coefficients 2 . The results of this section are thus not new in themselves, but are formulated as much as possible in the language of physics rather than applied mathematics. A more complete presentation and discussion of this transform can be found in Bruna & Mallat (2013) and Bruna et al. (2015).
The initial purpose of the WST was to understand and reproduce the image classification successes obtained by deeplearning architectures, but by means of a statistical description that does not require any training stage, and is entirely controlled. The WST is built by successive convolutions of the field with Morlet wavelets followed by the application of the modulus operator. We mainly use in this paper the WST to characterise the global statistical properties of a field. The WST thus computes a set of global coefficients that can be labelled with the scales they characterise. It is however also possible to use the WST to achieve a local description of a field that is not statistically homogeneous, as presented in Appendix D.

Computation of the WST coefficients
We consider here a real-valued two-dimensional field I(x). Typically, I(x) is defined on a grid of d × d pixels and represents, for instance, an intensity level at a given wavelength in an astrophysical observation. In that case, x stands for a position in the sky. All the sizes discussed henceforth refer to certain numbers of pixels, that can in turn be related to physical lengths. We use discretised wavelets, defining a number J of scales to consider. The integer scales j are labelled from 0 to J − 1 and correspond to effective sizes of 2 j pixels (we work accordingly with base-2 logarithms in the whole paper). Therefore, 2 J must be smaller than or equal to the size d of the image.
The angles ϑ are also labelled by integers θ, such that where Θ denotes the number of angles in which we divide a π interval 3 . As we work with real fields, it is indeed sufficient to 2 The scattering coefficients are computed with a MATLAB software called scatnet that is publically available (https://www.di.ens. fr/data/software/scatnet/). We also developed a python code to perform the WST. 3 Note that the integer labels j and θ are sometimes abusively identified in this paper with the effective scale 2 j in pixels and angle ϑ in radians they correspond to. Fig. 1. Two-dimensional Morlet wavelets, with σ = 1 (see Appendix A). a: real part of ψ j,0 . b: real part of ψ j,θ with ϑ = π/5. c: location of the modulus ofψ j,0 (blue) andψ j,θ (orange). We note that in our applications, we take σ = 0.8, see Eq. (A.1).
consider the WST coefficients for angles ϑ in [0, π), i.e. with θ going from 1 to Θ. The redundancy of the other angles stems from the fact that the Fourier transform of a real-valued field I(x) verifiesĨ(−k) =Ĩ * (k) where * stands for complex conjugation 4 . From now on, we work with a labelling in terms of oriented scales ( j, θ), each of which corresponds to a certain wavevector k. This labelling will be particularly useful to distinguish scale and angular dependencies of the scattering coefficients. The computation of scattering coefficients implemented in scatnet involves convolutions with complex Morlet wavelets, which are convenient to interpret in the usual framework of spectral analysis since those wavelets are well localized in Fourier space. Their definition and basic properties are given in Appendix A, where their link with discrete windowed Fourier transforms is explained. Starting from an initial mother wavelet ψ(x) defined as the product of an oscillation of unit frequency and a Gaussian window (Eq. (A.3)) the complex Morlet wavelets are then computed as where r θ is the rotation operator of angle θ. The real parts of two examples of such wavelets and the supports of their Fourier transforms are shown in Fig. 1. The Fourier transform of the mother wavelet ψ(k) being centred on k ψ with a unit bandwidth, ψ j,θ has a support centred on 2 − j r θ k ψ with a bandwidth proportional to 2 − j . For given values of J and Θ, one can build an appropriate set of wavelets {ψ j,θ } such that their combined Fourier supports cover the whole Fourier plane, except for a localised area close to the null frequency 5 (Bruna & Mallat 2013). Using these wavelets, the WST coefficients are computed in three layers, indexed by an integer m going from 0 to 2. The first layer m = 0 characterises the average value of the field, and thus contains only one coefficient S 0 where the normalization factor µ 0 is the surface over which the integration is performed. The coefficients S 1 ( j 1 , θ 1 ) of the second layer m = 1 depend on a single oriented scale ( j 1 , θ 1 ) and are given by where stands for the convolution and the normalization factor is the impulse response with δ the Dirac delta function 6 . These S 1 coefficients probe the amplitudes of the spectral components of the field centred on the wavevector 2 − j 1 r θ 1 k ψ that is associated with the ( j 1 , θ 1 ) oriented scale. Finally, the coefficients S 2 ( j 1 , θ 1 , j 2 , θ 2 ) of the third layer m = 2 depend on two oriented scales ( j 1 , θ 1 ) and ( j 2 , θ 2 ), and are given by where µ 2 is defined similarly to µ 1 in Eq. (5). These S 2 coefficients probe the level at which the first oriented scale ( j 1 , θ 1 ) is modulated at a second oriented scale ( j 2 , θ 2 ), with j 2 > j 1 (see Sect. 2.4). They are also related to geometrical shapes and structures appearing in the field (Bruna & Mallat 2013).

Properties of the WST coefficients
The WST coefficients depend on high-order moments of I(x), mainly of order up to 2 m for the mth layer (Bruna & Mallat 2013). We therefore expect the m = 2 coefficients to allow to distinguish fields that have the same second order moments (i.e. power spectra), but different higher order moments. However, unlike high-order moments, whose estimators exhibit variances that are increasingly dominated by outliers, that is by samples which are far away from the mean (Welling 2005), the WST coefficients do not involve products of values of the field. On the contrary, the WST coefficients are built using unitary and non-expansive operators (as the modulus), and have reduced variance, which means that they can be better estimated from limited number of samples (Bruna & Mallat 2013).
We note that the construction of scattering coefficients can be pursued for deeper m 3 layers, but in practice this is not necessary, and we choose to limit the present study to the m 2 layers. The m 3 layers describe couplings between three scales or more, and characterise accordingly correlations of order higher than four. It has however been shown in practice that those additional layers do not significantly improve the classification results or the quality of syntheses performed with the WST, despite an important increase in the number of scattering coefficients (Bruna & Mallat 2013).
Furthermore, for appropriate wavelets 7 , the WST also preserves the field energy to a very good approximation (Mallat 2012) where 6 We do not write explicitly here the j i and θ i dependencies of the µ 1 and µ 2 normalization factors. 7 By this we mean that the set of wavelet supports should cover the whole spectrum of the field under study, up to its largest scale. For example, for a 256 × 256 image, it requires to have J = 8. Under this condition, Eq. (7) is valid (Mallat 2012). In our case, we consider only the energy contained in the scales j 5.
In Eq. (7), the ε term stands for the energy encoded in the m 3 layers, that contain in general less than 1% of the initial energy of the field (Bruna & Mallat 2013). Under the same requirement as for Eq. (7), the conservation of energy can also be written at the level of the power spectrum: where ||I ψ j 1 ,θ 1 || 2 is defined following Eq. (8), and essentially represents the power spectrum of the field I at the ( j 1 , θ 1 ) oriented scale (see Eq. (A.5)). In this case, ε also encodes the energy contained in the m 3 layers, that has been shown to be negligible for stationary processes (Bruna & Mallat 2013). Equation (9) shows that it is possible to recover the power spectrum of a field from its scattering coefficients. In addition, these properties link the distribution of energy into the different layers of the WST to the sparsity of the wavelet coefficients. Indeed, as I ψ j 1 ,θ 1 gets sparser, S 1 coefficients get smaller, and more energy is propagated to deeper layers. Highly non-Gaussian fields thus have larger S 2 coefficients, while the power spectrum of Gaussian fields may be recovered from the S 1 coefficients alone.
The WST finally has particular properties related to translations and small deformations of the field. First, the scattering coefficients are invariant under any global translation, since the coefficients are obtained after a spatial integration. Such a property is indeed required when working with homogenous statistics. Second, the WST linearises small deformations (Mallat 2012). This means that starting from a field I(x) and deforming it by a small amount 8 , the associated displacement in the scattering coefficients space is bounded, and thus the modification of the statistical description performed by the WST is related in amplitude to the deformation of the field. Such a property is of prime importance when studying complex physical phenomena, since one expects two fields related by a small deformation to have similar physical properties.

Number and normalization of the WST coefficients
The assumed values of J and Θ determine the number of scattering coefficients describing a given field. Let us first note that the S 2 coefficients are negligible for j 2 < j 1 . Indeed, after a convolution of I(x) by ψ j 1 ,θ 1 , all the information about scales smaller than 2 j 1 is lost, and it is sufficient to consider the modulation of |I ψ j 1 ,θ 1 | by a larger scale 2 j 2 . The whole information about the coupling of two scales j 1 and j 2 is thus contained in S 2 ( j 1 , θ 1 , j 2 , θ 2 ) for j 2 > j 1 and for all θ 1 and θ 2 . There are then N 1 = J · Θ coefficients for the m = 1 layer and N 2 = J · (J − 1)/2 · Θ 2 coefficients for the m = 2 layer. For J = 6 and Θ = 8, which are the values we consider in this paper 9 , it gives N 1 = 48 and N 2 = 960. With N 0 = 1, this gives a total of 1009 coefficients.
Since in simple cases we may expect the scattering coefficients to depend on scales through power laws of 2 j 1 and 2 j 2 , it is 8 A proper formulation of this property requires the introduction of distances between two fields as well as between two sets of scattering coefficients. It is then possible to show that when deforming a field by a small amount, the displacement in the space of WST coefficients can be bounded in terms of the displacement in the field space. See Mallat (2012) or Bruna & Mallat (2013) for more details. 9 Working mainly with fields of linear sizes 2 8 pixels, one can hardly compute meaningful statistics on scales larger than or equal to 2 6 pixels. Also, to decompose the half-circle in Θ = 8 angles is a good trade-off between a smooth sampling and the possibility to clearly distinguish between two directions for scales up to 2 5 pixels on a square lattice. , plotted in a lexicographical order. In the log 2 S 1 ( j 1 , θ 1 ) plot (left), θ 1 spans the range 1-8 for each value of j 1 . In the log 2 S 2 ( j 1 , θ 1 , j 2 , θ 2 ) plot (right), θ 2 spans the range 1-8 for each value of ( j 1 = 0, θ 1 , j 2 ). The computation of the error bars is explained in Sect. 3.4. useful to work with their logarithms, which would lead to linear behaviours as a function of j 1 and j 2 (Sifre & Mallat 2013). We finally normalise each layer of scattering coefficients by those of the previous layer. We denoteS these normalised coefficients, and whereas log 2 (S 0 ) = log 2 (S 0 ) is unchanged. This normalization separates the dependencies of the different layers (Bruna et al. 2015), since theS 1 andS 2 coefficients are invariant under the multiplication of the field by a constant factor, and theS 2 coefficients are also invariant under a modification of the spectrum of the field by the action of a linear filter 10 . Note that in practice, this normalization is done locally before performing the spatial average (see Appendix D for more details).
As an example, Fig. 2 shows the logarithms of scattering coefficients (log 2 (S 1 ) on the left and log 2 (S 2 ) on the right) computed from a 256 × 256 column density map in a simulation of an astrophysical flow (see Appendix B.3 for more details), with J = 6 and Θ = 8. All the coefficients are plotted as a function of their arguments in lexicographical order: for instance, the first eight log 2 (S 1 ) coefficients are for fixed j 1 = 0 and θ 1 varying from 1 to 8, the next eight log 2 (S 1 ) coefficients are for j 1 = 1 and θ 1 varying from 1 to 8, and so on. In Fig. 2 (right), only the j 1 = 0 subset of the log 2 (S 2 ) coefficients is plotted, in a ( j 1 , θ 1 , j 2 , θ 2 ) lexicographical order.

The reduced wavelet scattering transform
The WST was introduced in data science with the purpose of characterizing any given field without assuming constraints such as continuity or regularity. In the case of physical fields, some of these constraints may be expected to hold, suggesting possible simplifications. Indeed, the scattering coefficients shown in Fig. 2 exhibit regular patterns through the angles and scales, for 10 This is in fact verified only for linear filters roughly constant on each of the spectral domains sampled by the different wavelets (Bruna & Mallat 2018).
instance the "stair-like" shape for the m = 1 coefficients and the oscillatory structure for m = 2. It should therefore be possible to derive a new statistical description that would somehow factor out these patterns, and thus offer a significant compression of the WST coefficients.

Rationale
We propose such a description, called the Reduced Wavelet Scattering Transform (RWST), obtained by fitting the angular (θ 1 , θ 2 ) dependencies of the WST coefficients with a few terms accounting for specific angular modulations. This reduction allows to concentrate the information contained in the ∼1000 coefficients of the WST (with J = 6 and Θ = 8) into fewer than 100 reduced coefficients, almost without any loss of information (see Sect. 3.5 below). Gathering coefficients describing specific angular modulations also allows a simpler and more transparent description, and gives supplementary simplifications in some cases. For example the coefficients describing the statistical anisotropies of a field can be ignored when the latter is in fact statistically isotropic.
Our modelling of the WST coefficients separates the dependency on angles from that on scales. In this assumption, the logarithms of WST coefficients may formally be written as a sum of terms corresponding to the various possible modulations, for m = 1 and 2, while log 2 (S 0 ) =Ŝ 0 for m = 0. In Eq. (12), the f p m ({θ i }) are given functions of the angles, that may involve some reference angles related to preferred directions for anisotropic fields, and theŜ p m ({ j i }) are the reduced scattering coefficients, giving the respective amplitudes of the various angular modulations.
Before discussing the form of the functions f p m ({θ i }), it should first be stressed that rotations and scalings, in their infinitesimal versions, can be thought of as small deformations, under which the WST is continuous (see Sect. 2.3). The scattering coefficients should therefore be seen as a discrete sampling, at scales { j i } and angles {θ i }, of a continuous statistical description, rather than as independent descriptors. The same should therefore be true of the RWST description given in Eq. (12).

Reduction of the angular dependency
The choice of the functions f p m ({θ i }) should be guided by general considerations involving periodicities and angle references. For instance, the presence of a statistically preferential direction, such as in images of fluid flows with a mean direction, should manifest itself by a symmetric modulation of the WST response with a π-periodicity, since the scattering coefficients are themselves π-periodic 11 , and an angle reference either along or perpendicular to the preferential direction. Similarly, a modulation related to some signature of pixelation should be aligned with the lattice and have a π/2 periodicity (see Appendix C). All periodic functions being susceptible to a Fourier series decomposition, we may assume -to first order -the f p m to be cosine functions. This assumption, as we will see, is generally validated by the successful fits of the different physical fields it allows, as shown in Sect. 3.4.
We first assume that the fields may be statistically anisotropic, but with only one preferential direction at most. A similar approach could be developed in the case of physical phenomena exhibiting several preferential directions. The angular modulations to take into account are expected to be functions of θ 1 , θ 2 , or of the difference θ 1 − θ 2 , the latter being isotropic since it does not change under a global rotation.
For the m = 1 layer, the only angular dependency is on θ 1 . Following Eq. (12) and the discussion above, we write log 2 S 1 as the sum of an isotropic term independent of θ 1 , and an anisotropic term proportional to a π-periodic cosine function of θ 1 : where θ ref,1 ( j 1 ) is a reference angle related to the direction of anisotropy. This angle is a function of the scale j 1 , and is expected to smoothly vary across the scales 12 . Such a trigonometric function distinguishes a direction from the perpendicular one, but not a direction from its opposite 13 . If the field is statistically isotropic, then we expectŜ aniso 1 0. For the m = 2 layer, we consider the following four-term decomposition, with two isotropic and two anisotropic terms, 11 Indeed, the Morlet wavelets verify ψ j,ϑ+π = ψ * j,ϑ . Knowing that I is real-valued, the wavelet coefficients |I ψ j,θ | are then π-periodic. 12 Although the potential dependency of θ ref,1 on j 1 means that scales and angle dependencies are not completely separate as Eq. (12) suggests, we use this slightly more general form to be able to detect variations of the anisotropy directions across the scales. 13 Note that in terms of geometrical angles ϑ associated with the integer labels θ (see Eq. (1)), the cosine function reads cos[2(ϑ 1 − ϑ ref,1 )] and is therefore π-periodic. Note also that the reference angles are fitted as real values in [0, π), and not as integers, in order to describe all possible directions. They may also be defined modulo π/2 by reversing the sign ofŜ aniso 1 ( j 1 ), but this degeneracy can be lifted by enforcingŜ aniso 1 0.
All the cosine functions in this equation are π-periodic, as in the m = 1 case. We ignore a potential reference angle difference for theŜ iso,2 2 term and assume the same θ ref,2 ( j 1 , j 2 ) reference angle for the two anisotropic terms, further imposing that it should be close to θ ref,1 ( j 1 ).
Our statistical description thus consists in eight functions (Ŝ iso 1 ,Ŝ aniso 1 , θ ref,1 ,Ŝ iso,1 2 ,Ŝ iso,2 2 ,Ŝ aniso,1 2 ,Ŝ aniso,1 2 , and θ ref,2 ) that are discretely sampled at scales j 1 and j 2 . These functions form the reduced wavelet scattering transform, and each of them is described by J coefficients for m = 1, and J(J − 1)/2 coefficients for m = 2, in addition to the m = 0 coefficient, for a total of (5J + 1)J/2 + 1 coefficients. This gives for instance 94 coefficients for J = 6. More precisely, the 48 WST coefficients of the m = 1 layer are fitted with 18 degrees of freedom, while the 960 WST coefficients of the m = 2 layer are fitted with 75 degrees of freedom 14 .
We investigated the limits of this reduction of the WST coefficients. Our study shows that the modulations given in Eqs. (13) and (14) are always largely sufficient to describe the angular dependencies of the scattering coefficients. In other words, this reduction boils down to the scattering coefficients at fixed j 1 and j 2 being described by the zeroth and first harmonics in θ 1 and θ 2 , with a very good approximation. Indeed, higher harmonics of those angular modulations are not detectable when working with a single map, and are detected at a very small level when working with a set of 20 independent maps for a given process (see below). It was also possible to identify minor modulations associated to potential signatures of the lattice at the smallest scales. These terms, that allow to better evaluate the limits of the RWST, are discussed in Appendix C. Note however that in any case, their addition does not essentially modify the values of the reduced scattering coefficients obtained by fitting Eqs. (13) and (14).

Test cases
The fit of the angular dependency of the WST coefficients and the associated reduction have been tested on various fields. These are presented in detail in Appendix B, but we give here a short description of each of them for easy reference.
The first type of field used are realizations of fractional Brownian motions (fBm, Stutzki et al. 1998), Gaussian random fields with power-law power spectra characterised by a Hurst exponent H ∈ [0, 1]. We explore the range H = 0.1 to H = 0.9 in steps of 0.1. For each value of H, we use 20 different random realizations over a 256 × 256 grid. In the following, we may refer to fits using a single map or using the ensemble of 20 maps.
The second type of field used are 256 × 256 gas column density maps N H obtained from numerical simulations of magnetised turbulent astrophysical flows (Iffrig & Hennebelle 2017). There are nine classes of such maps, labelled from 1 to 9, with varying intensities of the magnetic field and of the turbulent velocity forcing (see Table B.1). For each class, we use 20 independent maps. Similarly to the fBm case, in the following, we may refer to fits using a single map or using the ensemble of 20 maps.
The third and last field used is an observation of the dust continuum thermal emission in the Polaris Flare molecular cloud (Miville-Deschênes et al. 2010) with the Herschel satellite (Pilbratt et al. 2010;Griffin et al. 2010). Unlike the first two types of field, the statistics of this field are likely not homogeneous. We roughly addressed this limitation by using a local WST Normalized residuals j 2 j 2 j 2 j 2 j 2 j 1 = 0 j 1 = 1 j 1 = 2 j 1 = 3 j 1 = 4 Fig. 3. Logarithms of the normalised m = 2 scattering coefficients of a 256 × 256 column density map from an MHD simulation (class 4, see Appendix B). The coefficients are given in lexicographical order ( j 1 , θ 1 , j 2 , θ 2 ). In the top panel, the fit using Eq. (14)  (see Appendix D) and clustering the data into four sub-regions (clusters) based on the local WST coefficients, although this process and the chosen number of clusters are somewhat arbitrary.

Goodness of the fits
The local computation of the WST coefficients (Appendix D) is also instrumental in evaluating the goodness of the fits. The statistical dispersion of these local WST coefficients over each map and over the different realizations allows to estimate the empirical variance of the global WST coefficients, and in turn their uncertainties. As we work with non-Gaussian processes for which no analytic variance estimation is available, this method is currently the only one we have at hand to estimate these uncertainties. One should however keep in mind that this method has a notable flaw. Indeed, while the empirical estimate of the variance of the WST coefficients has to converge to its expected value when using a large enough number of samples, such an empirical estimate can be of poor quality when this convergence is not achieved. From our empirical study, we assess that the sampling performed in a single 256 × 256 map only gives well determined uncertainties only for scales j 3, while it is necessary to sample on 20 such maps to correctly determine the uncertainties for scales up to j = 5. This result can be seen for instance in Fig. 2, where scattering coefficients obtained in a single map are given, as well as their estimated uncertainties. One can for example see in both panels that the uncertainties on the coefficients are underestimated for j 4. This is particularly visible for j = 5.
The fits of the angular dependencies given in Eqs. (13) and (14) were performed taking these statistical uncertainties into account, yielding a standard χ 2 red using a diagonal covariance matrix. We chose not to include complete covariance matrices because we cannot properly estimate them on a single map. This implies, in addition to the previous discussion on the statistical uncertainties of the scattering coefficients, that these χ 2 red are only indicative of the goodness of the different fits. This is something to be improved upon in future works.
Performing these fits separately for each of the fBm and MHD simulation maps at our disposal yields as many χ 2 red values as there are realizations, i.e., 2 × 9 × 20 = 360. Over all of these, we have found similar results, that boil down to an average χ 2 red of 3.5 with a dispersion of 0.6 for the m = 2 coefficients. Such a fit is shown in Fig. 3. Similar results are also obtained when fitting the combined data from all twenty realizations for each class of MHD simulation or H exponent of the fBm field, provided that the minor additional terms described in Appendix C are included. Such a fit is shown in Fig. 4.
The goodnesses of the fits are somewhat lower in the case of the Herschel observations of the Polaris Flare. For the four sub-regions of the cloud discussed in Appendix B.3, we obtain χ 2 red values with a mean of 7.2 and a standard deviation of 3.3 for the m = 2 fits. We however observe that the smallest scales are very noisy in these fields. Indeed, performing the same fits while excluding the ( j 1 = 0, j 2 = 1) scale, we obtain χ 2 red values with a mean of 4.8 and a standard deviation of 1.2. We consider these values to be satisfactory, given the heterogeneity of the statistics across the field of view and the crudeness of the clustering approach we have used to address it.
These results validate our reduction of the WST to the RWST, and show the wide range of applicability of this new statistical description. Note however that we expect this reduction to be efficient on physical fields only. We tested this hypothesis by fitting the angular dependency of the WST coefficients obtained on the image of a brick wall from the UIUC data base (Agarwal et al. 2004). We obtained large χ 2 red values, with a mean around 450 for m = 2, because the angular dependencies are in this case not amenable to smooth trigonometric functions.

Syntheses
The efficiency of the RWST as a means to capture the essential statistics of a complex field can be assessed through our ability, starting from the RWST coefficients, to build synthetic fields that are visually similar to the original data. Although this is not an absolute criterion, such a visual comparison is a widespread indicator in data science, among others such as the ability to regress physical parameters, or to achieve high rates of success in classification tests. We also note that the power spectrum does not generally pass this test for non-Gaussian fields, which is exemplified by the fact that starting from a highly non-Gaussian image and synthesizing a field that has the same power spectrum but with random Fourier phases completely destroys the structure in the image.
The synthetic fields are constructed from a set of target WST coefficients. These may be obtained directly from the field to Normalized residuals j 2 j 2 j 2 j 2 j 2 j 1 = 0 j 1 = 1 j 1 = 2 j 1 = 3 j 1 = 4 Fig. 4. Logarithms of the normalised m = 2 scattering coefficients averaged over twenty 256 × 256 column density maps from an MHD simulation (class 4, see Appendix B). The coefficients are given in lexicographical order ( j 1 , θ 1 , j 2 , θ 2 ). Top panel: fit using Eq. (14) plus additional terms discussed in Appendix C is shown (dashed orange line) on top of the data (solid blue line). The additional terms that have been taken into account in this fit are a lattice signature and a π/2 harmonic for theŜ iso,2 2 term. Bottom panel: residuals normalised to the standard deviation. Dotted lines correspond to ±2σ. mimic, or from its RWST coefficients, using Eqs. (13) and (14). Starting from a white Gaussian noise map in order to ensure high randomness, the pixel values are iteratively modified through a gradient descent method to obtain the expected WST coefficients (see Bruna & Mallat 2018 for more details). We show such syntheses in Fig. 5, starting from an original data set that consists of 20 column density maps from a simulation of interstellar MHD turbulence (class 5, see Appendix B). One of these maps is shown in the top left panel. Three different syntheses are performed based on these original data: a synthetic Gaussian field with the same power spectrum (top right), a synthetic field with the same WST coefficients (bottom left), and a synthetic field with the same RWST coefficients (bottom right). These syntheses are done with the maximum scale J = 6 we use in this paper. This implies that the structures larger than 2 5 = 32 pixels are not properly synthesised, including large scales modulations as well as the long and thin filamentary structures. Further work is needed to include the largest scales in the synthesis algorithm.
Nevertheless, both of the syntheses based on scattering coefficients provide a better agreement with the original image than does the Gaussian field. More importantly, the RWST-based synthesis is at least as good as the WST-based one, showing that the dimensionality reduction (from ∼1000 to fewer than 100 coefficients, and even fewer than 50 for isotropic processes) leads to no significant loss of statistical information. Other similar RWST syntheses are given in Fig. E.5. The syntheses of fBm processes and MHD simulations shown there also display good agreement with the original data, except at the largest scales, as already discussed. This shows the efficiency of our angular reduction, and the ability of the RWST to characterise in a very compact form a significant part of the relevant statistical information about physical fields with homogeneous statistics.
In the same Fig. E.5, we also present syntheses of fields using the RWST coefficients computed on clusters of the Polaris Flare field (see Appendix B.3). In this case, the syntheses are hampered by the heterogeneous statistics of the original data, as can be seen mainly for cluster 2. Improving this is a direction for future work. Nevertheless, we find present syntheses of subregions of the Polaris Flare to be in reasonably good agreement with the original data, and believe that they show the applicability of the RWST to observational data as well as simulations.

Interpretation of the RWST components
This section examines the different components of the RWST, and proposes physical interpretations for them, using as examples the three different types of fields introduced in Sect. 3.3 (see also Appendix B).

Introduction
For the fBm processes as well as the column density maps from MHD simulations, the RWST coefficients plotted in this section have been obtained from WST coefficients averaged over the 20 independent maps for each class. For the Herschel observation, the RWST coefficients are calculated from the WST coefficients averaged over each of the four clusters. To support our physical interpretations, we have explored the full range of parameters for each type of field, varying the Hurst exponent of fBm processes, the physical parameters of the MHD simulations, and the cluster of the Polaris Flare Herschel observation. In addition to the plots shown in this section, we also refer to several additional plots of RWST coefficients given in Appendix E that are helpful to discuss these explorations of the parameter spaces.
In the plots discussed, the RWST coefficients for m = 1 (Ŝ iso 1 , S aniso 1 , and θ ref,1 ) are of course plotted as functions of j 1 , and the m = 2 coefficients (Ŝ iso,1 2 ,Ŝ iso,2 2 ,Ŝ aniso,1 2 ,Ŝ aniso,2 2 , and θ ref,2 ) are plotted as different functions of j 2 for fixed j 1 . Since j 2 > j 1 , the number of points varies from one curve to another. All the plots include associated 1σ uncertainties that are obtained by propagating the statistical uncertainties of the WST coefficients through the fitting process 15 . The smoothness of the variations of the RWST coefficients across the scales corroborates our understanding that these are discrete samplings of underlying smooth functions. examples 16 of this term are given in Fig. 6 (top row). Other examples are shown in Fig. E.1, obtained by including the additional terms detailed in Appendix C, which, one can see by comparing these two figures, do not change theŜ iso 1 coefficients appreciably. For scale-invariant processes, these coefficients are expected to be linear functions of the scale,Ŝ iso 1 ( j 1 ) ∝Ŝ 0 1 + j 1 H with H the Hurst exponent (Bruna et al. 2015). The m = 1 anisotropic termŜ aniso 1 describes the angular modulation of the WST coefficients for anisotropic fields, with an extremum at the preferential direction θ ref,1 , cos 2π

AnisotropicŜ
As already mentioned, θ ref,1 can be uniquely defined by imposingŜ aniso 1 0. Examples ofŜ aniso 1 coefficients are given in Figs. 6 (middle row) and E.1. For isotropic fields, we expect S aniso 1 0 and the uncertainty on θ ref,1 should be large, for the same reason that the phase of a very low amplitude complex number is poorly determined. For anisotropic fields,Ŝ aniso 1 should be non-zero and θ ref,1 should be well defined. It is noticeable that in this case, the θ ref,1 angle often has almost constant values over all the scales, which strengthens the interpretation that we are indeed probing a particular direction of anisotropy. 16 Note that for the fBm, we did not normalise the m = 1 coefficients by S 0 , as described in Eq. (10), because the fBm processes we consider have zero mean.

IsotropicŜ
iso,1 2 component The first m = 2 isotropic termŜ iso,1 2 describes at which level the 2 j 1 scales are modulated at the larger 2 j 2 scale. In other words, it describes the couplings between scales. Some examples of this term are given in Fig. 7 (top row), and others in Figs. E.2-E.4 (first column). We expect this term to depend on j 2 − j 1 only for scale-invariant fields, since the modulation of a first scale by a second scale then solely depends on the ratio between the two. We also expect this term to decrease as j 2 − j 1 increases, and this decrease to be all the steeper for fields where scales are only loosely coupled, and shallower in the case of fields with a strong nonlinear behaviour. Note that this this property is related to the notion of intermittency 17 in random fields (Bruna et al. 2015).

IsotropicŜ
This term quantifies whether, after the filtering of the field at the ( j 1 , θ 1 ) oriented scale, it is more probable to have, at a 17 In Bruna et al. (2015), this is defined as the occurrence of randomly distributed bursts of transient structures at multiple scales. This may be different from the physical notion of intermittency in studies of turbulent flows. given j 2 , a modulation in the same direction (θ 2 = θ 1 ), in which caseŜ iso,2 2 > 0, or in the perpendicular direction, in which casê S iso,2 2 < 0. Some examples ofŜ iso,2 2 coefficients are given in Fig. 7 (bottom row), and others in Figs. E.2-E.4 (second column).
Our understanding of these coefficients is that they signal the presence of structures such as filaments in the field. Indeed, in this case, we expect small scale oscillations to be aligned over larger scales along the different filaments, leading toŜ iso,2 2 coefficients that do not vanish even at large j 2 − j 1 . One can for example see that all these coefficients quickly converge to zero for large j 2 − j 1 for the fBm processes, which have very few structure, while they rather converge to a constant value for the filamentary MHD simulations. It is also interesting to see that they indicate an increasing presence of structure from the most diffuse (cluster 4) to the denser (cluster 1) areas of the Polaris cloud, which is in agreement with our expectations. The two m = 2 anisotropic termsŜ aniso,1 2 andŜ aniso,2 2 both describe an angular modulation similar to the one given in Eq. (15), and therefore characterise the anisotropy of the field, but with a finer scale dependency. Examples of these coefficients and reference angle are given in Figs. 8, E.2-E.4. Similarly toŜ aniso 1 , we expect S aniso,1 2 andŜ aniso,2 2 to vanish for statistically isotropic fields, in which case the uncertainty on θ ref,2 should be large. For the anisotropic fields, it is striking to note that the levels as well as the direction of anisotropy given by the m = 1 and m = 2 reduced scattering coefficients are similar, see for instance Figs. 6 and 8. This confirms our identification of the physical meaning of these terms.

Scale invariance
The signposts of scale invariance are unsurprisingly most apparent for fractional Brownian motion fields, whose power spectra display power-law scalings. Indeed, for these fields we find that S iso 1 is a linear function of j 1 with a slope proportional to H (Fig. 6, top left), whileŜ iso,1 2 is a function of j 2 − j 1 only, since the curves for different j 1 in Fig. 7 (top left) come together when plotted as functions of j 2 − j 1 . The same is true ofŜ iso,2 2 ( Fig. 7, bottom left).
For the gas column density maps from MHD simulations, we do not observe such a scale-invariant behaviour in the range of scales that is sampled (see theŜ iso 1 isotropic term in Fig. 6, top centre). This is not unexpected, since the energy injection in the simulations is itself not scale-invariant. There is a hint of a scaleinvariant behaviour at small scales, but these are over too short a range to be meaningful (Frisch 1995). On the other hand, thê S iso,1 2 ( Fig. 7, top centre) andŜ iso,2 2 (Fig. 7, bottom centre) terms are not a function of j 2 − j 1 only, indicating that the couplings between scales are not scale-invariant.
In the Polaris Flare Herschel map, we observe a behaviour that is similar to the MHD simulation maps for the most intense region (cluster 1, Fig. 6, top right), but also a flattening at small scales in the most diffuse regions 18 (clusters 3 and 4). This may be an effect of noise or of the Cosmic Infrared Background (CIB) fluctuations (Puget et al. 1996;Lagache et al. 2005;Viero et al. 2013) beginning to stand out. We also note that the third and four clusters seem to haveŜ iso,1 2 andŜ iso,2 2 coefficients 18 Recall that the m = 1 coefficients are normalised by the m = 0 ones (Eq. (10)), which precludes a direct comparison of theŜ iso 1 values. similar to the scale-invariant fBm ones at the smallest scales, which strengthens this observation.

Couplings between scales
The lack of coupling between scales in fractional Brownian motion fields appears in the fast decrease ofŜ iso,1 2 ( Fig. 7, top left) and ofŜ iso,2 2 to zero (Fig. 7, bottom left) for j 2 − j 1 3. In addition, the fBm processes share the same structure inŜ iso,1 2 and S iso,2 2 coefficients. Indeed, one can recover these curves from one another by a simple linear dilation of the j i scale that depends on their H exponents only (see Fig. E.3). This property can be related to the fact that all Gaussian fields have the same m = 2 WST coefficients in one dimension (Bruna et al. 2015), thus allowing their identification independently of their power spectrum.
On the contrary, theŜ iso,1 2 andŜ iso,2 2 coefficients for MHD simulations cannot be directly mapped from one another (see Fig. E.3). However, these terms share similar forms indicating that the gas dynamics in these MHD simulations are computed for a common set of equations. We expect the differences between those patterns to echo the differences of physical parameters. The dynamic range ofŜ iso,1 2 ( Fig. E.3, first column) may be used as a measure of the strength of the coupling between scales. We observe that it decreases as the turbulent forcing increases, in the absence of a mean magnetic field (from class 1 to class 3, see Appendix B), but that this effect is much less marked when the mean magnetic field is strong (from class 7 to class 9). A similar conclusion may be drawn from the comparison of the dynamic ranges ofŜ iso,2 2 ( Fig. E.3, second column) for the same classes. In all cases, this decrease is much less steep than in the fBm case, especially for theŜ iso,1 2 terms, clearly indicating a stronger coupling between scales.
For the Polaris Flare map, theŜ iso,1 2 terms signal a systematic decrease of the coupling between scales, from the most dense region (cluster 1) to the most diffuse (cluster 4, see Fig. E.4, first column). The signature inŜ iso,2 2 ( Fig. E.4, second column) is not so clear-cut, but we do observe that for cluster 1,Ŝ iso,2 2 does not go to zero at large j 2 − j 1 , while it does for cluster 4. This indicates a stronger nonlinear dynamics in the denser region of the Polaris Flare, as one could expect.

Statistical isotropy and anisotropy
The statistical isotropy of fBm fields is evidenced by the fact that S aniso 1 (Fig. 6,  On the contrary, the MHD simulations mostly display signatures of a statistical anisotropy, that is the result of a competition between the mean magnetic field and the turbulent forcing. Indeed, we note that the larger the magnetic field at a given turbulent forcing (from class 1 to 7 in Fig. 6, centre), the larger thê S aniso 1 terms. Exploring these dependencies further, we note that signatures of anisotropy (Ŝ aniso 1 0) are clear for MHD simulations with a strong mean magnetic field and low turbulent forcing (class 7, in Fig. E.1, centre), but are smaller for simulations with no mean magnetic field (classes 1 and 3) or with a high turbulent forcing even when the magnetic field strength is large (class 9). Quantitatively, the value ofŜ aniso 1 yields levels of anisotropy of 30% at most for the MHD simulation maps. The same conclusion can be drawn from the study ofŜ aniso,1 2 andŜ aniso,2 2 ( Fig. E.3). It is interesting to note that small signatures of anisotropy appear even for the simulations without large-scale magnetic fields (Figs. E.1 and E.3, classes 1 and 3). They are a priori the result of the inherently anisotropic dynamics of MHD flows. It is also worth noting that those self-induced spontaneous Each curve has a fixed j 1 values, following the convention given in Fig. 7, and j 2 values going from j 1 + 1 to 5.
anisotropies have different signatures compare to the ones driven by a mean magnetic field. Indeed,Ŝ aniso 1 appears to increase with scale for classes 1 and 3, while it decreases for class 7 (Fig. E.1, centre). Similar differences of behaviour also appear forŜ aniso,1 2 andŜ aniso,2 2 .
In the Polaris Flare map, we also detect signatures of anisotropy inŜ aniso 1 ( Fig. E.1, bottom centre). This coefficient increases with scale as in the case of MHD simulations without a mean magnetic field. We note that it also globally increases from the most diffuse region (cluster 4) to the most intense (cluster 1), reaching a ∼70% level of anisotropy. It is interesting to note that the θ ref,i reference angles are similar for all clusters, except for the most diffuse one (cluster 4), once again singling it out. The m = 2 anisotropic RWST coefficientsŜ aniso,1 2 andŜ aniso,2 2 similarly increase with scale (at least for clusters 1 to 3), also in clear contrast to the MHD simulations with mean magnetic fields (see Figs. E.3 and E.4, third and fourth columns).

Conclusions and perspectives
We have presented the RWST, a low-dimensionality statistical description of complex structures arising from nonlinear phenomena, in particular interstellar MHD turbulence. This description is built from the WST, a low-variance statistical description of non-Gaussian processes, developed in data science, that encodes long-range interactions through a hierarchical multiscale approach based on the wavelet transform. The WST characterises the textures of 2D images with coefficients that depend on scales and orientations. The RWST provides a reduction of the WST through a fit of its angular modulations, gathering the information into a few functions that separate isotropic and anisotropic characteristics of the data.
We have applied the RWST to statistically describe and compare fields arising from three processes: fractional Brownian motions, column density maps from numerical simulations of interstellar MHD turbulence, and an observation of the dust thermal emission from an interstellar cloud (the Polaris Flare).
Our analysis, performed on these fields, allows us to draw a number of conclusions on the properties of the RWST.
Firstly, the RWST characterises and differentiates processes with a small number of coefficients grouped into a few functions, since each of the 256 × 256 maps we have analysed is characterised by 94 RWST coefficients grouped into eight functions of the scales. The coefficients are statistical descriptors encoding, with reduced variance, moments of order up to four. The coefficients derived from independent realizations of fractional Brownian motions and MHD simulations are remarkably consistent for any given set of input parameters. For the Polaris Flare, the coefficients vary significantly across the image, but we obtain a satisfactory description of the data by splitting the image in four regions with distinct characteristics.
Secondly, the RWST coefficients compose a comprehensive statistical model that we use to generate synthetic random fields (Sect. 3.5). The textures of the synthesised images are noticeably similar to that of the input data on scales sufficiently sampled to allow for a statistical description. This match illustrates the ability of the RWST coefficients to capture the multiscale correlations intrinsic to non-Gaussian fields.
Thirdly, the RWST coefficients quantify the properties of scale invariance, as well as the degree and direction of anisotropy across the scales, in a given field (Sect. 4). They also encode non-Gaussian characteristics quantifying the coupling between scales as signatures of nonlinear gas dynamics. Further work is needed to precisely understand how to use the RWST to characterise the filamentary structure of the interstellar medium and the intermittency of interstellar turbulence.
Finally, the RWST project data into a space of reduced dimensionality where observations of the interstellar medium may be compared with numerical simulations in a comprehensive way. Such comparisons may contribute to constrain the physical properties of interstellar MHD turbulence. The results presented in Sect. 4 and Appendix E illustrate this possibility and point out quantitatively that the numerical simulations used in this paper fail to reproduce the statistical properties observed in the Polaris Flare. Further work is needed to check whether a better match is obtained with more realistic simulations of interstellar MHD turbulence including the formation of structures through the thermal instability.
In this paper, the WST and the RWST are applied to images. It would be interesting to extend this analysis to threedimensional fields from MHD simulations of interstellar turbulence, and data cubes obtained from spectroscopic observations (e.g. Hily-Blant et al. 2008;Blagrave et al. 2017;Pety et al. 2017) and Faraday tomography (e.g. Zaroubi et al. 2015;Van Eck et al. 2019), to build stationary stochastic models of the turbulent magnetised ISM including intermittency (e.g. Falgarone et al. 2009;Momferratos et al. 2014).
One can also develop the WST and RWST to analyse all-sky surveys, such as Planck data, as a whole, using directional wavelets on the sphere (Demanet & Vandergheynst 2001;McEwen et al. 2007). This could open up a path towards generating equivalent random fields to be used for the development of advanced component separation methods. A first example of such an application would be the separation in total intensity between the emission from Galactic dust and the Cosmic Infrared Background (Planck Collaboration Int. XLVIII 2016). We also expect to be able to adapt the RWST to fields describing polarised emission (Stokes I, Q, and U) and, from there, to simulate polarised Galactic foregrounds (Vansyngel et al. 2017;Planck Collaboration XI 2019). A&A 629, A115 (2019)

Appendix A: Morlet wavelets and windowed Fourier transforms
Wavelets are waveforms that locally quantify the amplitude of a field in a given range of scales (see for instance Cohen & Ryan 1995;Van Den Berg 2004;Farge et al. 2010;Farge & Schneider 2015 for a more detailed introduction to wavelets and their application in physics and turbulence). They are constructed by dilating and rotating an initial wavelet, generally called the mother wavelet. Each wavelet samples a given region of the Fourier spectrum of the field under study. The wavelets used in this paper are complex Morlet wavelets, also called Gabor wavelets. They are complex analytic wavelets that can efficiently separate the amplitude and phase components of a signal, with a good localization in frequency (Leung & Malik 2001). They are thus well suited to finely describe the spectrum of a field. The complex Morlet wavelets are defined from a mother wavelet of parameter σ, that in the one-dimensional case reads: In this equation, α and β = exp(−σ 2 /2) are normalization factors respectively ensuring that the wavelets have a unit L 2 norm and a null average (Ashmead 2010). This mother wavelet is the product of a plane wave of unit wavenumber by a Gaussian window of characteristic size σ which localises it. The β coefficient can often be neglected when σ > 1. The wavelet ψ j is obtained by a dilation of the mother wavelet: Such a wavelet and its Fourier transform are plotted in Fig. A.1. Neglecting the β term in a first approximation, the Fourier transform of the mother wavelet is a Gaussian window of width proportional to 1/σ and centred on the unit wavenumber k ψ = 1. The Fourier transform of the ψ j wavelet is thus centred on the 2 − j wavenumber and has a bandwith proportional to (2 j σ) −1 . Thus, convolving a given field with such a wavelet corresponds to bandpass filtering in which the passband is defined by the Fourier transform of the wavelet. As this is done locally, this convolution yields the local level of the signal filtered by the wavelet.
Morlet wavelets in two dimensions can also be constructed from a mother wavelet, which is then dilated and rotated, as expressed in Eq. (2). In this case, the mother wavelet is the generalization of the one-dimensional definition 19 : where n is a unit vector defining the oscillation direction of the mother wavelet 20 . The Fourier transform of such a wavelet is still close to a Gaussian, whose central position and width are modified by rotations and scalings. Two examples of such wavelets and the supports of their Fourier transforms in the Fourier plane are shown in Fig. 1. We note that we consider a discrete set of wavelets in this paper, since they are built from an integer number of rotations and scalings, labelled with the j and θ indices introduced in Sect. 2.2. The σ parameter describes approximately the number of oscillations of the wavelet within its support, and allows a tradeoff between their spatial and frequency resolutions. Indeed, small values of σ allow to detect the modulation of a given wavelength at a scale close to the wavelength itself, but at the cost of a poorer frequency localization. When studying fields linked to astrophysical observations, as in this paper, we use small values of σ (σ = 0.8 in the present case). Indeed, when studying the structure of a filament in a direction perpendicular to it, the main modulation it contains defines the width of the filament itself, and contains only one oscillation. Conversely, when studying audio signals, it is more suitable to use wavelets with a large value of σ, since the modulation timescales are often large in comparison to the period of audible sounds.
We use in this paper the Morlet wavelet as a main tool. All the calculations are however very close to what one could obtain with the Discrete Windowed Fourier Transform (DWFT). Indeed, the one-dimensional DWFT of wavevector k of a field I(x) is with g a normalised window function. Choosing for this window a Gaussian with appropriate width (see Eq. (A.1)), the DWFT reduces (up to a global phase and in the limit where β is negligible) to the convolution with a wavelet ψ j such that k = 2 − j . It is thus possible to compute the power spectrum in the range of frequencies of a given ψ j Morlet wavelet as where L is the size of the integration domain (Mallat 2012). This result, that can be generalised in two dimensions, emphasises the difference between the usual power spectrum and the scattering coefficients, the latter being computed with the L 1 norm (see Eq. (4)).

Appendix B: Flows studied
In this appendix, we present in detail the three different types of fields that we have applied our analysis to.

B.1. Fractional Brownian motions
The two-dimensional purely synthetic random fields that we use in this work are fractional Brownian motions (Falconer 2004). These extend the class of Brownian motion processes by relaxing the condition of independent increments. In other words, values of fBm fields at nearby points are not independent, and this process is continuous but almost nowhere differentiable. In one dimension, an fBm of Hurst exponent H ∈ ]0, 1[ is defined as a random process X : R + → R such that the increments X(t + δ) − X(t) for any t 0 and δ > 0 are normally distributed with zero mean and variance δ 2H . In N dimensions, a random field X defined on R N is an fBm if [X (r 2 ) − X (r 1 )] 2 ∝ ||r 2 − r 1 || 2H , for any pair of points (r 1 , r 2 ).
The syntheses of such fields are most easily built in Fourier space, with X(k) = A(k) exp iφ X (k) , by specifying amplitudes that scale as a power-law of the wavenumber k = ||k||, i.e. A(k) = A 0 k −β X /2 , where β X = 2H + N is the spectral index. The Fourier phases φ X are drawn from a uniform random distribution in [−π, π], subject to the constraint φ X (−k) = − φ X (k) so that X is real-valued. The power spectra of fractional Brownian motions are therefore power laws, P X (k) ∝ k −β X . Three examples of such fields are given in Fig. B.1 (top row), with Hurst exponents equal to 0.1, 0.5 and 0.9.
In an astrophysical context, fBms have been used previously as toy models for the fractal structure of molecular clouds, in both density and velocity space (Stutzki et al. 1998;Brunt & Heyer 2002;Miville-Deschênes et al. 2003). They have also recently been used to model the turbulent component of the interstellar magnetic field, and to study the statistical properties of polarised thermal dust emission maps (Levrier et al. 2018).

B.2. Isothermal MHD simulations
The second class of fields used in this work are column density maps N H computed from numerical simulations of magnetohydrodynamical turbulent flows, aiming at reproducing the structures emerging in the interstellar medium. These simulations are performed by solving numerically the equations of ideal MHD, as described in Iffrig & Hennebelle (2017).
The simulations used in the present paper are simplified and do not take self-gravity into account. Stellar feedback (from supernovae and HII regions) is removed accordingly. Because the equations are solved on a finite-resolution grid, numerical diffusion mimics the effects of physical viscosity and dissipates energy in the fluid. Due to this dissipation, the simulations require constant energy input to attain a statistical steady state. In Iffrig & Hennebelle (2017), the energy was injected by the stellar feedback, but here this is done through a turbulent forcing of the velocity field similar to the one described in Schmidt et al. (2009). This forcing is quantified by the overall turbulent velocity dispersion σ turb . The thermodynamical treatment of the  1.0 9.0 gas is simplified by assuming isothermality. Initially, the simulation cube is filled by a uniform-density, uniform-temperature gas, with n H = 2 cm −3 and T = 10 K, and is permeated by a uniform magnetic field B 0 . Several simulations are run with varying intensities of the magnetic field 21 , from B 0 = 0 (hydrodynamical case) to B 0 = 1 µG, and of the turbulent forcing, with σ turb = 1 km s −1 , to σ turb = 9 km s −1 . To distinguish the different simulations, we group them into classes, as indicated in Table B.1.
Snapshots of the logarithm of total column density (log N H ) for several of these simulations are shown in Fig. B.1 (bottom row). The degree of anisotropy in these maps increases with the ratio between the mean value of the magnetic field and the turbulent forcing. We thus expect maps derived from simulations in class 3 to be isotropic, while those derived from simulations in classes 4 and 7 present higher levels of anisotropy.

B.3. Herschel Polaris observations
The third field on which we have tested our approach is an observation of the Polaris Flare molecular cloud (Miville-Deschênes et al. 2010) obtained with the SPIRE instrument (Griffin et al. 2010) onboard the Herschel satellite (Pilbratt et al. 2010), at a wavelength of 500 µm. The Polaris Flare is a diffuse molecular cloud that is not showing clear signs of star-formation activity. As such, it is generally thought to be representative of the very early stages of molecular cloud formation and evolution, and the dynamics of its gas and dust contents are therefore probably more representative of the interstellar turbulent cascade than other, star-forming clouds, in which feedback processes from young stars (jets, outflows, and radiation) tend to confuse the picture.
The far-infrared emission that was mapped by Herschel-SPIRE at a resolution of 37 is produced by the cold dust in the cloud, as it reprocesses the ambient visible and UV radiation from Galactic starlight. At these long wavelengths, this emission is optically thin, and its integrated intensity is therefore directly proportional (to a very good level of approximation) to the column density of the large, cold grains on the line of sight. It allows to probe the matter content of the cloud, assuming a uniform gasto-dust ratio. We note that the Polaris Flare has been extensively studied, not only through this thermal continuum emission of cold dust, but also through CO rotational lines that allow to probe the velocity field of the molecular gas down to very small scales (Falgarone et al. 1998;). The geometry of the magnetic field in the Polaris Flare was also studied with optical stellar polarisation data by Panopoulou et al. (2016). We use a 832 × 832 pixels subset of the full Herschel-SPIRE map discussed in Miville-Deschênes et al. (2010), covering almost 10 square degrees in the sky (Fig. B.2, left). Compared to the fBm and MHD simulations, the statistical properties of this map are unlikely to be homogeneous 22 . It is therefore necessary to work with local WST coefficients and, ideally, to identify a mesoscopic scale over which the statistical properties may be considered homogeneous and study their variations over larger scales, as discussed in Appendix D.
To circumvent this difficulty, we propose, as a first attempt to distinguish between a spatial evolution of the statistical properties across the Polaris Flare and the statistical variability that is intrinsic to an homogeneous stochastic process, to compute local WST coefficients on the map (using a Gaussian window φ J of width 2 J = 64 pixels for J = 6, see Appendix D), and gather regions in the sky that have similar WST coefficients using a clustering algorithm 23 . To do so, we divide the Polaris Flare map into N = 22 × 22 square regions and for each region we compute a set of normalised WST coefficients that we note y i . These y i can be seen as a vector in a statistical space of dimension 1009 (with J = 6 and L = 8). To identify regions that have similar WST coefficients, we use a k-means clustering algorithm 24 which performs a partition of {y i } k∈{1,...,N} in k subsets {S 1 , ..., S k } so that the sum of the variances of Euclidean distances between vectors within each cluster is minimised (Arthur & Vassilvitskii 2007). Formally, the k-means algorithm finds a partition which minimises: where µ j is the centroid of cluster S j . 22 For example, the filamentary structures just south of the centre of the map might be gravitationally bound, but the diffuse filaments towards the edge probably are not. 23 Note that the local WST coefficients are computed with some oversampling, which means that the windows on which they are computed partially overlap (Laurent et al. 2013). Due to these local windows, it is also necessary to exclude a thin band close to the edges of the map. 24 Note that this clustering approach has already been used in studies of the interstellar medium (Bron et al. 2018). Figure B.2 (right) shows the four clusters of the Polaris Flare map that were identified in this way. This number of clusters is a free parameter of the algorithm, but we have not explored its influence, settling for k = 4 as a first guess. It already shows a clear distinction between the statistical properties of the identified clusters. For each of these regions, we then assume that the statistical properties are homogeneous, so that local WST coefficients within each region can be averaged.

Appendix C: Possible additional terms to the RWST angular reduction
In Sect. 3.2, we discussed the form of the dominant terms accounting for the angular dependencies of the WST coefficients (Eqs. (13) and (14)), but it may happen that residuals exhibit oscillatory trends with different periodicities, showing that these terms are not sufficient. This is mostly apparent when averaging over several realizations, because these residual trends then start to stand out from the sampling noise. In this case, several additional terms are used to satisfactorily fit these minor features. When fitting the angular dependency of WST coefficients averaged over twenty 256 × 256 maps, we identified three such additional terms, that are either new angular modulations due to additional physical effects, or higher harmonics of an angular modulation that has already been identified. We stress, however, that these terms have small amplitudes compared to the RWST coefficients discussed in the main text, and that the values of the latter are unaffected by the inclusion of these additional terms in the fit. The discussion of these additional minor terms nevertheless offers a better understanding of the limits of the dominant terms discussed in the main text.
The first two additional terms are modulations related to pixelation. These terms are not attributable to the WST computation, but to the methods used to generate the fields, and may be different for the different types of fields. For instance, we may expect a signature at small scale of the grid that the MHD simulations were computed on. Similarly, the computation of the fBm fields on a square, regular grid in Fourier space should produce signatures at both small and large scales. Similar effects related to pixelation have been identified in the map of the Polaris Flare. AdditionalŜ lat,1 1 ( j 1 ) andŜ lat,2 1 ( j 1 ) terms related to lattice pixelation, for the m = 1 layer. For each class of fBm processes or MHD simulations, these were computed using twenty 256 × 256 maps.

(C.2)
When included, these additional terms have low levels (see Fig. C.1 for examples), and all the fields but the fBm show such non-vanishing additional components only at the smallest scales. A signature of the lattice also seems to appear at large scales for fBm fields, but this is weakly significant and difficult to precisely assess. We note that the π/2 and π/4 harmonics have similar amplitudes. This is not surprising since anisotropic terms related to lattice pixelation may be much less smooth than a physical anisotropy.
The third additional term that we have identified is a π/2 harmonic for theŜ iso,2 2 component of Eq. (14), that needs to be added to this equation · · · +Ŝ iso,3 2 ( j 1 , j 2 ) · cos 4π Θ θ 1 − θ 2 ] . (C. 3) The appearance of such a term in the angular reduction of the WST coefficients is in line with the discussion of Sect. 3.2 about the structure of the angular modulations. TheŜ iso,3 2 terms may also contain additional information on the fields. For instance, Fig. C.2 shows that the fBm fields and MHD simulations clearly have different forms for this π/2 harmonics, providing yet another quantitative lever to compare various processes. No other additional term was necessary to achieve satisfactory fits of the residual trends, but future studies may need to include further terms of a similar type (e.g. π/2 harmonics for the anisotropicŜ aniso 1 ,Ŝ aniso,1 2 andŜ aniso,2 2 terms). However, we expect all these additional terms to remain small compared to the main RWST coefficients described in Sect. 3.2.

Appendix D: Heterogeneous statistics and the local wavelet scattering transform
In this appendix, we discuss the generalisation of the WST to fields that are not statistically homogeneous and how it can be applied locally in such cases.

D.1. Mesoscopic scale
The use of any statistical measure to describe properties of fields arising from nonlinear physical processes, and hopefully, from there, to gather information about the underlying physics itself, warrants some discussion about the physical quantities that may be encoded in the statistics, and about the scales over which these statistics are computed. A useful analogy here can be found in statistical thermodynamics, which proceeds by devising statistical measures over a vast number of particles to establish physically meaningful quantities 25 . These averages are performed over mesoscopic scales, large enough to contain a huge number of particles so that statistical fluctuations may be neglected, but also small enough so that the thermodynamical variables can be seen as locally-defined fields. These may in turn vary over larger, macroscopic scales. In our case, we have observable fields, such as column density maps, whose morphologies we mean to describe statistically, with the purpose of relating these statistics to physical properties of the medium, such as the amplitude of the magnetic field averaged on a certain scale. Assuming that such a relationship exists, it is important to ask which scales and structures in the observable fields are subject to the inherent variability that is meant to be captured by the statistics, and which ones are related to a modification of the larger scale physical properties associated in turn with the statistical properties themselves. Our mesoscopic scale should be chosen where these two ranges of scale meet, so that the statistics can be considered homogeneous below this scale while allowing for a subsequent study of the variation of physical properties across larger scales 26 .
It may be difficult to properly determine this mesoscopic scale. A good criterion seems to be the apparent reproduction of similar patterns. For instance, in the Polaris Flare map (Fig. B.2), structures at the scale of 30 are too scarce to be treated statistically, but those appearing at the 0.3 scale may be. The mesoscopic scale required to describe such a map should then be somewhere in between these two scales, which is why we chose J = 6, corresponding to 15 . We see that the scale separability provided by the WST and the RWST is of great importance.

D.2. The local WST
When the field considered has homogeneous statistical properties, it is sufficient to compute a set of global scattering coefficients that can be obtained by integration on the entire spatial support of this field (Eqs. (3)-(6)). It is however also possible to compute scattering coefficients that describe local statistical properties (Bruna & Mallat 2013) on mesoscopic scales. In the homogeneous case, these local WST coefficients provide different samples and allow us to estimate the variance. In the heterogeneous case, they allow us to quantify the evolution of statistical properties on large scales.
For a real-valued field I(x), these local WST coefficients S loc 0 (x), S loc 1 [ j 1 , θ 1 ](x) and S loc 2 [ j 1 , θ 1 , j 2 , θ 2 ](x) are computed similarly to the global coefficients but with a spatial integration limited to a subset of the space, using a normalised Gaussian window φ J of fixed width 2 J , the size of the largest wavelength probed by the Morlet wavelets (Bruna & Mallat 2013). The m = 0 coefficient is the local average of the field over a characteristic scale 2 J , i.e. while the m = 1 and m = 2 coefficients are given by S loc 1 [ j 1 , θ 1 ](x) = µ −1 1,loc |I ψ j 1 ,θ 1 | φ J (x), (D.2) and S loc 2 [ j 1 , θ 1 , j 2 , θ 2 ](x) = µ −1 2,loc ||I ψ j 1 ,θ 1 | ψ j 2 ,θ 2 | φ J (x), (D.3) 26 An important difference between our case and statistical thermodynamics is that in the latter the quantities defined on a mesoscopic scale (e.g. kinetic temperature) are different in nature from those they are built upon at the microscopic scale (e.g. particle velocities), while in our case, these two quantities could well be the same, for example the amplitude of the magnetic field. With a statistical description that allows a scale separability, as it is the case for the WST and its reduced form, we can treat the variations of these quantities statistically at small scales, and relate these statistics to the value of the same physical quantities averaged on the mesoscopic scale.
where the µ i,loc normalization factors are the m = 1 and m = 2 responses to a Dirac δ function, µ 1,loc = |δ ψ j 1 ,θ 1 | φ J (x), (D.4) and similarly for µ 2,loc . The normalization described in Sect. 2.4 (Eqs. (10) and (11)) can be performed at this stage. Then, the computation of local RWST coefficients can be done following exactly the computation described in Sect. 3. Otherwise, integrating these local coefficients over the entire space recovers the global scattering coefficients.