ROHSA: Regularized Optimization for Hyper-Spectral Analysis - Application to phase separation of 21 cm data

Extracting the multiphase structure of the neutral interstellar medium (ISM) is key to understand the star formation in galaxies. The radiative condensation of the diffuse warm neutral medium producing a thermally unstable lukewarm medium and a dense cold medium is closely related to the initial step which leads the HI-to-H2 transition and the formation of molecular clouds. Up to now the mapping of these phases out of 21 cm emission hyper-spectral cubes has remained elusive mostly due to the velocity blending of individual cold structures present on a given line of sight. To address this problem, we developed a new Gaussian decomposition algorithm named ROHSA (Regularized Optimization for Hyper-Spectral Analysis) based on a multi-resolution process from coarse to fine grid. ROHSA uses a regularized non-linear least-square criterion to take into account simultaneously the spatial coherence of the emission and the multiphase nature of the gas. In order to obtain a solution with spatially smooth parameters, the optimization is performed on the whole data cube at once. The performances of ROHSA were tested on a synthetic observation computed from numerical simulations of thermally bi-stable turbulence. An application on a 21 cm observation of a high Galactic latitude region from the GHIGLS survey is presented. The evaluation of ROHSA on synthetic 21 cm observations shows that it is able to recover the multiphase nature of the HI and physically meaningful information about the underlying three-dimensional fields (density, velocity and temperature). The application on a real 21 cm observation of a high Galactic latitude field produces a picture of the multiphase HI, with isolated, filamentary and narrow structures and wider, diffuse and space filling components. The test-case field used here contains significant intermediate-velocity clouds that were well mapped out by the algorithm.


Introduction
Star formation in galaxies is strongly linked to the physical processes that govern the evolution of the interstellar medium (ISM). Stars form by gravitational collapse of dense (n > 10 4 cm −3 ) and cold (T ∼ 10 K) structures in molecular very narrow features (a few km s −1 ). In emission the 21 cm line contains these narrow features on top of much boarder spectral structures (10-20 km s −1 ). Clark (1965) was the first to suggest that this might be the signature of a cloud-intercloud medium in pressure equilibrium. Very rapidly Field (1965) and Field et al. (1969) introduced the concept of thermal instability and laid out the theoretical ground of a "two phase" HI model showing that, at the pressure of the ISM, the heating and cooling processes naturally lead to two thermally stable states: a dense cold neutral medium (CNM -T ∼ 50 K, n ∼ 50 cm −3 ) immersed in a diffuse warm neutral medium (WNM -T ∼ 8000 K, n ∼ 0.3 cm −3 ). This vision was later complemented by Wolfire et al. (1995Wolfire et al. ( , 2003 considering updated heating (dominated by the photoelectric effect on small dust grains) and cooling (dominated by CII: 158 µm, OI: 63 µm, Lα and electron recombinations onto positive charged grains) processes of the ISM.
This description of the diffuse neutral gas complemented a parallel hypothesis that emerged in the 1950s (e.g., von Weizsäcker 1951;von Hoerner 1951;Chandrasekhar & Münch 1952) and considered the ISM as a multi-scale turbulent medium. In this case, the density and velocity structures are the result of a highly dynamical and out-of-equilibrium medium. In order to reconcile the "static/two-phase" and the "turbulent" hypotheses, several studies have aimed at understanding the production of the CNM in a turbulent and thermally unstable flow using numerical simulations (e.g., Hennebelle & Pérault 1999;Koyama & Inutsuka 2002;Audit & Hennebelle 2005;Hennebelle et al. 2008;Saury et al. 2014). In general, these numerical studies show that the WNM has the properties of a trans-sonic turbulent flow, while the CNM shows a much more contrasted density structure, in accordance with the cloudintercloud picture. In addition, such studies indicate the presence of a significant fraction of the mass being in the thermally unstable regime (i.e., with a temperature mid-way between the CNM and WNM stable states). For instance, Saury et al. (2014) showed that 30% of the HI is in the thermally unstable regime. Interestingly, these latter authors also show that this lukewarm neutral medium (LNM) is spatially located around the cold structures, pointing at the transitional nature of this thermal state.
From an observational standpoint, studies combining 21 cm absorption and emission data have clearly revealed the presence of HI at intermediate/unstable temperatures, typically between 500 and 5000 K (e.g., Heiles & Troland 2003a;Kanekar et al. 2003;Roy et al. 2013a,b;Murray et al. 2015Murray et al. , 2018a. Based on a coherent modeling of emission and absorption spectra, Heiles & Troland (2003b), Murray et al. (2015Murray et al. ( , 2018a) estimated that about 30% of the HI is in the cold CNM phase, 20% in the thermally unstable regime, and 50% in the WNM. Nevertheless the fraction of the HI in each phase remains uncertain and large variations are observed: the fraction of the mass in the CNM ranges from ∼1% to more than 50% (Murray et al. 2018a).
The nature of these variations and how they relate to the dynamical conditions of the gas remains largely unexplored from the observational point of view. One main hurdle in getting access to this information is the fact that our knowledge of the multiphase nature of the HI relies on 21 cm absorption measurements that are limited to lines of sight crossing radio sources. By nature, this way of observing prevents us from mapping the HI phases. To go further, and really compare with numerical simulations that are currently under-constrained by observation, it is mandatory to map the column density structure of each phase and study the spatial variations of their centroid velocity and velocity dispersion. This calls for methods that can extract the information of each HI phase from fully sampled 21 cm emission data only. Huge efforts have been made to map the 21 cm emission of the Galactic HI (recent examples are Taylor et al. 2003;Kalberla et al. 2005;Stil et al. 2006;McClure-Griffiths et al. 2009;Winkel et al. 2016;Peek et al. 2018) and a large amount of data is now available. The information about the multiphase and multi-scale nature of the HI contained in these large hyperspectral data cubes has remained elusive due to the difficulty in separating the emission from the different phases on each line of sight. In this paper we propose a new method to map out the contribution of each phase to the 21 cm emission. The method is based on a decomposition of the 21 cm emission line with Gaussian profiles and constraints that favors spatially coherent parameters.
The paper is organized as follows. In Sect. 2, we describe the methodology used to develop our Gaussian decomposition algorithm. An evaluation of numerical simulation is presented in Sect. 3. In Sect. 4 we present an application to observations. The discussion and summary are presented Sects. 5 and 6.

Gaussian decomposition of the 21 cm emission
Very early on after its detection, the 21 cm line was observed to be well described by a sum of a small number of Gaussian components. This was found to be true for the least confused absorption spectra (Muller 1957(Muller , 1959Clark 1965) but also for emission spectra observed away from the Galactic plane (Heeschen 1955). Very few spectra at high Galactic latitudes do not comply with that rule, whatever the angular resolution of the data. Recently, Kalberla & Haud (2018) showed that more than 60% of the spectra over the whole sky can be described by the sum of seven Gaussian components or less. In the two decades after the detection of the 21 cm line, many studies used the Gaussian decomposition to infer physical parameters from the data (Matthews 1957;Davis 1957;Muller 1957Muller , 1959Dieter 1964Dieter , 1965Lindblad 1966;Takakubo & van Woerden 1966;Mebold 1972). The fact that a small number of Gaussian components is needed to describe the signal was seen as a convenient way to describe the emission profiles with a small set of parameters (Takakubo & van Woerden 1966). It is also a very strong element in favor of the Gaussian function as a significant descriptor of the underlying physics. Takakubo (1967) showed that the width of the 21 cm emission line could be grouped in three components (σ ≤ 3 km s −1 ; 3 < σ < 7 km s −1 ; σ ≥ 7 km s −1 ), a result confirmed later on (Mebold 1972;Haud & Kalberla 2007;Kalberla & Haud 2018). Takakubo (1967) also showed that the narrow 21 cm features are well correlated with Ca+ K line absorption measurements. These latter authors concluded that the narrow component, also seen in 21 cm absorption spectra, is likely to be isolated cold clouds (CNM) in the Solar neighborhood. They also showed that the spatial distribution of the centroid velocity, velocity width, and column density of the large feature is compatible with a warm (WNM) and diffuse disk that follows Galactic rotation. The second group of Gaussians, with a width (σ) of between 3 and 7 km s −1 , is generally attributed to gas in the thermally unstable range, but a fraction of them could be caused by blending of narrow features.
The exact mass fraction of gas in each phase (CNM, LNM, and WNM) is still a matter of debate. In addition, because this knowledge is based on absorption measurements, there is very little information about the structure of these phases on the sky. Being able to separate the different phases on each line of sight A101, page 2 of 19 A. Marchal et al.: ROHSA: Regularized Optimization for Hyper-Spectral Analysis would allow us to study the structure and kinematics of the cold phase and its relationship with the more diffuse gas. In theory, one could expect that Gaussian decomposition of the emission spectra could provide such mapping.

Limitation of the Gaussian model
There are many pitfalls in the description of the 21 cm emission spectra as a sum of Gaussian components: velocity blending, ambiguities of the number of components, nonGaussian profiles, noise peaks, plurality of solutions. Another important limitation is the effect of optical depth of the 21 cm line that modifies the shape of the line. More generally, the main opposition to Gaussian decomposition of emission spectra is that any spectrum can be decomposed that way provided that enough Gaussians are used. If that is the case, how can one be sure that the Gaussian representation provides some real physical information about the emitting gas? For instance, two spatially disconnected cold structures present on the same line of sight could appear at the same projected velocity. In this case their respective emission profiles would be confused. This velocity blending affects both the emission and absorption spectra; it is present in the data, and more so at lower Galactic latitudes where the length of the line of sight is larger and the number of HI structures increases.
This line of reasoning led Dickey & Lockman (1990) to advise against using Gaussian decomposition to analyze 21 cm spectra. Later on, it continued to be used (Verschuur & Schmelz 1989;Verschuur & Magnani 1994;Poppel et al. 1994;Haud 2000;Verschuur 2004;Begum et al. 2010;Martin et al. 2015;Kalberla & Haud 2018) but overall there was a loss of interest, except for the analysis of absorption spectra. Indeed by comparing nearby absorption and emission spectra one can recover the effect of optical depth on the 21 cm emission. In addition, absorption measurements are only sensitive to cold gas, limiting the velocity blending problem. For these reasons, the Gaussian decomposition continued to be used in this context (Dickey et al. 2003;Kanekar et al. 2003), and especially after the seminal work of Heiles & Troland (2003b) who developed a dedicated formalism which has been used in several other studies since (e.g. Stanimirović & Heiles 2005;Begum et al. 2010;Stanimirović et al. 2014;Lee et al. 2015;Murray et al. 2014Murray et al. , 2015Murray et al. , 2018b. Indeed, key information about the nature of the HI came from the joint Gaussian decomposition of emission and absorption spectra.

Development of a new approach
Following what has been done for the comparison of emission and absorption spectra where the Gaussian decomposition is considered valid, we would like to argue that a similar decomposition could be envisaged for emission data only, at least at high Galactic latitudes where the effect of optical depth of the 21 cm line has been shown to be negligible (Murray et al. 2018b).
The fact that physical information could be obtained using a Gaussian decomposition of absorption data reveals the fact that thermal broadening has a significant effect in shaping the line profile, or in other words, that the dynamics of each HI phase is typical of sub-or trans-sonic turbulence. When the amplitude of turbulent and thermal motions are commensurate, the line appears smooth and can be represented by a small number of Gaussian components (Miville-Deschênes et al. 2003). If the HI at high Galactic latitude is indeed represented by a two-phase medium with small, cold, and trans-sonic structures immersed in a relatively low-Mach-number and warm diffuse phase, the Gaussian representation could bear significant physical information.
The perspective of mapping the phases of the HI is so important that we ventured to explore new ways of decomposing the emission spectra that could be applicable to the high Galactic sky. The main difficulty is the effect of velocity blending for cold structures. As mentioned by Takakubo & van Woerden (1966), there will always be cases where a given spectrum can be fitted with a smaller number of components than its neighbors, if two or more components have similar central velocity and velocity dispersion. One way to avoid this confusion is to look for solutions that have a spatial continuity, or that have a slow spatial variation. Poppel et al. (1994), Haud (2000), Martin et al. (2015) and Miville-Deschênes et al. (2017b) have implemented Gaussian decomposition methods that use some information about their neighbors in order to favor spatially coherent solutions. Formally nevertheless, these algorithms do not force solutions to be spatially coherent; they simply provide initial guesses to the fit of a single spectrum based on the most likely solutions found in some surrounding area. The optimization is not bound to this initial guess and it can always converge to another solution that would break the spatial smoothness of the parameter space.
The novelty of the algorithm we present here is that it is the first one that imposes the spatial coherence in the determination of the parameters. In order to do that, all the spectra of the emission cube are fitted at the same time. To make sure that the recovered parameters are spatially smooth, specific regularization terms are added to the cost function with non-negativity constraints on the amplitude. This algorithm, called ROHSA, is described below.

ROHSA
ROHSA performs a regression analysis using a regularized nonlinear least-square criterion. We formulate in this section the Gaussian model used as well as the energy terms added to the cost function to take into account the spatial coherence of the emission and the multiphase nature of the gas simultaneously. The quasi-Newton algorithm, L-BFGS-B, used to perform the optimization is then briefly described. Finally, we formulate the algorithm performed by ROHSA based on a multi-resolution process from coarse to fine grid.

Model
The data are the measured brightness temperature T B (v z , r) at a given projected velocity v z across sky coordinates r. The pro- with θ(r) = θ 1 (r), . . . , θ n (r) and where G v z , θ n (r) = a n (r) exp is parametrized by θ n = a n , µ n , σ n with a n ≥ 0 being the amplitude, µ n the position, and σ n the standard deviation 2D maps of the nth Gaussian profile across the plan of sky. The residual is A&A 626, A101 (2019) The estimated parametersθ are defined as the minimizer of a cost function that includes the sum of the squares of the residual, where Σ is the standard deviation 2D map of the noise assumed without spatial correlation. In practice this term is estimated using a sequence of empty velocity channels of T B (v z , r). For each of the N Gaussians, we want to obtain a spatially coherent solution, meaning that for each parameter, the values have to be close for neighboring lines of sight. This can be done by penalizing the small-scale spatial fluctuations of each parameter, measured by the energy at high spatial frequencies. The considered high-pass filter is the second-order differences, that is the Laplacian filtering, defined by the 2D convolution kernel, The following regularization term, itself containing energy terms, is added to the cost function given in Eq. (4), where D is a matrix performing the 2D convolution using the kernel d and λ a , λ µ , and λ σ are hyper-parameters than tune the balance between the different terms. These terms ensure a positive spatial correlation of the model parameters for neighboring pixels. However, each term is free to have large variation across the field at larger scale. Since σ n contains information about the gas thermodynamics, we design an additional term in the cost function to group Gaussians with similar σ n . This is implemented in order to favor any solution that would produce components ascribable to each of the phases (WNM, LNM or CNM). In order to do that we add another term, λ σ σ n − m n 2 2 , which constrains σ n to be close to an unknown scalar value m n . The full regularization term is then with m = (m 1 , . . . , m N ) and a n ≥ 0, ∀ n ∈ [1, . . . , N]. The last two terms in Eq. (7), representative of a joined constraint imposed on σ n , allow us to interpret a posteriori and simultaneously the morphology and the thermodynamical state of each component extracted from the data. The full cost function is then

Optimization algorithm
Unlike Q(θ), each energy term proposed in Eq. (6) involves linear dependences on the parameter θ. The cost function in Eq. (8) is therefore a regularized nonlinear least-square criterion. The minimizer, Algorithm 1 ROHSA based on a multi-resolution process from coarse to fine grid where T B i is the averaged data at i scale. ) has no closed form expression and is not directly tractable because of the complexity of the modelT B and the size of the unknown and data. The proposed solution relies instead on an iterative optimization algorithm that uses the gradient which is tractable since it involves the residual, the Jacobian of the residual ∇L(θ), and 2D convolutions with the kernel d for D For the optimization, RHOSA relies on L-BFGS-B (for Limited-memory Broyden-Fletcher-Goldfarb-Shanno with Bounds), a quasi-Newton iterative algorithm described by Zhu et al. (1997) which allows for the positivity constraints of the amplitudes to be taken into account. In this algorithm, after an initialization θ (0) , the solution is approached iteratively by where H −1 ∇J (θ, m) is approximated with the L-BFGS formula. The iterations are repeated until one of the two following criteria is met: (1) the total number of evaluations of J(θ, m) and ∇J(θ, m) exceeds a maximum number of iterations defined by the user; (2) the projected gradient is sufficiently small (i.e., |proj ∇J(θ, m)|/(1+|J(θ, m)|) < 10 −10 ). Due to its nonlinearity, the least-square criterion J(θ, m) described in Eq. (8) is likely to include local minimizers. Therefore, the L-BFGS-B algorithm used by ROHSA is also likely to converge toward one of these local minima, making the solution highly dependent on the initialization θ (I) (0) . In order to overcome this difficulty, we based the design of ROHSA on an iterative multi-resolution process from coarse to fine grid (described in the following section) to automatically choose θ (I) (0) and to converge towards a satisfactory local minimum.

ROHSA algorithm
ROHSA is based on an iterative algorithm using a multi-resolution process from coarse to fine grid presented in Algorithm 1. The number of iterations I depends on the size of the fine grid S and is defined by the relation 2 I = S . For example, a grid of size S 2 = 256 2 requires I = 8 iterations. Each iteration is made of three steps. 1. Data are averaged at scale i as where V i defines the neighborhood at scale i, as described in Fig. 1, and K i is the number of positions in that neighborhood. For i = 1, all the spatial information is compressed into a single spectrum: 2. The parameters θ (i) and m (i) are estimated on that spatially averaged data version T B i by minimizing the cost function given in Eq. (8).
The minimization (line 2 of Algorithm 1) is made using L-BFGS, described in the previous section. We note that for scale i = 1, there is no spatial information T B 1 and the result does not depend on the regularization. 3. Parameters θ (i) are spatially interpolated at nearest neighborhood to serve as initialization for the next scale i + 1. The free hyper-parameters λ A , λ µ , λ σ , λ σ remain constant during the iterations.

Evaluation on numerical simulation
To evaluate the performance of ROHSA, we applied it to synthetic 21 cm observations computed from a numerical simulation of thermally bi-stable turbulence flow. This allowed us to directly compare the solution given by ROHSA to the properties of the gas present in the simulation. That direct comparison with numerical reality is an essential test to evaluate the performances of a source-separation algorithm like ROHSA.

Numerical simulation
To test ROHSA we used the hydrodynamical simulation of thermally bi-stable turbulence performed by Saury et al. (2014). We used their 1024N01 simulation (1024 3 pixels and a physical size of the box of 40 pc) characterized by (1) an initial density n 0 = 0.1 cm −3 , (2) a large-scale velocity v S = 12.5 km s −1 and (3) a spectral weight ζ = 0.2. The initial density corresponds to the typical density of the WNM before condensation, the largescale velocity represents the amplitude given to the field that generates large-scale turbulent motions in the box, and finally the spectral weight controls the modes of the turbulent mixing (here a majority of compressible modes). The Mach number of this simulation has been evaluated to be around M = 0.85 for T > 200 K.
In order to explore the performances of ROHSA we use only a subset of this simulation. We concentrate our analysis on a region of 256 × 256 × 1024 pixels with a moderate CNM fraction in order to limit the effect of HI self-absorption (see Sect. 3.2.3).

21 cm line synthetic observations
The synthetic 21 cm observations were computed using the formalism described by Miville-Deschênes & Martin (2007).

Distribution of velocity fluctuations
In the 3D spatial space, the neutral hydrogen can be described by three 3D fields: the temperature T (x), the density ρ(x), and the z-component of the turbulent velocity field v z (x). Here the 3D spatial positions are denoted by the vector x while the 2D vector expressing the line-of-sight is denoted by r. The z-axis is taken along the line of sight.
Information about the velocity field is inevitably lost because of the projection along z-axis. This makes this description of HI a nonexhaustive one. For each position x, we assume that the velocity dispersion of a given cell is dominated by thermal motions. This is a fair approximation as the turbulent velocity dispersion at the cell size (0.04 pc) is σ turb ∼ 0.3 km s −1 , which is smaller than the thermal broadening everywhere in the simulation: the smallest thermal broadening for the coldest gas found in the simulation (T = 20 K) is σ therm = 0.4 km −1 . The distribution function of the z-component of the velocity v z (x) of a given cell is then given by where ∆(x) = √ k B T (x)/m H , which is the thermal broadening of the 21 cm line, m H is the hydrogen atom mass, and k B the Boltzmann constant.

Brightness temperature: general case
The general case for the computation of the 21 cm brightness temperature T B (v z , r) is based on the following radiative transfer equation.
where τ(v z , r, z) is the optical depth of the 21 cm line defined as and C = 1.82243 × 10 18 cm −2 (K km s −1 ) −1 . In this representation, a gas cell at position z absorbs emission from the cell located behind it along the line of sight, i.e., at z > z .

Optically thin limit
In the optically thin limit, in cases where the self-absorption is negligible (i.e., τ(v z , r, z) 1 everywhere), the 21 cm brightness temperature is proportional to the density ρ: where H is the depth of the cloud and ⊗ the spatial convolution.
Here we consider the case that includes spatial smoothing by a telescope beam B(r). The integrated column density N thin HI (r) and the centroid velocity v z (r) of the 21 cm line can be obtained directly by integrating T thin B (v z , r) along the velocity axis: and

Synthetic observation
We computed the synthetic position-position-velocity (PPV) data cube in the general case using Eq. (14). Each spectrum has an effective velocity resolution of 0.8 km s −1 and covers −40 < v z < 40 km s −1 . We considered the beam B(r) of the instrument by convolving the synthetic PPV cube with a Gaussian kernel characterized by standard deviations of two pixels along the spatial axis. We then added a homogeneous Gaussian noise of 0.05 K to each spectrum. In order to mimic observation, integrated column density maps shown in the rest of the paper are computed using the optically thin limit presented in Eq. (17). The integrated column density map of the synthetic PPV cube is shown in Fig. 2.

Results
ROHSA was then applied to decompose the synthetic observation computed in Sect. 3.2. In this section we discuss the choice of the free parameters of ROHSA, the global properties of the Gaussian sample, and the properties of individual components identified by the decomposition. Subsequently, the mapping of a threephase coherent model is presented with direct comparisons to the phases extracted directly from the simulation.
3.3.1. Choosing the free parameters of ROHSA ROHSA has six free parameters : the number of Gaussian components N, four hyper-parameters λ i and, the maximum number of iterations of the LBFGS algorithm.
The most important parameter is N. This has to be sufficiently high to ensure a complete encoding of the signal, that is, to ensure that the residual is dominated by noise. As we discuss in Sect. 3.3.3, a given number of Gaussian components does not imply that all components are used to describe the signal along every line of sight. Components are allowed to have an amplitude of zero at any position. This is especially relevant for components encoding cold features. Since the CNM clouds occupy a small fraction of the total volume (see also Sect. 3.3.3), we expect the associated amplitude fields to have a large fraction close to zero. This is ensured by the energy term σ n − m n 2 2 which minimizes the variance of the dispersion velocity of each component. Amplitudes are brought to zero if there is no need for a Gaussian to describe the signal at some location, instead of encoding another phase like the WNM for example. Avoiding phase mixing allows for overfitting to be prevented. As with N, different values of the hyper-parameters λ a , λ µ , λ σ , λ σ can be tested to obtain a satisfactory solution. If the hyperparameters are null (i.e., no regularization) the signal could be fully encoded, but no spatial coherence will appear in the solution θ (I) . On the other hand, if the hyper-parameters are too high, the solution will tend towards a solution that could be too spatially coherent, or even flat, wiping out small-scale fluctuations and providing a bad fit to the data. A spatially coherent solution that describes the data well with the smallest value of N is a good criterion to select the values of the hyper-parameters.
The last criterion set by the user is the maximum number of iterations of the LBFGS algorithm computing Eq. (11); see Sect. 2.4.2. That parameter must be large enough to ensure the convergence of the solution. The convergence of ROHSA from a numerical perspective is presented in Fig. 3, which shows the evolution of the cost function J(θ(r), m) for 730 iterations.
The decomposition of the synthetic observations presented in this section converges to a satisfactory solution with N = 8, λ a = 10 000, λ µ = 10 000, λ σ = 10 000, and λ σ = 1000. As previously recommended, these values are empirically found to converge towards a noise-dominated residual and a signal that is encoded with a minimum number of Gaussian components. To illustrate this, Fig. 4 shows the normalized probability distribution function of the relative difference (N HI -Ñ HI )/N HI between the solutionÑ HI inferred with ROHSA and the data N HI for different values of N.
As an a posterior assessment, the skewness µ 3 of the residual is shown to quantify the quality of the encoding. As the emission is only positive and noise has a symmetric distribution centered on zero, the skewness of the residual can be used as a way to evaluate if the emission is well estimated or if the algorithm has over-fitted the data and included some noise in the solution. A positive skewness is usually an indication that some emission is left in the residual. If the model has not enough freedom to encode the emission fully; more components are needed or the regularization terms should be lowered. On the other hand, a negative skewness indicates that the decomposition is overfitting the data; positive noise fluctuations are included in the model leaving more negative noise fluctuations than positive ones in the residual. This is usually an indication that the regularization coefficients (λ i ) should be larger to increase the spatial smoothness of the solution. A skewness of the residual close to zero is an indication of a valid solution, one that is not distorted by the regularization and that does not overfit the signal. For the case of the synthetic observations presented here, N = 8 fully encodes the signal with a relatively low skewness |µ 3 | = 0.04. The computation time of ROHSA depends on the maximum number of iterations, the dimension of the PPV cube, and the number of Gaussian components. For each step of the multiresolution process from coarse to fine grid, the computation time used by ROHSA for this particular case (purple line) is presented in Appendix B with a lower number of Gaussian components for the sake of comparison. The evaluation performed here requires nearly two hours of computation time on a single CPU, which makes it difficult to explore a large range of hyper-parameters and number of Gaussian components. To overcome this difficulty, a GPU implementation of ROHSA is under development.

Global properties of the Gaussian sample
ROHSA recovered the total emission of the synthetic observation with a relative variation of 0.3%. An example of the Gaussian decomposition for a representative 4 × 4 mosaic of the simulation is shown Fig. 5. The spatial coherence of the solution can be seen over the mosaic with a smooth variation of the amplitude, the central velocity, and the velocity dispersion of each Gaussian. It is already possible to distinguish in those spectra the convergence of the decomposition toward different velocity dispersions, that is, different temperatures/phases of the gas due to the energy term λ σ σ n − m n 2 2 . To have a clear view of these different components, let us take a look at the probability distribution function σ weighted by the fraction of total emission of each Gaussian √ 2πa n σ n / r N HI (r) presented in Fig. 6. This diagram shows the amount of gas in a given range of velocity dispersion (i.e., indirectly a certain range of temperature). It is clear that ROHSA, in this case, converges toward a three-phase model with typical velocity dispersion close to the expected values in the CNM (σ < 2 km s −1 ), LNM (σ ∼ 6 km s −1 ), and WNM (σ ∼ 8 km s −1 ). We see that a similar behavior is also present in the application to an observation of high Galactic latitude presented in Sect. 4. We note also that since eight Gaussian components have been used by ROHSA, some phases are encoded by several components. Association of the different components to characterize a three-phase model is presented in Sect. 3.3.4.

Properties of individual components
Integrated column density maps of each component obtained with ROHSA for N = 8 Gaussian are presented in Fig. 7. For each component G n , the mean velocity µ n and mean velocity dispersion σ n averaged over the field are presented in Table 1. The surface filling factor appears to vary considerably between the eight components. The components with low values of σ n are sparsely present, while the component with the largest velocity dispersion is present everywhere on the field. We recall that the numerical simulations used here were designed to reproduce the WNM-CNM condensation process of the HI through the thermal instability. The factor 100 difference in density between the two phases, and the fact that the mass fraction in each one is about 50%, implies that the cold phase fills only a few percent of the volume (Saury et al. 2014). This translates directly in the column density maps recovered by ROHSA; the narrow components, corresponding to colder structures, fill only a fraction of the projected field of view, while the larger component is present everywhere.
The eight velocity fields and velocity dispersion fields are presented in Figs. 8 and 9, respectively. At some location of the fields, when there is no need for a Gaussian to describe the signal over several pixels, the amplitude goes to zero. The corresponding velocity and velocity dispersion fields then have no reason to fluctuate. It turns out that where a n goes to zero, µ n and σ n are flat. This explains the apparent variation of spatial resolution as seen for example in Fig. 8 (bottom right). We notice that the first component G 1 (top left) and the second component G 2 (top right), encoding the WNM and the LNM, respectively, are defined everywhere, meaning that µ 1 and σ 1 also have fluctuations everywhere. The implications of this are discussed in Sect. 5.

Mapping the three-phase neutral ISM
In order to compare the result of the decomposition with the reality given by the numerical simulations, we grouped the eight components into three fields corresponding to the WNM, the LNM, and the CNM. The comparison with the numerical simulations requires that we identify ranges in temperature that demarcate these three phases. As with most numerical simulations that include the classical heating and cooling processes  of the ISM (Wolfire et al. 1995), the simulation of Saury et al. (2014) we use here shows a continuum of temperature, with no clear separation and with a significant fraction of the gas present at temperatures corresponding to the thermally unstable regime (see their Figs. 14 and 15). To facilitate the comparison with previous studies, we decided to use the canonical values T k lim,CNM/LNM = 500 K and T k lim,LNM/WNM = 5000 K (Heiles & Troland 2003b) to separate the simulation in three components. The integrated column density maps associated to each phase are computed following the methodology described in Sect. 3.2. The comparison between the integrated column density maps recovered with ROHSA and those inferred directly from the simulation is presented Fig. 10. The intensity and the morphology of each phase is well recovered. It is nevertheless possible to see some leakages between the phases, in particular between WNM and LNM. This is due partly to the poorly defined temperature thresholds used to separate the phases. It is also due to small confusions during the Gaussian decomposition where the intensity, the dispersion velocity, and velocity centroid of each component in the PPV space are close to each other. In other words, for similar central velocities, the scales of fluctuations on the velocity axis characterizing each component are too close to one another (see Figs. 5 and 6).
One way to evaluate the quality of the reconstruction is to compare the statistical properties of the cloud and inter-cloud components. Three different fields are used: the integrated column density field of the cloud medium (CNM), the integrated column density field of the inter-cloud medium (LNM + WNM), and, because it is fully sampled in the plan of sky, the centroid velocity field of the inter-cloud medium. Figure 11 presents the integrated column density field and the centroid velocity field, computed using Eq. (18), of the inter-cloud medium obtained combining the LNM and the WNM inferred with ROHSA and those obtained directly from the simulation.
In order to compare the estimates from ROHSA to the ones obtained from the simulation, we compute the spatial power spectrum (SPS) of each image. The SPSs of the integrated column density of the cloud and inter-cloud medium are presented Fig. 12 and the SPS of the centroid velocity field of the intercloud medium is presented in Fig. 13. In each case, the statistics recovered by ROHSA is consistent with the numerical simulation over all scales. The shape of these power spectra is interesting in µ n (km s −1 ) 0.2 −1.7 0.5 2.3 −0.1 5.0 −3.7 −2.5 σ n (km s −1 ) 8.2 6.1 0.5 0.6 1.5 1.5 1.8 0.9 itself; the inter-cloud medium is featureless with an almost constant power law, as the cloud phase is more structured, with a break at about 20 pixels, showing a typical scale linked to the condensation process. Interestingly, ROHSA is able to capture all these features very well.

Application on high-latitude HI gas
After validating the identification of the HI phases on numerical simulations, in this section we present the application of ROHSA on a 21 cm observation of a region with high Galactic latitude.

North ecliptic pole
To avoid the complication of low-latitude observations, where the 21 cm emission is significantly affected by velocity crowding and self-absorption, we chose to apply RHOSA to one of the high Galactic latitude fields of the GHIGLS 1 survey (Martin et al. 2015). We chose the North ecliptic pole (NEP) field, a 12 • × 12 • region centered on l = 96 • .40, b = 30 • .03 observed with the Green Bank Telescope, providing a 9 .55 spatial resolution. The HI spectra have an effective velocity resolution of 0.807 km s −1 and cover −200 < v (km s −1 ) < 50. The integrated column density map computed using Eq. (17) is shown in Fig. 14 and a mosaic of representative emission spectra is shown in  Table 1. The surface filling factor varies considerably between components, depending on their σ n value.
As Fig. 15 shows, high-latitude spectra of HI are more complex that the synthetic observations computed from the numerical simulations of Saury et al. (2014) emission in NEP indeed exhibits significant emission in the intermediate velocity cloud (IVC) and high velocity cloud (HVC) ranges. In this paper we do not consider the velocity channels with HVC emission; we focus on the phase separation of the local velocity cloud (LVC) and IVC components. The LVC range between −20 and +20 km s −1 shows relatively smooth emission profiles, with a narrow peak around −3 km s −1 on top of a broader feature (see Fig. 15). The latter is rather smooth but when inspected in detail it shows faint spectral structures on all scales along the velocity axis. The sensitivity of the GHIGLS data is such that these fluctuations of the emission profiles are not due to noise. In fact, they can be followed from one spectrum to the next quite easily. These fluctuations of the emission spectra at scales of a few kilometres per second reveal the presence of CNM and LNM features on a range of velocities. This field was selected because of its representative 21 cm emission for Galactic latitudes of b ∼ 30 • . The emission features are not particularly complex, nor are they especially simple. In addition, a first Gaussian decomposition of NEP 21 cm data was performed by Martin et al. (2015) which provides an interesting point of comparison. Unlike ROHSA, Martin et al. (2015) used a method similar to the one described by Haud (2000) that considers only the term L(v z , r, θ(r)) 2 2 in the parameter optimization. We highlight some qualitative comparisons in the following sections.

Results
In order to decompose the 21 cm emission of the NEP field, we used ROHSA with N = 12 Gaussian components and each hyper-parameter has been set to 1000. As for the previous case, these values have been chosen empirically following the same methodology as described in Sect. 3.3.1, allowing us to converge  Fig. 11). The red line shows the inter-cloud medium inferred with ROHSA (bottom-left panel of Fig. 11).
toward a noise-dominated residual with a minimum number of Gaussian components. We note that the hyper-parameter values are not the same as for the first application presented in Sect. 2. The complexity of the underlying signal structure and its signal-to-noise ratio (S/N) are the main causes of these differences. However, a detailed understanding of the behavior of these hyper-parameters would require testing different values over a large number of observations. Such exploration is currently complicated by computations limitations. A GPU version of the code is under development to overcome this main limitation. The combination of 12 Gaussian components produces a solution that recovers 99% of the total emission with spatially coherent components. The total integrated column density encoded by ROHSA and the residual between our model and the data are shown in Fig. 14 (middle and right panels).

Global properties of the Gaussian sample
Like for the application on the synthetic observations, the Gaussian parameters recovered for the NEP field have a strong spatial coherence; ROHSA converges towards a solution with smooth variations of the Gaussian parameters across the field. It turns out that ROHSA converges toward a multiphase model with Gaussian components of various widths, very similar to the application to numerical simulations presented in Sect. 3.3 but more complex due to the presence of an IVC component in the data.
To have a global view of the thermal state of the gas as a function of velocity, it is useful to look at the two-dimensional dispersion-velocity diagram σ − v weighted by the fraction of total emission of each Gaussian √ 2πa n σ n / r N HI (r) shown in Fig. 16. This diagram shows isolated complexes of Gaussian components in the σ − v space. This is a direct result of the way the parameter optimization is done in ROHSA, with a regularization term that favors the minimum variance of σ. Figure 16 highlights the fact that the 21 cm emission in NEP is mainly composed of negative-velocity components. A clear trend is visible in the velocity range −60 < v < 0 km s −1 with σ decreasing from 10 to 1 km s −1 going from negative to positive velocities. This likely reflects the radiative condensation of warm intermediate velocity clouds into the local velocity component of the neutral ISM. At this point it is interesting to compare the results of ROHSA with the Gaussian decomposition of the same data performed by Martin et al. (2015). The σ − v diagram of Martin et al. (2015, see their Fig. 7) shows a continuous distribution with arches that bridge together the LVC and IVC gas, an effect that is not observed in our results. Similarly to what we have done here, Martin et al. (2015) used numerical simulations to evaluate the performances of their Gaussian decomposition algorithm. Their tests revealed that such arches in the σ − v diagram are unphysical; they are the result of LVC and IVC gas components that overlap in velocity.
It is important to point out that both solutions provide as good a representation of the same dataset. The significant difference between the two solutions highlights the challenge of extracting a physically meaningful representation of the data. We recall that Martin et al. (2015) used an algorithm similar to the ones used by Haud (2000); Miville-Deschênes et al. (2017a); Kalberla & Haud (2018) where the spatial coherence of the solution is not enforced through regularization terms in the cost function. In these previous studies spatial coherence is attempted by providing spatially coherent initial guesses. Each spectrum is then fitted independently and no spatial coherence in the solution is enforced. In practice, this method is rather effective for relatively sparse emission data like CO (Miville-Deschênes et al. 2017a), but in the case of the more confused 21 cm data it produces parameter maps that are more affected by small-scale noise due to the degeneracy of the solution.
A Gaussian decomposition algorithm that fits each spectrum individually is easily fooled by components that overlap in velocity. In this specific case, such an algorithm would find a solution with a smaller number of components but with larger values of σ. The main innovation in ROHSA is that it is able to cluster different phases even if they are close in velocity. The four energy terms added to the cost function J(θ(r)) allow ROHSA to find a spatially coherent solution while avoiding the mix of components due to the high confusion present in the emission.

A cloud/inter-cloud medium vision of the North ecliptic pole
Integrated column density maps, centroid velocity fields, and dispersion velocity fields obtained with ROHSA are presented in Figs. 17-19, respectively. Mean velocities µ n and mean velocity dispersions σ n of the 12 Gaussian components G n are presented in Table 2. In this section we focus on building a coherent cloud/inter-cloud medium vision considering the local component of the emission identified previously. Two of the four components of the local gas, G 7 and G 9 , are associated with the CNM with σ 7 = 1.6 km s −1 and σ 9 = 1.9 km s −1 . The other components are used to build the inter-cloud medium. Integrated column density fields and centroid velocity fields of the cloud medium and inter-cloud medium are presented in Fig. 20. As noted by Martin et al. (2015) in their two-phase decomposition of the local component, filamentary structures are observed in the narrow component. The associated velocity    Table 2. Mean velocity µ n and mean velocity dispersion σ n of the 12 Gaussian components G n inferred by ROHSA in NEP.
µ n (km s −1 ) −74.1 −53.9 −44.7 −35.0 −22.9 −12.6 −4.8 −1.3 0.2 10.9 40.8 75.9 σ n (km s −1 ) 9.7 6.2 3.5 5.1 5.3 4.7 1.6 6.3 1.9 7.5 12.5 9.3 dispersion fields (see Fig. 19, component G 7 and G 9 ) coherent fluctuating values over a large part of the field. The core of these filamentary structures appears narrower than the envelop with velocity dispersion reaching about 0.87 km s −1 (the spectral resolution) in their centers. The broader component has an integrated column density field with no particular structure like filaments (see Fig. 20,topright). Like for the numerical simulation, the sum of the broad components is likely to represent a phase that fills a large fraction of the volume, as would an inter-cloud medium. One interesting aspect of the ROHSA decomposition is that it then allows to extract the velocity field of this volume-filling component (Fig. 20, bottom-right), enabling the characterization of the turbulent cascade in a mixture of lukewarm phase and warm phase.

Discussion
Historically, a large number of studies used a Gaussian basis to model 21 cm data. Different algorithms have been developed; all of them are fitting each spectrum individually, with or without information from the neighboring solutions to initialize the fit. To further constrain the degeneracy of the fit, solutions with the smallest number of Gaussian components have often been favored (e.g., Lindner et al. 2015). Because of velocity blending, the solution with the smallest number of components is not necessarily the best one. In some cases, narrow features overlap in velocity, making it impossible to separate them if the environment is not considered. Usually, this confusion breaks apart a few beams away and more components can be recovered. The fundamental idea behind ROHSA is that we are trying to extract diffuse components that have column density, centroid velocity, and velocity dispersion with smooth spatial variations. The optimization scheme has been designed with that concept at its core. In order to achieve this, ROHSA fits the whole data cube at once.
The application of ROHSA on both synthetic observations from numerical simulations and observational data converges naturally toward a multiphase model of the neutral ISM. The ability of ROHSA to extract the multiphase nature of the neutral ISM opens a totally new perspective on the study of the nature of the condensation process acting in the ISM. It is clear from a numerical point of view that the formation of cold clouds is the result of the condensation of the warm and diffuse gas through the thermal instability coupled with turbulence. From an observational point of view, spatial correlations between the different phases can now be made in order to more precisely quantify how the CNM emerges from this condensation process. This separation also opens the possibility to describe the properties of the very specific multiphase turbulence of the HI. ROHSA appears to be efficient at clustering different structures in PPV space, even when there is a high level of confusion. In particular, the separation of the LVC and IVC is known to be particularly challenging as the CNM and WNM of both components overlap significantly in velocity (Martin et al. 2015). As shown in Fig. 16, ROHSA significantly limits the "Arch effect" typical of this confusion (see Sect. 4). Globally, the performance in ROHSA on regions of high . Integrated column density maps (left: G 1 , G 4 , G 7 , G 10 ; middle: G 2 , G 5 , G 8 , G 11 ; right: G 3 , G 6 , G 9 , G 12 ) obtained by ROHSA applied on NEP. Mean velocity µ n and mean velocity dispersion σ n are presented in Table 2 Galactic latitude opens a large range of possibilities regarding the study of infalling neutral clouds from the galactic halo. We would also like to point out that no a priori information about the number of phases present in the neutral ISM is provided to ROHSA. The algorithm rests only on the hypothesis of the existence of components with similar line width through the energy term λ σ σ n − m n 2 2 . In that respect, ROHSA is perfectly adapted to decomposing hyper-spectral observations of any type, not only 21 cm emission.
At this time the main limitations of ROHSA are computational. First, as the whole PPV cube is fitted at once, the use of ROHSA is limited to cubes that can fit in memory. Second, the current computation time of ROHSA is not negligible (e.g., about two hours on a single CPU for a 256 × 256 × 100 PPV cube with 8 Gaussians). This limits the possibility to make a deep exploration of the hyper-parameters λ i . In particular it would be interesting to explore various weights of the hyper-parameters for the deduced quantities (a, µ and σ). It is expected that the amplitude of the spatial variations of these quantities are not the same. For instance, in a multi-phase medium like the HI, the density field (represented by a) might vary more strongly on smaller scales than the velocity field. This might require different values of λ a compared to λ µ , λ σ and λ σ . A GPU version of the code is under development that would allow such an exploration.

Summary
Here we present a new Gaussian decomposition algorithm named ROHSA. Energy terms have been added to the classical cost function to take into account the spatial coherence of the emission and the multiphase nature of the gas simultaneously. In order to identify a solution with spatially smooth parameters, the fit is performed on the whole hyper-spectral cube at once.
The performance of ROHSA has been evaluated using a synthetic 21 cm observation computed from a numerical simulation of thermally bi-stable turbulence. It was then tested on a 21 cm observation of a field of high Galactic latitude observed with the GBT. The main conclusions are as follows.
1. ROHSA is able to naturally highlight the physics of any multiphase medium without a priori information regarding the number of phases. 2. Evaluation on numerical simulation of thermally bi-stable turbulence shows that the sum of Gaussian components is a good approximation to model the multiphase nature of the neutral ISM. 3. The multiphase model inferred with ROHSA provides a spatially coherent vision of the integrated column density map, the centroid velocity field, and the velocity dispersion field of each component. 4. The power spectra of the integrated column density and centroid velocity fields are well recovered with ROHSA. Statistical properties of turbulence in the multiphase neutral ISM now become accessible. 5. The decomposition of a high-latitude HI gas observation shows the wide range of applications enabled with ROHSA, for instance to study the radiative condensation of the WNM and the nature of the ISM at the disk-halo interface.