Issue 
A&A
Volume 577, May 2015



Article Number  A140  
Number of page(s)  13  
Section  The Sun  
DOI  https://doi.org/10.1051/00046361/201425508  
Published online  19 May 2015 
Sparse inversion of Stokes profiles
I. Twodimensional MilneEddington inversions
^{1}
Instituto de Astrofísica de Canarias,
38205
La Laguna, Tenerife,
Spain
email:
aasensio@iac.es
^{2}
Departamento de Astrofísica, Universidad de La
Laguna, 38205, La
Laguna, Tenerife,
Spain
^{3}
Institute for Solar Physics, Dept. of Astronomy, Stockholm
University, Albanova University Center, 10691
Stockholm,
Sweden
email:
jaime@astro.su.se
Received: 12 December 2014
Accepted: 25 March 2015
Context. Inversion codes are numerical tools used to infer physical properties from observations. Despite their success, the quality of current spectropolarimetric observations and those expected in the near future presents a challenge to current inversion codes.
Aims. The pixelbypixel strategy of inverting spectropolarimetric data that we currently use needs to be surpassed and improved. The inverted physical parameters have to take into account the spatial correlation that is present in the data and that contains valuable physical information.
Methods. We used the concept of sparsity or compressibility to develop a new generation of inversion codes for the Stokes parameters. The inversion code uses numerical optimization techniques based on the idea of proximal algorithms to impose sparsity. In so doing, we allow for the first time exploiting the spatial correlation on the maps of physical parameters. Sparsity also regularizes the solution by reducing the number of unknowns.
Results. We compare the results of the new inversion code with pixelbypixel inversions to demonstrate the increased robustness of the solution. We also show how the method can easily compensate for the effect of the telescope point spread function, producing solutions with an enhanced contrast.
Key words: Sun: magnetic fields / Sun: atmosphere / line: profiles / methods: data analysis
© ESO, 2015
1. Introduction
An inversion code is a computer program based on algorithms that allow the user to extract information about the parameters of a physical system by interpreting the observables^{1}. In solar physics, nonlinear inversion codes are routinely applied since they were developed in the 1970s to derive the thermodynamical and magnetic properties of different regions of the solar atmosphere from the interpretation of the Stokes parameters (Harvey et al. 1972; Auer et al. 1977; Skumanich & Lites 1987).
One of the fundamental problems in general, and specifically for spectropolarimetric inversions, is that the relation between the observables and the physical parameters of interest is very complex. Despite the strong nonlinearity and nonlocality inherent to the radiative transfer, it has been possible to develop some simple diagnostics during the past years: the line ratio technique (Stenflo 1973, 2010, 2011), the centerofgravity method (Semel 1970; Rees & Semel 1979), and calibration curves between spectral line summaries and certain magnetic field properties (e.g., Lites et al. 2008; Martínez Pillet et al. 2011, for recent applications).
The appearance of the first powerful computers allowed researchers to apply optimization techniques for nonlinear functions and use more elaborate models. A χ^{2} merit function^{2} is minimized with respect to the physical parameters defining the specific model. The first efforts (with the application of the computers available at that time) made use of the MilneEddington (ME) approximation to analytically solve the radiative transfer equation (e.g., Harvey et al. 1972; Auer et al. 1977; Landi Degl’Innocenti & Landolfi 2004). Although the simplifying assumptions that one needs to apply when using the ME approximation may not be fully fulfilled in real solar plasmas, it is still one of the most widely used models, in part because of its simplicity. This simplicity leads to very fast inversion codes that can be applied to the large number of observations that we currently obtain. The ME approximation is the basis for advanced inversion codes such as VFISV (Borrero et al. 2007, 2010), which is used to infer magnetic field vectors in data from the Helioseismic and Magnetic Imager (HMI; onboard the Solar Dynamics Observatory), MILOS (Orozco Suárez et al. 2007) and MERLIN (Skumanich & Lites 1987; Lites et al. 2007), which are currently applied to data from the Hinode spacecraft, or the codes based on lookup tables and the principal component analysis (PCA) decomposition (Rees et al. 2000; SocasNavarro et al. 2001), which are used to invert data acquired with THEMIS (Télescope héliographique pour l’étude du magnétisme et des instabilités solaires).
The availability of more powerful computers in recent years allowed us to use more complex and more realistic models. One of the essential ingredients of this revolution was the application of the idea of response functions (Landi Degl’Innocenti & Landi Degl’Innocenti 1977) to the inversion of Stokes profiles with nontrivial depth stratifications of the physical quantities. The first representative of this family of codes was SIR (Stokes Inversion based on Response functions; Ruiz Cobo & del Toro Iniesta 1992). This evolution occurred naturally at that time because observations showed strong asymmetries in the Stokes profiles in magnetized regions. The explanation of these asymmetries requires the presence of gradients along the line of sight (LOS) of the physical properties (Solanki & Pahlke 1988; GrossmannDoerth et al. 1988; Solanki & Montavon 1993; Sigwarth et al. 1999; López Ariste 2002; Khomenko et al. 2003; Martínez González et al. 2008; Viticchié & Sánchez Almeida 2011). Another representative of these inversion codes is SPINOR (Frutiger et al. 2000). Based on the same strategy, SocasNavarro et al. (2000) developed the code NICOLE (SocasNavarro et al. 2015), which is capable of processing lines in nonlocal thermodynamical equilibrium (NLTE). This model has mainly been applied to invert Ca ii infrared triplet lines, which are formed under strong NLTE conditions (e.g., SocasNavarro et al. 2000; de la Cruz Rodríguez et al. 2012).
Another step forward in the field of inversion codes was made when Asensio Ramos et al. (2007) and Asensio Ramos (2009) introduced Bayesian inference for spectropolarimetric observations. This allows the user to obtain posterior probability distributions for any model parameter and their correlation with the remaining ones. This probabilistic inference is extremely powerful, but requires a huge effort in terms of computational power.
Arguably, the latest step in the field of inversion codes has been to compensate for the effect of the telescope point spread function (PSF)^{3} that is present in the observed data. van Noort (2012) implemented a forward approach, where synthetic spectra from the inversion are convolved spatially with the telescope PSF before they are compared with the observations. Ruiz Cobo & Asensio Ramos (2013) used a regularized deconvolution to compensate for the effect of the PSF before the inversions were performed.
The first solves the full problem in which the spatial convolution and the inversion are coupled in the same operator O, resulting in a very largescale classical inversion code. The use of a LevenbergMarquardt (LM) scheme (Levenberg 1944; Marquardt 1963) with a compact PSF leads to a sparse Hessian matrix that highly facilitates the inversion by spatially coupling the convergence of adjacent pixels. In contrast, the latter decouples the inversion. First the observations are deconvolved by applying a PCA regularization in the spectral domain and by applying a standard RichardsonLucy (Richardson 1972; Lucy 1974) maximumlikelihood deconvolution for each eigenimage. Then, the resulting denoised and deconvolved data are inverted using any classical inversion code. Both approaches introduce different regularizations in the inversion problem and can lead to potentially different results.
In the first approach, moreover, there is no need to consider a lowpass filtering or a regularization of the deconvolution, but the authors reported cases of oscillatory patterns in the inferred model that need to be damped during the inversion. In practice, the implementation of the second approach is easier, cleaner, and less prone to the aforementioned oscillatory artifacts, but the PCA regularization can filter out features with low statistical weight in the dataset. These methods rely upon accurate knowledge of the PSF used in the inversion, which may not be always available or applicable.
We note that Scharmer et al. (2013) also combined data inversions with a regularized deconvolution to compensate groundbased observations for straylight, which is believed to originate from smallscale aberrations (Löfdahl & Scharmer 2012).
To take a new step forward in the field of inversion codes, we here propose a new technique to spatially couple the parameters of the model. This is inspired by the idea of sparsity or compressibility of the data. This method allows using a PSF in the same way as did van Noort (2012), but it is not strictly needed for the spatial coupling.
2. Sparsity regularization
With the use of fast slit spectropolarimeters and 2D filterpolarimeters, we routinely have 2D maps of regions in the solar atmosphere in which the four Stokes parameters are observed at several points along one or several spectral lines. This rate of new highquality 2D observations will increase in the future with the advent of bidimensional spectropolarimeters based on image slicers or optical fibers. These observations have been interpreted in the past by assuming that all pixels are completely unrelated and by applying the inversion codes on a pixelbypixel basis. After this pixelbypixel inversion, the spatial smoothness of the derived quantities is taken as an indication of successful inversions. Saltandpepper noise present in the inverted maps of physical parameters is an indication of problems: either the required information cannot be extracted from the Stokes profiles, or the inversions failed to converge to a good solution.
It has become more and more evident that 2D observations and the ensuing inversions are needed to fully understand the physical processes in the solar atmosphere. This is the case even assuming local thermodynamical equilibrium (LTE), which relates radiation to the local properties of the plasma. The first representatives of this approach are the codes of van Noort (2012), Ruiz Cobo & Asensio Ramos (2013), in which the PSF of the telescope couples the observed Stokes profiles of nearby pixels.
2.1. General idea
Here we develop the general idea of regularized inversion codes, with the regularization being based on the idea of sparsity. Sparsity, or compressibility, idealizes the concept that the data can be projected onto a parameter space where a reduced set of variables can be used to fully describe that dataset. The motivation for this regularization is based on a very simple observation. When one saves a continuum image as a raw file (for instance, a standard 512 × 512 pixel image), the size of the file is roughly 1 MB (using four bytes per pixel). The same image compressed using a lossless file format reduces the size by a factor 3−4, while a lossy format can go further and increase the compression ratio to a factor ~10. The compression is possible, fundamentally, as a result of the presence of spatial correlation (properties such as smoothness and structures in the image) on the image. If appropriately exploited, it is possible to predict the value of a certain pixel from the values of other pixels, thus making it unnecessary to store the value of all pixels.
We assume that we have observed the Stokes parameters on a 2D grid of N_{pix} = N_{x} × N_{y} pixels for a set of N_{λ} wavelength points that sample one or several spectral lines. We propose a model atmosphere for every pixel to explain the observations, each model atmosphere being defined with a set of N_{par} parameters, so that each pixel k is described with the vector of parameters . We can build the vector of parameters p that encode all the model parameters for all the pixels in the field of view, such that . It is possible to reorder this vector so that it is built by stacking the value of each physical parameter for all the pixels, so that p = { p_{1},p_{2},...p_{Npar} }, where p_{i} encodes the value of a certain parameter (i.e., magnetic field inclination, Doppler velocity, etc.) as a 2D map. Note that the vectors p and are equivalent except for the ordering. The aim of any inversion code is, then, to infer the full vector p from the observed Stokes profiles. As explained in the introduction, classical inversion codes solve this problem by inverting the observed Stokes profiles pixel by pixel.
Fig. 1 Test for the compressibility of the parameters inferred in a quietSun map. Each column indicates the number of nonzero coefficients of the original map that are retained. 
In our approach, instead of working directly on the real space of parameters, we use a linear transformation so that the ith parameter is given by (1)In this equation, W is a linear operator that transforms from the real space to the transformed space. Among many potential transformations, the most interesting in our context are the Fourier, wavelet, or discrete cosine transforms. It is clear that by working on the transformed space and using an inversion code to infer the transformed parameters, we have gained nothing. However, here we can impose a fundamental key ingredient: the assumption that the transformed image is sparse^{4}in the transformed domain. In other words, if the appropriate transformation W is used, many elements of the transformed image (that we term modes in the following for simplicity) are zero or very close to zero in absolute value. Sparsity, which is usually fulfilled in nature, has been shown to be an extremely powerful assumption. Techniques such as compressed sensing (Candès et al. 2006; Donoho 2006) or exact matrix completion from partial measurements (Candés & Recht 2009) strongly rely on the sparsity assumption.
2.2. Testing for sparsity in physical parameters
For the inversion that we describe in the following to work properly, it is crucial to test for the sparsity of the model parameter maps. To this end, we performed ME pixelbypixel inversions of two maps observed with the Solar Optical Telescope/SpectroPolarimeter (SOT/SP; Lites et al. 2001) onboard Hinode (Kosugi et al. 2007). We chose to only invert the Fe i line at 6302.5 Å. The first map is a quietSun map extracted from the observations analyzed by Lites et al. (2008), which were obtained at disk center on 2007 March 10. The second map is a sunspot of the active region NOAA 10953, which was mapped at an average heliocentric angle of θ = 12.8° on 2007 April 30. This inversion was carried out with a classical inversion code using the LM algorithm and no regularization at all, but the inversion was repeated ten times for each pixel using different initial values of the model parameters (which were randomized). The ME model depends on a set of assumptions: i) the atmosphere is planeparallel, semiinfinite, and in local thermodynamic equilibrium; ii) the line opacity, Doppler width, and all line properties are constant with depth in the atmosphere; iii) the magnetic field vector is constant with depth in the atmosphere; and iv) the Planck function is a linear function of the continuum optical depth τ_{c}, so that B_{T} = B_{0} + B_{1}τ_{c}, where dτ_{c} = − κ_{c}ds, where κ_{c} is the continuum opacity. The ME model parameters of our code are the Doppler width of the line in wavelength units (Δλ), macroscopic bulk velocity (v_{mac}), the slope B_{1} and intercept B_{0} of the source function, the ratio between the line and continuum absorption coefficients (η), the linedamping parameter (a), and the magnetic field vector parameterized by its modulus, inclination and azimuth with respect to a given reference direction (B, θ_{B}, and φ_{B}, respectively).
Figures 1 and 2 display the effect of compressing some parameters in the quiet Sun and the sunspot maps. Both maps contain 256 × 256 pixels, and the selected parameters are the magnetic field strength, Doppler velocity, and width of the line. Each panel shows the reconstruction of the original image (the rightmost image in each panel) when the image is transformed using a linear transform, thresholded leaving only a certain percentage of the largest coefficients and transformed back. The percentage of nonzero coefficients is shown as a label in each column. In this example, we used the Daubechies8 (db8) orthogonal wavelet (Jensen & la CourHarbo 2001) as a sparsityenhancing transform, which is appropriate for smooth maps, although the results are similar if other transforms are used. The figure demonstrates that a very good reproduction of the original maps is achieved when the number of nonzero coefficients is slightly larger than 15%. This compression can be obtained because the linear transformations capture part of the spatial correlation in the image. As a consequence, the value of ~85% of the pixels can be predicted from the value of just 15% of the image plus the presence of spatial correlation. To help in this comparison, Fig. 3 displays the histogram of the difference between the original magnetic field strength, Doppler velocity, and width of the line and the compressed parameters for different levels of compression. In this case, apart from the Daubechies8 orthogonal wavelet transform, we also introduced the Haar (db1) orthonormal wavelet (equivalent to the Daubechies1 wavelet; Jensen & la CourHarbo 2001), which is a discontinuous wavelet appropriate for efficiently representing maps with spatial discontinuities and the discrete cosine transform (DCT), appropriate for periodic signals. These results show that keeping 10−15% of the coefficients seems to reproduce the original parameter maps very well. When the fraction of remaining coefficients is made too small, more artifacts appear and the difference between the original maps and the compressed ones is too large.
Fig. 2 Same as Fig. 1, but for the sunspot data. We note that these images correspond to the solution of the pixelbypixel inversion and contain some artifacts. 
Fig. 3 Histograms of the difference between the original and compressed parameters at different levels (indicated in each column). Upper panels: results for the quiet Sun map, while the lower panels refer to the sunspot data. Each histogram refers to a different linear transformation. 
One of the consequences of keeping only a certain percentage of the coefficients in the transformed domain is that we loose some information (lossy compression). It is important that the user of the inversion code selects appropriate threshold limits so that no important information is lost in the final maps. Although we recognize the presence of these additional parameters to tweak, standard inversion codes are not easy to use and are also full of parameters to tweak. In the end, all these parameters can be tuned during an initial trialanderror phase and fixed afterwards.
2.3. Advantages and disadvantages
The inversion code that we describe in the following (together with the numerical techniques needed to enforce sparsity) infers the physical parameters in the transformed space. In the end, the physical parameter of interest is obtained by just a final application of the inverse transform.
Working in the transformed domain and applying a sparsity regularization leads to the following obvious advantages:

Imposing that the solution to the inversion has to be sparse introduces a strong regularization because the number of unknowns is heavily reduced, from the potential N_{par}N_{x}N_{y} to only sN_{par}N_{x}N_{y}, with s ≪ 1. We have found that typical values of the sparsity s will be around 20−30%. Given that a large portion of the coefficients can be set to zero without degrading the solution, the number of free variables that we currently use in pixelbypixel inversions is highly overestimated. An overly large number of degrees of freedom can produce nonphysical fluctuations of the physical parameters^{5}.

The inverse linear transformation gives the value of a physical parameters at a specific observed pixel as a linear combination of all modes in the map. This global character induces changes in all the pixels of the original space simultaneously when perturbing the value of a single mode. Reversing the argument, the observed Stokes profiles of a single pixel provide a little information to all modes simultaneously. This high redundancy is a very interesting advantage of our approach and introduces a strong regularization of the solution. This global character also introduces strong regularization as a result of the spatial correlation in the observables and the physical parameters.
The main problem of the method that we present here is that a suitable linear transform has to be chosen beforehand. The wavelet transform is sufficiently versatile to highly compress the 2D maps of many of the physical parameters needed to synthesize spectral lines. For this reason, we suggest to use the same transform W for all the parameters. However, the method explained in the following allows the user to choose a different linear operator W for every parameter p_{i}. This way, it is possible to design a set of linear transforms that leads to a very sparse representation of all the needed physical parameters. This additional step that we need to give as compared with standard pixelbypixel inversions is the one that introduces the spatial regularization, however.
2.4. Merit function
All standard inversion codes work by optimizing a merit function that measures the ℓ_{2}norm of the residuals with respect to the vector p of physical parameters. If the observed Stokes parameters are corrupted with Gaussian noise with diagonal covariance matrix, the merit function for the whole map is (2)where refers to the synthetic Stokes vector at wavelength position j and position k. Likewise, O(λ_{j},k) is the observed Stokes vector at this very same wavelength and for pixel k. The symbol σ_{ijk} stands for the standard deviation of the noise at wavelength position i, for the Stokes parameter j and pixel k, while w_{i} is the weight associated with each Stokes parameter (that we assume, for simplicity, to be the same for all wavelengths and pixels). This weight is introduced for technical reasons to help improve the convergence during the optimization^{6}. Classically, it is customary to solve the problem (3)by direct application of the LevenbergMarquardt algorithm, which is especially well suited to the optimization of such ℓ_{2}norms. This involves an optimization for all parameters of all pixels simultaneously. Either or p can be used without this choice making a difference because the optimization has to be done for all parameters of all pixels.
In a complete parallel formulation, one can work on the transformed domain and write the merit function as (4)The previous expression takes into account that the synthetic Stokes profiles at pixel k now depends on the full vector q through the linear transformation. Therefore, the inversion problem is solved by computing (5)The solutions to Eqs. (3) and (5) are strictly equivalent, with the difference that the former is solved in the physical parameters and the latter in the transformed domain. However, with the results of Figs. 1–3, Eq. (5) allows us to impose the sparsity constrain, so that the regularized solution is obtained by computing (6)where ∥ q ∥ _{0} is the ℓ_{0}norm of q, equivalent to counting the number of nonzero elements. The desired sparsity level is set by the upper limit s ≪ N_{par}N_{x}N_{y}. In other words, we minimize the merit function χ^{2} with respect to the modes q, but using only a very small number of nonzero elements in q. Equivalently, the sparsity constrain can be incorporated in the function through a Lagrange multiplier, so that Eq. (6) can be rewritten as (7)with λ a parameter that controls the strength of the constraint.
2.5. Proximal algorithms
As a consequence of the recent huge increase in the interest in compressed sensing, matrix completion, big data, and related techniques based on the idea of sparsity, several algorithms have been developed to optimize the addition of convex functions including nonconvex constraints (ℓ_{0} or ℓ_{1} norms, for instance). The objective is to solve the following problem:
where f(q) is a convex and differentiable function, while g(q) is another convex function, possible nonsmooth. This is exactly the type of problem that we have to solve in Eq. (7), in our case with and g(q) = λ ∥ q ∥ _{0}.
One of the most successful algorithms developed recently belongs to the class of methods termed proximal algorithms (e.g., Parikh & Boyd 2014). They can be viewed as the equivalent of the Newton method for nonsmooth, constrained, and largescale optimization problems. In this case, the solution to the general problem (8) is given by the following iteration (e.g., Starck et al. 2010): (9)where a step along the gradient of f(q) is carried out and then the socalled proximity operator is applied, defined in the following. Because is differentiable, the term inside the parenthesis is the standard gradient descent method that is typically used in many inversion codes for the Stokes parameters.
It is clear from Eq. (9) that the key ingredient of these proximal algorithms is the application of the proximity operator of the regularization function g(x), which is defined as (Parikh & Boyd 2014) (10)In general, this proximal operator has to be computed numerically, but exact solutions exist for many interesting regularization terms (Parikh & Boyd 2014). Because we use the ℓ_{0}norm as regularization, the proximity operator is simply given by (11)where H_{s}(x) is the nonlinear hard thresholding operator. This operator leaves the s elements of x with the highest absolute value untouched and sets the remaining elements to zero. Although all our results here use the ℓ_{0}norm as regularization, it is sometimes interesting to use the ℓ_{1}norm^{7}. In this case, the proximal operator is the smooth thresholding operator, which is given by (12)where (·)_{+} denotes the positive part.
When Eqs. (9) and (11) are combined, the simplest proximal algorithm that uses the gradient descent method to solve Eq. (7) is given by (e.g., Parikh & Boyd 2014) (13)This iteration is just a plain gradient descent algorithm (which only makes use of firstorder derivatives) that is augmented by using a hard thresholding projection operator in each iteration. In other words, after moving
Fig. 4 Inverted maps for some of the most relevant physical parameters of the model for a quietSun observation. The map is 512 × 512 pixels. Left column: pixelbypixel results, central panels: results of the sparsityregularized inversion, using a Daubechies8 orthogonal wavelet and a thresholding value of 50%. Third column: a sparsityregularized inversion compensating for the Hinode telescope PSF. 
the solution for the direction of the negative gradient (controlled by the stepsize h), one sets all elements to zero that do not fulfill the sparsity constraint. This simple iterative scheme displays a sublinear convergence rate O(1 /k). Recently, a trivial improvement of the gradient descent method known as the fast iterative shrinkagethresholding algorithm (FISTA) has been developed by Beck & Teboulle (2009); it shows a quadratic convergence rate O(1 /k^{2}). Algorithm 1, a combination of FISTA and the restarting scheme developed by O’Donoghue & Candès (2013), is our method of choice here. We leave an analysis of algorithms that smartly introduce some secondorder information without ever constructing the Hessian matrix for future research (e.g., Becker & Fadili 2012).
Equation (13) shows that the optimization method relies on computing the gradient of the merit function with respect to the modes q, which we have made explicity by using the subindex q on ∇_{q}χ^{2}. Given the linear character of the transformation of Eq. (1), this gradient can be trivially related to the gradient with respect to the physical parameters (the response functions), which can be analytically written as (14)The method described in this section is independent of the specific details of the transform W. The only condition is that it has to be linear and invertible. In addition, it is also desirable from the computational point of view that the linear transform has a fast algorithm available. This is the case both for the orthogonal wavelet and DCT transforms.
2.6. Possible improvements
Even though gradient descent methods are known to be fast when approaching the minimum, they become slower in the refinement phase of the solution (that is precisely the reason why the LevenbergMarquardt method is a combination of a gradient descent method and a Newton method, controlled by the Hessian). However, the nonconvex optimization of largescale problems has to rely on firstorder derivatives because the calculation and storage of the Hessian is impractical. In our experience and that of many others in the literature, the FISTA algorithm is a competitive optimization technique. To achieve a good convergence speed, the vector of stepsizes h has to be smartly chosen. Extensive experiments have convinced us that these steps can be kept fixed and used in different data sets without any particular impact on the convergence speed. However, we are currently investigating the possibility of improving the convergence speed along two lines. First, using approximations to the diagonal Hessian matrix that can be efficiently computed using quasiNewton methods. QuasiNewton methods update the Hessian matrix by analyzing successive gradient vectors using a generalization of the secant method. We anticipate that the methods developed by Becker & Fadili (2012) or Marjugi & Leong (2013) can be of help. Second, we plan on using conjugate gradient methods.
We have realized that convergence is greatly improved if the inversion is started by forcing a very sparse solution, which in practice is similar to reducing the number of parameters in the first steps (Ruiz Cobo & del Toro Iniesta 1992). Once the model cannot be improved any more, the inversion can be restarted assuming the final target sparsity, and therefore, giving more freedom to the inversion in the final steps. If the sparsity is assumed to be low, the inversion can reach a standstill when χ^{2} is still very high. The reason is the FISTA algorithm with fixed h. We expect to ameliorate this behavior with the future improvements that are described above.
2.7. Summary
The inversion scheme that we proposed here works as follows. A linear transform is chosen so that the transformed 2D maps of model parameters are as sparse as possible. We have tested that DCT and wavelet transforms seem to be appropriate. At the beginning of the iterative process, all the model parameters are initialized using any appropriate initialization that can be any standard inversion code. The initial solution is transformed to obtain the first estimation of the modes. At each iteration, the values of the merit function and of its gradient with respect to the model parameters taking into account all observed pixels are computed. To this end, the inverse transform of the current solution has to be computed. The gradients, reordered as 2D maps, are transformed using Eq. (14). Once the gradient descent step is applied, the sparsity constraint is applied by hardthresholding the resulting new modes. This somehow trivial step is when the sparsity condition is applied. Finally, algorithm 1 is repeated until convergence.
Fig. 5 Value of the χ^{2} for each Stokes parameter for the pixelbypixel inversion (left panels) and the sparsity regularized inversion (right panels). 
3. Computer code
We have developed a computer code using the ideas presented in the previous section. The code is written in C++ using the Message Passing Interface (MPI) standard for parallelization using a masterslave strategy. One node works as a master, sending calculations to the slaves and receiving the results. Although the architecture of the code is very general, the current version of the code only works with the ME model atmosphere. It can be used to carry out pixelbypixel inversions (using the LM algorithm) and also sparsity regularized 2D inversions (using the FISTA algorithm). We will provide extensive details on the implementation in the near future when the code is expanded to use more realistic cases than the ME included in this paper (de la Cruz Rodríguez et al., in prep.). Additionally, we will make the code freely available.
The code starts by using an initial estimate of the transformed coefficients for the maps of parameters. With this initialization, the computationally heavy part of the calculation of the χ^{2} and the gradient term ∇_{p}χ^{2} is fully parallelized, with each slave working on a piece of the map. When all the information is gathered by the master node, it computes the gradient term in the transformed domain by directly applying Eq. (14). These are the needed quantities to evaluate one iteration of the FISTA algorithm and produce a correction of the transformed coefficients of the maps of physical quantities. This very same procedure is iterated until convergence.
The code also allows the user to specify a PSF that is used for degrading the synthetic profiles and carry out the spatial deconvolution and the regularized inversion simultaneously, similar to the approach followed by van Noort (2012). Including the deconvolution is straightforward in our firstorder approach. The convolution operator is linear so that, apart from convolving the Stokes profiles with the PSF, we only need to convolve the response functions with the PSF, so that Eq. (14) becomes (15)where ∗ is the convolution operator.
4. Illustrative examples
As a demonstration of our approach, we present in Fig. 4 the inverted maps for some physical parameters of the ME model in the quietSun map. Since the inversion is carried out with a very simple model with only one component that is not able to capture the behavior of unresolved fields, we cannot infer much from the magnetic properties on the map. However, we offer some comments:

1.
The pixeltopixel inversion was carried out in a multistep process to improve the quality of the fit. First, we obtained a result from 50 randomizations of the initial parameters, keeping only the best solution for each pixel. Then we smoothed the map with a Gaussian kernel. Finally, we reran the inversion with another 50 randomizations, starting from the smoothed maps. We kept the solution that produced the best fit.

2.
The general appearance of the maps is much cleaner for the sparsityregularized maps, a consequence of the thresholding. This avoids arbitrary pixeltopixel variations because some spatial coherence is forced.

3.
The maps of Doppler velocities are indistinguishable for the two inversions, meaning that this quantity is very easy to find in a pixelbypixel inversion.

4.
The spatially regularized inversion returns many fields inclined by ~90°. This is a direct consequence of the fact that many pixels have a very low signaltonoise ratio (S/N). The bias produced by the noise on the transversal component of the magnetic field (Martínez González et al. 2012) causes the fields to systematically appear highly inclined.

5.
We also find that the Doppler width of the line is larger in the granules than in the intergranules on the sparsityregularized solution. The opposite is found in the pixelbypixel inversion. This behavior is compensated for in the sparsity regularized solution by a small increase of the magnetic field.

6.
The maps in the third column of Fig. 4 show the effect of compensating for the telescope PSF. Features become more conspicuous and compact, as demonstrated by van Noort (2012) and Ruiz Cobo & Asensio Ramos (2013).
The quality of the fits was assessed by computing the associated χ^{2} for each Stokes parameter as shown in Fig. 5. Curiously, the sparsityregularized inversion is comparable for Stokes Q, U and V and much better for Stokes I, because the sparse solution does not show all the structuring visible in the pixelbypixel solution. This demonstrates that our inversion code finds a much simpler model (in terms of the number of free parameters) that explains the data better.
Figure 6 shows the results for the sunspot data. Again, the inferred Doppler velocities are very similar in the two approaches, which makes this quantity reliable. The same behavior is also seen in the Doppler width of the line, with granules presenting spectral lines with a higher broadening than in the intergranular lanes. The same applies to the penumbra, where we find dark filaments to have a smaller Doppler width. Concerning the magnetic field vector, we find smoother and much less noisy results.
Concerning the convergence speed, Fig. 7 displays the value of the reduced χ^{2} in the whole map and taking into account all Stokes parameters for the two datasets. As usual in firstorder, the initial convergence is very fast, while the refinement of the solution is slower. We note that artifacts can appear in the sparse inversion when the solution is not fully converged. Unlike the pixelbypixel case, these artifacts appear as spatially coherent patches in the solution that deviate from the surroundings. When the solution is converged, these patches disappear. Therefore, a different intuition is needed to assess whether the solution is physically meaningful or not.
Finally, Fig. 8 illustrates the smearing effect of the PSF in the observed data. We did not detect the oscillatory behavior that was reported by van Noort (2012). We speculate that the regularization present in our sparse inversion naturally damps that behavior, but to be fair, an ME model is simpler than the depthdependent case considered by van Noort (2012), and this aspect should be addressed under the same conditions.
5. Conclusion and outlook
We presented the first full 2D regularized inversion code for Stokes profiles based on the idea of sparsity. Although the atmosphere model that we used is very simple, it represents the first step toward a full suite of inversion codes. The strong spatial regularization that we introduced thanks to the sparsity constraint allowed us to obtain very reliable results.
Fig. 7 Convergence of χ^{2} as a function of iteration number for the two data sets. 
Fig. 8 Illustration of the effect of the telescope PSF on the data. The FOV is divided in two halves for a monochromatic image at Δλ = 91 mÅ from line center in Stokes I (left) and V (right). The lower half shows the observations, without applying the PSF compensation. The upper half shows the synthetic image from the inversion, including the PSF compensation. This inversion has been performed assuming 50% sparsity. 
Our method is built upon some of the ideas presented by van Noort (2012), Ruiz Cobo & Asensio Ramos (2013): the model is spatially coupled and regularized. Furthermore, given that we did not use a Hessianbased inversion method, compensating the data for telescope or straylight PSF is trivial because it does not add extra complexity to the code beyond the actual computation of convolutions. However, if the telescope PSF is not known, the sparse regularization will also couple the parameters from neighboring pixels. The intrinsic nature of the wavelet transform allows reproducing smallscale and localized features as well as the large scales, and therefore reducing the number of free parameters does not necessarily preclude the code from reproducing gradients and sharp features if the correct sparsity level is chosen.
We have used the firstorder accelerated FISTA optimization algorithm to perform the 2D sparse inversion. We emphasize that we are exploring alternative methods to further improve the convergence of the inversion.
A fast and parallel inversion code implementing everything that is presented in this work and some extensions as described in the following will soon be freely available for the community. The code will be extensible if the user provides the emerging Stokes profiles from a given parameterized atmosphere and the response function of the Stokes parameters to all parameters. This response function can be computed analytically or, in general, numerically.
The concept of sparse regularization can be extended to many other problems that we plan to analyze in this series of papers, not necessarily maintaining the order. In a first step, we plan to extend the sparse regularization to atmospheres with gradients in the vertical direction. This will be done by fixing a set of heights on a common optical depth scale (for instance, the optical depth scale at 500 nm) that will play the role of nodes, in parallel to what is currently done in standard inversion codes such as SIR, Nicole, or SPINOR. The physical parameters (temperature, velocity, magnetic field, etc.) at each node will be assumed to be sparse on a wavelet basis, and the strategy used in this paper will be applied to fit the emerging Stokes profiles.
In a second step, the concept of nodes will be refined. Each physical parameter will be determined by a 3D cube that will be assumed to be sparse on a 3D basis (DCT and orthogonal wavelets are trivially extended to the 3D case). Therefore, the inversion will proceed by inferring a reduced number of wavelet modes for each physical parameter. The regularization imposed in this 3D inversion is very strong because the observation of a single pixel provides (probably little) information for constraining the depth stratification of all the pixels in the cube.
In a third step, standard pixelbypixel inversions can be carried out using a sparse regularization in height. The concept of nodes, which have been present in all inversion codes with depth stratification, looses its meaning again. The depth stratification of the physical parameters of interest will be expanded in a suitable dictionary of depthdependent functions (we use the term dictionary because the functions, in principle, do not need to be orthogonal) and the optimization will select the smallest number of functions needed to reproduce the Stokes profiles.
Finally, we note that our strategy might be extended to include regularized superresolution. We leave this analysis for the future because it needs to be studied in detail.
The term inversion describes that in a forward model defined by an operator O, one needs to apply the (nonlinear) inverse of the operator, O^{1} to the observations to recover the model parameters. Strictly speaking, it would be better to call these algorithms inference codes, although we here use the classical term inversion to avoid confusion.
Acknowledgments
Financial support by the Spanish Ministry of Economy and Competitiveness through projects AYA2010–18029 (Solar Magnetism and Astrophysical Spectropolarimetry) and ConsoliderIngenio 2010 CSD200900038 are gratefully acknowledged. A.A.R. also acknowledges financial support through the Ramón y Cajal fellowships. J.d.l.C.R. acknowledges financial support from the Chromobs project funded by the Knut and Alice Wallenberg Foundation. Part of the computations included in this paper were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at the National Supercomputer Centre (Linköping University), with project id snic20141273. This research has made use of NASA’s Astrophysics Data System Bibliographic Services.
References
 Asensio Ramos, A. 2009, ApJ, 701, 1032 [NASA ADS] [CrossRef] [Google Scholar]
 Asensio Ramos, A., Martínez González, M. J., & Rubiño Martín, J. A. 2007, A&A, 476, 959 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Auer, L. H., House, L. L., & Heasley, J. N. 1977, Sol. Phys., 55, 47 [NASA ADS] [CrossRef] [Google Scholar]
 Beck, A., & Teboulle, M. 2009, SIAM J. Img. Sci., 2, 183 [Google Scholar]
 Becker, S., & Fadili, M. J. 2012, in Advances in Neural Information Processing Systems 25 (NIPS 2012), 2627 [Google Scholar]
 Borrero, J. M., Tomczyk, S., Norton, A., et al. 2007, Sol. Phys., 240, 177 [NASA ADS] [CrossRef] [Google Scholar]
 Borrero, J. M., Tomczyk, S., Kubo, M., et al. 2010, Sol. Phys., 35 [Google Scholar]
 Candés, E. J., & Recht, B. 2009, Found. Comput. Math., 9, 717 [Google Scholar]
 Candès, E., Romberg, J., & Tao, T. 2006, Comm. Pure Appl. Math., 59, 1207 [CrossRef] [Google Scholar]
 de la Cruz Rodríguez, J., SocasNavarro, H., Carlsson, M., & Leenaarts, J. 2012, A&A, 543, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Donoho, D. 2006, IEEE Trans. Infor. Theory, 52, 1289 [CrossRef] [Google Scholar]
 Frutiger, C., Solanki, S. K., Fligge, M., & Bruls, J. H. M. J. 2000, A&A, 358, 1109 [NASA ADS] [Google Scholar]
 GrossmannDoerth, U., Schuessler, M., & Solanki, S. K. 1988, A&A, 206, L37 [NASA ADS] [Google Scholar]
 Harvey, J., Livingston, W., & Slaughter, C. 1972, in Line Formation in the Presence of Magnetic Fields, 227 [Google Scholar]
 Jensen, A., & la CourHarbo, A. 2001, Ripples in Mathematics: The Discrete Wavelet Transform (Springer Verlag) [Google Scholar]
 Khomenko, E. V., Collados, M., Solanki, S. K., Lagg, A., & Trujillo Bueno, J. 2003, A&A, 408, 1115 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Kosugi, T., Matsuzaki, K., Sakao, T., et al. 2007, Sol. Phys., 243, 3 [NASA ADS] [CrossRef] [Google Scholar]
 Landi Degl’Innocenti, E., & Landi Degl’Innocenti, M. 1977, A&A, 56, 111 [NASA ADS] [Google Scholar]
 Landi Degl’Innocenti, E., & Landolfi, M. 2004, Polarization in Spectral Lines (Kluwer Academic Publishers) [Google Scholar]
 Levenberg, K. 1944, Q. J. Math, 2, 164 [Google Scholar]
 Lites, B. W., Elmore, D. F., Streander, K. V., et al. 2001, in UV/EUV and Visible Space Instrumentation for Astronomy and Solar Physics, eds. O. H. Siegmund, S. Fineschi, & M. A. Gummin, Proc. SPIE 4498, 73 [Google Scholar]
 Lites, B., Casini, R., Garcia, J., & SocasNavarro, H. 2007, Mem. Soc. Astron. It., 78, 148 [NASA ADS] [Google Scholar]
 Lites, B. W., Kubo, M., SocasNavarro, H., et al. 2008, ApJ, 672, 1237 [NASA ADS] [CrossRef] [Google Scholar]
 Löfdahl, M. G., & Scharmer, G. B. 2012, A&A, 537, A80 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 López Ariste, A. 2002, ApJ, 564, 379 [NASA ADS] [CrossRef] [Google Scholar]
 Lucy, L. B. 1974, AJ, 79, 745 [NASA ADS] [CrossRef] [Google Scholar]
 Marjugi, S. M., & Leong, W. J. 2013, J. Appl. Math., 2013, 523476 [CrossRef] [Google Scholar]
 Marquardt, D. 1963, SIAM J. Appl. Math., 11, 431 [Google Scholar]
 Martínez González, M. J., Collados, M., Ruiz Cobo, B., & Beck, C. 2008, A&A, 477, 953 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Martínez González, M. J., Manso Sainz, R., Asensio Ramos, A., & Belluzzi, L. 2012, MNRAS, 419, 153 [NASA ADS] [CrossRef] [Google Scholar]
 Martínez Pillet, V., Del Toro Iniesta, J. C., ÁlvarezHerrero, A., et al. 2011, Sol. Phys., 268, 57 [NASA ADS] [CrossRef] [Google Scholar]
 O’Donoghue, B., & Candès, E. 2013, Found. Comput. Math., DOI: 10.1007/s1020801391503 [Google Scholar]
 Orozco Suárez, D., Bellot Rubio, L. R., del Toro Iniesta, J. C., et al. 2007, ApJ, 670, L61 [NASA ADS] [CrossRef] [Google Scholar]
 Parikh, N., & Boyd, S. 2014, Foundations and Trends in Optimization, 1, 123 [Google Scholar]
 Rees, D. E., & Semel, M. D. 1979, A&A, 74, 1 [NASA ADS] [Google Scholar]
 Rees, D. E., López Ariste, A., Thatcher, J., & Semel, M. 2000, A&A, 355, 759 [NASA ADS] [Google Scholar]
 Richardson, W. H. 1972, J. Opt. Soc. Am. (19171983), 62, 55 [Google Scholar]
 Ruiz Cobo, B., & Asensio Ramos, A. 2013, A&A, 549, L4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Ruiz Cobo, B., & del Toro Iniesta, J. C. 1992, ApJ, 398, 375 [NASA ADS] [CrossRef] [Google Scholar]
 Scharmer, G. B., de la Cruz Rodriguez, J., Sütterlin, P., & Henriques, V. M. J. 2013, A&A, 553, A63 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Semel, M. 1970, A&A, 9, 152 [NASA ADS] [Google Scholar]
 Sigwarth, M., Balasubramaniam, K. S., Knölker, M., & Schmidt, W. 1999, A&A, 349, 941 [NASA ADS] [Google Scholar]
 Skumanich, A., & Lites, B. W. 1987, ApJ, 322, 473 [Google Scholar]
 SocasNavarro, H., Trujillo Bueno, J., & Ruiz Cobo, B. 2000, ApJ, 530, 977 [NASA ADS] [CrossRef] [Google Scholar]
 SocasNavarro, H., López Ariste, A., & Lites, B. W. 2001, ApJ, 553, 949 [NASA ADS] [CrossRef] [Google Scholar]
 SocasNavarro, H., de la Cruz Rodriguez, J., Asensio Ramos, A., Trujillo Bueno, J., & Ruiz Cobo, B. 2015, A&A, 577, A7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Solanki, S. K., & Montavon, C. A. P. 1993, A&A, 275, 283 [NASA ADS] [Google Scholar]
 Solanki, S. K., & Pahlke, K. D. 1988, A&A, 201, 143 [NASA ADS] [Google Scholar]
 Starck, J.L., Murtagh, F., & Fadili, J. M. 2010, Sparse image and signal processing (Cambridge: Cambridge University Press) [Google Scholar]
 Stenflo, J. O. 1973, Sol. Phys., 32, 41 [NASA ADS] [CrossRef] [Google Scholar]
 Stenflo, J. O. 2010, A&A, 517, A37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Stenflo, J. O. 2011, A&A, 529, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 van Noort, M. 2012, A&A, 548, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Viticchié, B., & Sánchez Almeida, J. 2011, A&A, 530, A14 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
All Figures
Fig. 1 Test for the compressibility of the parameters inferred in a quietSun map. Each column indicates the number of nonzero coefficients of the original map that are retained. 

In the text 
Fig. 2 Same as Fig. 1, but for the sunspot data. We note that these images correspond to the solution of the pixelbypixel inversion and contain some artifacts. 

In the text 
Fig. 3 Histograms of the difference between the original and compressed parameters at different levels (indicated in each column). Upper panels: results for the quiet Sun map, while the lower panels refer to the sunspot data. Each histogram refers to a different linear transformation. 

In the text 
Fig. 4 Inverted maps for some of the most relevant physical parameters of the model for a quietSun observation. The map is 512 × 512 pixels. Left column: pixelbypixel results, central panels: results of the sparsityregularized inversion, using a Daubechies8 orthogonal wavelet and a thresholding value of 50%. Third column: a sparsityregularized inversion compensating for the Hinode telescope PSF. 

In the text 
Fig. 5 Value of the χ^{2} for each Stokes parameter for the pixelbypixel inversion (left panels) and the sparsity regularized inversion (right panels). 

In the text 
Fig. 6 Same as Fig. 4, but for the sunspot data. 

In the text 
Fig. 7 Convergence of χ^{2} as a function of iteration number for the two data sets. 

In the text 
Fig. 8 Illustration of the effect of the telescope PSF on the data. The FOV is divided in two halves for a monochromatic image at Δλ = 91 mÅ from line center in Stokes I (left) and V (right). The lower half shows the observations, without applying the PSF compensation. The upper half shows the synthetic image from the inversion, including the PSF compensation. This inversion has been performed assuming 50% sparsity. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.