Open Access
Issue
A&A
Volume 693, January 2025
Article Number A170
Number of page(s) 14
Section The Sun and the Heliosphere
DOI https://doi.org/10.1051/0004-6361/202452172
Published online 14 January 2025

© The Authors 2025

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

During the past thirty years, inversion methods have proven to be one of the most robust ways of establishing a quantitative relation between the observed intensities and the underlying physical state of the atmospheric plasma. With the use of fast slit spectropolarimeters and 2D filterpolarimeters, maps of different regions in the solar atmosphere are routinely taken in which the four Stokes parameters are observed at several points along one or several spectral lines. The rate of new high-quality 2D observations is increasing with the advent of bi-dimensional spectropolarimeters and observing techniques that allow long time sequences of high-quality observations (Dominguez-Tagle et al. 2022; van Noort et al. 2022). With some exceptions (discussed below), in the overwhelming majority of studies available in the literature, such observations have been interpreted by assuming that all pixels are completely unrelated and by applying the inference techniques (commonly known as inversion codes) on a pixel-by-pixel basis. However, the spatial complexity of the observations is not as chaotic as one would expect from the high dimensionality of the data (spatial, temporal, and spectral), but it is rather coherent as a consequence of the physical processes that dominate solar dynamics.

For example, the magnetic field in the chromosphere, where the magnetic pressure is larger than the gas pressure, tends to be rather smooth and slow-varying over space. However, polarimetric signals induced by those chromospheric magnetic fields are particularly weak, and in most cases very close to the detection limit of current instrumentation (e.g., Díaz Baso et al. 2019a; Yadav et al. 2021). Given these inherent properties, incorporating these characteristics in the inference would significantly help to constrain the solar atmosphere better.

This spectral and spatial coherence has been exploited to effectively reduce the noise in solar and stellar spectropolarimetric observations (Martínez González et al. 2008; Díaz Baso et al. 2019b) and avoid averaging in time or space, which could lead to a loss of important information. The new generation of inversion codes is also starting to make use of this coherency to improve the fidelity of their algorithms. The work by van Noort (2012) proposed to couple the solution of neighboring pixels using the telescope point spread function. This inspired the recent development of a nonlinear spatially regularized and multi-resolution inversion technique by de la Cruz Rodríguez (2019).

A different approach to take into account spatial coherency was presented by Asensio Ramos & de la Cruz Rodríguez (2015), using the concept of sparsity and compressibility, by linearly transforming the physical parameters to a different space in which their representation is compact. They used proximal algorithms (Parikh & Boyd 2014) to impose sparsity in the wavelet domain, decreasing the number of free parameters to reproduce the observables while simultaneously favoring spatial coherency. Recent studies (de la Cruz Rodríguez 2019; Morosin et al. 2020; de la Cruz Rodríguez & Leenaarts 2024) proposed adding spatiotemporal constraints by explicitly imposing a Tikhonov regularization on the physical parameters, thereby improving the fidelity of the reconstruction. At the same time, advances in deep learning have introduced the potential for data-driven regularizations. These methods encode complex priors from simulations into neural networks (Asensio Ramos & Pallé 2021; Liaudat et al. 2024). These ideas developed in the context of deep learning have also inspired other works, such as Štěpán et al. (2022, 2024), which propose to solve the 3D inversion problem by including the 3D non-local thermodynamic equilibrium consistency as a regularization, together with additional physical constraints and stochastic gradient descent techniques to mitigate issues of local minima. Lastly, automatic differentiation frameworks, such as PyTorch (Paszke et al. 2019), facilitate the implementation of these ideas by efficiently computing gradients and optimizing models to reproduce observations (De Ceuster et al. 2024).

Motivated by the development of new instruments with an increasing field of view (FoV), we believe that implicit methods that describe the physical parameters in the whole domain with a compact representation can be of great help to reduce the dimensionality of the problem and, consequently, the computational load of the inference process. For this work, we investigated a different way of parametrizing the physical quantities by using a neural network as a continuous approximation to introduce spatiotemporal constraints. Recent works have demonstrated the potential of this idea in coronal tomography (Asensio Ramos 2023), source reconstruction under strong gravitational lenses (Mishra-Sharma & Yang 2022), magnetic field extrapolations (Jarolim et al. 2024), and interstellar chemistry (Asensio Ramos et al. 2024). These neural networks, usually termed implicit neural representations, neural fields (NFs), or coordinate-based representations, are used to map coordinates on the space (or space-time) to coordinate-dependent field quantities. They have many desirable properties: they are efficient in terms of the number of free parameters, they have a controllable implicit bias, they produce differentiable quantities that can be part of more elaborate optimizations, and they generate continuous signals that are ideal for imposing spatiotemporal constraints in noisy scenarios. In this work, we study the case where the magnetic field was inferred under the weak-field approximation (WFA; Landi Degl’Innocenti & Landi Degl’Innocenti 1973), a powerful method to estimate the magnetic field from plage (da Silva Santos et al. 2023) to flare scenarios (Vissers et al. 2021) and simple enough to focus our attention on this particular new implementation. The formalism also remains identical for local thermodynamic equilibrium (LTE) or nonlocal thermodynamic equilibrium (non-LTE) inversions of chromospheric lines. We plan to extend the use of NFs to more complex radiative transfer models in the near future.

The paper is organized as follows. We start with a brief introduction of their basic principles, and explain how we implemented the new approach to perform spectropolarimetric inversions. Later we show the application of the NF on some examples and introduce some additional explicit regularizations. Finally, we provide a brief discussion about the implications of this work and outline potential extensions and improvements.

2. Neural magnetic field reconstruction

2.1. Weak-field approximation

The weak-field approximation (WFA; Landi Degl’Innocenti & Landi Degl’Innocenti 1973) is an analytical solution of the radiative transfer equation. This allows us to derive the emerging Stokes Q, U, and V parameters describing the polarization of the light as a function of Stokes I and its derivatives as a function of wavelength. The fundamental assumptions are that the magnetic field vector is constant with depth and that the splitting induced by the Zeeman effect (ΔλB) is significantly smaller than the Doppler width of the line (ΔλD). This weak field regime occurs at different field strengths for different spectral lines (depending on the sensitivity to the magnetic field, the local temperature, etc.).

At first order in the magnetic field strength, the relation between Stokes V and Stokes I is given by the following expression:

V ( λ ) = Δ λ B g ¯ cos Θ B d I d λ , $$ \begin{aligned} V(\lambda ) = -\Delta \lambda _B \, \bar{g} \, \cos {\Theta _B} \, \frac{\mathrm{d}I}{\mathrm{d}\lambda }, \end{aligned} $$(1)

where ΔλB = 4.6686 ⋅ 10−13λ02B, ΘB is the inclination of the magnetic field (angle between the observer’s line-of-sight and the normal to the solar surface) and g ¯ $ \bar{g} $ is the Landé factor (Landi Degl’Innocenti & Landolfi 2004). The central wavelength of the line, λ0, is given in Å while the magnetic field strength, B, is given in G. The same perturbation analysis allows obtaining Stokes Q and U, which only appear at second-order. In particular, we will use the equations that describe the dependence of Stokes Q and U in the wings (λ ≫ ΔλD) of the line:

Q ( λ w ) = 3 4 Δ λ B 2 G ¯ sin 2 Θ B cos 2 Φ B 1 λ w λ 0 ( d I d λ ) U ( λ w ) = 3 4 Δ λ B 2 G ¯ sin 2 Θ B sin 2 Φ B 1 λ w λ 0 ( d I d λ ) , $$ \begin{aligned} Q(\lambda _{ w})&= \frac{3}{4}\,\Delta \lambda _B^2 \,\bar{G} \,\sin ^2{\Theta _B} \,\cos {2\Phi _B} \,\frac{1}{\lambda _{ w}-\lambda _0}\biggl (\frac{\mathrm{d}I}{\mathrm{d}\lambda }\biggl )\nonumber \\ U(\lambda _{ w})&= \frac{3}{4}\,\Delta \lambda _B^2 \,\bar{G} \,\sin ^2{\Theta _B} \,\sin {2\Phi _B} \,\frac{1}{\lambda _{ w}-\lambda _0} \biggl (\frac{\mathrm{d}I}{\mathrm{d}\lambda }\biggl ), \end{aligned} $$(2)

where G ¯ $ \bar{G} $ is a parameter that gives the magnetic sensitivity of linear polarization to B, which depends on the quantum numbers of the transition (Landi Degl’Innocenti & Landolfi 2004). In both equations, there is a term depending on ΦB, which is the azimuth angle of the magnetic field with respect to a reference direction.

2.2. Magnetic field inference

Once the model is set, we aim to infer the magnetic field vector from the interpretation of the observations of a set of spectral lines. Assuming that the weak-field approximation can be applied to the observed spectral lines and that observations are corrupted with uncorrelated Gaussian noise, we can use a least-squares estimator (maximum likelihood) to retrieve the magnetic field vector. The merit function ℒ, which represents the well-known χ2, can be defined for a particular pixel as the mean squared difference of the observed polarization signals and the synthetic ones predicted by the model, normalized by the variance of the noise:

L = L I + L Q + L U + L V , $$ \begin{aligned} \mathcal{L} = \mathcal{L} _I + \mathcal{L} _Q + \mathcal{L} _U + \mathcal{L} _V, \end{aligned} $$(3)

where

L I = λ ( I λ syn I λ obs ) 2 σ I , λ 2 , L Q = λ ( Q λ syn Q λ obs ) 2 σ Q , λ 2 , L U = λ ( U λ syn U λ obs ) 2 σ U , λ 2 , L V = λ ( V λ syn V λ obs ) 2 σ V , λ 2 · $$ \begin{aligned} \mathcal{L} _I&= \sum _\lambda \frac{(I^\mathrm{syn}_\lambda -I^\mathrm{obs}_\lambda )^2}{\sigma _{I,\lambda }^2}, \nonumber \\ \mathcal{L} _Q&= \sum _\lambda \frac{(Q^\mathrm{syn}_\lambda -Q^\mathrm{obs}_\lambda )^2}{\sigma _{Q,\lambda }^2}, \nonumber \\ \mathcal{L} _U&= \sum _\lambda \frac{(U^\mathrm{syn}_\lambda -U^\mathrm{obs}_\lambda )^2}{\sigma _{U,\lambda }^2}, \nonumber \\ \mathcal{L} _V&= \sum _\lambda \frac{(V^\mathrm{syn}_\lambda -V^\mathrm{obs}_\lambda )^2}{\sigma _{V,\lambda }^2}\cdot \end{aligned} $$(4)

The subindex λ is used as a label for the spectral wavelength points. The previous merit function considers the general case in which the noise is different for all the Stokes parameters. In the general case of modeling a spectral line, all the terms of the merit function associated with each observed Stokes parameter are used. However, in the case of WFA, the model uses the derivative of the observed profile (see Eqs. (1) and (2)), eliminating the need for modeling Stokes I (ℒI). In other words, in WFA we assume that the noise introduced by the intensity profile is sufficiently small to not significantly impact the results. This, together with the simplicity of these equations makes WFA one of the most popular approaches for magnetic field inference. The linear dependence on the model parameters facilitates finding an analytical solution for the optimization of the merit function (see, e.g., Martínez González et al. 2012). Since our approach here is more general (taking spatial correlation into account and considering more complex radiative transfer models in the near future), we do not use the analytical solution. Rather, we consider the numerical optimization of the merit function using gradient-based methods.

From a practical point of view, using B, ΘB, and ΦB as free variables can introduce challenges in the optimization process. These variables are subject to specific constraints and potential discontinuities that can complicate the optimization. For instance, B must be strictly positive, and while ΘB and ΦB are continuous, the periodic nature of ΦB within the interval [0, π] can cause issues. For example, identical pixels (in space or time) with solutions ΦB = 0 and ΦB = π would lead to artificial discontinuities in the spatiotemporal description of the magnetic field. To mitigate these issues, we opted to use the following variables, obtained as combinations of B, ΘB, and ΦB:

B = B cos Θ B B Q = ( B sin Θ B ) 2 cos 2 Φ B B U = ( B sin Θ B ) 2 sin 2 Φ B . $$ \begin{aligned} B_{\parallel }&= B \cos {\Theta _B} \nonumber \\ B_{Q}&= (B \sin {\Theta _B})^2 \cos {2 \Phi _B} \nonumber \\ B_{U}&= (B \sin {\Theta _B})^2 \sin {2 \Phi _B}. \end{aligned} $$(5)

These three quantities are defined in ( − ∞, +∞) and are continuous functions of the magnetic field vector. They also have the additional advantage of decoupling the problem into three unrelated subproblems: Stokes V is only dependent on B, Stokes Q is only dependent on BQ and Stokes U is only dependent on BU. Therefore, one could solve each sub-problem independently. This is useful, for instance, if one Stokes parameter has much stronger signals than the rest (typically the case of Stokes V), which could dominate the optimization process. In that case, one could use different weights for each term in the merit function. We note that this last point is only valid for the WFA and not for the general case.

2.3. Neural fields

We propose here a general and powerful technique for the inference of magnetic field (and potentially thermodynamical parameters when using models more complex than the WFA) from observations using neural networks. The general idea is depicted in Fig. 1 and described in the following. We use a neural network, fθ(x, y), to describe the components of the magnetic field vector as a function of the coordinates. In this case, we describe the magnetic field components in a 2D plane in Cartesian coordinates (x, y). These coordinates represent a plane with the magnetic field properties where the observed polarization is predominantly generated. In the general case in which stratification and time evolution are taken into account, the input coordinates will be the (x, y, z, t). For convenience, we normalize all coordinates so that they are mapped to the interval [ − 1, 1]. The magnetic field components are then given by the following simple, but flexible, fully connected neural network fθ:

thumbnail Fig. 1.

Sketch of the NF approach. The physical quantities are described by a neural network that takes the coordinates as input and outputs the physical quantities (temperature T, magnetic field B, velocity v, etc.) at that point (x, y). The output of the network is then used to compute the observables from the model using a radiative transfer (RT) module. In this work, the output of the network is the magnetic field vector and the RT module is the weak-field approximation. The synthetic observables are compared with the observations and the error is back-propagated through the network to obtain the optimal solution.

( B , B Q , B U ) = f θ ( x ) , $$ \begin{aligned} (B_\parallel , B_Q, B_U) = f_\theta (\boldsymbol{x}), \end{aligned} $$(6)

with θ the internal parameters of the neural network and x = (x, y).

This approach of describing the magnetic field using NFs has several advantages. First, the number of tunable parameters of the neural network can be potentially fewer than the number of unknowns in all pixels. This might not be especially relevant for the WFA model since there are only three unknowns per pixel. However, this will be crucial when applied to stratified inversions where we have many more physical quantities per pixel (temperature, velocity, magnetic field, microturbulence, etc.) in a very dense grid, not only spatial but also in the optical depth domain. Secondly, a NF is a continuous global function that operates across the entire spatiotemporal domain. This means that the information introduced by the observation of a single pixel informs the whole solution, leading to a very pronounced regularization effect. This effect is similar to the global character induced by the wavelet decomposition used by Asensio Ramos & de la Cruz Rodríguez (2015). Thirdly, a NF is a continuous and differentiable function of the input coordinates. Therefore, the result of the inversion process is a continuous function that can be evaluated at any arbitrary point. Having a continuous and differentiable magnetic field map can be greatly beneficial for the computation of current sheets though spatial derivatives of the magnetic field (see the application to inversion methods of Pastor Yabar et al. 2021; Esteban Pozuelo et al. 2024), which otherwise will be greatly affected by the (potential) presence of noise and inversion artifacts. Finally, it is straightforward to include any explicit additional regularization term in the optimization that depends on the output or any derivative of the output with respect to the input coordinates.

2.4. Tests with a 3D rMHD simulation

2.4.1. Parametric representation

Before showing the use of neural networks for the WFA problem, we will test first the performance of the neural network to describe the 2D spatial properties of the physical quantities of interest. To that end, we parameterize the magnetic field of a realistic 3D radiative magneto-hydrodynamics (rMHD) simulation. We have used one snapshot from a publicly available enhanced network simulation (Carlsson et al. 2016) performed with the Bifrost code (Gudiksen et al. 2011). Snapshots from this simulation have been extensively used in previous studies to test different diagnostic strategies (e.g., Leenaarts et al. 2012, 2013; Sukhorukov & Leenaarts 2017; Jurčák et al. 2018). The longitudinal magnetic field at 1500 km from the mean continuum formation layer is shown in the left panel of Fig. 2. This panel shows a chromospheric landscape with elongated magnetic features connecting two opposite-polarity patches. This image can be represented with the aid of a NF by minimizing the following merit function:

thumbnail Fig. 2.

Comparison of the parametric representation power of each method. From left to right: the original longitudinal magnetic field at z = 1500 km in the Bifrost simulation, represented using a ReLU neural network, and using a NF with ω = 10 and ω = 60.

L B = x , y ( f θ ( x , y ) B ( x , y ) ) 2 , $$ \begin{aligned} \mathcal{L} _{B_\parallel } = \sum _{x,{ y}} \left(f_\theta (x,{ y})-B_\parallel (x,{ y})\right)^2, \end{aligned} $$(7)

where we utilize a fully connected NN with ReLU (Rectified Linear Unit; Fukushima 1969) activation functions. We note that we do not train the neural network in the traditional manner for generalization across multiple datasets; instead, each network is uniquely optimized for a specific dataset to function as a tailored parametric tool, not as a generalized predictive model. The converged solution is found in the second panel of Fig. 2. The solution only approximates the general behavior of the original magnetic field distribution but is too smooth. This smoothness is a direct consequence of the implicit bias of NN, also known as spectral bias (Rahaman et al. 2018), which prevents standard networks from learning high-frequency functions1. This is an active area of research, and several strategies have been proposed to alleviate this problem and introduce an implicit bias toward high-frequency signals. One of the most successful strategies is to pass the input coordinates through a Fourier feature mapping γ(x), which allows the NF to correctly generate high spatial frequencies (Tancik et al. 2020). This mapping projects the input coordinates onto a high-dimensional space with a set of trigonometric functions:

γ ( x ) = [ cos ( 2 π G x ) , sin ( 2 π G x ) ] T , $$ \begin{aligned} \gamma (\boldsymbol{x}) = \left[\cos (2 \pi \boldsymbol{G} \boldsymbol{x}), \sin (2 \pi \boldsymbol{G} \boldsymbol{x})\right]^T, \end{aligned} $$(8)

where G ∼ 𝒩(0, ω), that is, each entry of the vector G is a frequency sampled from a Gaussian distribution with zero mean and standard deviation ω. The parameter ω controls the range of spatial frequencies that the network can reproduce. This will be helpful as a regularization, as we show later. After computing the Fourier features, we pass them through the neural network to obtain the magnetic field. By using a Fourier mapping, we can efficiently reproduce both low and high spatial frequencies, being able to represent the magnetic field with a high fidelity. The third and fourth panels of Fig. 2 show the results of using the Fourier features with different values of ω. Using ω = 10 (third panel), allows us to capture the lower spatial frequencies, but by increasing it to ω = 60 (fourth panel) we can reliably reproduce all the high-frequency spatial details of the magnetic field map. In the bottom right of each panel, we also quantify the mean absolute error (MAE) between the original simulation and the different results, decreasing from an average error of 5 G with the ReLU NN to 0.09 G using the NF (ω = 60). We note that other strategies to alleviate the spectral bias exist. It is worth mentioning that the use of suitable activation functions, such as sinusoids, has also been shown to be very effective in generating high-frequency functions (SIREN; Sitzmann et al. 2020). All these strategies can be seen as a nonlinear extension of Fourier series.

It is important to stress the fact that the number of learnable parameters of the NF is not dependent on the number of pixels in the observations, but rather on the properties of the spatial distribution of the physical quantities. In the example shown in Fig. 2, although the number of pixels is 500 × 500, the number of parameters of the network is ten times smaller, amounting to just 25 000. This compact representation is arguably associated with the fact that the magnetic field in the chromosphere, where the magnetic pressure is larger than the gas pressure, tends to be rather smooth and slow-varying over space. This is not to be expected for other quantities, such as the temperature or the velocity, which vary at much smaller spatial scales.

The representations shown in Fig. 2 have been trained by optimizing Eq. (7) computed for all pixels in the FoV. However, since the NF is a global function in x, a given pixel contains some information about the properties of the NF in the surroundings. For this reason, one can train the NF using mini-batches of pixels randomly chosen in the FoV, instead of summing over the whole FoV. To show this, Fig. 3 shows the mean absolute error as a function of iteration when different mini-batches are selected for each iteration. We can see that the convergence properties when using 10% of the total pixels are already as good as those obtained when using the whole FoV, but 10 times faster in terms of computing time. The sudden decrease in the merit function during the optimization is produced by the scheduler, a module that decreases the learning rate (step size) if there is no improvement after some iterations.

thumbnail Fig. 3.

Performance of the NF when evaluating different batches of pixels randomly chosen in the FoV in every iteration.

2.4.2. Magnetic field reconstruction

The previous experiment demonstrates that a NF can correctly recover the details of the quantities of interest by their direct observation. But in spectropolarimetry, we only have access to the Stokes profiles and one has to pass through the radiative transfer model to infer the physical quantities. Here we demonstrate that this can be done by interpreting synthetic observations of the Ca II 8542 Å line. We have chosen this line because the WFA approximation on this line is reliable for field strengths up to ∼1200 G (Centeno 2018). For this test, we have created synthetic observables from the Bifrost simulation and we run our WFA NF to reproduce the polarization signals that emerge from the simulation. Our goal is to assess the improvement delivered by our new method when compared to the traditional pixel-by-pixel WFA. To this end, and to discard any influence of the line formation details, we follow Morosin et al. (2020) and set the magnetic field in each pixel equal to its vertical average from z = 1000 km to z = 1500 km. Heights are measured assuming that z = 0 km corresponds to the mean continuum formation layer. We compute the synthetic observables using the non-LTE radiative transfer STiC code2 (de la Cruz Rodríguez et al. 2016, 2019) and add uncorrelated Gaussian noise to the Stokes Q, U, and V profiles. It is important to note that the average magnetic field of this particular snapshot is about 50−100 G, a value much lower than the typical magnetic field found in plage (∼400 G, Pietrow et al. 2020; Morosin et al. 2022). Consequently, to avoid the signal being buried in the noise, we use a standard deviation of the noise lower than the typical one found in solar observations.

The architecture chosen for the NF is a residual network (ResNet, He et al. 2015) with 2 residual blocks, 64 nodes per layer, and the ELU activation function (Exponential Linear Unit; Clevert et al. 2015). This architecture has proven its effectiveness by showing efficient training and good performance in capturing complex patterns. For the Fourier feature mapping, we have used ω = 60 for the longitudinal magnetic field and ω = 30 for the transverse components. The reason is that the latter is typically more affected by the noise. Using a smaller value of ω encourages the NF not to fit the noise, which is of very high spatial frequency. The magnetic field generated by the NF is used to compute the synthetic observables via the WFA approximation, which are then compared with the observations. In particular, for the longitudinal magnetic field, and using Eq. (1), we have used the following merit function:

L V = x , y λ [ ( C V B ( x , y | θ ) d I x , y obs d λ ) V x , y obs ] 2 , $$ \begin{aligned} \mathcal{L} _V = \sum _{x,{ y}} \sum _\lambda \left[\left(-C_V B_\parallel (x,{ y}|\theta ) \frac{\mathrm{d}I^\mathrm{obs}_{x,{ y}}}{\mathrm{d}\lambda }\right)-V^\mathrm{obs}_{x,{ y}}\right]^2 , \end{aligned} $$(9)

where C V = 4.6686 × 10 13 g ¯ λ 0 2 $ C_{V}=4.6686 \times 10^{-13}\bar{g}\lambda_0^2 $ is a constant for the line of interest, B(x, y|θ) is the NF used to represent the longitudinal field, d I x , y obs / d λ $ \mathrm{d}I^{\mathrm{obs}}_{x,\mathit{y}}/\mathrm{d}\lambda $ is the derivative of the observed Stokes I profile at pixel (x, y), and V x , y obs $ V^{\mathrm{obs}}_{x,\mathit{y}} $ is the observed Stokes V signal at the same pixel. The derivative of the intensity does not change during the optimization, so it can be pre-calculated and stored before the optimization process occurs.

For the transverse components, using Eq. (2), the merit functions are given by

L Q = x , y λ [ ( C QU · B Q ( x , y | θ ) · 1 λ λ 0 d I x , y obs d λ ) Q x , y obs ] 2 $$ \begin{aligned} \mathcal{L} _Q&= \sum _{x,{ y}} \sum _\lambda \left[\left(C_{QU}\cdot B_Q(x,{ y} |\theta ) \cdot \frac{1}{\lambda -\lambda _0}\frac{\mathrm{d}I^\mathrm{obs}_{x,{ y}}}{\mathrm{d}\lambda }\right)-Q^\mathrm{obs}_{x,{ y}}\right]^2 \end{aligned} $$(10)

L U = x , y λ [ ( C QU · B U ( x , y | θ ) · 1 λ λ 0 d I x , y obs d λ ) U x , y obs ] 2 , $$ \begin{aligned} \mathcal{L} _U&= \sum _{x,{ y}} \sum _\lambda \left[\left(C_{QU}\cdot B_U(x,{ y} |\theta ) \cdot \frac{1}{\lambda -\lambda _0}\frac{\mathrm{d}I^\mathrm{obs}_{x,{ y}}}{\mathrm{d}\lambda }\right)- U^\mathrm{obs}_{x,{ y}}\right]^2 , \end{aligned} $$(11)

where C QU = 1.6347 × 10 25 λ 0 4 G ¯ $ C_{QU}=1.6347 \times 10^{-25} \lambda_0^4 \bar{G} $. Thanks to the definition of the BQ and BU variables, each merit function can be optimized separately. This helps in imposing different implicit or explicit regularization methods for every variable.

The optimization of every merit function with respect to θ is carried out using the Adam optimizer with a learning rate of 3 ⋅ 10−4 for 100−500 epochs until the desired accuracy is reached. The derivatives for the backpropagation through the network and the physical model (WFA) are carried out with the automatic differentiation package PyTorch. Finally, after convergence of the NFs, one can recover the full magnetic field vector from B, BQ, and BU by a final transformation:

B ( x , y ) = B ( x , y | θ ) B ( x , y ) = [ B Q ( x , y | θ ) 2 + B U ( x , y | θ ) 2 ] 1 / 4 Φ B ( x , y ) = arctan ( B U ( x , y | θ ) B Q ( x , y | θ ) ) . $$ \begin{aligned} B_\parallel (x,{ y})&= B_\parallel (x,{ y}|\theta ) \nonumber \\ B_{\perp }(x,{ y})&= \left[B_Q(x,{ y}|\theta )^2 + B_U(x,{ y}|\theta )^2 \right]^{1/4} \nonumber \\ \Phi _B(x,{ y})&= \arctan \left(\frac{B_U(x,{ y}|\theta )}{B_Q(x,{ y}|\theta )}\right). \end{aligned} $$(12)

Figure 4 shows the reconstruction of the magnetic field vector in the synthetic case used before in Fig. 2. The first column shows the original magnetic field vector, given in terms of B, B, and ΦB. The second column shows the inferred magnetic field using the traditional pixel-based WFA and the third row shows the inferred magnetic field using the NFs. The quality of the results is affected, fundamentally, by the noise level and the regularization properties of NF. We have tested the performance of the NF for different noise levels and values of ω, finding the same consistent behavior. The results summarized in Fig. 4 are for the particular configuration with a noise level of 10−3 for Stokes V and 10−4 for Stokes Q and U, both given in units of the average continuum intensity. The longitudinal field inferred with the NF is very similar to that of the traditional WFA in general terms. The NF strongly damps the high-frequency components of the magnetic field associated with the presence of noise, but the overall structure is very similar. The key reason is that the longitudinal magnetic field is a quantity that is well constrained by the observations even in the presence of noise because the Stokes V signals are typically above the noise and the estimated value is statistically unbiased and coincides with the original value (see Martínez González et al. 2012).

thumbnail Fig. 4.

Comparison of the reconstruction of the magnetic field vector from the synthetic Ca II 8542 Å spectra calculated from the simulation (in rows: line-of-sight component, transverse component, and azimuth angle). From left to right: original magnetic field from the simulation, magnetic field inferred by the pixel-wise WFA, and results using the WFA NF.

The NF produces a much better result for the transverse component and the azimuth of the magnetic field, which are much closer to the real ones than those obtained with the traditional WFA. In this case, the implicit spatial regularization of the NF is able to provide a much better solution. The pixel-by-pixel WFA tends to overestimate the transversal component of the magnetic field, producing a background component over the FoV which is proportional to the noise level. This effect is well-known and produced because the maximum-likelihood solution is biased (Martínez González et al. 2012). The standard approach to deal with this bias is to avoid the regions where this effect has a strong impact on any subsequent analysis. Here we show that the NF is able to provide a better estimate of the magnetic field by finding a solution that is more coherent with the surroundings. This produces a mitigation of the bias of the transversal component. At the bottom right of each panel, we have quantified both approaches, retrieving an average error 2.5−4.5 times lower with the NF WFA compared with the pixel-wise approach. Our results confirm the advantages of the spatial regularization found by Morosin et al. (2020). It is important to note that these small errors do not fully encapsulate the complexities found in actual observations, which are influenced by various systematic factors such as noise or spectral and spatial degradation. Nonetheless, within our idealized setting, the inference using the NF WFA is indicative of the potential improvements. Lastly, our calculations provide two more conclusions. The first one is that, if we decrease the value of ω, the NF tends to produce an excessively smooth solution. Although this is bad in general, it can be an advantage in very noisy observations. The second one is the finding that the larger the noise level, the better the reconstruction of the magnetic field using the NF WFA is compared to the traditional WFA (see Fig. A.1 for a comparison using an increased noise level).

3. Application to observations

3.1. Example with real observations

After confirming with simulations that the WFA NF is able to recover the magnetic field with a higher fidelity than the standard pixel-by-pixel WFA in noisy cases, we apply it to real observations. We have used observations (Leenaarts et al. 2018; Yadav et al. 2023) from the active region NOAA 12593 observed on 2016-09-19 between 09:31:29 UT and 09:57:03 UT with the CRISP (Scharmer et al. 2008) instrument at the Swedish 1-m Solar Telescope (SST; Scharmer et al. 2003). The selection of the regularization frequencies ω was guided by empirical testing: we experimented with different values to find an optimal balance that minimizes noise while preserving critical features of the magnetic field, particularly the transverse component. The results from the analysis of the first time frame are shown in Fig. 5. The first row shows the results with the pixel-wise WFA and the second row displays those obtained when applying the WFA NF. As expected, since the Stokes V signals are well above the noise, the inference with both methods is very similar. In contrast, the pixel-wise WFA estimates a transverse magnetic field larger than 1500 G in many locations within the FoV. The reason for those high values is the aforementioned bias effect, which has an even more pronounced impact in plage, network and emerging regions, where the line source function is very shallow in the chromosphere and the resulting line profile shows core intensities that are as strong as the wings, presenting a quasi-flat profile (see de la Cruz Rodríguez et al. 2013 and Appendix A of Morosin et al. 2020). In other words, the pixel-wise WFA compensates the very small Stokes I derivatives with a very large magnetic field value (see Fig. A.2 for a map of the region and the respective derivatives). This effect is more noticeable in poor spectral resolution measurements, as shown in Díaz Baso et al. (2023). The NF, on the other hand, is able to provide a more coherent solution in the spatial domain without using unrealistic values. The maximum values are now below 800 G, where most of them are in the edge of the lower-right polarity (penumbra of the sunspot) and in between the small polarities in the FoV.

thumbnail Fig. 5.

Magnetic field reconstruction from a Ca II 8542 Å observation using the pixel-wise WFA (top row) and using the NF WFA with ω = 120 for B and ω = 80 for B and ΦB (bottom row). Columns show the magnetic field in terms of the longitudinal component, the transverse component, and the azimuth angle.

3.2. Challenging case and additional spatial regularization

We have used observations of the active region NOAA AR 12723 recorded on 2018-09-30 in the Ca II 8542 Å line by Vissers et al. (2022) with the CRISP instrument at the SST consisting of a mosaic of four overlapping pointings. This target was selected to explore the behavior of the NF when the magnetic field is particularly weak, but with a simple enough connectivity to investigate what additional information we could add to the inference problem. In this case, the NF WFA does a job similar to the pixel-wise WFA because the magnetic field is mainly concentrated at the sunspots and decreases rapidly for increasing distances from the sunspots. However, the formulation of the inversion process in terms of a NF allows us to include additional regularizations.

The results of the inversions are shown in Fig. 6. The longitudinal magnetic field, the transverse magnetic field, and the azimuth of the magnetic field are shown in the first, second, and third columns, respectively. The results clearly show that a NF is marginally regularizing the inference. The quality of the inferred magnetic field far from the sunspots is similar between both the NF WFA and the pixel-wise WFA. The estimation of the magnetic field is slightly more spatially coherent, as expected, but the overall structure is very similar. This is a clear example of a limiting case in which there is not enough information to constrain the solution.

thumbnail Fig. 6.

Magnetic field inference from a Ca II 8542 Å observation using the pixel-wise WFA, the NF WFA, and two additional approaches introducing an explicit regularization term: using the information of a potential extrapolation of the magnetic field (third row) and using the orientation of the fibrils (fourth row). The first column shows the longitudinal magnetic field, the second column shows the transverse magnetic field and the third column shows the azimuth of the magnetic field.

To improve the magnetic field estimation, we have implemented an explicit regularization. It is implemented by guiding the NF to simultaneously obey another physical constraint, apart from fitting the spectropolarimetric observations. This constraint is implemented as a regularization term in the merit function parameterized with a hyperparameter λ, which penalizes the distance between the magnetic field configuration and the magnetic field configuration from the external source of information. Given that the longitudinal magnetic field is well constrained, we will focus here on adding a regularization to the transverse component, that is:

L = L Q + L U + λ L reg . $$ \begin{aligned} \mathcal{L} = \mathcal{L} _Q + \mathcal{L} _U + \lambda \mathcal{L} _{\rm reg}. \end{aligned} $$(13)

As a first example, the third row of Fig. 6 shows the result when ℒreg = ℒpot, where ℒpot is the mean squared difference between the NF and a pre-calculated potential field extrapolation Bpot from the well-constrained longitudinal magnetic field component. One would want to calculate the difference between these two magnetic fields defining ℒpot following:

L pot = x , y ( B B , pot ) 2 + x , y ( Φ B Φ B , pot ) 2 . $$ \begin{aligned} \mathcal{L} _{\rm pot} = \sum _{x,{ y}} \left(B_\perp - B_{\perp ,\mathrm {pot}}\right)^2+ \sum _{x,{ y}} \left(\Phi _B - \Phi _{B,\mathrm {pot}}\right)^2. \end{aligned} $$(14)

However, as we have not disambiguated the magnetic field azimuth, we have to calculate the distance between the corresponding analog quantities BU, pot, and BQ, pot:

L pot = x , y ( B Q B Q , pot ) 2 + x , y ( B U B U , pot ) 2 . $$ \begin{aligned} \mathcal{L} _{\rm pot} = \sum _{x,{ y}} \left(B_Q - B_{Q,\mathrm{pot}}\right)^2+ \sum _{x,{ y}} \left(B_U - B_{U,\mathrm{pot}}\right)^2. \end{aligned} $$(15)

When the value of λ is increased during the optimization, the NF starts to incorporate the information of the potential field extrapolation. However, even with a small regularization value, we can already see from Fig. 6 that the azimuth of the magnetic field becomes more spatially coherent, but the new transverse magnetic field becomes much stronger than the one estimated only from reproducing the polarization signals. This experiment shows that this particular region is far from being potential and incorporating this information will worsen the fits as soon as we increase the value of λ. More complex extrapolations can be performed but the idea here is to show how to incorporate external information into the inference problem.

Finally, we also explore the idea of incorporating the orientation of the chromospheric fibrils as if they were aligned with the magnetic field. In fact, this idea has been used to improve nonlinear force-free modeling of coronal fields (Wiegelmann et al. 2008), given the limitations of the force-free assumption of the photospheric boundary (DeRosa et al. 2015). Extrapolations performed starting from a chromospheric vector boundary condition (Fleishman et al. 2017) or starting from the photosphere and adding even an incomplete set of chromospheric magnetic field data (Fleishman et al. 2019) can measurably improve the reconstruction of the coronal magnetic field, connectivity, and electric currents. This information can potentially also improve the inference of the magnetic field, especially in areas away from strong magnetic field concentrations.

To calculate the distance between the magnetic field azimuth ΦB and the direction of the fibrils Φfib, we need to convert all angles to corresponding points on the unit circle to avoid the ambiguity problem, and then we can compute the distance of these points. The fourth row of Fig. 6 shows the result when ℒreg = ℒfib, with

L fib = x , y [ sin ( 2 Φ B ) sin ( 2 Φ fib ) ] 2 + x , y [ cos ( 2 Φ B ) cos ( 2 Φ fib ) ] 2 , $$ \begin{aligned} \mathcal{L} _{\rm fib} =&\sum _{x,{ y}} \left[\sin (2\Phi _B) - \sin (2\Phi _{\rm fib})\right]^2 \nonumber \\&+ \sum _{x,{ y}} \left[\cos (2\Phi _B) - \cos (2\Phi _{\rm fib})\right]^2, \end{aligned} $$(16)

that is, the mean squared difference between the NF azimuth and the direction of the fibrils, as calculated from the intensity image. To infer the orientation of the fibrils from the observations we have used the following procedure: (i) we have used the core of the Ca II 8542 Å line to detect the fibrils in the chromosphere3, (ii) we apply a Sobel operator in each axis and take the arctangent to retrieve the orientation of the fibrils, which is collapsed to the range (0, 180) degrees to avoid the 180° ambiguity and (iii) a Gaussian filter is applied to remove small artifacts at the edges of the fibrils (see Fig. A.3 for more information). As a result of the inference, Fig. 6 shows a magnetic field that is aligned in general with the fibrils while still reproducing the polarization signals. The middle panel shows that after incorporating the orientation of the fibrils, the transverse component remains almost the same. In fact, the estimation of the orientation of the fibrils fails in the umbra where fibrils are not visible (see Fig. A.3) but the strong polarization signals are enough to compensate for that. This guided inferred magnetic field can be a much better boundary condition for coronal field extrapolations.

Other potential regularizations that we should explore in the future are forcing the divergence-free condition or the suppression of strong electric currents. Both constraints can be computed from derivatives of the output of the neural network with respect to the input coordinates, which can be computed efficiently with techniques from automatic differentiation. This could allow us to resolve the Zeeman-180 degree azimuth ambiguity at the same time we are reproducing the spectra.

3.3. Temporal regularization

The NF can be easily extended to the case where the magnetic field is not static but evolves with time. By imposing temporal coherence in the solution, one can obtain a better estimate of the magnetic field, as shown by de la Cruz Rodríguez & Leenaarts (2024). From a technical point of view, adding the time dependence can be done by adding t as an additional input parameter of the NF:

( B , B Q , B U ) = f θ ( x , t ) . $$ \begin{aligned} (B_\parallel , B_Q, B_U) = f_\theta (\boldsymbol{x},t). \end{aligned} $$(17)

Apart from this change, inferring the magnetic field proceeds exactly as before, with the only change that the optimization process requires minimizing the merit function over the spatiotemporal domain. The number of unknowns in the explicit temporal regularization of de la Cruz Rodríguez & Leenaarts (2024) scales linearly with the number of observed time steps (nt) since it infers the value of the magnetic field at all nxny pixels of the FoV. This requires solving a very large linear system of equations of size nxnynt × nxnynt. On the contrary, the NF produces a much more compact representation of the magnetic field because it only requires adding a few weights from the input layer to the Fourier feature mapping layer. Since the magnetic field contains higher frequencies in the spatial directions than in the temporal direction, we implement the Fourier feature mapping with ωt for the time and ωxy for the spatial coordinates, with ωt < ωxy.

To showcase this approach, we have used a 26-minute time series of the active region NOAA 12593 described in Sect. 3.1. We optimize the temporal WFA NF on the Ca II 8542 Å observations of this time series. For this particular example, we have used a temporal regularization of ωt = 3 and a spatial regularization of ωxy = 80. Higher values prevented the neural network from converging correctly. The results of the inference are shown in Fig. 7 for the longitudinal magnetic field and the transverse component. The rightmost column shows the temporal evolution of the magnetic field component for the particular pixel indicated in the middle panel.

thumbnail Fig. 7.

Longitudinal magnetic field (upper row) and transverse component (lower row) inferred using the pixel-wise WFA and the temporal WFA NF for a time series of the active region NOAA 12593 observed on 2016 September 19 with the SST/CRISP. The rightmost column shows the temporal evolution of the magnetic field for a particular pixel indicated as the intersection of the dashed lines in the middle panel.

In the case of the longitudinal magnetic field (first row of Fig. 7), the NF is able to capture the details of the spatial distribution of the emerging flux region. In the extracted pixel (right upper panel of that figure), the NF is also able to capture the temporal evolution of the magnetic field. The fluctuations of pixel-wise inversion are on the order of ∼100 G, which are compatible with the noise amplitude of the observations. This makes us confident that the implicit bias introduced in the NF with the small value of ωt correctly captures the time variation of the longitudinal component of the magnetic field. However, since this particular region is very dynamic, one could argue that a less restrictive temporal regularization could be needed to properly capture the details of the small-scale changes and a higher ωt would be necessary in this type of scenario.

In the case of the transverse magnetic field (second row of Fig. 7), there is a much larger difference between the output of the NF and that of the traditional pixel-wise WFA. The traditional WFA tends to estimate a transverse magnetic field larger than 1500 G in the center of the FoV far from the sunspot, where we have weaker polarization signals, while the NF estimates fields smaller than 500 G. The NF, on the other hand, is able to provide a more coherent solution both spatially and temporally. In regions where the signals are much stronger, both approaches retrieve very similar magnetic field values (see right bottom panel of Fig. 7). Compared to the previous section, where only spatial regularization was employed, the inclusion of temporal regularization is particularly effective in reducing background noise (de la Cruz Rodríguez & Leenaarts 2024).

4. Summary and conclusions

In this study, we investigated the use of NFs to parameterize the magnetic field for magnetic field inference. We have shown the capabilities of NFs to solve static and time-dependent magnetic field inferences. This implementation4 comes with several important advantages.

First, by reformulating the problem as a global problem (instead of pixel-wise independent problems), a NF can produce a much more compact representation, decreasing the number of parameters to optimize. This reduction generally depends on both the spatial complexity of the inferred physical quantities and the efficiency of the chosen neural architecture to represent such complexity. We believe that, for 3D cases such as the inversion of Stokes profiles with stratified atmospheres, NFs will represent an exceptionally compact representation of the physical conditions (see, for example, Asensio Ramos 2023). So, we anticipate that NFs will become crucial as new modern instruments and telescopes are providing a bigger FoV of more complex data (e.g., at the SST, the recently installed CRISP cameras offer a FoV diameter of 87″ and the forthcoming CRISP2 will offer a FoV of 2 arcminutes).

Second, using a NF to describe the magnetic field as a continuous differentiable function allowed us to obtain a better estimation of the magnetic field in places where the signals are buried in noise, without smearing out the details of locations where the signals are strong. Choosing adequate regularization frequencies ω is crucial to obtain results that are not overly smooth. These frequencies should be chosen so that the quality of the fit is not degraded but still maintains sharp gradients in locations where signals significantly impact the quality of the fit. In fact, NFs seem to be key in the inference of the transverse magnetic field, which the pixel-wise WFA tends to overestimate. This new implementation can help us to better understand and quantify chromospheric heating and its relation with the strength of the horizontal magnetic field in the low chromosphere (Leenaarts et al. 2018; Díaz Baso et al. 2021; da Silva Santos et al. 2022).

Third, the present work is based on a simple forward model for the Stokes profiles. A NF version of the WFA is not realistically competitive in terms of speed with other WFA implementations. For instance, the explicitly regularized WFA implementation of Morosin et al. (2020) can estimate the magnetic field of the FoV in less than a second, while the NF version requires ∼30 seconds. The temporal regularization also takes more time, almost ∼3 minutes in an off-the-shelf GPU, compared with ∼10 minutes in a CPU. The main reason for this long processing time is the relatively slow convergence of the NF given that it is optimized using first-order gradient-based techniques. On the other hand, the merit function in spectropolarimetric inversions is often optimized using (quasi-) second-order methods such as the Levenberg-Marquardt algorithm. Second-order methods require the construction of a very large approximate Hessian matrix that can impact memory consumption. For instance, for an observation with 1000 × 1000 pixels and 50 time steps and using a Milne Eddington model (approximately ten free parameters), the Hessian matrix is of size 5 × 108 × 5 × 108. In any case, the global Hessian matrix is relatively sparse. Depending on the sparsity degree of the Hessian matrix, the resolution of the coupled linear system of equations can pose a severe computational challenge, even when efficient iterative methods are used (e.g., GMRES or BICGSTAB). Imposing nearest neighbor regularization yields a very compact and sparse matrix that can be efficiently inverted. More dense cases, originating for example from the inclusion of an extended spatial point spread function or horizontal coupling by 3D radiative transfer effects, could be trivially included in this framework, whereas it is not trivial to model the inverse problem.

Fourth, modern automatic differentiation frameworks, such as PyTorch, allowed us to seamlessly use more complex forward models, such as a Milne-Eddington model, or solve the non-LTE inversion problem in a stratified atmosphere. In Asensio Ramos & Díaz Baso (2019), we already anticipated that traditional predictive neural networks, despite their remarkable speed, are not explicitly fitting Stokes profiles and the introduction of differentiable forward synthesis would notably increase the optimization time. Considering that the forward models are the bottleneck in complex spectropolarimetric inversions, emulators of the radiative transfer model (e.g., Asensio Ramos et al. 2024) emerges as a crucial strategy to speed up the calculations. Additionally, these frameworks allowed us to seamlessly speed up the calculations using GPUs. Adding extra regularization terms to guide the solution is straightforward in these frameworks. As mentioned, by improving the estimation of the magnetic field at the chromospheric level, we can perform better coronal magnetic field extrapolations, and the chromospheric fibrils contain valuable information about the non-potentiality of the magnetic field which should be integrated into the inference (Jing et al. 2011). Apart from the ones used in this work (alignment of the magnetic field with the chromospheric fibrils or similarity to a precomputed magnetic field extrapolation), one can think of adding Tikhonov regularization to the physical quantities or their spatial derivatives, sparsity constraints using linear transformation such as wavelets (e.g., Asensio Ramos & de la Cruz Rodríguez 2015), or a divergence-free constraint. By extending this model to include additional constraints we could potentially resolve the Zeeman-180 degree azimuth ambiguity directly within the inversion process (Štěpán et al. 2022, 2024). This would enhance the accuracy and reliability of the inferred magnetic field vectors.

Finally, a natural extension would be a probabilistic reconstruction (i.e., providing the uncertainty of the magnetic field) by using gradient-based variational inference, as shown by Mishra-Sharma & Yang (2022), in the context of gravitational lens reconstruction. This could be accomplished, for example, by modeling the magnetic field distribution using a normalizing flow (Díaz Baso et al. 2022). Another point to explore is the accuracy in reproducing very high-frequency features in the observations because our Fourier feature layer becomes less stable when we increase the complexity of the problem. A way of mitigating this problem can be using a multi-scale representation (Dolean et al. 2024; Saragadam et al. 2022) or by adding higher frequencies progressively as the training progresses (Asensio Ramos et al. 2024). Having demonstrated the potential of the method in a proof-of-principle setting, we leave these extensions to future work.

In summary, NFs allowed us to improve the reconstruction of the magnetic field properties of the solar atmosphere, which is especially suitable for imaging spectropolarimeters with large FoVs and a scarce wavelength sampling. However, it is also suitable for integral-field spectropolarimeters (van Noort et al. 2022; Rouppe van der Voort et al. 2023) where, although the FoV is smaller, the temporal cadence is very high and the temporal regularization holds much better. These properties make NFs valuable for the data taken with the next generation of telescopes such as the existing Daniel K. Inouye Solar Telescope (DKIST; Rimmele et al. 2020) and the upcoming European Solar Telescope (EST; Quintero Noda et al. 2022).


1

Surprisingly, this implicit bias is one of the key reasons for the success of deep learning, in which NNs focus on the general properties of the data manifolds, instead of focusing on the details.

3

We note that some preprocessing is usually performed to enhance the skeleton of the fibrils before the Sobel filters. However, we did not find a significant difference, so we left out this extra step.

4

Our implementation is publicly available in the following repository: https://github.com/cdiazbas/neural_wfa

Acknowledgments

We would like to thank the anonymous referee for their comments and suggestions. CJDB acknowledges M. L. DeRosa for valuable discussions on future application of NF inversion methods. This research is supported by the Research Council of Norway, project number 325491, and through its Centres of Excellence scheme, project number 262622. AAR acknowledges financial support from the Agencia Estatal de Investigación del Ministerio de Ciencia, Innovación y Universidades (MCIU/AEI) and the European Regional Development Fund (ERDF) through project PID2022-136563NB-I00. This project has been funded by the European Union through the European Research Council (ERC) under the Horizon Europe program (MAGHEAT, grant agreement 101088184). The NSO is operated by the Association of Universities for Research in Astronomy, Inc., under cooperative agreement with the National Science Foundation. The Swedish 1-m Solar Telescope is operated on the island of La Palma by the Institute for Solar Physics of Stockholm University in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias. The Swedish 1-m Solar Telescope, SST, is co-funded by the Swedish Research Council as a national research infrastructure (registration number 4.3-2021-00169). We acknowledge the community effort devoted to the development of the following open-source packages that were used in this work: numpy (https://numpy.org/), matplotlib (https://matplotlib.org/), scipy (https://scipy.org/), astropy (https://www.astropy.org/) and sunpy (https://sunpy.org/). This research has made use of NASA’s Astrophysics Data System Bibliographic Services.

References

  1. Asensio Ramos, A. 2023, Sol. Phys., 298, 135 [NASA ADS] [CrossRef] [Google Scholar]
  2. Asensio Ramos, A., & de la Cruz Rodríguez, J. 2015, A&A, 577, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Asensio Ramos, A., & Díaz Baso, C. J. 2019, A&A, 626, A102 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  4. Asensio Ramos, A., & Pallé, E. 2021, A&A, 646, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  5. Asensio Ramos, A., Westendorp Plaza, C., Navarro-Almaida, D., et al. 2024, MNRAS, 531, 4930 [NASA ADS] [CrossRef] [Google Scholar]
  6. Carlsson, M., Hansteen, V. H., Gudiksen, B. V., Leenaarts, J., & De Pontieu, B. 2016, A&A, 585, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  7. Centeno, R. 2018, ApJ, 866, 89 [Google Scholar]
  8. Clevert, D. A., Unterthiner, T., & Hochreiter, S. 2015, arXiv e-prints [arXiv:1511.07289] [Google Scholar]
  9. da Silva Santos, J. M., Danilovic, S., Leenaarts, J., et al. 2022, A&A, 661, A59 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. da Silva Santos, J. M., Reardon, K., Cauzzi, G., et al. 2023, ApJ, 954, L35 [NASA ADS] [CrossRef] [Google Scholar]
  11. De Ceuster, F., Ceulemans, T., Decin, L., Danilovich, T., & Yates, J. 2024, ApJS, 275, 44 [NASA ADS] [CrossRef] [Google Scholar]
  12. de la Cruz Rodríguez, J. 2019, A&A, 631, A153 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  13. de la Cruz Rodríguez, J., & Leenaarts, J. 2024, A&A, 685, A85 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. de la Cruz Rodríguez, J., De Pontieu, B., Carlsson, M., & Rouppe van der Voort, L. H. M. 2013, ApJ, 764, L11 [CrossRef] [Google Scholar]
  15. de la Cruz Rodríguez, J., Leenaarts, J., & Asensio Ramos, A. 2016, ApJ, 830, L30 [Google Scholar]
  16. de la Cruz Rodríguez, J., Leenaarts, J., Danilovic, S., & Uitenbroek, H. 2019, A&A, 623, A74 [Google Scholar]
  17. DeRosa, M. L., Wheatland, M. S., Leka, K. D., et al. 2015, ApJ, 811, 107 [NASA ADS] [CrossRef] [Google Scholar]
  18. Díaz Baso, C. J., Martínez González, M. J., & Asensio Ramos, A. 2019a, A&A, 625, A128 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Díaz Baso, C. J., de la Cruz Rodríguez, J., & Danilovic, S. 2019b, A&A, 629, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  20. Díaz Baso, C. J., de la Cruz Rodríguez, J., & Leenaarts, J. 2021, A&A, 647, A188 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Díaz Baso, C. J., Asensio Ramos, A., & de la Cruz Rodríguez, J. 2022, A&A, 659, A165 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  22. Díaz Baso, C. J., Rouppe van der Voort, L., de la Cruz Rodríguez, J., & Leenaarts, J. 2023, A&A, 673, A35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Dolean, V., Heinlein, A., Mishra, S., & Moseley, B. 2024, Computer Methods in Applied Mechanics and Engineering, 429, 117116 [NASA ADS] [CrossRef] [Google Scholar]
  24. Dominguez-Tagle, C., Collados, M., Lopez, R., et al. 2022, J. Astron. Instrum., 11, 2250014 [NASA ADS] [CrossRef] [Google Scholar]
  25. Esteban Pozuelo, S., Asensio Ramos, A., Díaz Baso, C. J., & Ruiz Cobo, B. 2024, A&A, 689, A255 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  26. Fleishman, G. D., Anfinogentov, S., Loukitcheva, M., Mysh’yakov, I., & Stupishin, A. 2017, ApJ, 839, 30 [NASA ADS] [CrossRef] [Google Scholar]
  27. Fleishman, G., Mysh’yakov, I., Stupishin, A., Loukitcheva, M., & Anfinogentov, S. 2019, ApJ, 870, 101 [NASA ADS] [CrossRef] [Google Scholar]
  28. Fukushima, K. 1969, IEEE Trans. Syst. Sci. Cybern., 5, 322 [CrossRef] [Google Scholar]
  29. Gudiksen, B. V., Carlsson, M., Hansteen, V. H., et al. 2011, A&A, 531, A154 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  30. He, K., Zhang, X., Ren, S., & Sun, J. 2015, arXiv e-prints [arXiv:1512.03385] [Google Scholar]
  31. Jarolim, R., Tremblay, B., Rempel, M., et al. 2024, ApJ, 963, L21 [NASA ADS] [CrossRef] [Google Scholar]
  32. Jing, J., Yuan, Y., Reardon, K., et al. 2011, ApJ, 739, 67 [NASA ADS] [CrossRef] [Google Scholar]
  33. Jurčák, J., Štěpán, J., Trujillo Bueno, J., & Bianda, M. 2018, A&A, 619, A60 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  34. Landi Degl’Innocenti, E., & Landi Degl’Innocenti, M. 1973, Sol. Phys., 31, 299 [CrossRef] [Google Scholar]
  35. Landi Degl’Innocenti, E., & Landolfi, M. 2004, Polarization in Spectral Lines (Kluwer Academic Publishers) [Google Scholar]
  36. Leenaarts, J., Carlsson, M., & Rouppe van der Voort, L. 2012, ApJ, 749, 136 [NASA ADS] [CrossRef] [Google Scholar]
  37. Leenaarts, J., Pereira, T. M. D., Carlsson, M., Uitenbroek, H., & De Pontieu, B. 2013, ApJ, 772, 90 [NASA ADS] [CrossRef] [Google Scholar]
  38. Leenaarts, J., de la Cruz Rodríguez, J., Danilovic, S., Scharmer, G., & Carlsson, M. 2018, A&A, 612, A28 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  39. Liaudat, T. I., Mars, M., Price, M. A., et al. 2024, RAS Tech. Instrum., 3, 505 [NASA ADS] [CrossRef] [Google Scholar]
  40. Martínez González, M. J., Asensio Ramos, A., Carroll, T. A., et al. 2008, A&A, 486, 637 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  41. Martínez González, M. J., Manso Sainz, R., Asensio Ramos, A., & Belluzzi, L. 2012, MNRAS, 419, 153 [CrossRef] [Google Scholar]
  42. Mishra-Sharma, S., & Yang, G. 2022, Machine Learning for Astrophysics, 34 [Google Scholar]
  43. Morosin, R., de la Cruz Rodríguez, J., Vissers, G. J. M., & Yadav, R. 2020, A&A, 642, A210 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  44. Morosin, R., de la Cruz Rodríguez, J., Díaz Baso, C. J., & Leenaarts, J. 2022, A&A, 664, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  45. Parikh, N., & Boyd, S. 2014, Foundations and Trends in Optimization, 1 [Google Scholar]
  46. Pastor Yabar, A., Borrero, J. M., Quintero Noda, C., & Ruiz Cobo, B. 2021, A&A, 656, L20 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  47. Paszke, A., Gross, S., Massa, F., et al. 2019, arXiv e-prints [arXiv:1912.01703] [Google Scholar]
  48. Pietrow, A. G. M., Kiselman, D., de la Cruz Rodríguez, J., et al. 2020, A&A, 644, A43 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  49. Quintero Noda, C., Schlichenmaier, R., Bellot Rubio, L. R., et al. 2022, A&A, 666, A21 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  50. Rahaman, N., Baratin, A., Arpit, D., et al. 2018, arXiv e-prints [arXiv:1806.08734] [Google Scholar]
  51. Rimmele, T. R., Warner, M., Keil, S. L., et al. 2020, Sol. Phys., 295, 172 [Google Scholar]
  52. Rouppe van der Voort, L. H. M., van Noort, M., & de la Cruz Rodríguez, J. 2023, A&A, 673, A11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  53. Saragadam, V., Tan, J., Balakrishnan, G., Baraniuk, R. G., & Veeraraghavan, A. 2022, arXiv e-prints [arXiv:2202.03532] [Google Scholar]
  54. Scharmer, G. B., Bjelksjo, K., Korhonen, T. K., Lindberg, B., & Petterson, B. 2003, Proc. SPIE, 4853, 341 [Google Scholar]
  55. Scharmer, G. B., Narayan, G., Hillberg, T., et al. 2008, ApJ, 689, L69 [Google Scholar]
  56. Sitzmann, V., Martel, J. N. P., Bergman, A. W., Lindell, D. B., & Wetzstein, G. 2020, arXiv e-prints [arXiv:2006.09661] [Google Scholar]
  57. Sukhorukov, A. V., & Leenaarts, J. 2017, A&A, 597, A46 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  58. Tancik, M., Srinivasan, P. P., Mildenhall, B., et al. 2020, arXiv e-prints [arXiv:2006.10739] [Google Scholar]
  59. van Noort, M. 2012, A&A, 548, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  60. van Noort, M., Bischoff, J., Kramer, A., Solanki, S. K., & Kiselman, D. 2022, A&A, 668, A149 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  61. Vissers, G. J. M., Danilovic, S., de la Cruz Rodríguez, J., et al. 2021, A&A, 645, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  62. Vissers, G. J. M., Danilovic, S., Zhu, X., et al. 2022, A&A, 662, A88 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  63. Štěpán, J., del Pino Alemán, T., & Trujillo Bueno, J. 2022, A&A, 659, A137 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  64. Štěpán, J., del Pino Alemán, T., & Trujillo Bueno, J. 2024, A&A, 689, A341 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  65. Wiegelmann, T., Thalmann, J. K., Schrijver, C. J., De Rosa, M. L., & Metcalf, T. R. 2008, Sol. Phys., 247, 249 [NASA ADS] [CrossRef] [Google Scholar]
  66. Yadav, R., Díaz Baso, C. J., de la Cruz Rodríguez, J., Calvo, F., & Morosin, R. 2021, A&A, 649, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  67. Yadav, R., Kazachenko, M. D., Afanasyev, A. N., de la Cruz Rodríguez, J., & Leenaarts, J. 2023, ApJ, 958, 54 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Additional figures

thumbnail Fig. A.1.

Same as Fig. 4 but for an increased noise level. From left to right: original magnetic field from the simulation, magnetic field inferred by the pixel-wise WFA and results using the WFA NF.

thumbnail Fig. A.2.

Intensity at λ0 − 0.425 Å (left panel) and total scaled Stokes I derivative (right panel) of the active region NOAA 12593. The very low values of the Stokes I derivative are the locations where the pixel-wise WFA infers very high transverse magnetic fields.

thumbnail Fig. A.3.

Steps to retrieve the orientation of the fibrils using the intensity of the Ca II line: the orientation (middle panel) is obtained by applying Sobel filters to the monochromatic image at the core (top panel), which is later filtered (bottom panel).

All Figures

thumbnail Fig. 1.

Sketch of the NF approach. The physical quantities are described by a neural network that takes the coordinates as input and outputs the physical quantities (temperature T, magnetic field B, velocity v, etc.) at that point (x, y). The output of the network is then used to compute the observables from the model using a radiative transfer (RT) module. In this work, the output of the network is the magnetic field vector and the RT module is the weak-field approximation. The synthetic observables are compared with the observations and the error is back-propagated through the network to obtain the optimal solution.

In the text
thumbnail Fig. 2.

Comparison of the parametric representation power of each method. From left to right: the original longitudinal magnetic field at z = 1500 km in the Bifrost simulation, represented using a ReLU neural network, and using a NF with ω = 10 and ω = 60.

In the text
thumbnail Fig. 3.

Performance of the NF when evaluating different batches of pixels randomly chosen in the FoV in every iteration.

In the text
thumbnail Fig. 4.

Comparison of the reconstruction of the magnetic field vector from the synthetic Ca II 8542 Å spectra calculated from the simulation (in rows: line-of-sight component, transverse component, and azimuth angle). From left to right: original magnetic field from the simulation, magnetic field inferred by the pixel-wise WFA, and results using the WFA NF.

In the text
thumbnail Fig. 5.

Magnetic field reconstruction from a Ca II 8542 Å observation using the pixel-wise WFA (top row) and using the NF WFA with ω = 120 for B and ω = 80 for B and ΦB (bottom row). Columns show the magnetic field in terms of the longitudinal component, the transverse component, and the azimuth angle.

In the text
thumbnail Fig. 6.

Magnetic field inference from a Ca II 8542 Å observation using the pixel-wise WFA, the NF WFA, and two additional approaches introducing an explicit regularization term: using the information of a potential extrapolation of the magnetic field (third row) and using the orientation of the fibrils (fourth row). The first column shows the longitudinal magnetic field, the second column shows the transverse magnetic field and the third column shows the azimuth of the magnetic field.

In the text
thumbnail Fig. 7.

Longitudinal magnetic field (upper row) and transverse component (lower row) inferred using the pixel-wise WFA and the temporal WFA NF for a time series of the active region NOAA 12593 observed on 2016 September 19 with the SST/CRISP. The rightmost column shows the temporal evolution of the magnetic field for a particular pixel indicated as the intersection of the dashed lines in the middle panel.

In the text
thumbnail Fig. A.1.

Same as Fig. 4 but for an increased noise level. From left to right: original magnetic field from the simulation, magnetic field inferred by the pixel-wise WFA and results using the WFA NF.

In the text
thumbnail Fig. A.2.

Intensity at λ0 − 0.425 Å (left panel) and total scaled Stokes I derivative (right panel) of the active region NOAA 12593. The very low values of the Stokes I derivative are the locations where the pixel-wise WFA infers very high transverse magnetic fields.

In the text
thumbnail Fig. A.3.

Steps to retrieve the orientation of the fibrils using the intensity of the Ca II line: the orientation (middle panel) is obtained by applying Sobel filters to the monochromatic image at the core (top panel), which is later filtered (bottom panel).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.