A new nonlocal thermodynamical equilibrium radiative transfer method for cool stars
Method and numerical implementation
^{1} Lund Observatory, Box 43, 221 00 Lund, Sweden
email: julien.lambert@astro.lu.se
^{2} Laboratoire Univers et Particules de Montpellier (LUPM), UMR 5299, CNRS, Université Montpellier 2 − CC72 Place Eugène Bataillon, 34095 Montpellier Cedex 5, France
^{3} Institut de Planétologie et d’Astrophysique de Grenoble (IPAG), UMR 5109, Université Joseph Fourier, CNRS, OSUG, 38041 Grenoble Cedex 9, France
Received: 15 October 2013
Accepted: 2 April 2015
Context. The solution of the nonlocal thermodynamical equilibrium (nonLTE) radiative transfer equation usually relies on stationary iterative methods, which may falsely converge in some cases. Furthermore, these methods are often unable to handle largescale systems, such as molecular spectra emerging from, for example, cool stellar atmospheres.
Aims. Our objective is to develop a new method, which aims to circumvent these problems, using nonstationary numerical techniques and taking advantage of parallel computers.
Methods. The technique we develop may be seen as a generalization of the coupled escape probability method. It solves the statistical equilibrium equations in all layers of a discretized model simultaneously. The numerical scheme adopted is based on the generalized minimum residual method.
Results. The code has already been applied to the special case of the water spectrum in a red supergiant stellar atmosphere. This demonstrates the fast convergence of this method, and opens the way to a wide variety of astrophysical problems.
Key words: radiative transfer / stars: atmospheres / methods: numerical
© ESO, 2015
1. Introduction
Radiative transfer plays a central role in astrophysics. The solution of the radiative transfer equation (RTE) is required to properly interpret any spectrum observed from an astrophysical object. Furthermore the radiation field often represents an important ingredient in hydrodynamics, for instance, through the energy budget (radiative heating and/or cooling) and the radiative pressure. However, as we discuss below, while the standard expression of the RTE is an ordinary differential equation of the 1st order, it may be highly nonlinear and nonlocal, especially because of scattering processes. The solution of the RTE thus requires specific procedures. Furthermore, with the development of infrared (IR) astronomy (e.g., from VLT, CRIRES in the nearIR to ALMA through Herschel/HIFI), one is now often confronted with very largescale problems resulting from molecular spectra, which most techniques developed so far cannot handle.
The techniques for solving the RTE may be divided into a few strategies. Among approximate methods, one that is widely used, relies on the escape probability technique (e.g., Castor 1970), which does not take the nonlocality of the RTE into account. Secondly, Monte Carlo methods can be used because of the probabilistic nature of scattering processes. These methods can successfully be adapted to any geometry, but may be unadapted when large optical depths are met (see, e.g., Juvela 2005, for a recent review). We do not discuss these two classes of methods hereafter.
To properly treat the nonlocality and the nonlinearity of the RTE, it is usually rewritten in the form of a diffusion integral equation, which is then solved iteratively^{1}. This integral form is usually referred to as the Λ operator. The form involves a kernel, which determines the mathematical nature of the integral equation. For example, in the case of an assumed plane parallel geometry, the kernel involves the exponential integral function, and the RTE then becomes a weakly singular Fredholm equation of the second type (this singularity is an important aspect, which will be treated below). It can then be formally solved thanks to appropriate algorithms, such as the Atkinson algorithm (Ahues et al. 2002). Since the inversion of the Λ operator is usually numerically prohibitive, one needs an iteration scheme. It is well known that the iteration with the complete Λ operator at best converges very slowly or even stabilizes far from the real solution. A major step in solving the RTE has been the introduction of the socalled approximate lambda iteration (ALI), initially by Cannon (1973), then called the operator splitting method. The ALI is in fact an application of the Jacobi method to the RTE, as shown by Olson et al. (1986), who greatly improved the mathematical understanding of the problem. In these methods, the choice of the approximate operator (denoted Λ^{⋆}) is crucial. The most common choice for Λ^{⋆} is the diagonal part of Λ, which ensures a rapid convergence if Λ is diagonally dominant matrix (Gershgorin’s theorem). ALI has been applied to a large variety of problems, including 3D polarized transfer (Štěpán & Trujillo Bueno 2013). This latter work also emphasizes the potential progress that can be made by taking advantage of massively parallel computing.
Meanwhile, new mathematical techniques have been applied to the RTE problem, especially the GaussSeidel method and the successive over relaxation (SOR) method, which significantly improve the convergence properties (Trujillo Bueno & Fabiani Bendicho 1995) that are determined by the spectral radius of their amplification matrix. Besides these, complete linearization methods have also been developed, the wellknown case being MULTI (Scharmer & Carlsson 1985). Despite apparent different formulations, the two approaches are essentially the same, as shown by SocasNavarro & Trujillo Bueno (1997).
Indeed, all these methods are based on linearization techniques. In particular, the dependence of the radiation field on the populations of the energy levels is approximated. Furthermore, all these methods are stationary (in the case of SOR, this is true if the relaxation parameter is constant over iterations), meaning that the spectral radius is constant. As the error vector e^{m} at the mth iteration is proportional to the power m of the spectral radius of the amplification matrix  λ_{max}  ^{m}, the convergence may thus be very slow in some cases. For the Lambda iteration, the spectral radius is close to 1 when the photon destruction probability ϵ ≪ 1 and/or the total optical depth is ≫1, explaining why it falsely converges. This is illustrated in Figs. 1 and 2, where the classical Eddington problem (twolevel atom with constant ϵ) is solved with Λ iteration (LI) and ALI, respectively. The false convergence of LI and the very low convergence of ALI are illustrated in cases B. Indeed, their rate of convergence is very slow for strongly nonlocal thermodynamical equilibrium (nonLTE) problems (nonLTE parameter ϵ ≪ 1, ϵ being the fraction of the source function given by the Planck function, or the photon destruction probability) and to the number of points per decade of optical depth. In the latter case (i.e., a fine sampling of the stellar atmosphere), this problem of slow convergence is well explained by Fabiani Bendicho et al. (1997), who also showed that the multigrid method can nicely circumvent this problem. This fine sampling is however rarely required, especially as classical model atmospheres are static. An excellent discussion of the general problem of false convergence is given in Chap. 13 in Hubeny & Mihalas (2014).
The iterative schemes are usually combined with a technique of acceleration of convergence (Auer 1991). These fall into two main categories. The first is based on the minimization of the residual, which is the case of the Ng acceleration (Ng 1974). The second class relies on minimization with respect to a set of conjugate vectors. This was first applied to the RTE by Klein et al. (1989; see also Dickel & Auer 1994).
The second class of techniques (the minimization with respect to a set of conjugate vectors) leads naturally to more general, nonlinear, nonstationary methods, such as the conjugate gradient and the generalized minimum residual method (GMRES). To our knowledge, only the conjugate gradient has been applied to the RTE, by Paletou & Anterrieu (2009).
The aim of our work is to develop a new nonstationary method for solving the RTE, which fully takes its nonlinearity into account. This new method thus opens the possibility to explore the mathematical consequences of this intrinsic property of the RTE. Furthermore, circumventing the possible slow or false convergence of stationary methods should allow an appropriate treatment of some specific cases, such as, e.g., shocked regions, which require a fine sampling and present a large dynamics of optical depth values. Finally, as the coding strategy of this method has been conceived as a parallel code from the beginning, it is applicable to largescale systems such as molecular spectra, and this was indeed our initial motivation (a posteriori parallelization is often less efficient, or even impossible).
The derivation of the equations, including the Jacobian matrix coefficients, used in the nonlinear method is presented in Sect. 2. The implementation of this method and its parallelization are described in Sect. 3. The application of the code to a specific problem facilitates the determination of its convergence properties, as shown in Sect. 4.
Fig. 1 Spectral radius of amplification matrix for the Λ iteration method. Convergence is shown for two spectral radii, marked with letters A (upper panel) and B (lower panel) in the righthand panels. The thin black lines correspond to successive iterations; the true solution is given by the orange line. 

Open with DEXTER 
Fig. 2 Same as Fig. 1, except for ALI with Λ^{⋆} = diagonal of Λ. 

Open with DEXTER 
2. Global strategy
Contrary to the twolevel atom problem with a constant destruction probability, which can be directly solved by direct inversion (or through the analytic Eddington solution). Multilevel atom or molecule problems require an iterative determination for two reasons. Firstly, the multilevel problem is nonlinear. Secondly, the radiative transfer equation is nonlocal, and one has to propagate the modification of the field during iterations. As we discussed in the previous section, however, this stationary iteration may be problematic for largescale systems. The basic concept of the strategy we adopt is to suppress the iterations due to the nonlocality of the RTE by considering the full system (spatially) and computing the radiation field exactly (assuming given populations) with an analytic formulation of the mean intensity field, which explicitly depends on populations.
2.1. Formulation of the problem
For clarity, we hereafter consider the special case of plane parallel geometry. In this context the 1D RTE is (1)where I_{ν} is the specific intensity, η and χ are the emission and extinction coefficients, and s is the geometrical path. We explicitly separate the continuum and the line processes (all demonstrations and symbols can be found in Appendices A and C, respectively). Assuming complete redistribution, we introduce classical expressions for line processes, where u and l refer to the upper and lower levels, respectively. Hereafter, if the distinction between upper and lower levels is not required, the indices i and j are used. The normalized line profile is and A and B are the Einstein coefficients.
Following the strategy of Gonzalez Garcia et al. (2008), we then introduce a unique optical depth scale for all frequencies, e.g,. the commonly used τ_{V} scale (optical depth at 500 nm). Equation (1)then becomes (3)where
The notation means that the quantity is expressed per hydrogen atom. Details can be found in Appendix A. As implicitly assumed in Eq. (3), we do not consider line overlap. This assumption is justified in the case of infrared molecular lines. Otherwise, a summation over transitions (at least adjacent ones) would be required.
This formulation depends explicitly on the populations of each energy level of the species considered. These populations are solutions of the statistical equilibrium equation system^{2}, which is for each level n_{i}, the detailed balance of incoming and outgoing (de)excitation processes, defined by radiative (Einstein coefficients), and collisional rates (n_{col} being the local density of the colliding partner). Thus the statistical equilibrium equations take the form (5)where the mean radiation field depends on the solution of the RTE, (6)Because this system of equations is degenerate, one of them is replaced by the normalization equation , where n^{k} is the total density of the species k.
To express each equation of the statistical equilibrium system with the same unknowns as in Eq. (3), we rewrite the statistical equilibrium with the variables x_{i} defined in Eq. (4). Moreover, for numerical reasons, we make a variable change for the mean intensity, (7)This leads to a matrix formulation of the statistical equilibrium, with an upper triangular part corresponding to incoming processes, and a lower triangular part corresponding to outgoing processes. Finally, thanks to the Einstein relations ( and B_{lu} = (g_{u}/g_{l})B_{ul}), the system of Eqs. (5)becomes (8)Because radiative transfer is a nonlocal problem, an integral formulation of the mean radiation field with an explicit dependence on populations is required.
2.2. The mean radiation field
To derive the mean radiation field, the first step consists of deriving the formal solution of the RTE. A classical way to obtain an integral form of the formal solution of a differential equation is to use the Green’s functions G_{μν}. This method consists of replacing the source term by a Dirac function. Taking the boundary conditions into account, the equation is then solved using its Laplace transform. The complete solution is then obtained after integration of this Green’s function, that is, performing a summation over each point source, according to the superposition principle (which is valid if the differential operator is linear). Thus, the integral form of I_{ν}(τ_{V},μ) can be obtained by the determination of the Green’s function of Eq. (3). This Green’s function is easily computable for spherical and other geometries using some symbolic tools such as Mathematica, making it possible to generalize the method for 2D or 3D problems. Knowing this Green’s function, we can directly obtain the formal integral form of the specific intensity, which is (9)We assume the line profile does not depend on μ. This expression is thus not applicable in the case of anisotropic scattering or in the presence of radiation fields, but is valid in the case of classical static model atmospheres. The formal expression of the mean intensity, , requires an integration over solid angles and frequencies according to Eq (6). We assume that D_{ij}x_{i}ζ and ξ_{ν}S_{ν} are angular independent (physically this corresponds to the assumption of isotropic scattering) and frequency independent over the line width (ξ_{ν}S_{ν} → ξ_{ij}S_{νij}). This is a very weak assumption because the continuum is nearly constant on the scale of a line width. Then, we can extract the source terms from the integral over solid angles and frequencies. Moreover, we separate line and continuum contributions by splitting the integral, which leads to (10)Then, after determination of the Green’s function in a given geometry, one obtains an integral form for the different contributors to the radiation field. We assume that the continuum contribution is in LTE (S_{νij} = B_{νij}(T)). Moreover, we consider, as boundary conditions, an incoming field from the inner atmosphere. Then, (11)where, where β_{−} [x_{i},x_{j}] (τ_{V}) is the escape probability from position τ_{V} in the atmosphere, (13)The kernels L_{1} [x_{i},x_{j}] (t,τ_{V}) and K_{1} [x_{i},x_{j}] (t,τ_{V}) are, respectively: where the total optical depth is given by (15)We do not include an external radiation field. In the plane parallel geometry considered here, it would require us to take additional effects such as limb darkening in realistic cases into account (e.g., illumination by a planet or a companion). We have introduced the function ε(t,τ_{V}) = 1 if t < τ_{V} and −1 if t > τ_{V} to split the integration over μ according to its sign. The physical meaning of the kernels L_{1}(t,τ_{V}) and K_{1}(t,τ_{V}) can be understood as the probability density for photons, emitted in a layer t, to contribute to the line excitation in the layer τ_{V}.
We now have an analytic expression of the mean intensity with an explicit dependence on populations. We now have to include this expression in the equations of statistical equilibrium to close the system and form a system of coupled integral equations (a Fredholmlike equation system).
2.3. The multizone statistical equilibrium equation system
The inversion of the system of statistical equilibrium equations allows us to obtain the populations also including the formal solution of RTE. But because it is nonlocal, the knowledge of requires the knowledge of the populations everywhere in the atmosphere because of the integral over the optical depth t in Eq. (12).
For this reason, one needs to solve the statistical equilibrium everywhere in the atmosphere simultaneously. The first step consists of rewriting Eq. (5)in a matrix form. A vector x(τ_{V}) containing all the populations of the considered species in a given layer, normalized to the density in that layer, is built. The size of this vector is thus equal to N, the number of energy levels of the considered species. We introduce the rate matrix (16)Where L_{1} is a lower triangular matrix of ones, A is a vector containing the spontaneous deexcitation coefficients (A_{ul}), Z is a matrix containing the radiation field (see Eq. (7)), C a matrix containing the collisional rates (n_{col}C_{ij}), and g is a vector containing the statistical weights (g_{i}).
Now the statistical equilibrium system can be written as, (17)with (18)where u is a ones vector. After discretization of the medium into P layers indexed by l, the vectorial function x(τ_{V}) becomes x^{l} and Eq. (16)becomes (19)as the radiation field, and thus the matrix Z depends implicitly on the populations in all layers, and the statistical equilibrium system can be rewritten, (20)To take care of the coupling between the layers and close the system to solve Eq. (20)for all l in a unique step, we build the multizone statistical equilibrium system: (21)We introduce the vector X = (x^{1},x^{2},...,x^{P}) and the blockdiagonal matrix Γ to rewrite Eq. (21)as a multiD nonlinear function, (22)With an initial X_{0} within the convergence radius, we can solve this nonlinear equation with a classical scheme such as the Newton method. An important question, however, is to determine whether the above system has a solution and whether the solution we expect to get is physical (unique and positive).
2.4. Existence, uniqueness and positivity of the solution
From a physical point of view, one naturally expects a unique and positive solution of the RTE. Reflecting on this problem is in fact a way to verify whether it is mathematically well formulated. Indeed, as stated above, the coupling between the populations and the radiation field is nonlinear. Without linearization, the solution can be regarded as the equilibrium state of Markov processes. Furthermore, because of radiative selection rules and collisional propensity rules, many factors are very small, and may numerically tend toward zero. The problem is thus potentially illconditioned.
The formulation adopted here is a multizone statistical equilibrium. The problem of uniqueness and positivity of the solution of the monozone statistical equilibrium has been addressed by Damgaard et al. (1992) and Rybicki (1997). As this system of equations is degenerate, one equation is usually replaced by a normalization condition, namely the conservation equation, to overcome the singularity of the associated matrix. Then, Damgaard et al. (1992) show that the solution is positive if all the rates are strictly nonzero. Yet certain transition probabilities vanish, physically or numerically. As long as all the levels are connected, just one nonzero cofactor guarantees the regularity of the matrix. More generally, Rybicki (1997) demonstrated that even for the system without a normalization condition, positive states can link (and be linked) only to positive states. Then, if the statistical equilibrium equations can be transformed into a set of uncoupled irreducible subproblems with a strictly positive normalization condition, the solution is positive and unique. In other words, the uniqueness and positivity properties of the general linear statistical equilibrium equations relate to the underlying connectivity properties of the states.
In our multizone formulation, the coupling between statistical equilibria in every atmospheric layer is strongly nonlinear. That is, the system cannot be split into uncoupled sets of equations, i.e., a set of uncoupled irreducible subproblems. This may impact the number of required normalization conditions, while only one per atmospheric layer (the conservation equation) is available. The uniqueness of the solution may then be demonstrated through the search for a nonlinear fixed point as the solution of the equilibrium of a Markov chain (P. Azerad, priv. comm.). A forthcoming paper will be devoted to this. At least, as the properties of connectivity still apply, the positivity of the solution is guaranteed.
3. Implementation
3.1. Nonlinear solving
In the method developed here, the computation of the statistical equilibrium is based on its exact formulation and explicitly includes radiative transfer effects through the mean radiation field in Eq. (22). Finding the root of this function naturally leads to the exact solution to the statistical equilibrium, and avoids any false convergence that may be obtained with stationary iterative methods.
The computation of the norm of a function, (23)N and P being the number of considered energy levels and the number of layers in the model atmosphere, respectively, is equivalent to an optimization problem, that is, the search for the minimum of the L_{2} norm of the function f, (24)The classical way to minimize this function is to move along a vector δX in a descending direction, using the Newton’s method. Indeed, for δX = X_{k}−X_{k−1} = −J^{1}·F, the gradient along the vector δX is (25)Newton’s method is very efficient in approaching the solution as it has a quadratic convergence. For each successive Newton step, we have to solve the linear problem JδX_{k} = −F, where k is the index of the Newton iteration. However, with classical algebric methods, this operation leads to an algorithm, which is prohibitive in terms of computational time for largescale systems. A robust and efficient way to accelerate this operation is to approximate the solution using a faster algorithm. We adopt GMRES, which is a subclass of Krylov subspace methods.
A Krylov subspace method begins with an initial guess . At the nth subiteration, is determined through a correction in the nth Krylov subspace, (26)The main advantage of this method is that its implementation requires only some vector matrix (or vector transposed matrix) products. It leads to a complexity in which may be reduced to if the matrix is sparse. Moreover, the scheme needs operations, but the algorithm can be restarted to minimize this contribution.
For each Newton step, we thus have to solve the linear problem JδX = −F, which will be rewritten in this section as Ax = b for simplicity. Stationary iterative methods (Jacobi, GaussSeidel, SOR) approximate the unknown vector x with a scheme x^{(k)} = Bx^{(k−1)} + c, where neither B nor c depend upon the iteration count k. We use the GMRES method, which is part of a class of (nonstationary) methods named Krylov subspace methods. The GMRES algorithm tries to evaluate the guess (the nth order Krylov subspace) by solving an optimization problem (minimizing the residue ∥ Ax_{n}−b ∥ _{L2}) like a leastsquare problem. The nth Krylov subspace is spanned over the n basis vectors, (27)Then, the approximate solution is x_{n} = K_{n}c, where the Krylov matrix is K_{n} = ^{[}b,Ab,A^{2}b...,A^{n−1}b^{]} and c is an appropriate vector such that (28)This vector is easily computed as it just requires a Matrixvector product. This is particularly well suited for sparse matrices because this operation evolves roughly as the number of nonzero elements.
A straightforward method to find the leastsquare solution of this problem would be to compute the QR factorization of the matrix AK_{n}. Indeed, the normal equation A^{T}Ax = A^{T}b ⇔ Rx = Q^{T}b can be solved by back substitution. However, this is both unstable and too expensive (the QR factorization requires 2mn^{2} flops, d = Q^{T}b, 2mn flops, and Rx = d, n^{2} flops, where m is the iterate and n is the order of the Krylov subspace). Moreover this natural Krylov basis is illconditioned and is thus not a good choice. Indeed, the basis vectors have to be linearly independent, and the successive multiplications by A lead to vectors, which point (fastly) to the dominant eigenvector of A and are thus numerically collinear. For this reason, we build another space , , where the basis vectors are orthogonal (this orthogonalization of Krylov bases is named the Arnoldi method). During the Arnoldi step to build the orthogonal basis, the matrix H of the orthogonalization elements is built. This matrix has a special shape (Hessenberg shape), which is close to triangular (one subdiagonal more). Our minimization problem can be rewritten as (29)where y designs the new unknown (Q_{n}y = x_{n}).
Furthermore, we have to combine this method with a global convergence strategy. Indeed, as one gets close to the solution, the choice of the descending direction may lead to spurious effects, especially if δX becomes very large. Then, to ensure the global convergence of the scheme, we use a combination of the line search and backtracking methods, which is an efficient method to solve nonlinear equations. Precisely, when f_{k}>f_{k−1}, δX may be too large. Then, δX is reduced by a factor λ, (30)with λ chosen to minimize f(X_{k−1} + λδX) to maintain the iteration in the descending direction.
3.2. Treatment of the singularity
The computation of goes through the integration of a singular function. Indeed, in the integrand of Eq. (12), the function E_{1}(x) has a logarithmic divergence at x = 0^{+}, see Fig 3.
To integrate this kernel, we use the periodization method developed by Helluy et al. (1998), which consists of a high order quadrature method with a variable change. The new variable is a polynomial of degree k, properties of which improve on the order of rectangle rule. The advantage of this method is that it can handle singular functions. Indeed the error  E_{N}(f)  on an interval subdivided in N subintervals goes as ~C_{k}/N^{γ}, where γ → (2k−1) if f has a logarithmic singularity at one bound of the integration domain (see Table 1).
Benchmark of the periodization method for a logarithmic singularity (from Helluy et al. 1998).
However the periodization method requires that the integration is done over the interval [0,1] and that the singularity is located at one extremum of this interval. We thus split the integration domain into two subdomains, and , and then normalize each subdomain to [0,1]. However, since the MPI strategy limits the number of model layers handled by each process, this can lead to the undersampling of one of the subdomains, when τ_{V} approaches one of the bounds of the total domain (i.e., the boundaries of the model atmosphere). We have thus tested this method under the same conditions as those met in our problem, without any resampling of the subdomains.
Fig. 3 Map of the integrand of for a radiative transition (see Eq. (12c)). The diagonal is singular, and the quadrature in the periodization method produces a mesh refinement around it. The color encoding of the integrant is in logarithmic arbitrary units. 

Open with DEXTER 
The result is displayed in Fig. 4. As expected, the relative error strongly increases when one of the subdomains includes too few model layers (typically fewer than 4) and becomes unacceptable for the extreme points. They will thus be considered ghost points in all subsequent computations. This does not affect the computation of the statistical equilibrium.
Fig. 4 Benchmarking of the periodization method applied to the integration with a logarithmically singular kernel (g(t) = a_{0} + a_{1}t, K(x,t) = −γ−ln(  G(t)−G(x) ) and ). The integration domain has been split into two subdomains (see text for details). 

Open with DEXTER 
Because we can compute the nonlinear function F(X) with good accuracy, the next step to find its root is to compute the Jacobian matrix to solve the system through a Newton’s scheme.
3.3. Computation of the Jacobian matrix
The method presented here, and more generally the solving of nonlinear systems of equations, requires the computation of the Jacobian matrix. One of the advantages of the method we develop is that an analytical form can be derived. Here, the function F(X) is a set of N × P equations, N being the number of energy levels of the species considered, and P the number of layers in the discretized model atmosphere. The differentiation of over (i and k referring to energy levels, and l and m to atmosphere layers) leads to a set of integrodifferential equations, with the following four cases: Equations (31a)and (31b)represent local couplings. Mathematically, they correspond to the diagonal and blockdiagonal parts of the Jacobian, respectively. Physically, Eq. (31a)describes perturbations of the statistical equilibrium of a given level, by modifying the population of this level, within one atmospheric layer. Equation (31b)describes the perturbations of the statistical equilibrium of a given level, by modifying the population of other levels, still within one atmospheric layer. Equations (31c)and (31d)correspond to offdiagonal blocks and describe nonlocal perturbations, i.e., the influence of the populations in another layer. Naturally, in these last differentiations the coupling is purely radiative, whereas the first and second differentiations include collisional processes.
To determine the terms , one needs to differentiate an integral expression over the variable . This derivation must be considered a functional derivative, and be performed before discretization (see Appendix B for the demonstration), (32)where δ(τ_{V}−s) is the Dirac distrbution.
In the case τ_{V} ≠ s (l ≠ m), the derivatives for each component are However is not differentiable at x(t) = x(τ_{V}) because of the singularity, therefore Eq. (33a)is undetermined for s = τ_{V}. For these local terms, we thus need an approximate expression of the radiation field. We emphasize that we only apply this approximation to the computation of the Jacobian matrix.
Fig. 5 Pattern of the Jacobian matrix for 4 model atmosphere layers and 200 energy levels. 

Open with DEXTER 
3.4. Sparsity of Jacobian matrix
In Sect. 3.1, we argue that GMRES is a welladapted method for largescale problems if the Jacobian matrix is sparse, as it only requires matrixvector products, and no matrix inversion. The sparsity (or density) coefficient is defined by the ratio of nonzero elements over the total number of elements. The Jacobian matrix is composed by P × P submatrix of N × N elements leading to NP × NP elements. The block diagonal of this matrix is full because of the irreducibility of the collisional coupling within a given layer (no selection rules for collisions). The number of nonzero elements in the blockdiagonal part is P × N^{2}. In the P(P−1) offdiagonal submatrices, the nonzero elements are only due to radiative coupling between different layers, and their number is thus linked to the selection rules for allowed transitions. These rules thus lead to a sparsity of each submatrix with a density ρ_{mol} = N_{transitions}/N^{2}. The density of the complete Jacobian matrix is then given by (34)Then, as ρ_{mol}(N) decreases rapidly with N (e.g., ρ_{H2O} ~ 10^{3}, for N = 411, see Figs. 6 and 7) and ρ is inversely proportional to P, the larger the system, the sparser the Jacobian matrix. The GMRES scheme is thus well adapted and asymptotically approaches the NP complexity.
Fig. 6 Example of the evolution of the offdiagonal, blockmatrix sparsity (ρ_{mol}) according to the number of considered levels. This example is computed for the 411 levels of the orthoH_{2}O molecule. 

Open with DEXTER 
Fig. 7 Evolution of the Jacobian matrix sparsity according to the number of atmospheric layers for different values of ρ_{mol}. 

Open with DEXTER 
3.5. Jacobian blockdiagonal approximation
Assuming that τ_{T}(τ_{V})−τ_{T}(t) ∝ (t−τ_{V}) (first order linearization) when t is close to τ_{V}, because the kernel decreases when  t−τ_{V}  increases, we assume that every function at t is equal to its value at τ_{V}.
Splitting the integral over t at τ_{V} and inverting the integrals over t and μ, Gonzalez Garcia et al. (2008) show that the mean intensity can be rewritten as where β_{−}(τ_{V}) and β_{+}(τ_{V}) are inward and outward escape probabilities (defined in Gonzalez Garcia et al. 2008). The β(τ_{V}) invoke the function E_{2}(x), which is not singular and differentiable everywhere in the domain. This enables an analytic estimate of the bloc diagonal terms. The pattern of the Jacobian is displayed in Fig. 5.
3.6. The numerical code MOrad
The above strategy has been implemented in a numerical code named MOrad written in Fortran 90. It takes as inputs:

1.
A model atmosphere (temperature, atomic, molecular, andelectronic densities etc. as a function of geometrical thickness).

2.
Spectroscopic data (level energies, statistical weights, radiative rates).

3.
Collisional rates.
Then it computes the continuous opacities and uses an initial guess to calculate the function (22)and its Jacobian matrix. Choosing the initial guess X_{0} is an important operation. In MOrad we tested the use of an LTE solution (Boltzmann distribution), and a 0 K solution (the molecules in the ground level). We find that the convergence rate is better with the “0 K” initial guess.
The singular integrations needed to compute are performed with the periodization method. The Green’s function needed to compute is stored at each point. The required space scales as the number of layers times the number of frequency points (in a parallel code; otherwise it would scale with the square of the number of layers). The evaluation of the function and the Jacobian are parallelized with MPI. At first glance, two parallelization strategies may be considered: either a parallelization over frequencies or over atmosphere layers. However, van Noort et al. (2002) show that the most efficient parallelization to reduce communications is spatial parallelization because of the strong local coupling between energy levels. The populations of all energy levels within each atmosphere layer should thus be computed by the same processor, with several adjacent layers possibly treated by the same processor. In MOrad we adopt a subdomain decomposition where each processor computes the physical values needed in one given layer and solves the local statistical equilibrium. Integrations are performed with a scalable algorithm, which minimizes global communications to preserve a good scalability.
The code has been interfaced with a nonlinear solver of the parallel and scalable library PETSc (see, e.g., Balay et al. 2013b,a, 1997). This solver takes as input the distributed function vector and the Jacobian matrix and returns the root of the function after preconditioning.
The main advantages of this library are its performance and its scalability. Moreover, PETSc provides access to many different nonlinear methods and preconditioning, which allows us a versatile choice of the best method to be adapted for a given physical problem. Other efficient libraries, such as the parallel IO library HDF5, are interfaced with MOrad to preserve the performances for largescale problems and the portability of output data. Future planned developments include an MPI/OpenMP hybridization (openMP parallelization over frequencies).
4. Discussion
4.1. Example of application
To illustrate the applicability of MOrad, we compute the nonLTE departure coefficients of water molecules in a MARCS (Gustafsson et al. 2008) red supergiant model atmosphere. The model considered here has an effective temperature of 3500 K, a log g = 0, a solar metallicity, and a microturbulence of 2 km s^{1}. These parameters correspond to typical red supergiant stars. We consider more than 800 rovibrational levels, leading to more than 330 000 transitions and 15 000 lines (see the Grotrian diagram of ortho H_{2}O in Fig. 8). The energy levels and the radiative coefficients are taken from Barber et al. (2006). The collisional rates for H_{2}O–H_{2} and H_{2}O–e^{−} are taken from Faure & Josselin (2008).
Fig. 8 Grotrian diagram of the orthoH_{2}O molecule showing the radiative transitions taken into account. The parameter J_{n} is the rotational angular momentum of H_{2}O. Colors indicate different vibrational states. 

Open with DEXTER 
The code MOrad was launched on 47 processors using a GMRES NewtonKrylov method and a linesearch global convergence strategy. We conducted preconditioning with the AMS method (Block Additive Schwarz method), where each subblock is preconditioned with an approximate iterative LU factorization method. We obtain the root X^{⋆} of the function (∥ F(X^{⋆}) ∥ _{∞} ≤ 10^{12}) after three nonlinear iterations and a total time of code execution of less than ~1 h on the Alarik LUNARC system (see Lunarc 2013). The departure coefficients are presented in Fig. 9. The convergence rate is Qquadratic, though a quadratic convergence rate would be expected. This may be because of an over reconditioning or a problem that is too simple (close to linear), but we obtain the same Qquadratic convergence for a purely radiative problem.
Fig. 9 LTE departure coefficients of orthoH_{2}O in a red supergiant model atmosphere. 

Open with DEXTER 
A detailed analysis of the results is out of the scope of this paper and is devoted to a forthcoming paper; a preliminary analysis was done in Lambert et al. (2013). In particular a detailed discussion of the use of the superlevel approximation, which seems appropriate for the vibrationally excited states, will be presented. In short, the nonLTE calculations lead to stronger lines, especially rotational lines in the fundamental vibrational state around 2 μm, compared to LTE computation.
4.2. Performance and complementarity of the method
The method presented here is naturally accurate as it is equivalent to finding the root of a function and thus avoids any false convergence (assuming the solution is unique)^{3}. Furthermore, we emphasize that because the integration is analytic, the method leads naturally to null residue. Concerning the performance of the method, the performances of the method are difficult to assess. The convergence rate depends upon the choice of both the solver and the preconditioning. These choices are not unique and may vary according to the considered problem (species and model atmosphere). The memory demands scales with the dimensionality of the considered problem.
The computation of the function and the Jacobian matrix leads to a scheme with an asymptotic complexity in . The complexity of the resolution scheme is more difficult to evaluate because the preconditioned NewtonKrylov method has a cost in N_{freq} × N^{2} × P^{2} (N_{freq} being the number of frequency points sampling the line profile, N the number of energy levels, and P the number of layers in the discretized atmosphere), but invokes another term that evolves as the square of the iteration index. This term is partially eliminated by restarting GMRES and is not really important in practice because the number of iterations is small after appropriate preconditioning.
Furthermore, the performance is better than and probably close to . Indeed, for a sparse matrix, GMRES evolves in . Because the only nonzero elements of the offdiagonal blocks of the Jacobian matrix are due to radiative coupling between layers, the sparsity of J evolves as the ratio of the number of radiative transitions over N^{2}. Taking a complexity in Nb_{freq} × N^{2} × P^{2} is then an superior limit, which underevaluates the method scalability in real cases.
As a comparison, the computation of the equivalent function of F(X) in MULTI has a complexity evolving in N × P because of the Scharmer operator, which performs no numerical integration to evaluate diffusion integrals but only one quadrature point. The system is solved with a NewtonRaphson scheme, which has a cost in N^{3} × P^{2} when using the full matrix or N^{3} × P when using the local operator (blockdiagonal matrix; M. Carlsson, priv. comm.). The theoretical asymptotic speedup for MOrad compared to MULTI are summarized in Table 2.
Theoretical asymptotic speedup for two extreme cases in terms of ρ = sparsity(J)(0 ≤ ρ ≤ 1) for MOrad, compared to MULTI with either full or local operator.
From Table 2, one can see that our method is complementary to existing methods, such as direct linearization. Indeed, in some cases, with a large number of levels and a small number of layers, the expected time of computation is shorter.
5. Conclusion
We have developed a new method to solve the RTE in nonLTE conditions. Rather than solving the RTE itself, we solve the statistical equilibrium equation system, which includes an analytical formal solution of the RTE. One of the main advantages of this choice is that the coefficients of the associated Jacobian matrix can be computed exactly. With this approach, a nonlinear solver can be used. In other words, this method avoids any linearization or stationary iterative schemes, which may converge very slowly or even falsely in some extreme cases (a very low nonLTE parameter ϵ and/or highly discretized media). We chose a method based on the GMRES nonlinear method. It has been implemented in a code that uses the PETSc library. We pay special attention to the proper treatment of the singularity that arises in the integration of the mean radiation field.
Furthermore, this method is parallelized using MPI, which makes it applicable to largescale systems such as molecules and atoms in stellar atmospheres or interstellar medium, as it has an excellent scalability with the number of energy levels. Indeed, our code has been successfully applied to the modeling of water in the atmospheres of red supergiants. The results will be presented in a forthcoming paper. Further developments of the code include hybrid parallelization, with openMP parallelization over frequencies. We then plan to make the code public.
We already emphasize here that solving either the RTE explicitly or the statistical equilibrium equation is strictly identical, and both equations are of similar mathematical nature, as shown, for instance, in the Appendix of Hauschildt et al. (1995).
The solution of the Eddington problem (the twolevel atom in the isothermal optically thick medium) is recovered at the 1st iteration with preconditioning, and after five iterations without preconditioning (relative error <0.005 for ϵ = 10^{5} and τ_{max} = 1000). This is expected because the inversion of the matrix by the Krylov subspace method is well known to work in this kind of situation.
Acknowledgments
We are grateful to Pascal Azerad for fruitful discussions and Karin Ryde for proofreading the English in the manuscript. We acknowledge financial support from the Swedish Karl Tryggers Foundation under grants CTS 12:408 and CTS 13:388 and its investment in science and astrophysics. Moreover, we thank the French National Agency for Research (ANR) through program number ANR06BLAN0105, and from “Programme National de Physique Stellaire” (PNPS) of CNRS/INSU, France. N.R. is a Royal Swedish Academy of Sciences Research Fellow supported by a grant from the Knut and Alice Wallenberg Foundation. Funds from Kungl. Fysiografiska Sällskapet i Lund and support from the Swedish Research Council, VR are gratefully acknowledged.
References
 Ahues, M., D’Almeida, F., Largillier, A., Titaud, O., & Vasconcelos, P. 2002, J. Comput. Appl. Math., 140, 13 [NASA ADS] [CrossRef] (In the text)
 Auer, L. 1991, in NATO ASIC Proc. 341: Stellar Atmospheres − Beyond Classical Models, eds. L. Crivellari, I. Hubeny, & D. G. Hummer, 9 (In the text)
 Balay, S., Gropp, W. D., McInnes, L. C., & Smith, B. F. 1997, in Modern Software Tools in Scientific Computing, eds. E. Arge, A. M. Bruaset, & H. P. Langtangen (Birkhäuser Press), 163 (In the text)
 Balay, S., Brown, J., Buschelman, K., et al. 2013a, PETSc Users Manual, Tech. Rep. ANL95/11 − Revision 3.4 (Argonne National Laboratory) (In the text)
 Balay, S., Brown, J., Buschelman, K., et al. 2013b, PETSc Web page, http://www.mcs.anl.gov/petsc (In the text)
 Barber, R. J., Tennyson, J., Harris, G. J., & Tolchenov, R. N. 2006, MNRAS, 368, 1087 [NASA ADS] [CrossRef] (In the text)
 Cannon, C. J. 1973, J. Quant. Spec. Radiat. Transf., 13, 1011 [NASA ADS] [CrossRef] (In the text)
 Castor, J. I. 1970, MNRAS, 149, 111 [NASA ADS] (In the text)
 Damgaard, P. H., Hjorth, P. G., & Thejll, P. A. 1992, A&A, 254, 422 [NASA ADS] (In the text)
 Dickel, H. R., & Auer, L. H. 1994, ApJ, 437, 222 [NASA ADS] [CrossRef] (In the text)
 Fabiani Bendicho, P., Trujillo Bueno, J., & Auer, L. 1997, A&A, 324, 161 (In the text)
 Faure, A., & Josselin, E. 2008, A&A, 492, 257 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Gonzalez Garcia, M., Le Bourlot, J., Le Petit, F., & Roueff, E. 2008, A&A, 485, 127 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Gustafsson, B., Edvardsson, B., Eriksson, K., et al. 2008, A&A, 486, 951 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Hauschildt, P. H., Starrfield, S., Shore, S. N., Allard, F., & Baron, E. 1995, ApJ, 447, 829 [NASA ADS] [CrossRef] (In the text)
 Helluy, P., Maire, S., & Ravel, P. 1998, Comptes Rendus de l’Académie des Sciences Paris, Série Sciences Mathématiques, 327, 843 (In the text)
 Hubeny, I., & Mihalas, D. 2014, Theory of Stellar Atmospheres (Princeton University Press) (In the text)
 Juvela, M. 2005, A&A, 440, 531 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Klein, R. I., Castor, J. I., Dykema, P. G., Greenbaum, A., & Taylor, D. 1989, J. Quant. Spec. Radiat. Transf., 41, 199 [NASA ADS] [CrossRef] (In the text)
 Lambert, J., Josselin, E., Ryde, N., & Faure, A. 2013, in EAS Pub. Ser. 60, eds. P. Kervella, T. Le Bertre, & G. Perrin, 111 (In the text)
 Lunarc 2013, Alarik system details, http://www.lunarc.lu.se/Systems/AlarikDetails (In the text)
 Ng, K.C. 1974, J. Chem. Phys., 61, 2680 [NASA ADS] [CrossRef] (In the text)
 Olson, G. L., Auer, L. H., & Buchler, J. R. 1986, J. Quant. Spec. Radiat. Transf., 35, 431 [NASA ADS] [CrossRef] (In the text)
 Paletou, F., & Anterrieu, E. 2009, A&A, 507, 1815 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Rybicki, G. B. 1997, ApJ, 479, 357 [NASA ADS] [CrossRef] (In the text)
 Scharmer, G. B., & Carlsson, M. 1985, J. Comput. Phys., 59, 56 [NASA ADS] [CrossRef] (In the text)
 SocasNavarro, H., & Trujillo Bueno, J. 1997, ApJ, 490, 383 [NASA ADS] [CrossRef] (In the text)
 Trujillo Bueno, J., & Fabiani Bendicho, P. 1995, ApJ, 455, 646 [NASA ADS] [CrossRef] (In the text)
 Štěpán, J., & Trujillo Bueno, J. 2013, A&A, 557, A143 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 van Noort, M., Hubeny, I., & Lanz, T. 2002, ApJ, 568, 1066 [NASA ADS] [CrossRef] (In the text)
Appendix A: Formulation
One starts with the classical formulation of the radiative transfer equation in a plane parallel geometry, (A.1)The monochromatic extinction and emissivity coefficients are expressed with an explicit dependence on levels populations as follows: By inserting relative populations f_{i} = n_{i}/n^{k} ,where n^{k} is the total density of the species labeled by k, one gets One brings up the new variable x_{i} = f_{i}/g_{i}, and using the relation B_{ul}g_{u} = B_{lu}g_{l}, where g_{i} is the statistical weight of the level i, yields Including the formulation of and in the radiative transfer equation, one obtains (A.5)In order to create a transition independent scale, the previous equation is divided by an arbitrarily chosen continuum extinction coefficient χ_{V}, here χ_{V} ≡ χ_{ν = 500 nm}. Moreover we express it per hydrogen atom to bring out the ratio n^{k}/n^{H} for numerical reasons. This normalization is symbolized by a tilde (~) for other variables, (A.6)With , , , , , and dτ_{V} = −χ_{V}ds. Assuming that the continuum is formed in LTE (), one obtains (A.7)
Appendix B: Jacobian matrix coefficients
We start with the analytic formulation of the mean intensity field for the line component (similar demonstration for contunium), i.e., (B.1)Because the functional derivative of a functional product is (B.2)we functionally derive the product x_{i}(t)K [x_{i},x_{j}] over x_{i}(τ_{V} = s), which yields, (B.3)Equation (B.3) can be rewritten as a sum of two integrals over t, I_{1} and I_{2}.
The first term I_{1} simplifies (B.4)This term is an indefinite form if s = τ_{V}, because the kernel K(x,y) is singular at x = y.
We consider here the case s ≠ τ_{V} (the case s = τ_{V} is detailed in Sect. 3.4).
The second term I_{2} requires the computation of the functional derivative of the kernel, which is (B.5)With the chain rule theorem of functional derivatives, we get (B.6)We then derive the kernel K, finding (B.7)To simplify the calculus, we rewrite as (B.8)where the distribution Θ(t,τ_{V},s) = −2(θ(t−τ_{V})−0.5)(θ(s−t)−θ(s−τ_{V})), and θ(x) is the Heaviside function. This distribution can be rewritten, (B.9)We functionally derive Eq. (B.8), yielding (B.10)We insert this expression in Eq. (B.7) and we define the modified kernel , (B.11)The integral I_{2} from Eq. (B.3) can be rewritten as (B.12)which, according to the Θ properties, becomes (B.13)Finally, we get the derivative of the mean radiation field, (B.14)
Appendix C: Definition of the variables
Physical quantities.
Functions, operators, and distributions.
All Tables
Benchmark of the periodization method for a logarithmic singularity (from Helluy et al. 1998).
Theoretical asymptotic speedup for two extreme cases in terms of ρ = sparsity(J)(0 ≤ ρ ≤ 1) for MOrad, compared to MULTI with either full or local operator.
All Figures
Fig. 1 Spectral radius of amplification matrix for the Λ iteration method. Convergence is shown for two spectral radii, marked with letters A (upper panel) and B (lower panel) in the righthand panels. The thin black lines correspond to successive iterations; the true solution is given by the orange line. 

Open with DEXTER  
In the text 
Fig. 2 Same as Fig. 1, except for ALI with Λ^{⋆} = diagonal of Λ. 

Open with DEXTER  
In the text 
Fig. 3 Map of the integrand of for a radiative transition (see Eq. (12c)). The diagonal is singular, and the quadrature in the periodization method produces a mesh refinement around it. The color encoding of the integrant is in logarithmic arbitrary units. 

Open with DEXTER  
In the text 
Fig. 4 Benchmarking of the periodization method applied to the integration with a logarithmically singular kernel (g(t) = a_{0} + a_{1}t, K(x,t) = −γ−ln(  G(t)−G(x) ) and ). The integration domain has been split into two subdomains (see text for details). 

Open with DEXTER  
In the text 
Fig. 5 Pattern of the Jacobian matrix for 4 model atmosphere layers and 200 energy levels. 

Open with DEXTER  
In the text 
Fig. 6 Example of the evolution of the offdiagonal, blockmatrix sparsity (ρ_{mol}) according to the number of considered levels. This example is computed for the 411 levels of the orthoH_{2}O molecule. 

Open with DEXTER  
In the text 
Fig. 7 Evolution of the Jacobian matrix sparsity according to the number of atmospheric layers for different values of ρ_{mol}. 

Open with DEXTER  
In the text 
Fig. 8 Grotrian diagram of the orthoH_{2}O molecule showing the radiative transitions taken into account. The parameter J_{n} is the rotational angular momentum of H_{2}O. Colors indicate different vibrational states. 

Open with DEXTER  
In the text 
Fig. 9 LTE departure coefficients of orthoH_{2}O in a red supergiant model atmosphere. 

Open with DEXTER  
In the text 