EDP Sciences
Free Access
Issue
A&A
Volume 580, August 2015
Article Number A50
Number of page(s) 12
Section Numerical methods and codes
DOI https://doi.org/10.1051/0004-6361/201322852
Published online 29 July 2015

© ESO, 2015

1. Introduction

Radiative transfer plays a central role in astrophysics. The solution of the radiative transfer equation (RTE) is required to properly interpret any spectrum observed from an astrophysical object. Furthermore the radiation field often represents an important ingredient in hydrodynamics, for instance, through the energy budget (radiative heating and/or cooling) and the radiative pressure. However, as we discuss below, while the standard expression of the RTE is an ordinary differential equation of the 1st order, it may be highly nonlinear and nonlocal, especially because of scattering processes. The solution of the RTE thus requires specific procedures. Furthermore, with the development of infrared (IR) astronomy (e.g., from VLT, CRIRES in the near-IR to ALMA through Herschel/HIFI), one is now often confronted with very large-scale problems resulting from molecular spectra, which most techniques developed so far cannot handle.

The techniques for solving the RTE may be divided into a few strategies. Among approximate methods, one that is widely used, relies on the escape probability technique (e.g., Castor 1970), which does not take the nonlocality of the RTE into account. Secondly, Monte Carlo methods can be used because of the probabilistic nature of scattering processes. These methods can successfully be adapted to any geometry, but may be unadapted when large optical depths are met (see, e.g., Juvela 2005, for a recent review). We do not discuss these two classes of methods hereafter.

To properly treat the nonlocality and the nonlinearity of the RTE, it is usually rewritten in the form of a diffusion integral equation, which is then solved iteratively1. This integral form is usually referred to as the Λ operator. The form involves a kernel, which determines the mathematical nature of the integral equation. For example, in the case of an assumed plane parallel geometry, the kernel involves the exponential integral function, and the RTE then becomes a weakly singular Fredholm equation of the second type (this singularity is an important aspect, which will be treated below). It can then be formally solved thanks to appropriate algorithms, such as the Atkinson algorithm (Ahues et al. 2002). Since the inversion of the Λ operator is usually numerically prohibitive, one needs an iteration scheme. It is well known that the iteration with the complete Λ operator at best converges very slowly or even stabilizes far from the real solution. A major step in solving the RTE has been the introduction of the so-called approximate lambda iteration (ALI), initially by Cannon (1973), then called the operator splitting method. The ALI is in fact an application of the Jacobi method to the RTE, as shown by Olson et al. (1986), who greatly improved the mathematical understanding of the problem. In these methods, the choice of the approximate operator (denoted Λ) is crucial. The most common choice for Λ is the diagonal part of Λ, which ensures a rapid convergence if Λ is diagonally dominant matrix (Gershgorin’s theorem). ALI has been applied to a large variety of problems, including 3D polarized transfer (Štěpán & Trujillo Bueno 2013). This latter work also emphasizes the potential progress that can be made by taking advantage of massively parallel computing.

Meanwhile, new mathematical techniques have been applied to the RTE problem, especially the Gauss-Seidel method and the successive over relaxation (SOR) method, which significantly improve the convergence properties (Trujillo Bueno & Fabiani Bendicho 1995) that are determined by the spectral radius of their amplification matrix. Besides these, complete linearization methods have also been developed, the well-known case being MULTI (Scharmer & Carlsson 1985). Despite apparent different formulations, the two approaches are essentially the same, as shown by Socas-Navarro & Trujillo Bueno (1997).

Indeed, all these methods are based on linearization techniques. In particular, the dependence of the radiation field on the populations of the energy levels is approximated. Furthermore, all these methods are stationary (in the case of SOR, this is true if the relaxation parameter is constant over iterations), meaning that the spectral radius is constant. As the error vector em at the mth iteration is proportional to the power m of the spectral radius of the amplification matrix | λmax | m, the convergence may thus be very slow in some cases. For the Lambda iteration, the spectral radius is close to 1 when the photon destruction probability ϵ ≪ 1 and/or the total optical depth is 1, explaining why it falsely converges. This is illustrated in Figs. 1 and 2, where the classical Eddington problem (two-level atom with constant ϵ) is solved with Λ iteration (LI) and ALI, respectively. The false convergence of LI and the very low convergence of ALI are illustrated in cases B. Indeed, their rate of convergence is very slow for strongly nonlocal thermodynamical equilibrium (non-LTE) problems (non-LTE parameter ϵ ≪ 1, ϵ being the fraction of the source function given by the Planck function, or the photon destruction probability) and to the number of points per decade of optical depth. In the latter case (i.e., a fine sampling of the stellar atmosphere), this problem of slow convergence is well explained by Fabiani Bendicho et al. (1997), who also showed that the multigrid method can nicely circumvent this problem. This fine sampling is however rarely required, especially as classical model atmospheres are static. An excellent discussion of the general problem of false convergence is given in Chap. 13 in Hubeny & Mihalas (2014).

The iterative schemes are usually combined with a technique of acceleration of convergence (Auer 1991). These fall into two main categories. The first is based on the minimization of the residual, which is the case of the Ng acceleration (Ng 1974). The second class relies on minimization with respect to a set of conjugate vectors. This was first applied to the RTE by Klein et al. (1989; see also Dickel & Auer 1994).

The second class of techniques (the minimization with respect to a set of conjugate vectors) leads naturally to more general, nonlinear, nonstationary methods, such as the conjugate gradient and the generalized minimum residual method (GMRES). To our knowledge, only the conjugate gradient has been applied to the RTE, by Paletou & Anterrieu (2009).

The aim of our work is to develop a new nonstationary method for solving the RTE, which fully takes its nonlinearity into account. This new method thus opens the possibility to explore the mathematical consequences of this intrinsic property of the RTE. Furthermore, circumventing the possible slow or false convergence of stationary methods should allow an appropriate treatment of some specific cases, such as, e.g., shocked regions, which require a fine sampling and present a large dynamics of optical depth values. Finally, as the coding strategy of this method has been conceived as a parallel code from the beginning, it is applicable to large-scale systems such as molecular spectra, and this was indeed our initial motivation (a posteriori parallelization is often less efficient, or even impossible).

The derivation of the equations, including the Jacobian matrix coefficients, used in the nonlinear method is presented in Sect. 2. The implementation of this method and its parallelization are described in Sect. 3. The application of the code to a specific problem facilitates the determination of its convergence properties, as shown in Sect. 4.

thumbnail Fig. 1

Spectral radius of amplification matrix for the Λ iteration method. Convergence is shown for two spectral radii, marked with letters A (upper panel) and B (lower panel) in the right-hand panels. The thin black lines correspond to successive iterations; the true solution is given by the orange line.

Open with DEXTER

thumbnail Fig. 2

Same as Fig. 1, except for ALI with Λ = diagonal of Λ.

Open with DEXTER

2. Global strategy

Contrary to the two-level atom problem with a constant destruction probability, which can be directly solved by direct inversion (or through the analytic Eddington solution). Multi-level atom or molecule problems require an iterative determination for two reasons. Firstly, the multi-level problem is nonlinear. Secondly, the radiative transfer equation is nonlocal, and one has to propagate the modification of the field during iterations. As we discussed in the previous section, however, this stationary iteration may be problematic for large-scale systems. The basic concept of the strategy we adopt is to suppress the iterations due to the nonlocality of the RTE by considering the full system (spatially) and computing the radiation field exactly (assuming given populations) with an analytic formulation of the mean intensity field, which explicitly depends on populations.

2.1. Formulation of the problem

For clarity, we hereafter consider the special case of plane parallel geometry. In this context the 1D RTE is (1)where Iν is the specific intensity, η and χ are the emission and extinction coefficients, and s is the geometrical path. We explicitly separate the continuum and the line processes (all demonstrations and symbols can be found in Appendices A and C, respectively). Assuming complete redistribution, we introduce classical expressions for line processes, where u and l refer to the upper and lower levels, respectively. Hereafter, if the distinction between upper and lower levels is not required, the indices i and j are used. The normalized line profile is and A and B are the Einstein coefficients.

Following the strategy of Gonzalez Garcia et al. (2008), we then introduce a unique optical depth scale for all frequencies, e.g,. the commonly used τV scale (optical depth at 500 nm). Equation (1)then becomes (3)where

The notation means that the quantity is expressed per hydrogen atom. Details can be found in Appendix A. As implicitly assumed in Eq. (3), we do not consider line overlap. This assumption is justified in the case of infrared molecular lines. Otherwise, a summation over transitions (at least adjacent ones) would be required.

This formulation depends explicitly on the populations of each energy level of the species considered. These populations are solutions of the statistical equilibrium equation system2, which is for each level ni, the detailed balance of incoming and outgoing (de-)excitation processes, defined by radiative (Einstein coefficients), and collisional rates (ncol being the local density of the colliding partner). Thus the statistical equilibrium equations take the form (5)where the mean radiation field depends on the solution of the RTE, (6)Because this system of equations is degenerate, one of them is replaced by the normalization equation , where nk is the total density of the species k.

To express each equation of the statistical equilibrium system with the same unknowns as in Eq. (3), we rewrite the statistical equilibrium with the variables xi defined in Eq. (4). Moreover, for numerical reasons, we make a variable change for the mean intensity, (7)This leads to a matrix formulation of the statistical equilibrium, with an upper triangular part corresponding to incoming processes, and a lower triangular part corresponding to outgoing processes. Finally, thanks to the Einstein relations ( and Blu = (gu/gl)Bul), the system of Eqs. (5)becomes (8)Because radiative transfer is a nonlocal problem, an integral formulation of the mean radiation field with an explicit dependence on populations is required.

2.2. The mean radiation field

To derive the mean radiation field, the first step consists of deriving the formal solution of the RTE. A classical way to obtain an integral form of the formal solution of a differential equation is to use the Green’s functions Gμν. This method consists of replacing the source term by a Dirac function. Taking the boundary conditions into account, the equation is then solved using its Laplace transform. The complete solution is then obtained after integration of this Green’s function, that is, performing a summation over each point source, according to the superposition principle (which is valid if the differential operator is linear). Thus, the integral form of Iν(τV) can be obtained by the determination of the Green’s function of Eq. (3). This Green’s function is easily computable for spherical and other geometries using some symbolic tools such as Mathematica, making it possible to generalize the method for 2D or 3D problems. Knowing this Green’s function, we can directly obtain the formal integral form of the specific intensity, which is (9)We assume the line profile does not depend on μ. This expression is thus not applicable in the case of anisotropic scattering or in the presence of radiation fields, but is valid in the case of classical static model atmospheres. The formal expression of the mean intensity, , requires an integration over solid angles and frequencies according to Eq (6). We assume that Dijxiζ and ξνSν are angular independent (physically this corresponds to the assumption of isotropic scattering) and frequency independent over the line width (ξνSνξijSνij). This is a very weak assumption because the continuum is nearly constant on the scale of a line width. Then, we can extract the source terms from the integral over solid angles and frequencies. Moreover, we separate line and continuum contributions by splitting the integral, which leads to (10)Then, after determination of the Green’s function in a given geometry, one obtains an integral form for the different contributors to the radiation field. We assume that the continuum contribution is in LTE (Sνij = Bνij(T)). Moreover, we consider, as boundary conditions, an incoming field from the inner atmosphere. Then, (11)where, where β [xi,xj] (τV) is the escape probability from position τV in the atmosphere, (13)The kernels L1 [xi,xj] (t,τV) and K1 [xi,xj] (t,τV) are, respectively: where the total optical depth is given by (15)We do not include an external radiation field. In the plane parallel geometry considered here, it would require us to take additional effects such as limb darkening in realistic cases into account (e.g., illumination by a planet or a companion). We have introduced the function ε(t,τV) = 1 if t < τV and −1 if t > τV to split the integration over μ according to its sign. The physical meaning of the kernels L1(t,τV) and K1(t,τV) can be understood as the probability density for photons, emitted in a layer t, to contribute to the line excitation in the layer τV.

We now have an analytic expression of the mean intensity with an explicit dependence on populations. We now have to include this expression in the equations of statistical equilibrium to close the system and form a system of coupled integral equations (a Fredholm-like equation system).

2.3. The multi-zone statistical equilibrium equation system

The inversion of the system of statistical equilibrium equations allows us to obtain the populations also including the formal solution of RTE. But because it is nonlocal, the knowledge of requires the knowledge of the populations everywhere in the atmosphere because of the integral over the optical depth t in Eq. (12).

For this reason, one needs to solve the statistical equilibrium everywhere in the atmosphere simultaneously. The first step consists of rewriting Eq. (5)in a matrix form. A vector x(τV) containing all the populations of the considered species in a given layer, normalized to the density in that layer, is built. The size of this vector is thus equal to N, the number of energy levels of the considered species. We introduce the rate matrix (16)Where L1 is a lower triangular matrix of ones, A is a vector containing the spontaneous de-excitation coefficients (Aul), Z is a matrix containing the radiation field (see Eq. (7)), C a matrix containing the collisional rates (ncolCij), and g is a vector containing the statistical weights (gi).

Now the statistical equilibrium system can be written as, (17)with (18)where u is a ones vector. After discretization of the medium into P layers indexed by l, the vectorial function x(τV) becomes xl and Eq. (16)becomes (19)as the radiation field, and thus the matrix Z depends implicitly on the populations in all layers, and the statistical equilibrium system can be rewritten, (20)To take care of the coupling between the layers and close the system to solve Eq. (20)for all l in a unique step, we build the multi-zone statistical equilibrium system: (21)We introduce the vector X = (x1,x2,...,xP) and the block-diagonal matrix Γ to rewrite Eq. (21)as a multi-D nonlinear function, (22)With an initial X0 within the convergence radius, we can solve this nonlinear equation with a classical scheme such as the Newton method. An important question, however, is to determine whether the above system has a solution and whether the solution we expect to get is physical (unique and positive).

2.4. Existence, uniqueness and positivity of the solution

From a physical point of view, one naturally expects a unique and positive solution of the RTE. Reflecting on this problem is in fact a way to verify whether it is mathematically well formulated. Indeed, as stated above, the coupling between the populations and the radiation field is nonlinear. Without linearization, the solution can be regarded as the equilibrium state of Markov processes. Furthermore, because of radiative selection rules and collisional propensity rules, many factors are very small, and may numerically tend toward zero. The problem is thus potentially ill-conditioned.

The formulation adopted here is a multi-zone statistical equilibrium. The problem of uniqueness and positivity of the solution of the mono-zone statistical equilibrium has been addressed by Damgaard et al. (1992) and Rybicki (1997). As this system of equations is degenerate, one equation is usually replaced by a normalization condition, namely the conservation equation, to overcome the singularity of the associated matrix. Then, Damgaard et al. (1992) show that the solution is positive if all the rates are strictly non-zero. Yet certain transition probabilities vanish, physically or numerically. As long as all the levels are connected, just one nonzero cofactor guarantees the regularity of the matrix. More generally, Rybicki (1997) demonstrated that even for the system without a normalization condition, positive states can link (and be linked) only to positive states. Then, if the statistical equilibrium equations can be transformed into a set of uncoupled irreducible subproblems with a strictly positive normalization condition, the solution is positive and unique. In other words, the uniqueness and positivity properties of the general linear statistical equilibrium equations relate to the underlying connectivity properties of the states.

In our multi-zone formulation, the coupling between statistical equilibria in every atmospheric layer is strongly nonlinear. That is, the system cannot be split into uncoupled sets of equations, i.e., a set of uncoupled irreducible subproblems. This may impact the number of required normalization conditions, while only one per atmospheric layer (the conservation equation) is available. The uniqueness of the solution may then be demonstrated through the search for a nonlinear fixed point as the solution of the equilibrium of a Markov chain (P. Azerad, priv. comm.). A forthcoming paper will be devoted to this. At least, as the properties of connectivity still apply, the positivity of the solution is guaranteed.

3. Implementation

3.1. Nonlinear solving

In the method developed here, the computation of the statistical equilibrium is based on its exact formulation and explicitly includes radiative transfer effects through the mean radiation field in Eq. (22). Finding the root of this function naturally leads to the exact solution to the statistical equilibrium, and avoids any false convergence that may be obtained with stationary iterative methods.

The computation of the norm of a function, (23)N and P being the number of considered energy levels and the number of layers in the model atmosphere, respectively, is equivalent to an optimization problem, that is, the search for the minimum of the L2 norm of the function f, (24)The classical way to minimize this function is to move along a vector δX in a descending direction, using the Newton’s method. Indeed, for δX = XkXk−1 = −J-1·F, the gradient along the vector δX is (25)Newton’s method is very efficient in approaching the solution as it has a quadratic convergence. For each successive Newton step, we have to solve the linear problem JδXk = −F, where k is the index of the Newton iteration. However, with classical algebric methods, this operation leads to an algorithm, which is prohibitive in terms of computational time for large-scale systems. A robust and efficient way to accelerate this operation is to approximate the solution using a faster algorithm. We adopt GMRES, which is a subclass of Krylov subspace methods.

A Krylov subspace method begins with an initial guess . At the nth sub-iteration, is determined through a correction in the nth Krylov subspace, (26)The main advantage of this method is that its implementation requires only some vector matrix (or vector transposed matrix) products. It leads to a complexity in which may be reduced to if the matrix is sparse. Moreover, the scheme needs operations, but the algorithm can be restarted to minimize this contribution.

For each Newton step, we thus have to solve the linear problem JδX = −F, which will be rewritten in this section as Ax = b for simplicity. Stationary iterative methods (Jacobi, Gauss-Seidel, SOR) approximate the unknown vector x with a scheme x(k) = Bx(k−1) + c, where neither B nor c depend upon the iteration count k. We use the GMRES method, which is part of a class of (nonstationary) methods named Krylov subspace methods. The GMRES algorithm tries to evaluate the guess (the nth order Krylov subspace) by solving an optimization problem (minimizing the residue AxnbL2) like a least-square problem. The nth Krylov subspace is spanned over the n basis vectors, (27)Then, the approximate solution is xn = Knc, where the Krylov matrix is Kn = [b,Ab,A2b...,An−1b] and c is an appropriate vector such that (28)This vector is easily computed as it just requires a Matrix-vector product. This is particularly well suited for sparse matrices because this operation evolves roughly as the number of nonzero elements.

A straightforward method to find the least-square solution of this problem would be to compute the QR factorization of the matrix AKn. Indeed, the normal equation ATAx = ATbRx = QTb can be solved by back substitution. However, this is both unstable and too expensive (the QR factorization requires 2mn2 flops, d = QTb, 2mn flops, and Rx = d, n2 flops, where m is the iterate and n is the order of the Krylov subspace). Moreover this natural Krylov basis is ill-conditioned and is thus not a good choice. Indeed, the basis vectors have to be linearly independent, and the successive multiplications by A lead to vectors, which point (fastly) to the dominant eigenvector of A and are thus numerically collinear. For this reason, we build another space , , where the basis vectors are orthogonal (this orthogonalization of Krylov bases is named the Arnoldi method). During the Arnoldi step to build the orthogonal basis, the matrix H of the orthogonalization elements is built. This matrix has a special shape (Hessenberg shape), which is close to triangular (one subdiagonal more). Our minimization problem can be rewritten as (29)where y designs the new unknown (Qny = xn).

Furthermore, we have to combine this method with a global convergence strategy. Indeed, as one gets close to the solution, the choice of the descending direction may lead to spurious effects, especially if δX becomes very large. Then, to ensure the global convergence of the scheme, we use a combination of the line search and backtracking methods, which is an efficient method to solve nonlinear equations. Precisely, when fk>fk−1, δX may be too large. Then, δX is reduced by a factor λ, (30)with λ chosen to minimize f(Xk−1 + λδX) to maintain the iteration in the descending direction.

3.2. Treatment of the singularity

The computation of goes through the integration of a singular function. Indeed, in the integrand of Eq. (12), the function E1(x) has a logarithmic divergence at x = 0+, see Fig 3.

To integrate this kernel, we use the periodization method developed by Helluy et al. (1998), which consists of a high order quadrature method with a variable change. The new variable is a polynomial of degree k, properties of which improve on the order of rectangle rule. The advantage of this method is that it can handle singular functions. Indeed the error | EN(f) | on an interval subdivided in N subintervals goes as ~Ck/Nγ, where γ → (2k−1) if f has a logarithmic singularity at one bound of the integration domain (see Table 1).

Table 1

Benchmark of the periodization method for a logarithmic singularity (from Helluy et al. 1998).

However the periodization method requires that the integration is done over the interval [0,1] and that the singularity is located at one extremum of this interval. We thus split the integration domain into two subdomains, and , and then normalize each subdomain to [0,1]. However, since the MPI strategy limits the number of model layers handled by each process, this can lead to the undersampling of one of the subdomains, when τV approaches one of the bounds of the total domain (i.e., the boundaries of the model atmosphere). We have thus tested this method under the same conditions as those met in our problem, without any resampling of the subdomains.

thumbnail Fig. 3

Map of the integrand of for a radiative transition (see Eq. (12c)). The diagonal is singular, and the quadrature in the periodization method produces a mesh refinement around it. The color encoding of the integrant is in logarithmic arbitrary units.

Open with DEXTER

The result is displayed in Fig. 4. As expected, the relative error strongly increases when one of the subdomains includes too few model layers (typically fewer than 4) and becomes unacceptable for the extreme points. They will thus be considered ghost points in all subsequent computations. This does not affect the computation of the statistical equilibrium.

thumbnail Fig. 4

Benchmarking of the periodization method applied to the integration with a logarithmically singular kernel (g(t) = a0 + a1t, K(x,t) = −γ−ln( | G(t)−G(x) |) and ). The integration domain has been split into two subdomains (see text for details).

Open with DEXTER

Because we can compute the nonlinear function F(X) with good accuracy, the next step to find its root is to compute the Jacobian matrix to solve the system through a Newton’s scheme.

3.3. Computation of the Jacobian matrix

The method presented here, and more generally the solving of nonlinear systems of equations, requires the computation of the Jacobian matrix. One of the advantages of the method we develop is that an analytical form can be derived. Here, the function F(X) is a set of N × P equations, N being the number of energy levels of the species considered, and P the number of layers in the discretized model atmosphere. The differentiation of over (i and k referring to energy levels, and l and m to atmosphere layers) leads to a set of integro-differential equations, with the following four cases: Equations (31a)and (31b)represent local couplings. Mathematically, they correspond to the diagonal and block-diagonal parts of the Jacobian, respectively. Physically, Eq. (31a)describes perturbations of the statistical equilibrium of a given level, by modifying the population of this level, within one atmospheric layer. Equation (31b)describes the perturbations of the statistical equilibrium of a given level, by modifying the population of other levels, still within one atmospheric layer. Equations (31c)and (31d)correspond to off-diagonal blocks and describe nonlocal perturbations, i.e., the influence of the populations in another layer. Naturally, in these last differentiations the coupling is purely radiative, whereas the first and second differentiations include collisional processes.

To determine the terms , one needs to differentiate an integral expression over the variable . This derivation must be considered a functional derivative, and be performed before discretization (see Appendix B for the demonstration), (32)where δ(τVs) is the Dirac distrbution.

In the case τVs (lm), the derivatives for each component are However is not differentiable at x(t) = x(τV) because of the singularity, therefore Eq. (33a)is undetermined for s = τV. For these local terms, we thus need an approximate expression of the radiation field. We emphasize that we only apply this approximation to the computation of the Jacobian matrix.

thumbnail Fig. 5

Pattern of the Jacobian matrix for 4 model atmosphere layers and 200 energy levels.

Open with DEXTER

3.4. Sparsity of Jacobian matrix

In Sect. 3.1, we argue that GMRES is a well-adapted method for large-scale problems if the Jacobian matrix is sparse, as it only requires matrix-vector products, and no matrix inversion. The sparsity (or density) coefficient is defined by the ratio of nonzero elements over the total number of elements. The Jacobian matrix is composed by P × P submatrix of N × N elements leading to NP × NP elements. The block diagonal of this matrix is full because of the irreducibility of the collisional coupling within a given layer (no selection rules for collisions). The number of nonzero elements in the block-diagonal part is P × N2. In the P(P−1) off-diagonal submatrices, the nonzero elements are only due to radiative coupling between different layers, and their number is thus linked to the selection rules for allowed transitions. These rules thus lead to a sparsity of each submatrix with a density ρmol = Ntransitions/N2. The density of the complete Jacobian matrix is then given by (34)Then, as ρmol(N) decreases rapidly with N (e.g., ρH2O ~ 10-3, for N = 411, see Figs. 6 and 7) and ρ is inversely proportional to P, the larger the system, the sparser the Jacobian matrix. The GMRES scheme is thus well adapted and asymptotically approaches the NP complexity.

thumbnail Fig. 6

Example of the evolution of the off-diagonal, block-matrix sparsity (ρmol) according to the number of considered levels. This example is computed for the 411 levels of the ortho-H2O molecule.

Open with DEXTER

thumbnail Fig. 7

Evolution of the Jacobian matrix sparsity according to the number of atmospheric layers for different values of ρmol.

Open with DEXTER

3.5. Jacobian block-diagonal approximation

Assuming that τT(τV)−τT(t) ∝ (tτV) (first order linearization) when t is close to τV, because the kernel decreases when | tτV | increases, we assume that every function at t is equal to its value at τV.

Splitting the integral over t at τV and inverting the integrals over t and μ, Gonzalez Garcia et al. (2008) show that the mean intensity can be rewritten as where β(τV) and β+(τV) are inward and outward escape probabilities (defined in Gonzalez Garcia et al. 2008). The β(τV) invoke the function E2(x), which is not singular and differentiable everywhere in the domain. This enables an analytic estimate of the bloc diagonal terms. The pattern of the Jacobian is displayed in Fig. 5.

3.6. The numerical code MOrad

The above strategy has been implemented in a numerical code named MOrad written in Fortran 90. It takes as inputs:

  • 1.

    A model atmosphere (temperature, atomic, molecular, andelectronic densities etc. as a function of geometrical thickness).

  • 2.

    Spectroscopic data (level energies, statistical weights, radiative rates).

  • 3.

    Collisional rates.

Then it computes the continuous opacities and uses an initial guess to calculate the function (22)and its Jacobian matrix. Choosing the initial guess X0 is an important operation. In MOrad we tested the use of an LTE solution (Boltzmann distribution), and a 0 K solution (the molecules in the ground level). We find that the convergence rate is better with the “0 K” initial guess.

The singular integrations needed to compute are performed with the periodization method. The Green’s function needed to compute is stored at each point. The required space scales as the number of layers times the number of frequency points (in a parallel code; otherwise it would scale with the square of the number of layers). The evaluation of the function and the Jacobian are parallelized with MPI. At first glance, two parallelization strategies may be considered: either a parallelization over frequencies or over atmosphere layers. However, van Noort et al. (2002) show that the most efficient parallelization to reduce communications is spatial parallelization because of the strong local coupling between energy levels. The populations of all energy levels within each atmosphere layer should thus be computed by the same processor, with several adjacent layers possibly treated by the same processor. In MOrad we adopt a subdomain decomposition where each processor computes the physical values needed in one given layer and solves the local statistical equilibrium. Integrations are performed with a scalable algorithm, which minimizes global communications to preserve a good scalability.

The code has been interfaced with a nonlinear solver of the parallel and scalable library PETSc (see, e.g., Balay et al. 2013b,a, 1997). This solver takes as input the distributed function vector and the Jacobian matrix and returns the root of the function after preconditioning.

The main advantages of this library are its performance and its scalability. Moreover, PETSc provides access to many different nonlinear methods and preconditioning, which allows us a versatile choice of the best method to be adapted for a given physical problem. Other efficient libraries, such as the parallel IO library HDF5, are interfaced with MOrad to preserve the performances for large-scale problems and the portability of output data. Future planned developments include an MPI/OpenMP hybridization (openMP parallelization over frequencies).

4. Discussion

4.1. Example of application

To illustrate the applicability of MOrad, we compute the non-LTE departure coefficients of water molecules in a MARCS (Gustafsson et al. 2008) red supergiant model atmosphere. The model considered here has an effective temperature of 3500 K, a log g = 0, a solar metallicity, and a microturbulence of 2 km s-1. These parameters correspond to typical red supergiant stars. We consider more than 800 rovibrational levels, leading to more than 330 000 transitions and 15 000 lines (see the Grotrian diagram of ortho H2O in Fig. 8). The energy levels and the radiative coefficients are taken from Barber et al. (2006). The collisional rates for H2O–H2 and H2O–e are taken from Faure & Josselin (2008).

thumbnail Fig. 8

Grotrian diagram of the ortho-H2O molecule showing the radiative transitions taken into account. The parameter Jn is the rotational angular momentum of H2O. Colors indicate different vibrational states.

Open with DEXTER

The code MOrad was launched on 47 processors using a GMRES Newton-Krylov method and a line-search global convergence strategy. We conducted preconditioning with the AMS method (Block Additive Schwarz method), where each subblock is preconditioned with an approximate iterative LU factorization method. We obtain the root X of the function (F(X) ∥ ≤ 10-12) after three nonlinear iterations and a total time of code execution of less than ~1 h on the Alarik LUNARC system (see Lunarc 2013). The departure coefficients are presented in Fig. 9. The convergence rate is Q-quadratic, though a quadratic convergence rate would be expected. This may be because of an over reconditioning or a problem that is too simple (close to linear), but we obtain the same Q-quadratic convergence for a purely radiative problem.

thumbnail Fig. 9

LTE departure coefficients of ortho-H2O in a red supergiant model atmosphere.

Open with DEXTER

A detailed analysis of the results is out of the scope of this paper and is devoted to a forthcoming paper; a preliminary analysis was done in Lambert et al. (2013). In particular a detailed discussion of the use of the super-level approximation, which seems appropriate for the vibrationally excited states, will be presented. In short, the non-LTE calculations lead to stronger lines, especially rotational lines in the fundamental vibrational state around 2 μm, compared to LTE computation.

4.2. Performance and complementarity of the method

The method presented here is naturally accurate as it is equivalent to finding the root of a function and thus avoids any false convergence (assuming the solution is unique)3. Furthermore, we emphasize that because the integration is analytic, the method leads naturally to null residue. Concerning the performance of the method, the performances of the method are difficult to assess. The convergence rate depends upon the choice of both the solver and the preconditioning. These choices are not unique and may vary according to the considered problem (species and model atmosphere). The memory demands scales with the dimensionality of the considered problem.

The computation of the function and the Jacobian matrix leads to a scheme with an asymptotic complexity in . The complexity of the resolution scheme is more difficult to evaluate because the preconditioned Newton-Krylov method has a cost in Nfreq × N2 × P2 (Nfreq being the number of frequency points sampling the line profile, N the number of energy levels, and P the number of layers in the discretized atmosphere), but invokes another term that evolves as the square of the iteration index. This term is partially eliminated by restarting GMRES and is not really important in practice because the number of iterations is small after appropriate preconditioning.

Furthermore, the performance is better than and probably close to . Indeed, for a sparse matrix, GMRES evolves in . Because the only nonzero elements of the off-diagonal blocks of the Jacobian matrix are due to radiative coupling between layers, the sparsity of J evolves as the ratio of the number of radiative transitions over N2. Taking a complexity in Nbfreq × N2 × P2 is then an superior limit, which underevaluates the method scalability in real cases.

As a comparison, the computation of the equivalent function of F(X) in MULTI has a complexity evolving in N × P because of the Scharmer operator, which performs no numerical integration to evaluate diffusion integrals but only one quadrature point. The system is solved with a Newton-Raphson scheme, which has a cost in N3 × P2 when using the full matrix or N3 × P when using the local operator (block-diagonal matrix; M. Carlsson, priv. comm.). The theoretical asymptotic speed-up for MOrad compared to MULTI are summarized in Table 2.

Table 2

Theoretical asymptotic speed-up for two extreme cases in terms of ρ = sparsity(J)(0 ≤ ρ ≤ 1) for MOrad, compared to MULTI with either full or local operator.

From Table 2, one can see that our method is complementary to existing methods, such as direct linearization. Indeed, in some cases, with a large number of levels and a small number of layers, the expected time of computation is shorter.

5. Conclusion

We have developed a new method to solve the RTE in non-LTE conditions. Rather than solving the RTE itself, we solve the statistical equilibrium equation system, which includes an analytical formal solution of the RTE. One of the main advantages of this choice is that the coefficients of the associated Jacobian matrix can be computed exactly. With this approach, a nonlinear solver can be used. In other words, this method avoids any linearization or stationary iterative schemes, which may converge very slowly or even falsely in some extreme cases (a very low non-LTE parameter ϵ and/or highly discretized media). We chose a method based on the GMRES nonlinear method. It has been implemented in a code that uses the PETSc library. We pay special attention to the proper treatment of the singularity that arises in the integration of the mean radiation field.

Furthermore, this method is parallelized using MPI, which makes it applicable to large-scale systems such as molecules and atoms in stellar atmospheres or interstellar medium, as it has an excellent scalability with the number of energy levels. Indeed, our code has been successfully applied to the modeling of water in the atmospheres of red supergiants. The results will be presented in a forthcoming paper. Further developments of the code include hybrid parallelization, with openMP parallelization over frequencies. We then plan to make the code public.


1

We already emphasize here that solving either the RTE explicitly or the statistical equilibrium equation is strictly identical, and both equations are of similar mathematical nature, as shown, for instance, in the Appendix of Hauschildt et al. (1995).

2

At steady-state. Chemical formation and destruction terms are not included.

3

The solution of the Eddington problem (the two-level atom in the isothermal optically thick medium) is recovered at the 1st iteration with preconditioning, and after five iterations without preconditioning (relative error <0.005 for ϵ = 10-5 and τmax = 1000). This is expected because the inversion of the matrix by the Krylov subspace method is well known to work in this kind of situation.

Acknowledgments

We are grateful to Pascal Azerad for fruitful discussions and Karin Ryde for proofreading the English in the manuscript. We acknowledge financial support from the Swedish Karl Tryggers Foundation under grants CTS 12:408 and CTS 13:388 and its investment in science and astrophysics. Moreover, we thank the French National Agency for Research (ANR) through program number ANR-06-BLAN-0105, and from “Programme National de Physique Stellaire” (PNPS) of CNRS/INSU, France. N.R. is a Royal Swedish Academy of Sciences Research Fellow supported by a grant from the Knut and Alice Wallenberg Foundation. Funds from Kungl. Fysiografiska Sällskapet i Lund and support from the Swedish Research Council, VR are gratefully acknowledged.

References

Appendix A: Formulation

One starts with the classical formulation of the radiative transfer equation in a plane parallel geometry, (A.1)The monochromatic extinction and emissivity coefficients are expressed with an explicit dependence on levels populations as follows: By inserting relative populations fi = ni/nk ,where nk is the total density of the species labeled by k, one gets One brings up the new variable xi = fi/gi, and using the relation Bulgu = Blugl, where gi is the statistical weight of the level i, yields Including the formulation of and in the radiative transfer equation, one obtains (A.5)In order to create a transition independent scale, the previous equation is divided by an arbitrarily chosen continuum extinction coefficient χV, here χVχν = 500 nm. Moreover we express it per hydrogen atom to bring out the ratio nk/nH for numerical reasons. This normalization is symbolized by a tilde (~) for other variables, (A.6)With , , , , , and dτV = −χVds. Assuming that the continuum is formed in LTE (), one obtains (A.7)

Appendix B: Jacobian matrix coefficients

We start with the analytic formulation of the mean intensity field for the line component (similar demonstration for contunium), i.e., (B.1)Because the functional derivative of a functional product is (B.2)we functionally derive the product xi(t)K [xi,xj] over xi(τV = s), which yields, (B.3)Equation (B.3) can be rewritten as a sum of two integrals over t, I1 and I2.

The first term I1 simplifies (B.4)This term is an indefinite form if s = τV, because the kernel K(x,y) is singular at x = y.

We consider here the case sτV (the case s = τV is detailed in Sect. 3.4).

The second term I2 requires the computation of the functional derivative of the kernel, which is (B.5)With the chain rule theorem of functional derivatives, we get (B.6)We then derive the kernel K, finding (B.7)To simplify the calculus, we rewrite as (B.8)where the distribution Θ(t,τV,s) = −2(θ(tτV)−0.5)(θ(st)−θ(sτV)), and θ(x) is the Heaviside function. This distribution can be rewritten, (B.9)We functionally derive Eq. (B.8), yielding (B.10)We insert this expression in Eq. (B.7) and we define the modified kernel , (B.11)The integral I2 from Eq. (B.3) can be rewritten as (B.12)which, according to the Θ properties, becomes (B.13)Finally, we get the derivative of the mean radiation field, (B.14)

Appendix C: Definition of the variables

Table C.1

Physical quantities.

Table C.2

Functions, operators, and distributions.

All Tables

Table 1

Benchmark of the periodization method for a logarithmic singularity (from Helluy et al. 1998).

Table 2

Theoretical asymptotic speed-up for two extreme cases in terms of ρ = sparsity(J)(0 ≤ ρ ≤ 1) for MOrad, compared to MULTI with either full or local operator.

Table C.1

Physical quantities.

Table C.2

Functions, operators, and distributions.

All Figures

thumbnail Fig. 1

Spectral radius of amplification matrix for the Λ iteration method. Convergence is shown for two spectral radii, marked with letters A (upper panel) and B (lower panel) in the right-hand panels. The thin black lines correspond to successive iterations; the true solution is given by the orange line.

Open with DEXTER
In the text
thumbnail Fig. 2

Same as Fig. 1, except for ALI with Λ = diagonal of Λ.

Open with DEXTER
In the text
thumbnail Fig. 3

Map of the integrand of for a radiative transition (see Eq. (12c)). The diagonal is singular, and the quadrature in the periodization method produces a mesh refinement around it. The color encoding of the integrant is in logarithmic arbitrary units.

Open with DEXTER
In the text
thumbnail Fig. 4

Benchmarking of the periodization method applied to the integration with a logarithmically singular kernel (g(t) = a0 + a1t, K(x,t) = −γ−ln( | G(t)−G(x) |) and ). The integration domain has been split into two subdomains (see text for details).

Open with DEXTER
In the text
thumbnail Fig. 5

Pattern of the Jacobian matrix for 4 model atmosphere layers and 200 energy levels.

Open with DEXTER
In the text
thumbnail Fig. 6

Example of the evolution of the off-diagonal, block-matrix sparsity (ρmol) according to the number of considered levels. This example is computed for the 411 levels of the ortho-H2O molecule.

Open with DEXTER
In the text
thumbnail Fig. 7

Evolution of the Jacobian matrix sparsity according to the number of atmospheric layers for different values of ρmol.

Open with DEXTER
In the text
thumbnail Fig. 8

Grotrian diagram of the ortho-H2O molecule showing the radiative transitions taken into account. The parameter Jn is the rotational angular momentum of H2O. Colors indicate different vibrational states.

Open with DEXTER
In the text
thumbnail Fig. 9

LTE departure coefficients of ortho-H2O in a red supergiant model atmosphere.

Open with DEXTER
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.