Numerical solutions to linear transfer problems of polarized radiation

Pietro Benedusi; Gioele Janett; Luca Belluzzi; Rolf Krause

doi:10.1051/0004-6361/202141238

Home

All issues

Volume 655 (November 2021)

A&A, 655 (2021) A88

Full HTML

Free Access

Issue		A&A Volume 655, November 2021


Article Number		A88
Number of page(s)		10
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202141238
Published online		25 November 2021

A&A 655, A88 (2021)

II. Krylov methods and matrix-free implementation

Pietro Benedusi¹, Gioele Janett²^,1, Luca Belluzzi²^,3^,1 and Rolf Krause¹

¹ Euler Institute, Università della Svizzera italiana (USI), 6900 Lugano, Switzerland
e-mail: pietro.benedusi@usi.ch
² Istituto Ricerche Solari (IRSOL), Università della Svizzera italiana (USI), 6605 Locarno-Monti, Switzerland
³ Leibniz-Institut für Sonnenphysik (KIS), 79104 Freiburg im Breisgau, Germany

Received: 3 May 2021
Accepted: 27 June 2021

Abstract

Context. Numerical solutions to transfer problems of polarized radiation in solar and stellar atmospheres commonly rely on stationary iterative methods, which often perform poorly when applied to large problems. In recent times, stationary iterative methods have been replaced by state-of-the-art preconditioned Krylov iterative methods for many applications. However, a general description and a convergence analysis of Krylov methods in the polarized radiative transfer context are still lacking.

Aims. We describe the practical application of preconditioned Krylov methods to linear transfer problems of polarized radiation, possibly in a matrix-free context. The main aim is to clarify the advantages and drawbacks of various Krylov accelerators with respect to stationary iterative methods and direct solution strategies.

Methods. After a brief introduction to the concept of Krylov methods, we report the convergence rate and the run time of various Krylov-accelerated techniques combined with different formal solvers when applied to a 1D benchmark transfer problem of polarized radiation. In particular, we analyze the GMRES, BICGSTAB, and CGS Krylov methods, preconditioned with Jacobi, (S)SOR, or an incomplete LU factorization. Furthermore, specific numerical tests were performed to study the robustness of the various methods as the problem size grew.

Results. Krylov methods accelerate the convergence, reduce the run time, and improve the robustness (with respect to the problem size) of standard stationary iterative methods. Jacobi-preconditioned Krylov methods outperform SOR-preconditioned stationary iterations in all respects. In particular, the Jacobi-GMRES method offers the best overall performance for the problem setting in use.

Conclusions. Krylov methods can be more challenging to implement than stationary iterative methods. However, an algebraic formulation of the radiative transfer problem allows one to apply and study Krylov acceleration strategies with little effort. Furthermore, many available numerical libraries implement matrix-free Krylov routines, enabling an almost effortless transition to Krylov methods.

Key words: radiative transfer / methods: numerical / Sun: atmosphere / stars: atmospheres / polarization

© ESO 2021

1. Introduction

When looking for numerical solutions to transfer problems of polarized radiation, it is common to rely on stationary iterative methods. In particular, fixed point iterations with Jacobi, block-Jacobi, or Gauss-Seidel preconditioning have been successfully employed. An illustrative convergence analysis of the stationary iterative methods usually used in the numerical transfer of polarized radiation is presented in the first paper of this series (Janett et al. 2021). Unfortunately, stationary iterative methods often show unsatisfactory convergence rates when applied to large problems. By contrast, Krylov iterative methods gained popularity in the last decades, proving to be highly effective solution strategies, especially when dealing with large and sparse linear systems.

Among the various Krylov techniques, the biconjugate gradient stabilized (BICGSTAB) and the generalized minimal residual (GMRES) methods are very common choices, especially when dealing with nonsymmetric systems (Meurant & Duintjer Tebbens 2020). As for stationary iterative methods, suitable preconditioning can significantly improve the convergence of Krylov iterations.

In the radiative transfer context, both BICGSTAB and GMRES methods have been employed for various applications, such as time-dependent fluid flow problems (Klein et al. 1989; Hubeny & Burrows 2007), problems arising from finite element discretizations (Castro & Trelles 2015; Badri et al. 2019), and nonlinear radiative transfer problems for cool stars (Lambert et al. 2015). In particular, Krylov methods have already been employed to linear transfer problems of unpolarized radiation, showing promising results. We would like to mention the first convergence studies on the BICGSTAB method by Paletou & Anterrieu (2009) and Anusha et al. (2009).

Nagendra et al. (2009) first applied BICGSTAB to the transfer of polarized radiation in 1D atmospheric models. Anusha et al. (2011) and Anusha & Nagendra (2011) employed the same method in 2D and 3D geometries, while Anusha & Nagendra (2012) additionally included angle-dependent partial frequency redistribution (PRD) effects. For the sake of simplicity, these papers are limited to the study of hypothetical lines in isothermal atmospheric models. Finally, Anusha & Nagendra (2013) used the BICGSTAB method to model the Ca II K 3993 Å resonance line using an ad hoc 3D atmospheric model, and Sampoorna et al. (2019) applied the GMRES method to model the D₂ lines of Li I and Na I, taking the hyperfine structure of these atoms into account. However, Krylov methods have not been fully exploited in the numerical solution of transfer of polarized radiation. Indeed, only a few applications of Krylov methods aim to model the polarization profiles in realistic atmospheric models. Moreover, numerical studies investigating multiple preconditioned Krylov methods, in terms of convergence and run time, and presenting key implementation details, are still lacking in this context.

This article is organized as follows. Section 2 recalls the idea that lies behind Krylov iterative techniques, with a special focus on the GMRES and BICG methods. Section 3 presents a benchmark analytical problem, its discretization, and its algebraic formulation, and shows how to apply Krylov methods in the radiative transfer context. In Sect. 4, we describe the problem settings and the solvers options and present a quantitative comparison between Krylov and stationary iterative methods with respect to convergence, robustness, and run time. Finally, Sect. 5 provides remarks and conclusions, which are also generalized to more complex problems.

2. Krylov methods

This section contains a gentle, but not exhaustive, introduction to Krylov methods. For more interested readers, Ipsen & Meyer (1998) explain the idea behind Krylov methods, while Saad (2003) and Meurant & Duintjer Tebbens (2020) provide a comprehensive and rigorous discussion on the topic.

We consider the linear system

$\begin{matrix} A x = b, \end{matrix}$ $\begin{aligned} A{\mathbf{x}}={\mathbf{b}}, \end{aligned}$ (1)

where the nonsingular matrix A ∈ ℝ^N × N and the vector b ∈ ℝ^N are given, and the solution x ∈ ℝ^N is to be found. Given an initial guess x⁰, a Krylov method approximates the solution x of (1) with the sequence of vectors xⁿ ∈ 𝒦_n(A, b), with n = 1, …, N, where

$\begin{matrix} K_{n} (A, b) = span {b, A b, A^{2} b, \dots, A^{n - 1} b} \end{matrix}$ $\begin{aligned} \mathcal{K} _n(A,{\mathbf{b}}) = \text{ span}\left\{ {\mathbf{b}},A{\mathbf{b}}, A^2{\mathbf{b}},\ldots ,A^{n-1}{\mathbf{b}}\right\} \end{aligned}$ (2)

is the nth Krylov subspace generated by b. This is the case for the (very common) initial guess x⁰ = 0. In general, xⁿ − x⁰ ∈ 𝒦_n(A, r⁰), where r⁰ = b − Ax⁰ is the initial residual.

We first try to clarify why it is convenient to look for an approximate solution xⁿ of the linear system (1) inside the Krylov subspace 𝒦_n(A, b). The minimal polynomial of A is the monic polynomial q of minimal degree such that q(A) = 0. If A has d distinct eigenvalues λ₁, …, λ_d, its mth degree minimal polynomial reads

$\begin{matrix} q (t) = \prod_{j = 1}^{d} {(t - λ_{j})}^{m_{j}}, \end{matrix}$ $\begin{aligned} q(t) = \prod _{j=1}^d (t - \lambda _j)^{m_j}, \end{aligned}$ (3)

where m_j is the multiplicity¹ of λ_j (that is, the jth root of q) and $m = \sum_{j = 1}^{d} m_{j}$ $m = \sum\nolimits_{j=1}^d m_j$ . Polynomial (3) can be rewritten in the form

$\begin{matrix} q (t) = \sum_{j = 0}^{m} α_{j} t^{j}, \end{matrix}$ $\begin{aligned} q(t) = \sum _{j=0}^m\alpha _jt^j, \end{aligned}$

where α₁, …, α_m are unspecified scalars and $α_{0} = \prod_{j = 1}^{d} {(- λ_{j})}^{m_{j}}$ $\alpha_0 = \prod\nolimits_{j=1}^d(-\lambda_j)^{m_j}$ . Evaluating the polynomial in A, one obtains

$\begin{matrix} q (A) = α_{0} I d + α_{1} A + \dots + α_{m} A^{m} = 0, \end{matrix}$ $\begin{aligned} q(A) = \alpha _0 Id + \alpha _1A + \ldots + \alpha _m A^m = 0, \end{aligned}$

which allows us to express the inverse A⁻¹ in terms of powers of A, namely,

$\begin{matrix} A^{- 1} = - \frac{1}{α_{0}} \sum_{j = 0}^{m - 1} α_{j + 1} A^{j}, \end{matrix}$ $\begin{aligned} A^{-1} = -\frac{1}{\alpha _0}\sum _{j=0}^{m-1}\alpha _{j+1} A^j, \end{aligned}$ (4)

showing that the solution vector x = A⁻¹b belongs to the Krylov subspace 𝒦_m(A, b). Hence, a smaller degree of the minimal polynomial of A (e.g., m ≪ N) generally corresponds to a faster convergence of Krylov methods, since x ∈ 𝒦_m(A, b)⊂ℝ^N. We remark that (4) is well defined if A is nonsingular: if λ_j ≠ 0 for all j then α₀ ≠ 0.

Secondly, we try to better characterize the smallest Krylov space containing x. The minimal polynomial of b with respect to A is the monic polynomial p of minimal degree such that p(A)b = 0. Denoting with g the degree of p, it is possible to show that 𝒦_g(A, b) is invariant under A, that is

$\begin{matrix} \dim (K_{n} (A, b)) = min {n, g}, \end{matrix}$ $\begin{aligned} \mathrm{dim}\left(\mathcal{K} _n(A,{\mathbf{b}})\right)=\min \{n,g\}, \end{aligned}$

meaning that 𝒦_n(A, b) = 𝒦_g(A, b) for all n ≥ g. Therefore, a Krylov method applied to (1) will typically terminate in g iterations, where g can be much smaller than N. Since g ≤ m ≤ N, a Krylov method terminates after at most N steps². In practice, approximate solutions of system (1) are sufficient and a suitable termination condition allows a Krylov method to converge in less than g iterations.

Crucially, because of the structure of 𝒦_n(A, b), the computation of xⁿ mainly requires a series of matrix-vector products involving A. This paradigm is particularly beneficial when n is large and A is sparse, in which cases matrix-vector products can be computed efficiently. Moreover, Krylov methods are also handy when there is no direct access to the entries of the matrix A and its action is only encoded in a routine that returns Av from an arbitrary input vector v. The various Krylov methods differ from one other in two main aspects. The first difference is in the way they construct the subspace 𝒦_n(A, b). In practice, the direct use of definition (2) is not convenient to build 𝒦_n(A, b), because the spanning vectors b, …, A^n − 1b can become closer and closer being linearly dependent as n grows, resulting in numerical instabilities. The second difference is the way they select the “best” iterate xⁿ ∈ 𝒦_n(A, b).

The following sections introduce three common Krylov methods usually applied to nonsymmetric linear systems: the GMRES, BICGSTAB and conjugate gradient squared (CGS) methods.

2.1. GMRES

Saad & Schultz (1986) introduced one of the best known and mostly applied Krylov solvers: the GMRES method. At iteration n, the GMRES method first constructs an orthonormal basis of 𝒦_n(A, b) by the Arnoldi algorithm, which is a modified version of the Gram-Schmidt procedure adapted to Krylov spaces. Secondly, the GMRES method sets the approximate solution xⁿ ∈ 𝒦_n(A, b) minimizing the Euclidean norm of the residual rⁿ = b − Axⁿ, that is, solving the least squares problem of size n given by

$\begin{matrix} ‖ r^{n} ‖_{2} = ‖ b - A x^{n} ‖_{2} = min_{z \in K_{n}} {‖ b - A z ‖}_{2} . \end{matrix}$ $\begin{aligned} \Vert {\mathbf{r}}^n\Vert _2 = \Vert {\mathbf{b}} - A{\mathbf{x}}^n\Vert _2 =\min _{{\boldsymbol{z}}\in \mathcal{K} _n}\Vert {\mathbf{b}} - A{\mathbf{z}}\Vert _2. \end{aligned}$

Assuming exact arithmetic, the GMRES method returns the exact solution in at most g iterations. Regarding computational complexity, the most expensive stages in the nth GMRES iteration is a O(N²) matrix-vector product involving A, possibly decreasing to O(N) for sparse A, and a nested loop requiring O(nN) floating point operations. In terms of memory, n basis vectors have to be stored. Therefore, the GMRES method becomes inefficient for large n. A standard remedy to this problem is to employ restarts, that is, to terminate the GMRES method after k iterations, using x^k as the initial guess for a subsequent GMRES run, until a desired termination condition is met. A suitable choice of k can significantly reduce the time to solution, but convergence is not always guaranteed. However, if A is positive definite, GMRES converges for any k ≥ 1.

2.2. CGS and BICGSTAB

If A is symmetric, the Arnoldi iteration can be simplified, avoiding nested loops, obtaining the so-called Lanczos procedure, that requires just three vectors to be saved at once. For the nonsymmetric case, this procedure can be extended to the Lanczos biorthogonalization algorithm, that keeps in memory six vectors at once, yielding to a significant storage reduction with respect to the Arnoldi method. On the other hand, two matrix-vector products (involving A and A^T) are needed at each iteration. Starting from two arbitrary vectors v₁ and w₁ satisfying (v₁, w₁) = 1, this algorithm builds two bi-orthogonal bases ${v_{i}}_{i = 1}^{n}$ $\{{{\bf v}}_i\}_{i=1}^n$ and ${w_{i}}_{i = 1}^{n}$ $\{{{\bf w}}_i\}_{i=1}^n$ for 𝒦_n(A, v₁) and 𝒦_n(A^T, w₁), respectively.

In the nth iteration, the biconjugate gradient (BCG) method imposes rⁿ ⊥ 𝒦_n(A^T, w₁) and x_n ∈ 𝒦_n(A, v₁). The BCG method efficiently solves two linear systems with coefficient matrices A and A^T, whose residual expressions are given by

$\begin{matrix} \begin{matrix} r^{n} = p_{n} (A) b, \\ {\tilde{r}}^{n} = p_{n} (A^{T}) b, \end{matrix} \end{matrix}$ $\begin{aligned} \begin{aligned}&{\mathbf{r}}^n=p_n(A){\mathbf{b}},\\&\tilde{{\mathbf{r}}}^n=p_n(A^T){\mathbf{b}}, \end{aligned} \end{aligned}$ (5)

p_n being a nth-degree polynomial³ and ${\tilde{r}}^{n} = b - A^{T} x_{n}$ $\tilde{{\mathbf{r}}}^n = {\mathbf{b}} - A^T{\mathbf{x}}_n$ .

Since the solution of the linear system involving A^T is generally not needed and A^T could not even be explicitly available, transpose-free variants are used. Two well-known transpose-free variants are the CGS (Sonneveld 1989) and the BICGSTAB (Van der Vorst 1992) methods. These variants avoid the use of A^T replacing the residual expressions (5) by $r_{CGS}^{n} = p_{n}^{2} (A) b$ ${{\bf r}}_{\text{CGS}}^n = p_n^2(A){{\bf b}}$ and $r_{BICGSTAB}^{n} = p_{n} (A) {\tilde{p}}_{n} (A) b$ ${\mathbf{r}}^n_{\text{BICGSTAB}} = p_n(A)\tilde{p}_n(A){\mathbf{b}}$ , respectively, where ${\tilde{p}}_{n}$ $\tilde{p}_n$ is a nth-degree polynomial recursively defined. Regarding complexity, both CGS and BICGSTAB, in each iteration, require two matrix-vector products involving A, plus O(N) additional floating point operations. Both CGS and BICGSTAB are less robust than GMRES, since their convergence can stagnate for pathological cases (see, e.g., Gutknecht 1993). Moreover, CGS rounding errors tend to be more pronounced with respect to BICGSTAB, possibly resulting in an unstable convergence.

2.3. Preconditioning

Preconditioning the linear system (1) with the nonsingular matrix P ∈ ℝ^N × N, that is, considering the equivalent system

$\begin{matrix} P^{- 1} A x = P^{- 1} b, \end{matrix}$ $\begin{aligned} P^{-1}A{\mathbf{x}}=P^{-1}{\mathbf{b}}, \end{aligned}$ (6)

can significantly increase robustness and convergence speed of stationary iterative methods. This is also particularly true for Krylov solvers. Crucially, the action of P⁻¹ on an input vector has to be computationally inexpensive, while improving the conditioning of P⁻¹A with respect to the one of A. In practice, Krylov methods must perform the product w = P⁻¹Av for an arbitrary vector v. This operation is naively performed in two stages: $\tilde{w} = A v$ $\tilde{{\mathbf{w}}} = A{\mathbf{v}}$ followed by $w = P^{- 1} \tilde{w}$ ${\mathbf{w}} = P^{-1}\tilde{{\mathbf{w}}}$ . Since P usually depends on A, it is sometimes possible to obtain the action of P⁻¹A in a more efficient way, reducing the number of floating point operations. Since Krylov methods do not require direct access to the entries of the preconditioner P, the action of P⁻¹ (or directly of P⁻¹A) can be also provided as a routine.

We would like to present the most common choices for P that are numerically investigated in Sect. 4. Given the decomposition A = D + U + L, with D, U, and L being respectively the diagonal and the strictly upper and lower triangular parts of A, and 0 < ω < 2, we consider

P = D: Jacobi method;
P = ω⁻¹D + L or P = ω⁻¹D + U: successive over relaxation (SOR) method;
$P = ω {(2 - ω)}^{- 1} \tilde{L} \tilde{U}$ $P=\omega(2-\omega)^{-1}\tilde{L}\tilde{U}$ with $\tilde{L} = ω^{- 1} D + L$ $\tilde{L}=\omega^{-1}D+L$ and $\tilde{U} = D^{- 1} (ω^{- 1} D + U)$ $\tilde{U} = D^{-1}(\omega^{-1}D+U)$ : symmetric SOR method (SSOR);
$P = \tilde{L} \tilde{U}$ $P=\tilde{L}\tilde{U}$ , where $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ are lower and upper triangular matrices approximating the factors of the LU factorization of A: incomplete LU factorization (ILU).

The Jacobi method is numerically appealing since the action of P⁻¹ is cheaply computed (in parallel). The SOR method usually provides faster convergence than Jacobi, at the price of P⁻¹ being more expensive to apply (only sequentially). The SSOR method typically reduces the run time of SOR, especially if there is no preferential direction in the coupling between unknowns. Moreover, a suitable tuning of ω significantly improves convergence, as shown in Janett et al. (2021) for stationary iterative methods. We remark that, for ω = 1, SOR reduces to the Gauss-Seidel (GS) method and SSOR reduces to symmetric Gauss-Seidel. ILU preconditioners are obtained trough an inexact Gaussian elimination, where some elements of the LU factorization are dropped. Typically, $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ are constructed such that their sum has the same sparsity pattern of A (a.k.a. ILU with no fill-in or ILU(0)). As in the (S)SOR case, ILU preconditioners are sequential, but variants capable of extracting some parallelism can be applied.

Finally, the Jacobi and (S)SOR methods can be encoded in matrix-free routines. By contrast, to apply an ILU preconditioner, the matrices A, $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ must be assembled and stored.

3. Benchmark problem

In this section, we present the continuous formulation of a benchmark linear transfer problem of polarized radiation. We then consider its discretization and algebraic formulation. Finally, we remark on how the matrix-free Krylov methods presented in Sects. 2.1 and 2.2 are applied in this context. The reader is referred to Janett et al. (2021, Sect. 3) for a more detailed presentation of the problem, along with its discretization and algebraic formulation.

The problem is formulated within the framework of the complete frequency redistribution (CRD) approach presented in Landi Degl’Innocenti & Landolfi (2004). We consider a two-level atom with total angular momenta J_u = 1 and J_ℓ = 0, with the subscripts ℓ and u indicating the lower and upper level, respectively. Since J_ℓ = 0, the lower level is unpolarized by definition. Stimulated emission it is not considered here, since it is completely negligible in solar applications. For the sake of simplicity, continuum processes are neglected, as are magnetic and bulk velocity fields. A homogeneous and isothermal, one-dimensional (1D) plane-parallel atmosphere is considered. Under the aforementioned assumptions, the spatial dependency of the physical quantities entering the problem is fully described by the height coordinate z. Moreover, the problem is characterized by cylindrical symmetry around the vertical and, consequently, the angular dependency of the problem variables is fully described by the inclination θ with respect to the vertical, or equivalently by μ = cos(θ).

3.1. Continuous problem

The physical quantities entering the problem are in general functions of the height z ∈ [z_min, z_max], the frequency ν ∈ [ν_min, ν_max], and the propagation direction of the radiation ray under consideration, identified by μ ∈ [ − 1, 1].

Due to the cylindrical symmetry of the problem, the only nonzero components of the radiation field tensor are ${\bar{J}}_{0}^{0}$ $\bar{J}^0_0$ and ${\bar{J}}_{0}^{2}$ $\bar{J}^2_0$ . Consequently, the only nonzero multipolar components of the source function are $σ_{0}^{0}$ $\sigma^0_0$ and $σ_{0}^{2}$ $\sigma^2_0$ . Setting the angle γ = 0 in the expression of the polarization tensor $T_{0, i}^{2}$ $\mathcal{T}^2_{0,i}$ , one finds that the only nonzero source functions are

$\begin{matrix} S_{I} (z, μ) = σ_{0}^{0} (z) + T_{0, 1}^{2} (μ) σ_{0}^{2} (z), \end{matrix}$ $\begin{aligned}&S_{\!I}(z,\mu ) = \sigma _0^0(z)+\mathcal{T} ^2_{0,1}(\mu ) \sigma _0^2(z),\end{aligned}$ (7)

$\begin{matrix} S_{Q} (z, μ) = T_{0, 2}^{2} (μ) σ_{0}^{2} (z), \end{matrix}$ $\begin{aligned}&S_{\!Q}(z,\mu ) = \mathcal{T} ^2_{0,2}(\mu ) \sigma _0^2(z), \end{aligned}$ (8)

where $T_{0, 1}^{2} (μ) = \sqrt{2} (3 μ^{2} - 1) / 4$ $\mathcal{T}^2_{0,1}(\mu) = \sqrt{2}(3\mu^2-1)/4$ and $T_{0, 2}^{2} (μ) = \sqrt{2} (3 μ^{2} - 3) / 4$ $\mathcal{T}^2_{0,2}(\mu) = \sqrt{2}(3\mu^2-3)/4$ . The only nonzero Stokes parameters are therefore I and Q. Their propagation is described by the decoupled differential equations

$\begin{matrix} μ \frac{d}{d z} I (z, μ, ν) = - η (z, ν) [I (z, μ, ν) - S_{I} (z, μ)], \end{matrix}$ $\begin{aligned}&\mu \frac{\mathrm{d}}{\mathrm{d}z} I(z,\mu ,\nu ) = - \eta (z,\nu ) \left[I(z,\mu ,\nu ) - S_{\!I}(z,\mu )\right],\end{aligned}$ (9)

$\begin{matrix} μ \frac{d}{d z} Q (z, μ, ν) = - η (z, ν) [Q (z, μ, ν) - S_{Q} (z, μ)], \end{matrix}$ $\begin{aligned}&\mu \frac{\mathrm{d}}{\mathrm{d}z} Q(z,\mu ,\nu ) = - \eta (z,\nu ) \left[Q(z,\mu ,\nu ) - S_{\!Q}(z,\mu )\right], \end{aligned}$ (10)

where η is the absorption coefficient. The initial conditions are given by

$\begin{matrix} I (z_{min}, μ, ν) = 1, & if μ > 0, \\ Q (z_{min}, μ, ν) = 0, & if μ > 0, \\ I (z_{max}, μ, ν) = Q (z_{max}, μ, ν) = 0, & if μ < 0 . \end{matrix}$ $\begin{aligned}&I(z_{\min },\mu ,\nu )=1,&\mathrm{if}\;\mu >0,\\&Q(z_{\min },\mu ,\nu )=0,&\mathrm{if}\;\mu >0,\\&I(z_{\max },\mu ,\nu ) = Q(z_{\max },\mu ,\nu )=0,&\mathrm{if}\;\mu < 0. \end{aligned}$

In order to linearize the problem, we assume that the absorption coefficient η is known a priori and fixed or, equivalently, we discretize the atmosphere through a fixed grid in frequency-integrated optical depth (see Janett et al. 2021, Sect. 3.2).

The explicit expressions of ${\bar{J}}_{0}^{0}$ $\bar{J}^0_0$ and ${\bar{J}}_{0}^{2}$ $\bar{J}^2_0$ are

$\begin{matrix} {\bar{J}}_{0}^{0} (z) & = \int d ν \frac{ϕ (ν)}{2} \oint d μ I (z, μ, ν), \end{matrix}$ $\begin{aligned} \bar{J}^0_0(z)&= \int \mathrm{d} \nu \frac{\phi (\nu )}{2} \oint \mathrm{d}\mu \, I(z,\mu ,\nu ),\end{aligned}$ (11)

$\begin{matrix} {\bar{J}}_{0}^{2} (z) & = \int d ν \frac{ϕ (ν)}{2} \oint d μ [T_{0, 1}^{2} (μ) I (z, μ, ν) + T_{0, 2}^{2} (μ) Q (z, μ, ν)], \end{matrix}$ $\begin{aligned} \bar{J}^2_0(z)&= \int \mathrm{d} \nu \frac{\phi (\nu )}{2} \oint \mathrm{d} \mu \left[\mathcal{T} ^2_{0,1}(\mu )I(z,\mu ,\nu ) + \mathcal{T} ^2_{0,2}(\mu )Q(z,\mu ,\nu )\right], \end{aligned}$ (12)

where the damping constant entering the absorption profile ϕ is fixed to a = 10⁻³. Finally, the explicit expressions of $σ_{0}^{0}$ $\sigma^0_0$ and $σ_{0}^{2}$ $\sigma^2_0$ are (see Janett et al. 2021, Eq. (18))

$\begin{matrix} σ_{0}^{0} (z) & = ξ {\bar{J}}_{0}^{0} (z) + ϵ \\ = ξ \int d ν \frac{ϕ (ν)}{2} \oint d μ I (z, μ, ν) + ϵ, \end{matrix}$ $\begin{aligned} \sigma _0^0(z)&= \xi \bar{J}^0_0(z) + \epsilon \nonumber \\&= \xi \!\int \mathrm{d} \nu \frac{\phi (\nu )}{2} \oint \mathrm{d} \mu \, I(z,\mu ,\nu ) + \epsilon ,\end{aligned}$ (13)

$\begin{matrix} σ_{0}^{2} (z) & = ξ {\bar{J}}_{0}^{2} (z) \\ = ξ \int d ν \frac{ϕ (ν)}{2} \oint d μ [T_{0, 1}^{2} (μ) I (z, μ, ν) + T_{0, 2}^{2} (μ) Q (z, μ, ν)], \end{matrix}$ $\begin{aligned} \sigma _0^2(z)&= \xi \bar{J}^2_0(z)\nonumber \\&= \xi \!\int \mathrm{d} \nu \frac{\phi (\nu )}{2} \oint \mathrm{d} \mu \left[\mathcal{T} ^2_{0,1}(\mu )I(z,\mu ,\nu ) + \mathcal{T} ^2_{0,2}(\mu )Q(z,\mu ,\nu )\right], \end{aligned}$ (14)

where we set the thermalization parameter to ϵ = 10⁻⁴ and ξ = 1 − ϵ. In the equations above, we assumed a negligible depolarizing rate of the upper level due to elastic collisions $δ_{u}^{(K)} = 0$ $\delta_u^{(K)}=0$ , and set the Planck function (in the Wien limit) to W = 1. We finally recall that for the considered atomic model $w_{J_{u} J_{l}}^{(0)} = w_{J_{u} J_{l}}^{(2)} = 1$ $w_{J_u J_\ell}^{(0)} = w_{J_u J_\ell}^{(2)} = 1$ .

3.2. Discrete problem

We discretize the continuous variables z, ν, and μ with the numerical grids ${z_{k}}_{k = 1}^{N_{s}}$ $\{z_k\}_{k=1}^{N_{\rm s}}$ , ${μ_{m}}_{m = 1}^{N_{μ}}$ $\{\mu_m\}_{m=1}^{N_\mu}$ , and ${ν_{p}}_{p = 1}^{N_{ν}}$ $\{\nu_p\}_{p=1}^{N_{\nu}}$ , respectively. The quantities of the problem are now approximated at the nodes only. The discrete version of Eqs. (7) and (8) can be expressed in matrix form

$\begin{matrix} S = T σ, with T \in R^{2 N_{s} N_{μ} \times 2 N_{s}}, \end{matrix}$ $\begin{aligned} \mathbf S =T\boldsymbol{\sigma },\;\;\mathrm{with}\;\; T\in {\mathbb{R} }^{2 N_{\rm s} N_\mu \times 2 N_{\rm s}}, \end{aligned}$ (15)

where S ∈ ℝ^2N_sN_μ and σ ∈ ℝ^2N_s collect the discretized source functions S_I(z_k, μ_m) and S_Q(z_k, μ_m), and the multipolar components of the source function $σ_{0}^{0} (z_{k})$ $\sigma_0^0(z_k)$ and $σ_{0}^{2} (z_{k})$ $\sigma_0^2(z_k)$ , respectively. The entries of the matrix T depend on the coefficients appearing in Eqs. (7) and (8).

Similarly, the transfer Eqs. (9) and (10) are expressed in matrix form

$\begin{matrix} I = Λ S + t, with Λ \in R^{2 N_{s} N_{μ} N_{ν} \times 2 N_{s} N_{μ}}, \end{matrix}$ $\begin{aligned} \mathbf I =\Lambda \mathbf S +\mathbf t ,\;\;\mathrm{with}\;\; \Lambda \in {\mathbb{R} }^{2N_{\rm s} N_\mu N_\nu \times 2 N_{\rm s} N_\mu }, \end{aligned}$ (16)

where I ∈ ℝ^2N_sN_μN_ν collects the discretized Stokes parameters I(z_k, μ_m, ν_p) and Q(z_k, μ_m, ν_p), while t ∈ ℝ^2N_sN_μN_ν represents the radiation transmitted from the boundaries. The entries of the matrix Λ depend on the numerical method (a.k.a. formal solver) used to solve the transfer Eqs. (9) and (10), on the spatial grid, and on the eventual numerical conversion to the optical depth scale.

Finally, the matrix version of Eqs. (13) and (14) reads

$\begin{matrix} σ = J I + c, with J \in R^{2 N_{s} \times 2 N_{s} N_{μ} N_{ν}}, \end{matrix}$ $\begin{aligned} \boldsymbol{\sigma }=J\mathbf I +\mathbf c ,\;\;\mathrm{with}\;\; J\in {\mathbb{R} }^{2 N_{\rm s}\times 2 N_{\rm s} N_\mu N_\nu }, \end{aligned}$ (17)

where $σ = {[σ_{0}^{0} (z_{1}), σ_{0}^{2} (z_{1}), σ_{0}^{0} (z_{2}), \dots, σ_{0}^{2} (z_{N_{s}})]}^{T}$ $\boldsymbol{\sigma} = [\sigma_0^0(z_1),\sigma_0^2(z_1),\sigma_0^0(z_2),\ldots,\sigma_0^2(z_{N_{\rm s}})]^T$ and, accordingly, c = [ϵ, 0, ϵ, 0, …, ϵ, 0]^T. The matrix J depend on the choice of the angular and spectral quadratures used in Eqs. (13) and (14). Janett et al. (2021) provide an explicit description of the matrices T, Λ, and J for a more general radiative transfer setting.

By choosing σ as the unknown vector, Eqs. (15)–(17) can then be combined in the linear system given by

$\begin{matrix} (I d - J Λ T) σ = J t + c ⟺ A x = b, \end{matrix}$ $\begin{aligned} (Id-J\Lambda T)\boldsymbol{\sigma }=J\mathbf t +\mathbf c \quad \iff \quad A{\mathbf{x}}={\mathbf{b}}, \end{aligned}$ (18)

with Id − JΛT being a square matrix of size 2N_s. Referring to (1), we set A = Id − JΛT, x = σ, and b = Jt + c. Alternatively, it is also possible to consider I as unknown of the problem, obtaining the following linear system

$\begin{matrix} (I d - Λ T J) I = Λ T c + t, \end{matrix}$ $\begin{aligned} (Id-\Lambda TJ)\mathbf I =\Lambda T\mathbf c +\mathbf t , \end{aligned}$ (19)

with Id − ΛTJ being a block sparse square matrix of size 2N_sN_μN_ν with dense blocks.

The linear problem (18) is considered in the following sections, because of its smaller size in the current setting. However, the formulation (19) is likely to be more favorable to Krylov methods, since the corresponding linear system is both larger and sparse. In both cases the coefficient matrix A is nonsymmetric and positive definite (cf. Fig. 1).

Fig. 1.

Spectrum of P⁻¹A on the complex plane for different preconditioners for the DELO-linear formal solver, with N_s = 80 and N_μ = N_ν = 20. From here on the abbreviation “no prec.” indicates that no preconditioner is used, that is, P = Id.

3.3. Matrix-free Krylov approach

The explicit assembly of A = Id − JΛT is convenient if the linear system (18) must be solved multiple times, for example in the case of extensive numerical tests with different right-hand sides. However, it is in general expensive to assemble A, especially for large problems. However, Krylov methods can also be applied in a matrix-free context, where the action of A is encoded in a routine and no direct access to its entries is required. For this reason, we provide in Algorithm 1 a matrix-free routine to compute σ_out = Aσ_in for any arbitrary input vector σ_in. Algorithm 1 is modular, allowing preexisting radiative transfer routines to be reused.

We remark that it is possible to assemble A column-by-column, obtaining its jth column applying Algorithm 1 to a point-like source function σ_i = δ_ij for i, j = 1, …, 2N_s.

4. Numerical experiments

In Sect. 2, we discussed the convergence properties, complexity, and memory requirements of various Krylov methods. In practice, the specific choice among the different methods is usually made according to run time, which is empirically tested.

In this section, different Krylov and stationary iterative methods are applied to the benchmark linear problem (18). In particular, we compare GMRES, CGS, and BICGSTAB Krylov methods, possibly preconditioned, and the Richardson, Jacobi, SOR, and SSOR stationary iterative methods. The stationary iterative methods are described as preconditioned Richardson methods of the form

$\begin{matrix} x^{n + 1} = x^{n} + P^{- 1} (b - A x^{n}) . \end{matrix}$ $\begin{aligned} {\mathbf{x}}^{n+1} = {\mathbf{x}}^n + P^{-1}({\mathbf{b}} - A{\mathbf{x}}^n). \end{aligned}$ (20)

4.1. Discretization, quadrature, and formal solver

We replaced the spatial variable z by the frequency-integrated optical depth scale along the vertical τ. Moreover, we fixed the logarithmically spaced grid

$\begin{matrix} 10^{- 5} = τ_{1} < τ_{2} < \dots < τ_{N_{s}} = 10^{4}, \end{matrix}$ $\begin{aligned} 10^{-5} = \tau _1 < \tau _2 < \cdots < \tau _{N_{\rm s}} = 10^4, \end{aligned}$

thus linearizing the problem. We used N_μ Gauss-Legendre nodes and weights to discretize μ ∈ [ − 1, 1] and compute the corresponding angular quadrature. The spectral line was sampled with N_ν frequency nodes equally spaced in the reduced frequency interval [ − 5, 5] and the trapezoidal rule was used as spectral quadrature.

A limited number of formal solvers was considered for the numerical solution of transfer Eqs. (9) and (10): the implicit Euler, DELO-linear, DELOPAR, and DELO-parabolic methods. Further details on these formal solvers are given by Janett et al. (2017a,b, 2018).

4.2. Iterative solvers settings and implementation

As initial guess for all the solvers we use $σ_{0}^{0} (z_{k}) = 1$ $\sigma_0^0(z_k) = 1$ and $σ_{0}^{2} (z_{k}) = 0$ $\sigma_0^2(z_k) = 0$ for all k, corresponding to σ_in = [1, 0, 1, 0, …, 1, 0]. Since multiple preconditioners are compared, we use the following termination condition

$\begin{matrix} ‖ r^{n} ‖_{2} / {‖ b ‖}_{2} < 10^{- 6}, \end{matrix}$ $\begin{aligned} \Vert {\mathbf{r}}^n\Vert _2/\Vert {\mathbf{b}}\Vert _2 < 10^{-6}, \end{aligned}$

which is based on the unpreconditioned relative residual.

Between the two possible formulations of SOR presented in Sect. 2.3, the structure of A (cf. Fig. 2) suggests the use of P = D + ωU, which in fact provides a better convergence. Moreover, the near-optimal damping parameter ω = 1.5 (see Janett et al. 2021, Sect. 2.3) is used for both SOR and SSOR preconditioners when applied to the Richardson method. By contrast, no damping (that is, ω = 1) is used when the SOR and SSOR preconditioners are applied to Krylov methods, since no benefits are observed for different choices of ω. The GMRES method is employed without restarts, while the ILU factorization is applied with a threshold of 10⁻² (a.k.a. ILUT⁴). Figure 2 exposes an example of sparsity pattern of $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ , highlighting the most significant entries of A.

Fig. 2.

Sparsity pattern of $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ from the ILU factorization of A for the DELO-parabolic formal solver, with N_s = 80 and N_μ = N_ν = 20.

Krylov methods are applied to problem (18) using MATLAB and its built-in functions: gmres, bicgstab, cgs and ilu. In practice, we explicitly assemble the matrix A and store P in sparse format, using mldivide to apply P⁻¹ in (20). The SSOR and ILU preconditioners can be expressed as the product of lower and upper triangular matrices, that is, in the form $P = \tilde{L} \tilde{U}$ $P=\tilde{L}\tilde{U}$ . Since $P^{- 1} = {\tilde{U}}^{- 1} {\tilde{L}}^{- 1}$ $P^{-1} = \tilde{U}^{-1}\tilde{L}^{-1}$ , the action of P⁻¹ can be conveniently computed using sparse storage for $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ . We would like to mention that the Eisenstat’s trick (not used here) can be conveniently applied to the SSOR preconditioning, reducing the run time by a factor up to two.

Matrix-free approaches are also adopted, using Algorithm 1 and avoiding the explicit assembling of the matrix A. In this case, for performance reasons, we precompute the action of P⁻¹. Notice that the Krylov routines available in the most common numerical packages (e.g., MATLAB or PETSc) support a matrix-free format of A and P⁻¹.

4.3. Convergence

Figure 1 displays the spectrum of P⁻¹A on the complex plane for different preconditioners. The clustering of the eigenvalues of P⁻¹A corresponds to a better conditioning and, consequently, to a faster convergence of iterative methods when applied to (6). Clustered eigenvalues also yield a reduction of the dimension of the Krylov space 𝒦_n(P⁻¹A, P⁻¹b). Table 1 presents the number of iterations required from different methods to converge as a function of the number of nodes in the spectral and angular grids N_μ and N_ν, respectively. Analogously to Janett et al. (2021), we observe that, once a minimal resolution is guaranteed, varying N_μ and N_ν has a negligible effect on the convergence behavior of the different Krylov solvers. Therefore, the values N_μ = 20 and N_ν = 20 remain fixed in the following numerical experiments on convergence.

Table 1.

Iterations to convergence for the GMRES, CGS, and BICGSTAB Krylov methods, with no preconditioning, for the DELO-linear formal solver, with N_s = 40, and varying N_μ = N_ν.

Figure 3 presents the convergence history of the Richardson and Krylov methods combined with different preconditioners and the formal solvers under investigation. Krylov methods generally outperform stationary methods in terms of number of iterations to reach convergence. Among the Krylov methods, preconditioned BICGSTAB and CGS methods result in the fastest convergence. As predicted by Fig. 1, SSOR and ILU preconditioners are the most effective. On the other hand, the use of a Jacobi preconditioner also seems ideal, because of its overall fast convergence and its cheap and possibly parallel application.

Fig. 3.

Relative residual, up to machine precision, vs iterations for various preconditioners and formal solvers with N_s = 80 and N_ν = N_μ = 20. Each row corresponds to a different formal solver, from above: implicit Euler, DELO-linear, DELOPAR, DELO-parabolic. We note that a different scale is used in the horizontal axes of the first column, where no preconditioner is employed.

Tables 2–6 present the number of iterations required by different methods to converge as a function of N_s for different preconditioners, using the DELO-linear formal solver. The same numerical experiments using the implicit Euler, DELOPAR, and DELO-parabolic formal solvers show similar trends, which are thus not shown. With reference to Table 2, we observe that the unpreconditioned Richardson method never converges in less than 10⁴ iterations. Figure 4 graphically represents the content of Tables 2–6, suggesting a superior scaling of Krylov methods with the problem size in terms of convergence. Since realistic radiative transfer problems are very large, these results are particularly relevant for practical purposes.

Fig. 4.

Graphical representation of Tables 2–6.

Table 2.

Iterations to convergence for the Richardson, GMRES, CGS, and BICGSTAB methods with no preconditioner for the DELO-linear formal solver, with N_μ = N_ν = 20, and varying N_s.

Table 3.

Same as Table 2, but for the Jacobi preconditioner.

Table 4.

Same as Table 2, but for the SOR preconditioner.

Table 5.

Same as Table 2, but for the SSOR preconditioner.

Table 6.

Same as Table 2, but for the ILU preconditioner.

4.4. Run times

Since the use of different formal solvers would not change the essence of the results, this section only considers the use of the DELO-linear formal solver.

We first consider the case of A being assembled explicitly. Table 7 reports the time spent to assemble A using Algorithm 1 repeatedly, varying the discretization parameters. We remark that Algorithm 1 can be optimized for this context, where a point-like input is used. In fact, the computational complexity of the formal solution of (9) and (10) with a zero initial condition and a point-like source term can be largely reduced. For example, given j ∈ {1, …, N_s} and a point-like σ in z_j, that is, σ_i = δ_ij, the solution of (9) and (10) for μ < 0 (resp. μ > 0) is identically zero for all z < z_j (resp. z > z_j) and it is given by an exponential decay for z > z_j (resp. z < z_j). Once A is explicitly assembled, problem (18) can be solved with a negligible run time using, for example, a LU direct solver (taking 32 milliseconds for N_s = 500, cf. Table 7). We remark that the solve time is almost negligible, with respect to the time spent on assembly, for all the iterative methods under consideration (besides unpreconditioned Richardson). Therefore, Tables 8, 9 and 10 report run times corresponding to a matrix-free approach, for which no time is spent to assemble A. With reference to Table 8, the time spent to compute P is of approximately 0.05, 4 and 9 s for the Jacobi, SOR, and SSOR preconditioners, respectively. The run time of the Richardson method with no preconditioning is not reported, because it exceeds one hour. Similarly, concerning Table 9, the time spent to compute P is of approximately 0.2, 55 and 111 seconds for the Jacobi, SOR, and SSOR preconditioners, respectively. In the matrix-free context, GMRES turns out to be the fastest Krylov method, since it requires only one application of A for each iteration (as discussed in Sect. 2). Consider that the application of A, obtained using Algorithm 1, is computationally expensive.

Table 7.

Run times (in seconds) to assemble the matrix A in Eq. (18) for the DELO-linear formal solver, varying N_μ = N_ν and N_s.

Table 8.

Run times (in seconds) and number of iterations in brackets for the matrix-free solution of Eq. (18) for the DELO-linear formal solver, with N_s = 140 and N_μ = N_ν = 20.

Table 9.

Same as Table 8, but with N_s = 500.

Table 10.

Run times (in seconds) and number of iterations in brackets for the matrix-free solution of Eq. (18) using GMRES method with the Jacobi preconditioner and varying N_μ = N_ν and N_s.

Figure 5 displays the total run times of the most relevant approaches, highlighting the time spent to assemble A, to precompute the action of P⁻¹, and to solve the linear system. Accordingly, the most convenient methods are, assembling A and then apply an LU direct solver or employing a Jacobi-GMRES matrix-free solver.

Fig. 5.

Total run times (in seconds), subdivided in various stages, for the solution of (18) with a DELO-linear formal, N_μ = N_ν = 20, N_s = 140 (left), and N_s = 500 (right), according to Tables 9 and 7. The first three bars correspond to various preconditioned Richardson iterations. The following three bars represent run times for the matrix-free GMRES method, again with various preconditioners. Finally, the last bar represents a direct approach, i.e., an LU solver. Only in this case A is explicitly assembled using Algorithm 1 and the “Solve” time can hardly be seen being negligible.

The best choice between the two approaches mostly depends on the problem size, memory limitations, and accuracy requirements. To this regard, Table 10 reports how the run time of Jacobi-GMRES scales with respect to the problem size (in terms of N_s, N_μ and N_ν). The comparison between Tables 7 and 10 suggests that the matrix-free approach scales better as N_s grows. On the other hand, it becomes advantageous to assemble A as N_μ and N_ν grow (cf. Fig. 6). To explain this difference, we would like to remark that, in general, the nested loop over N_μ and N_ν, containing a formal solver call over N_s, is the most expensive portion of Algorithm 1. On the other hand, when Algorithm 1 is used to assemble A, the computational complexity of the formal solution of (9) and (10) can be largely reduced. In this case, the computational bottleneck of Algorithm 1 is the calculation of ${\bar{J}}_{0}^{0} (z_{k})$ $\bar J^0_0(z_k)$ and ${\bar{J}}_{0}^{2} (z_{k})$ $\bar J^2_0(z_k)$ , for k = 1, …, N_s.

Fig. 6.

Total run times (in seconds) for the solution of (18) with a DELO-linear formal, N_μ = N_ν = 60, N_s = 140 (left), and N_s = 500 (right). For both plots the first bar corresponds to a stationary Jacobi method. In following three bars the Jacobi method is accelerated with various Krylov methods. Finally, the last bar represents a direct approach, i.e, an LU solver. Only in this case A is explicitly assembled using Algorithm 1.

We finally remark that both the selected methods can be applied in parallel, since each loop in Algorithm 1 can be replace by a “parallel for” and the application of the Jacobi preconditioner is embarrassingly parallel.

5. Conclusions

We described how to apply state-of-the-art Krylov methods to linear radiative transfer problems of polarized radiation. In this work, we considered the same benchmark problem as in Janett et al. (2021). However, Krylov methods can be applied to a wider range of settings, such as the unpolarized case, two-term atomic models including continuum contributions, 3D atmospheric models with arbitrary magnetic and bulk velocity fields, and theoretical frameworks accounting for PRD effects.

In particular, we presented the convergence behavior and the run time measures of the GMRES, BICGSTAB, and CGS Krylov methods. Krylov methods significantly accelerate standard stationary iterative methods in terms of convergence rate, time-to-solution, and are favorable in terms of scaling with respect to the problem size. Contrary to stationary iterative methods, Krylov-accelerated routines always converge (even with no preconditioning) for the considered numerical experiments.

For run time measures, we also considered the matrix-free application of the iterative methods under investigation. In this regard, the GMRES method preconditioned with Jacobi has proven to be the most advantageous choice in terms of convergence and run time (approximately 10 to 20 time faster with respect to standard Jacobi-Richardson, cf. Figs. 5 and 6). Since GMRES requires only one matrix-vector multiplication in each iteration, it becomes more advantageous when A is less sparse or in the matrix-free case, where the evaluation of Ax is computationally expensive. We remark that parallelism can be extensively exploited both in Algorithm 1 and in the application of the Jacobi preconditioner.

We remark that the fully algebraic formulation of the linear radiative transfer problem allows us to explicitly assemble the matrix A and to apply direct methods to solve the corresponding linear system. In this case, the solving time is negligible with respect to to the assembly time. This approach can be more advantageous than the matrix-free one if high accuracy is required, if Eq. (18) has to be solved for many right hand sides, or if the number of discrete frequency and directions is particularly large. On the other hand, the matrix-free GMRES method preconditioned with Jacobi seems to be preferable if the number of spatial nodes is particularly large (e.g., for 3D geometries).

We finally recall that many available numerical libraries implement matrix-free Krylov routines, enabling an almost painless transition from stationary iterative methods to Krylov methods. In terms of implementation, Krylov methods can simply wrap already existing radiative transfer routines based on stationary iterative methods. The application of Krylov methods to more complex linear radiative transfer problems will be the subject of study of forthcoming papers.

¹

The index m_j is equal to the size of the largest Jordan block associated with λ_j. It is possible to show that $m_{j} \leq m_{j}^{A} - m_{j}^{G} + 1$ $m_j \leq m_j^{\rm A} - m_j^{\rm G} + 1$ , where $m_{j}^{A}$ $m_j^{\rm A}$ and $m_{j}^{G}$ $m_j^{\rm G}$ are the algebraic and geometric multiplicities associated with λ_j. If A is diagonalizable, for all j we have $m_{j}^{A} = m_{j}^{G}$ $m_j^{\rm A}=m_j^{\rm G}$ and therefore m_j = 1 for all j.

²

The Cayley-Hamilton theorem implies g ≤ N.

³

From (2), by definition, we have that xⁿ = p_n − 1(A)b. Then

rⁿ = b − Axⁿ = b − Ap_n − 1(A)b = p_n(A)b.

⁴

Replacing with zero nondiagonal elements of $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ if

$\begin{matrix} | {\tilde{U}}_{ij} | < 10^{- 2} ‖ A_{* j} ‖_{2}, | {\tilde{L}}_{ij} | < 10^{- 2} {‖ A_{* j} ‖}_{2} / \tilde{U} (j, j), \end{matrix}$ $\begin{aligned}|\tilde{U}_{ij}| < 10^{-2} \Vert A_{* j}\Vert _2, \qquad |\tilde{L}_{ij}| < 10^{-2} \Vert A_{* j}\Vert _2/\tilde{U}(j,j),\end{aligned}$

for i, j = 1, …, N and i ≠ j.

Acknowledgments

Special thanks are extended to E. Alsina Ballester, N. Guerreiro, and T. Simpson for particularly enriching comments on this work. The authors gratefully acknowledge the Swiss National Science Foundation (SNSF) for financial support through grant CRSII5_180238. Rolf Krause acknowledges the funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955701 (project TIME-X). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Belgium, France, Germany, Switzerland.

References

Anusha, L. S., & Nagendra, K. N. 2011, ApJ, 738, 116 [NASA ADS] [CrossRef] [Google Scholar]
Anusha, L. S., & Nagendra, K. N. 2012, ApJ, 746, 84 [NASA ADS] [CrossRef] [Google Scholar]
Anusha, L. S., & Nagendra, K. N. 2013, ApJ, 767, 108 [NASA ADS] [CrossRef] [Google Scholar]
Anusha, L., Nagendra, K., Paletou, F., & Léger, L. 2009, ApJ, 704, 661 [NASA ADS] [CrossRef] [Google Scholar]
Anusha, L. S., Nagendra, K. N., & Paletou, F. 2011, ApJ, 726, 96 [NASA ADS] [CrossRef] [Google Scholar]
Badri, M., Jolivet, P., Rousseau, B., & Favennec, Y. 2019, Comput. Math. Appl., 77, 1453 [Google Scholar]
Castro, R. O., & Trelles, J. P. 2015, J. Quant. Spectr. Rad. Transf., 157, 81 [NASA ADS] [CrossRef] [Google Scholar]
Gutknecht, M. H. 1993, SIAM J. Sci. Comput., 14, 1020 [CrossRef] [Google Scholar]
Hubeny, I., & Burrows, A. 2007, ApJ, 659, 1458 [CrossRef] [Google Scholar]
Ipsen, I. C., & Meyer, C. D. 1998, Am. Math. Mon., 105, 889 [Google Scholar]
Janett, G., Carlin, E. S., Steiner, O., & Belluzzi, L. 2017a, ApJ, 840, 107 [NASA ADS] [CrossRef] [Google Scholar]
Janett, G., Steiner, O., & Belluzzi, L. 2017b, ApJ, 845, 104 [Google Scholar]
Janett, G., Steiner, O., & Belluzzi, L. 2018, ApJ, 865, 16 [NASA ADS] [CrossRef] [Google Scholar]
Janett, G., Benedusi, P., Belluzzi, L., & Krause, R. 2021, A&A, 655, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Klein, R. I., Castor, J. I., Greenbaum, A., Taylor, D., & Dykema, P. G. 1989, J. Quant. Spectr. Rad. Transf., 41, 199 [NASA ADS] [CrossRef] [Google Scholar]
Lambert, J., Josselin, E., Ryde, N., & Faure, A. 2015, A&A, 580, A50 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Landi Degl’Innocenti, E., & Landolfi, M. 2004, in Polarization in Spectral Lines (Dordrecht: Kluwer Academic Publishers), Astrophys. Space Sci. Lib., 307 [Google Scholar]
Meurant, G., & Duintjer Tebbens, J. 2020, Krylov Methods for Nonsymmetric Linear Systems: From Theory to Computations (Springer) [CrossRef] [Google Scholar]
Nagendra, K. N., Anusha, L. S., & Sampoorna, M. 2009, Mem. Soc. Astron. It., 80, 678 [NASA ADS] [Google Scholar]
Paletou, F., & Anterrieu, E. 2009, A&A, 507, 1815 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Saad, Y. 2003, Iterative Methods for Sparse Linear Systems (SIAM) [Google Scholar]
Saad, Y., & Schultz, M. H. 1986, SIAM J. Sci. Stat. Comput., 7, 856 [CrossRef] [Google Scholar]
Sampoorna, M., Nagendra, K. N., Sowmya, K., Stenflo, J. O., & Anusha, L. S. 2019, ApJ, 883, 188 [NASA ADS] [CrossRef] [Google Scholar]
Sonneveld, P. 1989, SIAM J. Sci. Stat. Comput., 10, 36 [CrossRef] [Google Scholar]
Van der Vorst, H. A. 1992, SIAM J. Sci. Stat. Comput., 13, 631 [CrossRef] [Google Scholar]

All Tables

Table 1.

Iterations to convergence for the GMRES, CGS, and BICGSTAB Krylov methods, with no preconditioning, for the DELO-linear formal solver, with N_s = 40, and varying N_μ = N_ν.

In the text

Table 2.

Iterations to convergence for the Richardson, GMRES, CGS, and BICGSTAB methods with no preconditioner for the DELO-linear formal solver, with N_μ = N_ν = 20, and varying N_s.

In the text

Table 3.

Same as Table 2, but for the Jacobi preconditioner.

In the text

Table 4.

Same as Table 2, but for the SOR preconditioner.

In the text

Table 5.

Same as Table 2, but for the SSOR preconditioner.

In the text

Table 6.

Same as Table 2, but for the ILU preconditioner.

In the text

Table 7.

Run times (in seconds) to assemble the matrix A in Eq. (18) for the DELO-linear formal solver, varying N_μ = N_ν and N_s.

In the text

Table 8.

Run times (in seconds) and number of iterations in brackets for the matrix-free solution of Eq. (18) for the DELO-linear formal solver, with N_s = 140 and N_μ = N_ν = 20.

In the text

Table 9.

Same as Table 8, but with N_s = 500.

In the text

Table 10.

Run times (in seconds) and number of iterations in brackets for the matrix-free solution of Eq. (18) using GMRES method with the Jacobi preconditioner and varying N_μ = N_ν and N_s.

In the text

All Figures

	Fig. 1. Spectrum of P⁻¹A on the complex plane for different preconditioners for the DELO-linear formal solver, with N_s = 80 and N_μ = N_ν = 20. From here on the abbreviation “no prec.” indicates that no preconditioner is used, that is, P = Id.
In the text

	Fig. 2. Sparsity pattern of $\tilde{L}$ $\tilde{L}$ and $\tilde{U}$ $\tilde{U}$ from the ILU factorization of A for the DELO-parabolic formal solver, with N_s = 80 and N_μ = N_ν = 20.
In the text

	Fig. 3. Relative residual, up to machine precision, vs iterations for various preconditioners and formal solvers with N_s = 80 and N_ν = N_μ = 20. Each row corresponds to a different formal solver, from above: implicit Euler, DELO-linear, DELOPAR, DELO-parabolic. We note that a different scale is used in the horizontal axes of the first column, where no preconditioner is employed.
In the text

	Fig. 4. Graphical representation of Tables 2–6.
In the text

Fig. 5.

Total run times (in seconds), subdivided in various stages, for the solution of (18) with a DELO-linear formal, N_μ = N_ν = 20, N_s = 140 (left), and N_s = 500 (right), according to Tables 9 and 7. The first three bars correspond to various preconditioned Richardson iterations. The following three bars represent run times for the matrix-free GMRES method, again with various preconditioners. Finally, the last bar represents a direct approach, i.e., an LU solver. Only in this case A is explicitly assembled using Algorithm 1 and the “Solve” time can hardly be seen being negligible.

In the text

Fig. 6.

Total run times (in seconds) for the solution of (18) with a DELO-linear formal, N_μ = N_ν = 60, N_s = 140 (left), and N_s = 500 (right). For both plots the first bar corresponds to a stationary Jacobi method. In following three bars the Jacobi method is accelerated with various Krylov methods. Finally, the last bar represents a direct approach, i.e, an LU solver. Only in this case A is explicitly assembled using Algorithm 1.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Anusha, L. S., & Nagendra, K. N. 2011, ApJ, 738, 116 [NASA ADS] [CrossRef] [Google Scholar]

[2] Anusha, L. S., & Nagendra, K. N. 2012, ApJ, 746, 84 [NASA ADS] [CrossRef] [Google Scholar]

[3] Anusha, L. S., & Nagendra, K. N. 2013, ApJ, 767, 108 [NASA ADS] [CrossRef] [Google Scholar]

[4] Anusha, L., Nagendra, K., Paletou, F., & Léger, L. 2009, ApJ, 704, 661 [NASA ADS] [CrossRef] [Google Scholar]

[5] Anusha, L. S., Nagendra, K. N., & Paletou, F. 2011, ApJ, 726, 96 [NASA ADS] [CrossRef] [Google Scholar]

[6] Badri, M., Jolivet, P., Rousseau, B., & Favennec, Y. 2019, Comput. Math. Appl., 77, 1453 [Google Scholar]

[7] Castro, R. O., & Trelles, J. P. 2015, J. Quant. Spectr. Rad. Transf., 157, 81 [NASA ADS] [CrossRef] [Google Scholar]

[8] Gutknecht, M. H. 1993, SIAM J. Sci. Comput., 14, 1020 [CrossRef] [Google Scholar]

[9] Hubeny, I., & Burrows, A. 2007, ApJ, 659, 1458 [CrossRef] [Google Scholar]

[10] Ipsen, I. C., & Meyer, C. D. 1998, Am. Math. Mon., 105, 889 [Google Scholar]

[11] Janett, G., Carlin, E. S., Steiner, O., & Belluzzi, L. 2017a, ApJ, 840, 107 [NASA ADS] [CrossRef] [Google Scholar]

[12] Janett, G., Steiner, O., & Belluzzi, L. 2017b, ApJ, 845, 104 [Google Scholar]

[13] Janett, G., Steiner, O., & Belluzzi, L. 2018, ApJ, 865, 16 [NASA ADS] [CrossRef] [Google Scholar]

[14] Janett, G., Benedusi, P., Belluzzi, L., & Krause, R. 2021, A&A, 655, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[15] Klein, R. I., Castor, J. I., Greenbaum, A., Taylor, D., & Dykema, P. G. 1989, J. Quant. Spectr. Rad. Transf., 41, 199 [NASA ADS] [CrossRef] [Google Scholar]

[16] Lambert, J., Josselin, E., Ryde, N., & Faure, A. 2015, A&A, 580, A50 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Landi Degl’Innocenti, E., & Landolfi, M. 2004, in Polarization in Spectral Lines (Dordrecht: Kluwer Academic Publishers), Astrophys. Space Sci. Lib., 307 [Google Scholar]

[18] Meurant, G., & Duintjer Tebbens, J. 2020, Krylov Methods for Nonsymmetric Linear Systems: From Theory to Computations (Springer) [CrossRef] [Google Scholar]

[19] Nagendra, K. N., Anusha, L. S., & Sampoorna, M. 2009, Mem. Soc. Astron. It., 80, 678 [NASA ADS] [Google Scholar]

[20] Paletou, F., & Anterrieu, E. 2009, A&A, 507, 1815 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[21] Saad, Y. 2003, Iterative Methods for Sparse Linear Systems (SIAM) [Google Scholar]

[22] Saad, Y., & Schultz, M. H. 1986, SIAM J. Sci. Stat. Comput., 7, 856 [CrossRef] [Google Scholar]

[23] Sampoorna, M., Nagendra, K. N., Sowmya, K., Stenflo, J. O., & Anusha, L. S. 2019, ApJ, 883, 188 [NASA ADS] [CrossRef] [Google Scholar]

[24] Sonneveld, P. 1989, SIAM J. Sci. Stat. Comput., 10, 36 [CrossRef] [Google Scholar]

[25] Van der Vorst, H. A. 1992, SIAM J. Sci. Stat. Comput., 13, 631 [CrossRef] [Google Scholar]