Issue 
A&A
Volume 646, February 2021



Article Number  A123  
Number of page(s)  17  
Section  Numerical methods and codes  
DOI  https://doi.org/10.1051/00046361/202038579  
Published online  22 February 2021 
A highperformance and portable asymptotic preserving radiation hydrodynamics code with the M_{1} model
^{1}
Université ParisSaclay, UVSQ, CNRS, CEA, Maison de la Simulation, 91191 GifsurYvette, France
email: helene.bloch@cea.fr
^{2}
Université de Paris, AIM, 91191 GifsurYvette, France
^{3}
AIM, CEA, CNRS, Université ParisSaclay, 91191 GifsurYvette, France
Received:
4
June
2020
Accepted:
22
November
2020
Aims. We present a new radiation hydrodynamics code called ARKRT which uses a twomoment model with the M_{1} closure relation for radiative transfer. This code was designed to be ready for highperformance computing on exascale architectures.
Methods. The twomoment model is solved using a finitevolume scheme. The scheme is designed to be asymptotic preserving in order to accurately capture both optically thick and thin regimes. We also propose a wellbalanced discretization of the radiative flux source term which allows users to capture constant flux steady states with discontinuities in opacity. We use the library Trilinos for linear algebra and the package Kokkos allows us to reach highperformance computing and portability across different architectures, such as multicore, manycore, and GPGPU.
Results. ARKRT is able to reproduce standard tests in both freestreaming and diffusive limits, including purely radiative tests and radiation hydrodynamics ones. Using a timeimplicit solver is profitable as soon as the timestep given by the hydrodynamics is between 50 and 100 times larger than the explicit timestep for radiative transfer, depending on the preconditioner and the architecture. Nevertheless, more work is needed to ensure stability in all circumstances. Using ARKRT, we study the propagation of an ionization front in convective dense cores. We show that the ionization front is strongly stable against perturbations even with destabilizing convective motions. As a result, the presence of instabilities should be interpreted with caution. Overall, ARKRT is wellsuited to studying many astrophysical problems involving convection and radiative transfer such as the dynamics of H II regions in massive prestellar dense cores and future applications could include planetary atmospheres.
Key words: radiative transfer / hydrodynamics / methods: numerical
© H. Bloch et al. 2021
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
In many astrophysical situations, radiation is an important process which interacts with the surrounding gas; for example in (exo)planet atmospheres (e.g., Thomas & Stamnes 2002), massive stars (e.g., Kuiper et al. 2010; MignonRisse et al. 2020), and H II regions (e.g., Spitzer 1978) up to the cosmic reionization (e.g., Stiavelli 2009). In all these situations, the radiation can be absorbed, thus heating the surrounding gas. Photons could also be emitted by the gas or by another source. Photons could be scattered, which will change their direction of propagation and perhaps their frequency (Chandrasekhar 1960).
Two main regimes can arise, depending on the mean free path of photons compared to the characteristic length of the system (Mihalas & Mihalas 1984). On one hand, in the diffusive limit, the medium is optically thick (mean free path of photons much smaller than the characteristic length), and the radiation and the matter strongly interact with each other. On the other hand, in the freestreaming regime, the radiation does not affect the gas, and the medium is optically thin (mean free path of photons greater than the characteristic length). Numerically, it is difficult to accurately capture both limits.
Because of the high number of degrees of freedom (i.e., the time, the position, the direction of propagation, and the frequency of photons), only a few problems can be solved analytically (Chandrasekhar 1960) and direct simulations are out of reach for modern computers. Different models have been developed to reduce the computational cost. Here, we focus on the moment models (Levermore 1984), in which the specific intensity is averaged over the direction of propagation of photons. These models present several advantages: the computational cost is lower than other methods such as the MonteCarlo method, and in general it is easy to couple these models with a gridbased hydrodynamics code.
One can only consider the moment of zero order (the radiative energy), leading to the fluxlimited diffusion (FLD) approximation (Levermore & Pomraning 1981). Because the moment models consider only the moment of zero order, their computational cost is quite low, but they are very diffusive in the freestreaming regime. To tackle this issue, one can use a twomoment model (radiative energy and radiative flux), with the M_{1} closure relation (Dubroca & Feugeas 1999). However, this method can suffer from artifacts when multiple beams cross in the freestreaming regime (González 2006). One can solve this issue by using a threemoment model (radiative energy, radiative flux, and radiative pressure) with the M_{2} closure relation (Pichard et al. 2016). However, because of the increase of unknowns, the computational cost also increases. In this work, we have chosen to use the twomoment model with the M_{1} closure relation because the computational cost remains affordable, and in our applications we do not encounter the problem of beams crossing in the freestreaming regime.
Even though the M_{1} model is accurate in both freestreaming and diffusive regimes at the continuous level, numerical schemes also need to properly capture both limits. Several approaches have been developed. For example, Berthon & Turpault (2011) presented a scheme based on an HLL solver with source terms modified with a free parameter. Following this idea, we propose a new socalled asymptotic preserving scheme also based on an HLL solver. Nevertheless, we have chosen another parameter to recover the asymptotic behavior in the diffusive limit. Furthermore, our integration of source terms is different. In many physical applications (e.g., clouds), optically thick regions are found next to optically thin zones. We propose a wellbalanced modification of the source term, which allows us to accurately reach steady states in the presence of sharp transitions.
As a first application, we are interested in the development of H II regions in massive prestellar dense cores (Churchwell 2002). We focus on solving the radiation hydrodynamics equations. An explicit solver for the radiative transfer would be restricted by a CourantFriedrichsLewy (CFL) condition, limited by the speed of light. This will result in a very low timestep compared to the hydrodynamics timestep, which is limited by the speed of sound through the fluid. Several methods have been developed to get around this problem; we have chosen a timeimplicit solver (e.g., González et al. 2007). The temporality of the radiative transfer is preserved, which is not the case with the reducedspeedoflight approximation (RSLA, e.g., Gnedin & Abel 2001). However, a timeimplicit solver method is costly because it requires solving large sparse linear systems.
Fortunately, progress has been made in linear algebra for highperformance computing (HPC). Because the linear system we have to solve is large and sparse, direct methods are out of reach. Iterative solvers with preconditioners have been developed to tackle this issue (e.g., Saad 2003). In this work, we use the library Trilinos (Heroux et al. 2005) because it allows us to target different architectures, such as multicore, manycore, and GPGPU. It also provides, among others, algebraic multigrid (AMG) preconditioners.
The paper is organized as follows. In the following section, we present the moment model and the M_{1} closure relation in more detail. In Sect. 3, we outline our new numerical scheme, which is well balanced and asymptotic preserving in the diffusive limit. In Sect. 4, we provide details of some implementation features of our code ARKRT, especially the Kokkos and Trilinos libraries used for shared memory parallelism and linear algebra for highperformance computing. We also show some performance results. In Sect. 5, we present some numerical test cases to show the importance of the asymptotic preserving and wellbalanced properties. Finally, in Sect. 5.6 we present a physical application of our method where we study the stability of the ionization front in the dense cores of a H II region.
2. Model
2.1. Physical model
In this work, we only consider gray radiative transfer, that is, we have computed the average over the frequency. We also assume local thermodynamic equilibrium (LTE) and we do not consider scattering. The mathematical description of radiative transfer was formalized by Chandrasekhar (1960):
where I, the quantity of interest, is the specific intensity, c is the speed of light, σ is the opacity of the medium, and B is the blackbodyspecific intensity. Also, x, t, and Ω are the spatial, temporal, and angular variables. This model can be generalized to multigroup radiative transfer (e.g., Turpault 2005).
Because the specific intensity I depends on six variables in the threedimensional case, the numerical treatment of Eq. (1) becomes rapidly costly. To reduce the computational cost, we use a moment method, which we present in detail in the following section.
2.2. M_{1} model
Let us consider the three first moments of the specific intensity: the radiative energy E_{r}, the radiative flux F_{r}, and the radiative pressure P_{r} defined as:
The mean over solid angles of Eq. (1) and its product by Ω give the following system:
where T_{g} is the gas temperature and a_{r} is the radiation constant.
The fluid and the radiation exchange energy and momentum through emission and absorption. To ensure the conservation of the total energy when the hydrodynamics is frozen, the energy exchange term is given by
Here, ρc_{v}T_{g} is the gas internal energy, with ρ the density of the fluid and c_{v} the heat capacity, which is defined by for a perfect gas, where k_{b} is the Boltzmann constant, μ is the mean molecular weight, m_{H} is the mass of hydrogen, and γ is the adiabatic index of the gas.
We still have to specify a closure relation, that is, a way to express P_{r} as a function of E_{r} and F_{r}. The one chosen here is the M_{1} model (Levermore 1984). From Dubroca & Feugeas (1999), we write P_{r} = DE_{r} where D is the Eddington tensor, defined by with χ the Eddington factor, I the identity matrix, and is the reduced flux. Here, χ can be specified either by applying a Lorentz transform to an isotropic distribution of photons (Levermore 1984), or by minimizing the radiative entropy (Dubroca & Feugeas 1999). In both cases, we have . Let us note that f ≤ 1, which ensures that the radiative energy cannot be transported faster than the speed of light.
The M_{1} model preserves both freestreaming and diffusive limits. On one hand, if f = 1, then P_{r} = DE_{r}n ⊗ n, and only the transport regime remains. On the other hand, if f = 0, the model in the diffusive regime simplifies to the P_{1} model, with . The radiative pressure tensor becomes isotropic. We look into the diffusion limit more precisely in Sect. 2.4.
2.3. Radiation hydrodynamics
We now consider the radiation hydrodynamics equations. The fluid evolution is described by the Euler equations expressing conservation of mass, balance of momentum, and balance of energy. Because photons are relativistic particles, we have to evaluate the quantities in the laboratory frame or the comoving frame, which is moving with the fluid. On one hand, using the comoving frame introduces nonconservative terms in the lefthand side of the equations. On the other hand, the hyperbolic part of the system remains simple in the laboratory frame but some source terms have to be incorporated to describe the interactions between the matter and the radiation. We have chosen the second approach. These source terms characterize the momentum and energy exchanges between the fluid and the radiation:
where u is the material velocity, p is the pressure of the fluid, g is the external gravitational field, and is the density of total matter energy with e the specific internal energy. The terms S_{Er}(u) and S_{Fr}(u) depend on the velocity u. Using Eqs. (29) and (31) from Lowrie et al. (1999), we have
at first order in . To close the system, we also add the equation of state of an ideal gas: p = ρe(γ − 1).
2.4. Diffusive limit in a static fluid
Let us now focus on the diffusive regime with the hydrodynamics frozen, that is, the limit of large opacity, long timescale, and u = 0. We consider the P_{1} closure relation
Following Berthon & Turpault (2011), we introduce a rescaling parameter ε to write the time and opacity as t̃ = εt and , respectively. The radiative energy, radiative flux, and gas temperature are expanded with ε; for example E_{r} = E_{r, 0} + εE_{r, 1} + 𝒪(ε^{2}). System (7) becomes
By expanding Eqs. 8a and 8b at zero order, we have
Expanding Eq. (8b) at first order leads to
Finally, expanding the sum of Eqs. (8a) and (8c) at second order gives
In the diffusive limit, the total energy E_{r} + ρc_{v}T_{g} at zero order obeys the diffusion equation given by Eq. (11). Similar development of the radiation hydrodynamics case is presented in Appendix A. Section 3 presents our design of an asymptotic preserving scheme, that is, a numerical scheme that will degenerate to the discretization of Eq. (11) in the diffusive regime.
3. Numerical scheme and algorithm
3.1. Radiation transport in a static fluid
Let us first introduce some notations: we use Δx to denote the step along the xdirection. here, Δt is the time interval between the current time t^{n} and t^{n + 1}. We write x_{i} the center of the cell i and the interface between the cell i and the cell i + 1. We use the notation to represent the averaged quantity associated with the field u at time t^{n} in the cell i (finite volume). Finally, we use to represent the quantity associated with the field u at time t^{n} and at the interface between cells i and i + 1.
The development of the numerical scheme is presented only in the onedimensional case, but its extension to higher dimensions is straightforward. To ease notations, we drop the indices r for all radiative variables.
We present the development of the numerical scheme using a timeimplicit integration, but a similar development can be done with a semiimplicit solver: source terms remain implicit, but the hyperbolic part is timeexplicit.
3.1.1. Hyperbolic system
Following González et al. (2007), we discretize the hyperbolic part of Eq. (3) using a firstorder Godunovtype solver (Toro 2009). From Berthon & Turpault (2011), we also introduce an extra parameter α which is specified in Sect. 3.1.3:
where and are the numerical fluxes given by
with and , where λ_{max} and λ_{min} are the eigenvalues of Eq. (3). From Berthon et al. (2007), we have
with . See Fig. 1 of González et al. (2007) for more details about the structure of the eigenvalues. is a wellchosen discretization of the term σF_{r} in the cell i and at time t^{n + 1}, and is specified in the following section.
3.1.2. Wellbalanced modification of the source term
From Berthon et al. (2015), a wellbalanced scheme catches the correct steady regime. The steady state, if it exists, is given by
Equation (15c) is discretized by . An obvious choice for is
However, using this formulation, Eq. (15c) is discretized as
with . The radiative flux remains cellcentered and is equal to the divergence of radiative pressure, which itself is defined at the interfaces of the cells. This can create some spurious flux at the interface when looking for a steady state with a constant flux in the box (see Sect. 5.2). Inspired by wellbalanced schemes for hydrodynamics (e.g., Padioleau et al. 2019) which preserve the hydrostatic balance between the pressure forces and the gravitational force (and the similarity between this balance and that between radiative pressure and the radiative flux source term in Eq. (15c)), we choose to use an average of a face discretization of the radiative flux source term:
with
One way to interpret this equation is to remember that
Equation (16) is obtained with the rectangle rule for numerical integration of Eq. (20):
whereas Eq. (18) is given by the trapezoidal rule:
To have
in the whole domain, we also impose it as boundary condition:
where is the radiative pressure given by the boundary condition. In that way, the radiative flux is centered at the interfaces of the cells, as is the divergence of radiative pressure. A von Neumann stability analysis of the modified scheme is presented in Appendix B.
3.1.3. Asymptotic preserving scheme
We still have to specify our choice for . Here, corresponds to a classic HLL scheme. However, the solution given by an asymptotic preserving scheme has to approximate the solution of Eq. (11) as soon as the asymptotic regime is reached, that is, it must incorporate large opacities and long timescales. Unfortunately, a standard HLL scheme does not have this property (see Sect. 5.1). To tackle this issue and obtain an asymptotic preserving scheme, we choose
with . The derivation of Eq. (25) is presented in Appendix C. Other choices can be made (see e.g., Berthon & Turpault 2011). If goes to zero, goes to 1, and we recover a standard HLL scheme. Considering the diffusive limit, we prove that the scheme is asymptotic preserving in Appendix C. We show that
which is a standard discretization of Eqs. (9)–(11).
Unfortunately, we cannot prove that this scheme will preserve the admissible states f < 1 and, indeed, numerical experiments with this scheme have shown that we can get f > 1 when we are close to the freestreaming regime. In these situations, we can either enforce f < 1 (Sect. 5.6) or come back to a centered discretization of the source term (Sect. 5.4). Furthermore, the development of the asymptotic preserving scheme with the wellbalanced modification of the source term is only done in the case of a static fluid. Using the asymptotic correction, Equation (25) is only the first step to obtaining an asymptotic preserving scheme in the case of a moving fluid (see Sect. 6.2).
3.2. Coupling to hydrodynamics
Following González et al. (2007), the resolution of the whole of the system shown in Eq. (5) describing radiation hydrodynamics is split into three steps:
1. Update the hydrodynamics quantities (Eqs. (5a)–(5c) without the terms of energy and momentum exchange) using the wellbalanced and allregime solver developed in Padioleau et al. (2019);
2. Update the radiative quantities and gas temperature (Eqs. 3 and (4)) using the solver developed in Sect. 3.1. During this step, the hydrodynamics quantities are frozen;
3. Add source terms S_{Er}(u) and S_{Fr}(u). For simplicity, all source terms which depend on the velocity are treated explicitly. The term in Eqs. (5b) and (5e) is discretized using the wellbalanced scheme proposed in Sect. 3.1.2. All the other terms remain cellcentered.
This splitting allows us to reduce the number of equations solved implicitly, making the method more efficient.
3.3. Algorithm for nonlinear implicit solver
The timestep given by the CFL condition is much smaller for radiation than for hydrodynamics. Indeed, for the radiation, it is limited by the speed of light, whereas it is limited by the speed of sound in the fluid for the hydrodynamics. Because we turn to radiation hydrodynamics, we consider a long timescale for the radiative transfer. Therefore, we use a timeimplicit integration for the radiative transfer.
3.3.1. Newton–Raphson method and linear solver
Because of the Eddington tensor, the eigenvalues in the numerical fluxes, and the factor, the system is nonlinear and is solved using a NewtonRaphson method. At each iteration, we have to solve a linear system. Because the system is large ((2 + d)N unknowns, where d is the number of dimensions and N the total number of cells) and sparse, it cannot be solved using a direct method. Because of the numerical fluxes, the matrix is not symmetric, and we use the biconjugate gradient stabilized method (Van der Vorst 1992).
3.3.2. Preconditioner
Using large timesteps for the radiative transfer, the matrix is illconditioned and iterative methods might not converge. One way to deal with this issue is to use a preconditioner. Instead of solving the original linear system Ax = b, we solve the right preconditioned system AK^{−1}Kx = b via solving AK^{−1}y = b to compute y and then Kx = y. As long as the matrix K is invertible, this gives the same solution as the original system. If K is well chosen, the condition number of the matrix AK^{−1} is lower than that of A. For example, the Jacobi or diagonal preconditioner is given by
See Saad (2003) for more details.
Numerous preconditioners have been developed over the years; some of them perform well and some do not, depending on the problem considered. Among different preconditioners, we explore the algebraic multigrid (AMG) technique. We present this method in the following section, and we compare it with other preconditioners in Sect. 4.4: a standard ILU(k) factorization (Saad 2003), a slightly modified variant of the standard ILU factorization (Saad 1994), an additive Schwarz domain decomposition (Cai & Sarkis 1999), and a classical relaxation method, Jacobi with damping (Saad 2003).
3.3.3. Algebraic multigrid preconditioner
The AMG methods were first developed as linear solvers for symmetric positive definite matrices arising from the discretization of scalar elliptic PDEs. For such a matrix, classical iterative methods are able to efficiently compute the high frequencies of the solution, but lack efficiency when computing its low frequencies. However, the computation is easier on a coarser grid with fewer unknowns. The idea of the multigrid solver is to build a coarser grid, solve the problem on this coarse grid, and then finally interpolate the solution on the fine grid. We can subsequently define a restriction operator R which transfers vectors from the fine grid to the coarse grid and an interpolation operator P used to return to the finer grid. P and R are nonsquared matrices. From Saad (2003), the main steps of the method are as follows:
1. Presmoothing: a few iterations of a simple method such as Jacobi or an incomplete factorization are performed to get the value .
2. The residual is projected over the coarse grid with the restriction operator R, to get the residual equation .
3. This equation is solved, possibly with a direct solver.
4. The solution y is interpolated over the fine grid with the interpolation operator P and then .
5. Postsmoothing: a few iterations of a simple method are again performed to get the solution .
The solution is used as a preconditioner result. If the coarse grid has too many unknowns to be solved directly, this process is applied recursively: the coarse grid becomes the fine grid and a coarser grid is built. Therefore, we have a hierarchy of grids. With a geometric multigrid solver, the restriction and interpolation operators are determined by the mesh, whereas with an algebraic multigrid solver they are automatically generated using data from the matrix.
Before describing our numerical tests, we present details of our implementation and parallelization in the following section.
4. Implementation and parallelization
We used the code ARKRT^{1} for implementation, which is a fork of the code ARK developed in Padioleau et al. (2019). The hydrodynamics and gravity part of our solver is similar to ARK and is solved with a wellbalanced and allregime solver. Because the solver for the radiative transfer equations is timeimplicit, we have to solve large, sparse linear systems. This is done using the library Trilinos, described in the following section.
4.1. Linear algebra
We use the second generation of packages of the Trilinos framework (Heroux et al. 2005), namely Kokkos (Edwards et al. 2014) for shared memory computation, Tpetra (Baker & Heroux 2012) for distributed vectors and matrices, Belos (Bavier et al. 2012) for linear solvers, Ifpack2 (Prokopenko et al. 2016) for classical preconditioners, and MueLu (BergerVergiat et al. 2019a,b) for the AMG preconditioner. Let us first focus on the package Kokkos.
4.2. Shared memory computation
As new architectures have increasing numbers of cores, the distributed memory model is insufficient to take advantage of all the computational power available. Therefore, we need to use a shared memory model inside the nodes. Furthermore, computational nodes are increasingly heterogeneous; for example, multicore, manycore, or accelerators such as GPGPUs. Each architecture requires its own interface, such as OpenMP or C++11 threads for multicore and manycore processors and CUDA or OpenACC for NVIDIA GPUs. This raises the problem of portability and performance portability: many HPC codes are optimized for some specific architectures, and so running the code on a different architecture will result in poor performance.
The library Kokkos (Edwards et al. 2014) tackles this issue. The user has a unique code which can be compiled with different shared memory models such as OpenMP or CUDA. Kokkos is based on different abstractions, like execution spaces (where a function is executed), memory spaces (where the data are stored), and execution policies (how the function is executed). Also, Kokkos provides execution patterns such as parallel loops and multidimensional arrays, and the storage is optimized according to the architecture.
4.3. Implementation
For the hydrodynamics step, Kokkos is used as an independent library for shared memory computation. Communications between the nodes are handled by the Message Passing Interface (MPI) programming model through a regular domain decomposition. Following Kestener (2017), inside each node, the domains are endowed with ghost cells used to implement physical boundary conditions, but also to contain values from neighbor domains. The code is organized with computational kernels, and each kernel is a C++ functor; see Padioleau et al. (2019) for more details.
The second step is the timeimplicit solver for radiative transfer. The values of the matrix and the righthand side of the linear system have to be updated at each iteration of the Newton–Raphson method, but the graph of the matrix does not change. Using Trilinos, the rows of the matrix are distributed across the MPI processes, each of them is associated with a unique global index. Each global index has a matching local index on the owning process.
Trilinos provides several methods to update the coefficients of the matrix. One of them uses only global indices, which is recommended for Trilinos. However, this function can only be called by the host. This has two main consequences: First, because all the rows of the matrix have to be updated, we use a sequential loop over the rows of the matrix. Second, in the case where we are using GPGPUs, we have to update the matrix with data coming from the device. Therefore, we have to transfer some data from the device to the host, update the matrix with these data, and then transfer the matrix from the host to the device; this last step is done implicitly by Trilinos. This process increases the computational cost.
Another way is to use local indices. The package KokkosKernels, which is part of the Kokkos ecosystem, provides several ways to update the coefficients of the matrix through a kernel. This allows the use of a parallel loop (via Kokkos::parallel_for) and we avoid data transfers between the host and the device. For performance reasons, we use local indices.
4.4. Performances
Thanks to Trilinos, we can use many preconditioners. Unfortunately, they do not behave in the same way when the size of the system increases. All tests are performed on Poincare, our local cluster at Maison de la Simulation. Each node consists of two Sandy Bridge E52670 at 2.60 GHz (2 × 8 cores, 32 Go RAM) processors. We use a hybrid configuration MPI/OpenMP, with one MPI process per socket to avoid NUMA effects.
We first performed a weak scaling test, where we consider a twodimensional case with periodic boundary conditions and a hot source located at the center of each domain. Each MPI process provides a piece of the whole domain of 1500^{2} cells, and therefore the size of the system increases with the number of MPI tasks. The resolution is close to our target for threedimensional simulations. As shown in the left panel of Fig. 1, for all preconditioners, the number of iterations remains constant, at around 10 iterations for the AMG preconditioner, around 20 iterations for both incomplete factorizations and the additive Schwarz domain decomposition, and around 250 iterations for the relaxation. As shown in the right panel, the speedup reaches a plateau of 80% to 90% of maximum performance depending on the preconditioner.
Fig. 1.
Weak scaling test. Each MPI process treats 1500^{2} cells. We tested different preconditioners: Jacobi with damping (Relaxation), AMG, standard ILU(k) factorization (RILUK), a variant of the standard ILU factorization (ILUT), and additive Schwarz domain decomposition (Schwarz). Left panel: number of iterations to solve the linear system as a function of the number of cells. Right panel: speedup as a function of the number of MPI processes. 

Open with DEXTER 
Figure 2 shows the number of iterations and the speedup as a function of the number of MPI processes for a strong scaling test. We now consider a Marshak wave propagation in the diffusive limit. The global resolution remains constant as the number of processes increases, and is set to 2048^{2} cells. Because the global resolution is constant, one can expect the number of iterations to also remain constant when the number of MPI processes increases. However, using the AMG (orange curve) and the incomplete factorizations (RILUK, ILUT; green curve in left panel), when four MPI processes or more are used, the number of iterations is twice the number of iterations reached with one or two MPI processes. Therefore, the computational time is the same when using two or four MPI processes. Furthermore, all tested preconditioners and the linear solver require several communications per iteration, which likely becomes the main cost when the local resolution decreases.
Fig. 2.
Strong scaling test. The global resolution is 2048^{2} cells. The same preconditioners as those listed in Fig. 1 were tested. Left panel: number of iterations to solve the linear system as a function of the number of MPI processes. Right panel: speedup as a function of the number of MPI processes. 

Open with DEXTER 
Thanks to Kokkos, we can use exactly the same code on different architectures, namely Sandy Bridge processors and NVIDIA GPGPUs (e.g., K80). Unfortunately, the memory required by the AMG preconditioner with a 1500^{2} simulation is greater than the memory available on a K80 GPU. For the next tests, we use a lower resolution of 1000^{2} cells. Table 1 summarizes the computational time for a fixed problem with different schemes and different architectures. As the explicit solver is restricted by a CFL condition, it requires several thousand timesteps whereas the implicit solver only needs a few timesteps to reach the same final time. Updating the matrix in parallel allows for a 25% reduction in computational time required. On a CPU, the implicit solver is around 160 times faster than the explicit solver, whereas on a GPU it is only 11 times faster.
Computational time with both explicit and implicit solvers on CPU and GPU.
Figure 3 compares the computational time with different preconditioners, on both a CPU and a GPU. Except for the implicit solver using the AMG preconditioner, all solvers are faster on a GPU than on a CPU; up to three times faster for the relaxation preconditioner. Part of the AMG algorithm probably remains sequential. On a CPU, the AMG preconditioner is faster than the relaxation preconditioner. The other preconditioners are slower, by up to a factor eight between the relaxation and the additive Schwarz domain decomposition on a GPU.
Fig. 3.
Computational time for the implicit solver with different preconditioners (as listed for Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 1000^{2} cells. 

Open with DEXTER 
Figure 4 compares the memory consumption with different preconditioners. Using GPGPU, most of the data are located on the device, but Trilinos still allocates some memory, namely between 0.125 GB and 0.208 GB, on the host, unlike the explicit solver. Using the relaxation as a preconditioner, the amount of memory allocated is lower than with the other preconditioners (7.3 GB for the relaxation against 11.5 GB for the AMG).
Fig. 4.
Memory consumption for the implicit solver with different preconditioners (as listed in Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 1000^{2} cells. 

Open with DEXTER 
Choosing a wellsuited preconditioner can be challenging and problem dependent. Once the preconditioner is chosen, it depends on many parameters. For example, Trilinos allows the user to choose the damping factor ω for the relaxation method or the smoother and the coarse solver for the AMG. Performances and stability can largely depend on these choices. For example, the relaxation method seems to be well suited for this problem with low computational time, and memory consumption, but in many other test cases, the linear solver will not converge. The AMG preconditioner performs well on a CPU but is less efficient on a GPU. Both incomplete factorizations and the additive Schwarz domain decomposition are slightly less efficient than the AMG preconditioner. Overall, we find the AMG preconditioner and the relaxation method to produce a good compromise between stability and performance.
The performances we obtain thanks to Kokkos and Trilinos are encouraging for the study of astrophysical problems. The timestep given by the hydrodynamics can be written as . Using a relaxation as a preconditioner, we need CFL ≥ 50 on a CPU and CFL ≥ 100 on a GPU to save computational time, whereas when using an incomplete factorization, we need CFL ≥ 250 on a CPU and CFL ≥ 1000 on a GPU. We need a larger CFL number on a GPU because the explicit solver is more efficient on a GPU than on a CPU.
In the following section, we use several numerical tests to show that the scheme developed in Sect. 3 is well suited for the study of radiation hydrodynamics problems.
5. Numerical results
We performed a series of verification tests to validate different properties of the scheme: the asymptotic correction with a Marshak wave, the wellbalanced property to reach a steady state with a jump of opacity, the properties of the M_{1} model with a beam test and a shadow test, and the coupling to the hydrodynamics with radiative shocks. We also present a physical application, investigating the stability of the ionization front in the dense core of a H II region. To ease notations, we define the radiative temperature as .
5.1. Marshak wave
From Mihalas & Mihalas (1984), a Marshak wave is the propagation of hot radiation into a cold medium. We consider a onedimensional case in the diffusive limit in order to test the asymptotic preserving scheme developed in Sect. 3.1.3.
The length of the computational domain is 1 cm; it is discretized with 400 points. Initially, the medium is at equilibrium with the radiation: T_{0} = T_{r} = 300 K, the initial radiative flux is F_{r} = 0. We consider a perfect gas with . The hydrodynamics is frozen. The density is constant, such that ρc_{v} = 1 J K^{−1} cm^{−3}, and the opacity is also constant, with σ = 10 000 cm^{−1}, and therefore σΔx = 25. At time t = 0, a source is lit at the left boundary with T_{r} = 1000 K.
The results are shown in Fig. 5 at time t_{f} = 2 × 10^{−4} s. We compare different solutions: a reference solution, the solution given by our asymptotic preserving scheme, and the solution given by a standard scheme. The reference solution is given by a standard discretization of Eq. (11). From Audit et al. (2002), when a scheme that is not asymptotic preserving is used with σΔx ≫ 1, the grid does not sample the mean free path of the photons and the solution is dominated by numerical diffusion. The relative L^{2} error between the reference solution and the solution with given by Eq. (25) is 1.1%, whereas with the standard HLL scheme the relative L^{2} error is 84%. Using the asymptotic correction, we recover the correct behavior in the asymptotic limit.
Fig. 5.
Marshak wave simulation. This figure shows a snapshot of the gas temperature at time t_{f} = 2 × 10^{−4} s, with and without the asymptotic correction and the reference solution. Spatial resolution is n = 400 and the opacity is σ = 10 000 cm^{−1}. 

Open with DEXTER 
5.2. Steady state with a jump of opacity
In the previous case, the opacity is constant, we now consider a test with a jump of opacity, still in the onedimensional case. We use this test to highlight the need for a wellbalanced modification of the source term.
The length of the computational domain is 1 cm; it is discretized with 100 points. Initially, the medium is at equilibrium with the radiation: T_{0} = T_{r} = 300 K, the initial radiative flux is F_{r} = 0. The opacity σ is now a function of space:
At time t = 0, a source is lit at the left boundary with T_{r} = 1000 K.
Figure 6 shows the radiative flux at time t_{f} = 10^{−3} s. From Eq. (15b), when the steady state is reached, we expect the radiative flux to be constant in the box. Using a standard discretization of the source term, such as Eq. (16), a spurious peak located at the discontinuity of opacity is observed (orange curve). The value taken by the radiative flux is more than 20 times the expected value. This seems to be caused by a numerical instability. This can result in f > 1 during the iterations of our Newton–Raphson implicit scheme, which is not physically admissible. However, using the wellbalanced modification of the source term proposed by Eq. (18) (blue curve), the spurious peak no longer appears and the constant steady state is reached.
Fig. 6.
Simulation of a steady state with a jump of opacity. The opacity is piecewise constant, and a jump is located at x = 0.5 cm (gray line). This figure shows a snapshot of the radiative flux at time t_{f} = 10^{−3} s. 

Open with DEXTER 
Using the standard discretization of the source term in Eq. (16), one can show that the numerical scheme is unconditionally stable in that the error between the numerical solution and the exact solution goes to zero as Δx and Δt go to zero. The spurious peak seems to be due to a lack of precision in the integration of the source term. Using Eq. (18), the source term is defined at the interfaces of the cells and balances the divergence of radiative pressure, which is also defined at the interfaces.
5.3. Beam
We now perform the same twodimensional test as in Richling et al. (2001), González et al. (2007). The domain [−1, 1] × [−1, 1] is discretized with 128 × 128 cells. The initial temperature is T_{0} = T_{r} = 300 K, the initial radiative flux is F_{r} = 0, and the opacity is σ = 0. At time t = 0, a beam with T = T_{r} = 1000 K is introduced with an angle of 45°. The beam is located at x = −1 and y ∈ [−0.875, −0.75]. Because we are in the freestreaming regime, we cannot use large timesteps. For performance reasons, we use the semiimplicit scheme.
Because there is no opacity, the beam propagates in the vacuum, and we expect it to cross the box without dispersion. The direction of the beam is not along the mesh axis; we use this test to quantify the numerical diffusion. Figure 7 shows the radiative energy at steady state, with the eigenvalues fixed to ±c and with computed eigenvalues. Because there is no opacity, neither the asymptotic correction nor the wellbalanced source term affect the result, and we recover the same result as in González et al. (2007). Figure 8 shows the horizontal cut at the middle height. The beam introduced at the boundary is sampled over 8 cells. Ideally, without any numerical diffusion, we would expect the width of the beam to remain exactly 8 cells. With the computed eigenvalues, we can keep the numerical diffusion under control. Using fixed eigenvalues, the width of the beam at middle height is approximately 30 cells, whereas it is only 24 cells with calculated eigenvalues.
Fig. 7.
Beam simulation, showing radiative energy. The eigenvalues are fixed to ±c (upper panel) or calculated with Eq. (14) (lower panel). 

Open with DEXTER 
Fig. 8.
Beam simulation, showing a horizontal cut in Fig. 7 at the middle height. 

Open with DEXTER 
The main difference in this test between ARKRT and HERACLES (González et al. 2007) is the computation of the eigenvalues. We use the exact eigenvalues given by Eq. (14) from Berthon et al. (2007), whereas in González et al. (2007), to save computational time, the eigenvalues are computed once at the beginning of the simulation and then interpolated. However, this approximation does not impact the result.
5.4. Shadow
Let us now consider a twodimensional test with source terms. Following Hayes & Norman (2003), González et al. (2007), we consider a shadow test. The computational domain is a cylinder of length L = 1 cm and radius R = 0.12 cm, and is discretized with 280 × 80 cells. A spheroid clump is located at the center of the box, on the symmetric axis: (z_{c}, r_{c})=(0.5, 0). The extension of the clump is (z_{0}, r_{0}) = (0.1, 0.06). Initially, the medium is at equilibrium with the radiation, with T_{0} = T_{r} = 290 K. We consider a homogeneous gas, with ρ_{0} = 1 g cm^{−3}, except for the clump with density ρ_{1} = 100ρ_{0}. The boundary of the clump is smoothed: with . The opacity in the medium is with σ_{0} = 0.1 cm^{−1}. At time t = 0, a source is lighted at the left boundary with T_{r} = 1740 K and the reduced flux is set to f = 1. Because f is close to 1 in the freestreaming regime, we encounter f > 1 in the simulation. To tackle this issue, we use the nonwellbalanced scheme: the radiative flux source term is discretized using Eq. (16). Because we are in the freestreaming regime, we cannot use large timesteps. For performance reasons, we use the semiimplicit scheme. To recover the same result as in González et al. (2007), we use and , where λ_{max} and λ_{min} are given by Eq. (14).
Figure 9 shows the radiative temperature at the final time t_{f} = 10^{−10} s with different closure relations: the P_{1} model, the M_{1} model with fixed eigenvalues, and the M_{1} model with computed eigenvalues. Because of the high opacity in the clump, the light does not cross it and we expect the shadow behind it to remain stable.
Fig. 9.
Shadow simulation, showing snapshots of the radiative temperature at time t_{f} = 10^{−10} s with different closure relations: P_{1} model (upper panel), M_{1} model with fixed eigenvalues (middle panel), and M_{1} model with computed eigenvalues (lower panel). 

Open with DEXTER 
As in Hayes & Norman (2003), González et al. (2007), we plot the radial profile of the radiative temperature at the right boundary (Fig. 10). Using the P_{1} model, the radiative pressure is isotropic, and therefore the photons go around the obstacle immediately, heating the whole domain. Using the M_{1} closure relation, the shadow is better preserved, and the temperature behind the obstacle remains at its initial value, 290 K. As the opacity remains rather low outside of the clump and the light has not crossed the obstacle, the asymptotic correction has no impact on the result. Because the boundary of the clump is smoothed, the transition between the optically thick and thin medium is less sharp than in Sect. 5.2 and the wellbalanced modification of the source term is not necessary.
Fig. 10.
Shadow simulation, showing the radial profiles of the radiative temperature at time t_{f} = 10^{−10} s with different closure relations: P_{1} model, M_{1} model with fixed eigenvalues, and M_{1} model with computed eigenvalues. 

Open with DEXTER 
5.5. Radiative shocks
We finally consider radiative shocks: the gas and the radiation exchange energy and momentum. Following Ensman (1994), Hayes & Norman (2003), González et al. (2007), we consider a onedimensional homogeneous medium, with ρ = 7.78 × 10^{−10} g cm^{−3} and σ = 3.1 × 10^{−10} cm^{−1}. We consider a perfect gas with an adiabatic coefficient and a mean molecular weight μ = 1. The length of the domain is 7 × 10^{10} cm. It is discretized with 400 cells. The initial temperature at the left boundary is set to 10 K and is increased by 0.25 K per cell. Initially, the radiation is at equilibrium with the gas. The left boundary condition is reflective, and the initial velocity of the fluid is set to u_{0}. According to the value of u_{0}, the shock will be subcritical or supercritical; see González et al. (2007) for more details. To compare our results with those of Ensman (1994), Hayes & Norman (2003), González et al. (2007), we plot the temperature as a function of x_{i} = x − u_{0}t.
5.5.1. Subcritical shock
We first consider a subcritical shock, the initial velocity is set to u_{0} = −6 km s^{−1}. Figure 11 shows the gas temperature, the radiative temperature, and the reduced flux at three different times: 1.7 × 10^{4} s, 2.8 × 10^{4} s, and 3.8 × 10^{4} s. As expected, the gas and the radiation are not at equilibrium, before nor after the shock. The gas temperature reaches 1135 K, as in González et al. (2007), whereas it is only 850 K in Ensman (1994).
Fig. 11.
Subcritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 1.7 × 10^{4} s, 2.8 × 10^{4} s, and 3.8 × 10^{4} s. 

Open with DEXTER 
5.5.2. Supercritical shock
We now consider a supercritical shock, where the initial velocity is set to u_{0} = −20 km s^{−1}. Figure 12 shows the gas temperature, the radiative temperature, and the reduced flux at three different times: 4 × 10^{3} s, 7.5 × 10^{3} s, and 1.3 × 10^{4} s. As in González et al. (2007), the radiative temperature is the same as the matter temperature on both sides of the shock. The gas and the radiation are therefore at equilibrium. The radiative precursor is larger than that of the subcritical shock, as intended, and the temperature reaches 5000 K, as in Ensman (1994). We also recover the Zel’dovich spike.
Fig. 12.
Supercritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 4.0 × 10^{3} s, 7.5 × 10^{3} s, and 1.3 × 10^{4} s. 

Open with DEXTER 
5.6. Expansion of H II region
Now that we have confirmed the good behavior of the numerical scheme with both the asymptotic preserving and the wellbalanced properties, we can apply it to a physical situation: the propagation of the ionization front in a massive prestellar dense core.
5.6.1. Model
We consider the early stage of the development of an H II region in a massive prestellar dense core (Churchwell 2002). We focus on a region of the dense core at about 100 AU from the massive young stellar object (YSO). This region has been heated by the YSO during the premain sequence phase. The temperature reached at this location by infrared heating is of the order of 1000 K and the transport of energy in this region can be dominated by convection. We have inferred the convective state of this region by computing thermal and adiabatic gradients based on observations of Herpin et al. (2009) (Fig. 7). Highenergy photons emitted by the YSO when entering the main sequence start to ionize the surrounding gas. This triggers the propagation of an ionization front in a convective medium, and we are interested in the stability of such a front perturbed by the preexisting convective motions.
The interaction of the ionizing photons with the gas is described by Eq. (5). The only photons able to ionize the gas are emitted by the YSO, that is, there is no local source of ionizing photons. Following Tremblin (2012), we need to modify this model to take into account photochemistry and thermal balance. We define the fraction of ionization X = n_{H+}/n_{H} where n_{H} = n_{H+} + n_{H0}, n_{H+} is the number of ionized atoms and n_{H0} is the number of cold atoms. The evolution of the number of ionized atoms is simply the number of incoming photons that interact with the gas minus the number of ionized atoms that recombine (on the spot approximation, see Lesaffre 2002). Therefore,
where F_{γ} is the number of incoming photons per unit of surface and time, σ_{γ} is the average crosssection at the temperature of the star, and β gives the recombination rate: β = 2 × 10^{−10}T^{−0.75} with T the temperature of thermodynamic equilibrium (Black 1981).
The thermal balance is the difference between the heating rate and the cooling rate. The extra energy of the absorbed photons is converted into kinetic energy of electrons; this is the only source of heating during the ionization, and therefore the heating rate is given by (1 − X)n_{H}F_{γ}σ_{γ}e_{γ}. In this simplified model, the equilibrium temperature is obtained from the balance between the heating from the ionization and the cooling from the recombination. We do not consider any other effects, such as metal cooling. Therefore, we take e_{γ} = 1 eV (Lesaffre 2002) to recover the observed temperature around 1000 K. From Tremblin (2012), the cooling rate is given by . We also add a term of Newtonian forcing, , to trigger convection. T_{forcing} is the equilibrium temperature profile, depending on space, and τ_{forcing} is the relaxation timescale. The gas temperature will relax toward the equilibrium temperature profile T_{forcing}.
By writing cE_{r} = F_{γ}e_{γ}, ρ = n_{H}m_{H} and σ = σ_{γ}n_{H}, we finally have to solve the following system:
In this test, we use the M_{1} solver with the asymptotic correction presented in Sect. 3.1.3, but we do not use the wellbalanced discretization of the source term because of stability issues that we discuss in Sect. 6.1.
5.6.2. Setup
We consider a square domain of 5 AU on all sides. We use a setup similar to Padioleau et al. (2019) for compressible convection simulations. The temperature is set to 500 K and 1000 K at the top and bottom of the box, respectively. The initial temperature is a linear interpolation between the top and the bottom of the box. It is also the forcing temperature profile T_{forcing}. We take τ_{forcing} = 10^{7} s. These parameters are chosen to trigger the initial convective motions. We also set the pressure at the bottom: 10^{−3} dyne ⋅ cm^{−2} (Herpin et al. 2009). The density and the pressure are linked by the ideal gas law: , where μ is the mean molecular weight. The nonionized medium is made of hydrogen, with μ_{1} = 1. When the medium is fully ionized, it is made of atomic nucleus and electrons, and therefore has twice as many particles for the same mass. Because the distribution of nucleus and electrons is homogeneous, the mean molecular weight is μ_{2} = 0.5. When the medium is partially ionized, we take μ as the mean of the previous values balanced by the fraction of ionization, that is, μ = (1 − X)μ_{1} + Xμ_{2}. The density is initialized with the recursive formula , which is the discrete version of the hydrostatic balance ∇p = −ρg.
We impose Neumann boundary conditions for the temperature. The pressure and density are imposed by an extrapolation of the hydrostatic balance. Because the hydrodynamics solver is wellbalanced for the gravity, this configuration will remain static, even if the initial condition is unstable. The hydrostatic equilibrium is destabilized with a velocity mode perturbation of the form
with c_{s} the speed of sound, x_{mid} = y_{mid} = 2.5 AU and L_{x} = L_{y} = 5 AU. Without any interaction with the ionizing photons, the convective motions are stationary.
The opacity is set to with σ_{γ} = 6 × 10^{−18} cm^{2} (Lesaffre 2002). The radiative energy and flux are set to zero and the medium is not ionized (X = 0). We initialize the hydrodynamics variables with the steady state described above. At time t = 0, the bottom boundary of the region is ionized: the reduced flux is set to a value of 1 and the radiative energy is set to with F_{*} = 3 × 10^{17} cm^{−2} s^{−1} in the boundary. The boundary conditions for the hydrodynamics variables remain unchanged.
5.6.3. Results
As the initial condition is such that E_{r} is close to zero, this can easily create some spurious values such that f > 1. This is clearly a numerical artifact induced by the very low value of the radiative energy in regions where no ionizing photons are present. Even with a centered discretization of the radiative flux source term and without the asymptotic correction, we still encounter f > 1 during the simulation. Because of this problem, we impose f = 1 in the computation of the Eddington tensor in the cells where f > 1.
Figure 13 shows the evolution of the position of the ionization front as a function of time. With and without the initial convective rolls, the position of the ionization front oscillates around an equilibrium position, between 0.3 AU and 0.4 AU. The oscillations around the equilibrium are expected and have been observed with simpler models (Tremblin et al. 2012).
Fig. 13.
Evolution of the position of the ionization front as a function of time, with and without the initial velocity perturbation. 

Open with DEXTER 
Figures 14 and 15 show the ionization front at time t = 6 × 10^{8} s and at the final time t_{f} = 10^{10} s. With and without the initial convective rolls, a numerical noise appears as a consequence of the long timescales. Because of the numerical noise, some lack of symmetry can appear, such as is seen in the left panel of Fig. 14. The fraction of ionization, which is always between 0 and 1, reaches values between 10^{−12} and 10^{−6}. The effect of the preconditioner and the MPI domain decomposition is discussed in Appendix D. However, the numerical noise does not affect the position of the ionization front.
Fig. 14.
Snapshots of the fraction of ionization and the velocity field at time t = 6 × 10^{8} s without the initial velocity perturbation (left panel) and with it (right panel). 

Open with DEXTER 
Fig. 15.
Snapshots of the fraction of ionization and the velocity field at the final time t_{f} = 10^{10} s without the initial velocity perturbation (left panel) and with it (right panel). 

Open with DEXTER 
The stability of the ionization front is an issue that has been discussed for a long time in the literature (Mizuta et al. 2008). For example, 3D simulations of the expansion of a spherical ionization front in 3D Cartesian grids have shown instabilities either on the axis of the grid or in the diagonal depending on the numerical scheme (see Fig. A3 Bisbas et al. 2015). The dependence of the instability on the grid casts doubt on the possibility of a physical regime. Our test case shows that even with convective motions of large amplitude, the ionization front remains stable.
6. Discussion and conclusion
We present a new radiation hydrodynamics code. The radiative transfer is described with a twomoment model and uses the M_{1} closure relation. We first discuss some limitations of our work before presenting our conclusions.
6.1. Wellbalanced discretization of the source term
In Sect. 3.1.2 we propose a wellbalanced discretization of the source term on the radiative flux equation. This discretization allows us to properly capture the steady state with constant flux and with a discontinuity of opacity. However, this discretization can lead to spurious oscillations in the radiative flux, a problem that we have encountered in the test case for the expansion of H II regions. Although we have changed the discretization of the source term to achieve a wellbalanced property, our integration of the hyperbolic part on the source term is still split into two steps. Such a splitting strategy might be unstable if the source term is not taken into account in the hyperbolic part. A possible solution to this problem would be to incorporate the source term in a Lagrangeprojectionlike scheme such as Buet & Despres (2008). Such a strategy might be necessary to treat the radiation hydrodynamics problem of cloud interfaces.
6.2. Asymptotic limit for radiation hydrodynamics
In Sect. 3.1.3, we present an asymptotic correction which allows us to capture the asymptotic behavior, whereas the solution given by a standard scheme is dominated by numerical diffusion. The asymptotic correction uses the numerical diffusion to recover the physical one in a static fluid. Nevertheless, this scheme does not capture the asymptotic regime in a moving fluid, as presented in Appendix A. Most of the schemes proposed in the literature do not preserve this asymptotic regime (e.g., González et al. 2007; Berthon & Turpault 2011). The diffusive regime depends on the material velocity; our scheme cannot reach it. A possible solution would be to limit the numerical diffusion with a correction similar to a low Mach correction, as in Chalons et al. (2016), in conjunction with a cellcentered discretization of the source term, as proposed by the Lagrangeprojection scheme of Buet & Despres (2008).
6.3. Conclusion
The model for radiation hydrodynamics proposed in this paper has the correct behavior in both freestreaming and diffusive limits. It is discretized with an asymptotic preserving scheme. This asymptotic correction is important to capture the correct behavior in the diffusive limit and to preserve the freestreaming regime in a static fluid.
We take advantage of the libraries Kokkos and Trilinos to reach highperformance computing and to solve linear systems. This approach allows us to take advantage of different architectures and to use large timesteps for radiative transfer. Using the implicit solver is profitable as soon as the timestep given by the hydrodynamics is 50−100 times larger than the explicit timestep for radiative transfer, depending on the preconditioner and the architecture.
The solver is then coupled with the hydrodynamics code that implements an allregime and wellbalanced solver for hydrodynamics with gravity. The tests performed show that ARKRT is well suited to studying many astrophysical problems. We used this code to study the development of H II regions in massive prestellar dense cores, especially the propagation of the ionization front in the presence of convection. We show that even with the destabilizing effect of convection, the ionization front is strongly stable against perturbations. A linear stability analysis similar to the RayleighTaylor instability but including a source term could provide more insight into this behavior.
Further work will consist of the development of a numerical scheme that preserves the admissible states while keeping the asymptotic preserving and wellbalanced properties. We will then be able to take full advantage of the largest nextgeneration, massively parallel architectures to study atmospheric physics with opacity interfaces, among other things.
Acknowledgments
P. T. acknowledges supports by the European Research Council under Grant Agreement ATMO 757858. The authors thank the referee for his useful comments, especially about the test on the expansion of H II regions. H. B. thanks also Rodolphe Turpault for his welcome in Bordeaux and the helpful discussion.
References
 Anderson, J. D. 1995, Computational Fluid Dynamics : The Basics with Applications, McGrawHill Series in Mechanical Engineering (New York: McGrawHill) [Google Scholar]
 Audit, E., Charrier, P., Chièze, J. P., & Dubroca, B. 2002, ArXiv eprints [arXiv:astroph/0206281] [Google Scholar]
 Baker, C., & Heroux, M. 2012, Sci. Program., 20, 115 [Google Scholar]
 Bavier, E., Hoemmen, M., Rajamanickam, S., & Thornquist, H. 2012, Sci. Program., 20, 241 [Google Scholar]
 BergerVergiat, L., Glusa, C. A., Hu, J. J., et al. 2019a, MueLu Multigrid Framework, http://trilinos.org/packages/muelu [Google Scholar]
 BergerVergiat, L., Glusa, C. A., Hu, J. J., et al. 2019b, MueLu User’s Guide, Tech. Rep. SAND20190537 (Sandia National Laboratories) [Google Scholar]
 Berthon, C., & Turpault, R. 2011, Numer. Methods Partial Differ. Equations, 27, 1396 [CrossRef] [Google Scholar]
 Berthon, C., Charrier, P., & Dubroca, B. 2007, J. Sci. Comput., 31, 347 [CrossRef] [Google Scholar]
 Berthon, C., Crestetto, A., & Foucher, F. 2015, J. Sci. Comput., 67, 618 [CrossRef] [Google Scholar]
 Bisbas, T. G., Haworth, T. J., Williams, R. J. R., et al. 2015, MNRAS, 453, 1324 [NASA ADS] [CrossRef] [Google Scholar]
 Black, J. H. 1981, MNRAS, 197, 553 [NASA ADS] [Google Scholar]
 Buet, C., & Despres, B. 2008, A gas dynamics scheme for a two moments model of radiative transfer [Google Scholar]
 Cai, X.C., & Sarkis, M. 1999, SIAM J. Sci. Comput., 21, 792 [CrossRef] [Google Scholar]
 Chalons, C., Girardin, M., & Kokh, S. 2016, Commun. Comput. Phys., 20, 188 [CrossRef] [MathSciNet] [Google Scholar]
 Chandrasekhar, S. 1960, Radiative Transfer Dover Books on Intermediate and Advanced Mathematics (Dover Publications) [Google Scholar]
 Churchwell, E. 2002, ARA&A, 20, 27 [NASA ADS] [CrossRef] [Google Scholar]
 Dubroca, B., & Feugeas, J. L. 1999, C. R. Acad. Sci.  Ser. I  Math., 329, 915 [Google Scholar]
 Edwards, H. C., Trott, C. R., & Sunderland, D. 2014, J. Parallel Distrib. Comput., 74, 3202 [CrossRef] [PubMed] [Google Scholar]
 Ensman, L. 1994, ApJ, 424, 275 [NASA ADS] [CrossRef] [Google Scholar]
 Gnedin, N. Y., & Abel, T. 2001, New Astron., 6, 437 [NASA ADS] [CrossRef] [Google Scholar]
 González, M. 2006, Theses, Université Paris Sud  Paris XI [Google Scholar]
 González, M., Audit, E., & Huynh, P. 2007, A&A, 464, 429 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Hayes, J. C., & Norman, M. L. 2003, ApJS, 147, 197 [NASA ADS] [CrossRef] [Google Scholar]
 Heroux, M. A., Bartlett, R. A., Howle, V. E., et al. 2005, ACM Trans. Math. Softw., 31, 397 [CrossRef] [Google Scholar]
 Herpin, F., Marseille, M., Wakelam, V., Bontemps, S., & Lis, D. C. 2009, A&A, 504, 853 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Kestener, P. 2017, Implementing HighResolution Fluid Dynamics Solver in a Performance Portable Way with Kokkos, Tech. rep. (GPU Technology Conf. GTC) [Google Scholar]
 Krumholz, M. R., Klein, R. I., McKee, C. F., & Bolstad, J. 2007, ApJ, 667, 626 [NASA ADS] [CrossRef] [Google Scholar]
 Kuiper, R., Klahr, H., Beuther, H., & Henning, T. 2010, ApJ, 722, 1556 [NASA ADS] [CrossRef] [Google Scholar]
 Lesaffre, P. 2002, Theses, Université ParisDiderot  Paris VII, France [Google Scholar]
 Levermore, C. D. 1984, J. Quant. Spectr. Rad. Transf., 31, 149 [NASA ADS] [CrossRef] [Google Scholar]
 Levermore, C. D., & Pomraning, G. C. 1981, ApJ, 248, 321 [Google Scholar]
 Lowrie, R. B., Morel, J. E., & Hittinger, J. A. 1999, ApJ, 521, 432 [NASA ADS] [CrossRef] [Google Scholar]
 MignonRisse, R., González, M., Commerçon, B., & Rosdahl, J. 2020, A&A, 635, A42 [CrossRef] [EDP Sciences] [Google Scholar]
 Mihalas, D., & Mihalas, B. W. 1984, Foundations of Radiation Hydrodynamics (New York: Oxford University Press) [Google Scholar]
 Mizuta, A., Kane, J., Pound, M., et al. 2008, ApJ, 621, 803 [NASA ADS] [CrossRef] [Google Scholar]
 Padioleau, T., Tremblin, P., Audit, E., Kestener, P., & Kokh, S. 2019, ApJ, 875, 128 [CrossRef] [Google Scholar]
 Pichard, T., Alldredge, G., Brull, S., Dubroca, B., & Frank, M. 2016, J. Comput. Theor. Transp., 1 [Google Scholar]
 Prokopenko, A., Siefert, C. M., Hu, J. J., Hoemmen, M., & Klinvex, A. 2016, Ifpack2 User’s Guide 1.0, Tech. Rep. SAND20165338 (Sandia National Labs) [Google Scholar]
 Richling, S., Meinköhn, E., Kryzhevoi, N., & Kanschat, G. 2001, A&A, 380, 776 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Saad, Y. 1994, Numer. Linear Algebra Appl., 1, 387 [CrossRef] [Google Scholar]
 Saad, Y. 2003, Iterative Methods for Sparse Linear Systems, 2nd edn. (Philadelphia, PA, USA: Society for Industrial and Applied Mathematics) [CrossRef] [Google Scholar]
 Spitzer, L. 1978, Physical Processes in the Interstellar Medium (New York: Wiley) [Google Scholar]
 Stiavelli, M. 2009, From First Light to Reionization: The End of the Dark Ages [CrossRef] [Google Scholar]
 Thomas, G. E., & Stamnes, K. 2002, Radiative Transfer in the Atmosphere and Ocean (UK: Cambridge University Press) [Google Scholar]
 Toro, E. 2009, Riemann Solvers and Numerical Methods for Fluid Dynamics, 163 [CrossRef] [Google Scholar]
 Tremblin, P. 2012, PhD Thesis, Univ. Paris Diderot  Paris 7, France [Google Scholar]
 Tremblin, P., Audit, E., Minier, V., Schmidt, W., & Schneider, N. 2012, A&A, 546, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Turpault, R. 2005, J. Quant. Spectrosc. Radiat. Transfer, 94, 357 [CrossRef] [Google Scholar]
 Van der Vorst, H. 1992, SIAM J. Sci. Stat. Comput., 13, 631 [CrossRef] [MathSciNet] [Google Scholar]
Appendix A: Diffusive limit for radiation hydrodynamics
As in Sect. 2.4, we consider the diffusive limit with the P_{1} closure relation. We introduce a rescaling parameter ε to write the time and opacity as t̃ = εt and , respectively. Because the velocity of the fluid is smaller than the speed of light (), we also rescale it as . Let us focus on the equations describing the evolution of the radiative variables:
By expanding Eqs. (A.1a) and (A.1b) at zero order, we have
Expanding Eq. (A.1b) at first order leads to
Finally, looking at the radiative energy and the gas internal energy at second order, source terms cancel each other, and only the divergence of the radiative flux at first order remains, and we have
One can also look at Eq. (A.1a) at second order. Expanding Eq. (A.1a) at second order gives
We recover Eq. (43) of Krumholz et al. (2007). The term is the development at second order of the term κ_{0}(4πB − cE). Because we do not neglect any terms , some coefficients are slightly different. See the discussion in Krumholz et al. (2007) for the importance of the term .
Appendix B: Von Neumann stability analysis for the wellbalanced modification of the source term
For simplicity, we split Eq. 3b into a pure hyperbolic problem ∂_{t}F_{r} + c^{2}∇ · P_{r} = 0 and a source problem ∂_{t}F_{r} = −cσF_{r}. We focus on the onedimensional source problem, with periodic boundary conditions on the domain [0, T] × [0, 1] with T the final time. The following can easily be extended to an arbitrary space interval. Because we use periodic boundary conditions, we can apply the von Neumann stability analysis (see e.g., Anderson 1995), based on the decomposition of the numerical solution into Fourier series. Let us recall that, using the wellbalanced modification of the source term, the source problem is discretized as
with . We define the function F^{n}, piecewise constant, such that
This function is then extended to ℝ by periodicity. F^{n} can now be expanded in a Fourier series:
with
We can define the 2norm of the function F^{n}:
We apply the Fourier transform to Eq. (B.1):
We define the amplification factor A(k) as
and we then have . By induction, we have . The coefficient remains bounded if and only if A(k) ≤ 1. In this case, for all k ∈ ℤ, . Therefore, F^{n + 1}_{2} ≤ F^{n}_{2} ≤ F^{0}_{2} and the scheme is unconditionally stable. We now have to prove that A(k) ≤ 1:
As , we have A(k) ≤ 1.
Appendix C: Numerical scheme in the diffusive limit
We consider the numerical scheme developed in Sect. 3 in the asymptotic regime, with . Following Sect. 2.4, we introduce the rescaling parameter ε to write the time and opacity as and , respectively. Using the P_{1} closure relation, we have and
Radiative variables are expanded, e.g., . Expanding Eqs. (C.1a) and (C.1b) at zero order leads to
At first order for Eq. (C.1b), we have
Using the boundary condition given by Eq. (24), we have
in the whole domain.
Now, we consider the sum of Eqs. (C.1a) and (C.1c) expanded at second order. If , we have
whereas the asymptotic development of a standard discretization of Eq. (11) would be
Therefore, we are looking for such that the term of first order in Eq. (C.5) becomes a term of second order with the expected coefficient of diffusion and the term of second order becomes a term of third order and therefore negligible. In other words, we want the asymptotic development of to be . One way to achieve this is to take
However, in the general case, we do not have . We can then replace Eq. (C.7) by
Unfortunately, in numerical tests with σΔx close to 1, the condition f < 1 is not preserved. Because f is close to 1 in this case, we write
We use because numerical experiments have shown good results using this form. In the diffusion regime, because , we recover Eq. (C.8).
Now that we have the form of , we can check that the proposed scheme is asymptotic preserving. We have
Therefore,
We finally have
Equations (C.2), (C.4), and (C.12) are standard discretization of Eqs. (9)–(11), and therefore this scheme is asymptotic preserving.
Appendix D: Expansion of H II region
In the test case described in Sect. 5.6, some numerical noise appears as a consequence of the long timescales. Let us recall that a timeimplicit scheme is used, with large timesteps for the radiative transfer. At each timestep, the NewtonRaphson method is used and at each iteration of this algorithm an illconditioned linear system is solved using an iterative process. This results in the appearance of some numerical noise.
We performed the same simulations as in Sect. 5.6 with different numbers of MPI processes and different preconditioners. The physical domain is either distributed over 4 × 4 MPI processes (Fig. D.1a,c) or 2 × 2 MPI processes (Fig. D.1b,d). We also tried two preconditioners which allowed us to reach the final time with reasonable computational time: a standard ILU(k) factorization (Fig. D.1a,b) and an additive Schwarz domain decomposition (Fig. D.1c,d).
Fig. D.1.
Snapshots of the fraction of ionization and the velocity field at the final time t_{f} = 10^{10} s without the initial velocity perturbation (left panel) and with it (right panel). The physical domain is distributed across different numbers of MPI processes and different preconditioners have been used. Figure D.1a is the same figure as Fig. 15. (a) 4 × 4 MPI processes, ILU(k) factorization. (b) 2 × 2 MPI processes, ILU(k) factorization. (c) 4 × 4 MPI processes, Schwarz domain decomposition. (d) 2 × 2 MPI processes, Schwarz domain decomposition. 

Open with DEXTER 
Figure D.1 shows snapshots of the fraction of ionization and the velocity field at the final time t_{f} = 10^{10} s. The shape of the small structures produced by the numerical noise varies with the number of MPI processes and the preconditioner. Furthermore, the propagation of the ionization front creates some velocity that also depends on the number of MPI processes and preconditioners. However, the position of the ionization front is not affected by these parameters.
All Tables
All Figures
Fig. 1.
Weak scaling test. Each MPI process treats 1500^{2} cells. We tested different preconditioners: Jacobi with damping (Relaxation), AMG, standard ILU(k) factorization (RILUK), a variant of the standard ILU factorization (ILUT), and additive Schwarz domain decomposition (Schwarz). Left panel: number of iterations to solve the linear system as a function of the number of cells. Right panel: speedup as a function of the number of MPI processes. 

Open with DEXTER  
In the text 
Fig. 2.
Strong scaling test. The global resolution is 2048^{2} cells. The same preconditioners as those listed in Fig. 1 were tested. Left panel: number of iterations to solve the linear system as a function of the number of MPI processes. Right panel: speedup as a function of the number of MPI processes. 

Open with DEXTER  
In the text 
Fig. 3.
Computational time for the implicit solver with different preconditioners (as listed for Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 1000^{2} cells. 

Open with DEXTER  
In the text 
Fig. 4.
Memory consumption for the implicit solver with different preconditioners (as listed in Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 1000^{2} cells. 

Open with DEXTER  
In the text 
Fig. 5.
Marshak wave simulation. This figure shows a snapshot of the gas temperature at time t_{f} = 2 × 10^{−4} s, with and without the asymptotic correction and the reference solution. Spatial resolution is n = 400 and the opacity is σ = 10 000 cm^{−1}. 

Open with DEXTER  
In the text 
Fig. 6.
Simulation of a steady state with a jump of opacity. The opacity is piecewise constant, and a jump is located at x = 0.5 cm (gray line). This figure shows a snapshot of the radiative flux at time t_{f} = 10^{−3} s. 

Open with DEXTER  
In the text 
Fig. 7.
Beam simulation, showing radiative energy. The eigenvalues are fixed to ±c (upper panel) or calculated with Eq. (14) (lower panel). 

Open with DEXTER  
In the text 
Fig. 8.
Beam simulation, showing a horizontal cut in Fig. 7 at the middle height. 

Open with DEXTER  
In the text 
Fig. 9.
Shadow simulation, showing snapshots of the radiative temperature at time t_{f} = 10^{−10} s with different closure relations: P_{1} model (upper panel), M_{1} model with fixed eigenvalues (middle panel), and M_{1} model with computed eigenvalues (lower panel). 

Open with DEXTER  
In the text 
Fig. 10.
Shadow simulation, showing the radial profiles of the radiative temperature at time t_{f} = 10^{−10} s with different closure relations: P_{1} model, M_{1} model with fixed eigenvalues, and M_{1} model with computed eigenvalues. 

Open with DEXTER  
In the text 
Fig. 11.
Subcritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 1.7 × 10^{4} s, 2.8 × 10^{4} s, and 3.8 × 10^{4} s. 

Open with DEXTER  
In the text 
Fig. 12.
Supercritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 4.0 × 10^{3} s, 7.5 × 10^{3} s, and 1.3 × 10^{4} s. 

Open with DEXTER  
In the text 
Fig. 13.
Evolution of the position of the ionization front as a function of time, with and without the initial velocity perturbation. 

Open with DEXTER  
In the text 
Fig. 14.
Snapshots of the fraction of ionization and the velocity field at time t = 6 × 10^{8} s without the initial velocity perturbation (left panel) and with it (right panel). 

Open with DEXTER  
In the text 
Fig. 15.
Snapshots of the fraction of ionization and the velocity field at the final time t_{f} = 10^{10} s without the initial velocity perturbation (left panel) and with it (right panel). 

Open with DEXTER  
In the text 
Fig. D.1.
Snapshots of the fraction of ionization and the velocity field at the final time t_{f} = 10^{10} s without the initial velocity perturbation (left panel) and with it (right panel). The physical domain is distributed across different numbers of MPI processes and different preconditioners have been used. Figure D.1a is the same figure as Fig. 15. (a) 4 × 4 MPI processes, ILU(k) factorization. (b) 2 × 2 MPI processes, ILU(k) factorization. (c) 4 × 4 MPI processes, Schwarz domain decomposition. (d) 2 × 2 MPI processes, Schwarz domain decomposition. 

Open with DEXTER  
In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.