Issue |
A&A
Volume 646, February 2021
|
|
---|---|---|
Article Number | A123 | |
Number of page(s) | 17 | |
Section | Numerical methods and codes | |
DOI | https://doi.org/10.1051/0004-6361/202038579 | |
Published online | 22 February 2021 |
A high-performance and portable asymptotic preserving radiation hydrodynamics code with the M1 model
1
Université Paris-Saclay, UVSQ, CNRS, CEA, Maison de la Simulation, 91191 Gif-sur-Yvette, France
e-mail: helene.bloch@cea.fr
2
Université de Paris, AIM, 91191 Gif-sur-Yvette, France
3
AIM, CEA, CNRS, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
Received:
4
June
2020
Accepted:
22
November
2020
Aims. We present a new radiation hydrodynamics code called ARK-RT which uses a two-moment model with the M1 closure relation for radiative transfer. This code was designed to be ready for high-performance computing on exascale architectures.
Methods. The two-moment model is solved using a finite-volume scheme. The scheme is designed to be asymptotic preserving in order to accurately capture both optically thick and thin regimes. We also propose a well-balanced discretization of the radiative flux source term which allows users to capture constant flux steady states with discontinuities in opacity. We use the library Trilinos for linear algebra and the package Kokkos allows us to reach high-performance computing and portability across different architectures, such as multi-core, many-core, and GP-GPU.
Results. ARK-RT is able to reproduce standard tests in both free-streaming and diffusive limits, including purely radiative tests and radiation hydrodynamics ones. Using a time-implicit solver is profitable as soon as the time-step given by the hydrodynamics is between 50 and 100 times larger than the explicit time-step for radiative transfer, depending on the preconditioner and the architecture. Nevertheless, more work is needed to ensure stability in all circumstances. Using ARK-RT, we study the propagation of an ionization front in convective dense cores. We show that the ionization front is strongly stable against perturbations even with destabilizing convective motions. As a result, the presence of instabilities should be interpreted with caution. Overall, ARK-RT is well-suited to studying many astrophysical problems involving convection and radiative transfer such as the dynamics of H II regions in massive pre-stellar dense cores and future applications could include planetary atmospheres.
Key words: radiative transfer / hydrodynamics / methods: numerical
© H. Bloch et al. 2021
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
In many astrophysical situations, radiation is an important process which interacts with the surrounding gas; for example in (exo)planet atmospheres (e.g., Thomas & Stamnes 2002), massive stars (e.g., Kuiper et al. 2010; Mignon-Risse et al. 2020), and H II regions (e.g., Spitzer 1978) up to the cosmic reionization (e.g., Stiavelli 2009). In all these situations, the radiation can be absorbed, thus heating the surrounding gas. Photons could also be emitted by the gas or by another source. Photons could be scattered, which will change their direction of propagation and perhaps their frequency (Chandrasekhar 1960).
Two main regimes can arise, depending on the mean free path of photons compared to the characteristic length of the system (Mihalas & Mihalas 1984). On one hand, in the diffusive limit, the medium is optically thick (mean free path of photons much smaller than the characteristic length), and the radiation and the matter strongly interact with each other. On the other hand, in the free-streaming regime, the radiation does not affect the gas, and the medium is optically thin (mean free path of photons greater than the characteristic length). Numerically, it is difficult to accurately capture both limits.
Because of the high number of degrees of freedom (i.e., the time, the position, the direction of propagation, and the frequency of photons), only a few problems can be solved analytically (Chandrasekhar 1960) and direct simulations are out of reach for modern computers. Different models have been developed to reduce the computational cost. Here, we focus on the moment models (Levermore 1984), in which the specific intensity is averaged over the direction of propagation of photons. These models present several advantages: the computational cost is lower than other methods such as the Monte-Carlo method, and in general it is easy to couple these models with a grid-based hydrodynamics code.
One can only consider the moment of zero order (the radiative energy), leading to the flux-limited diffusion (FLD) approximation (Levermore & Pomraning 1981). Because the moment models consider only the moment of zero order, their computational cost is quite low, but they are very diffusive in the free-streaming regime. To tackle this issue, one can use a two-moment model (radiative energy and radiative flux), with the M1 closure relation (Dubroca & Feugeas 1999). However, this method can suffer from artifacts when multiple beams cross in the free-streaming regime (González 2006). One can solve this issue by using a three-moment model (radiative energy, radiative flux, and radiative pressure) with the M2 closure relation (Pichard et al. 2016). However, because of the increase of unknowns, the computational cost also increases. In this work, we have chosen to use the two-moment model with the M1 closure relation because the computational cost remains affordable, and in our applications we do not encounter the problem of beams crossing in the free-streaming regime.
Even though the M1 model is accurate in both free-streaming and diffusive regimes at the continuous level, numerical schemes also need to properly capture both limits. Several approaches have been developed. For example, Berthon & Turpault (2011) presented a scheme based on an HLL solver with source terms modified with a free parameter. Following this idea, we propose a new so-called asymptotic preserving scheme also based on an HLL solver. Nevertheless, we have chosen another parameter to recover the asymptotic behavior in the diffusive limit. Furthermore, our integration of source terms is different. In many physical applications (e.g., clouds), optically thick regions are found next to optically thin zones. We propose a well-balanced modification of the source term, which allows us to accurately reach steady states in the presence of sharp transitions.
As a first application, we are interested in the development of H II regions in massive pre-stellar dense cores (Churchwell 2002). We focus on solving the radiation hydrodynamics equations. An explicit solver for the radiative transfer would be restricted by a Courant-Friedrichs-Lewy (CFL) condition, limited by the speed of light. This will result in a very low time-step compared to the hydrodynamics time-step, which is limited by the speed of sound through the fluid. Several methods have been developed to get around this problem; we have chosen a time-implicit solver (e.g., González et al. 2007). The temporality of the radiative transfer is preserved, which is not the case with the reduced-speed-of-light approximation (RSLA, e.g., Gnedin & Abel 2001). However, a time-implicit solver method is costly because it requires solving large sparse linear systems.
Fortunately, progress has been made in linear algebra for high-performance computing (HPC). Because the linear system we have to solve is large and sparse, direct methods are out of reach. Iterative solvers with preconditioners have been developed to tackle this issue (e.g., Saad 2003). In this work, we use the library Trilinos (Heroux et al. 2005) because it allows us to target different architectures, such as multi-core, many-core, and GP-GPU. It also provides, among others, algebraic multigrid (AMG) preconditioners.
The paper is organized as follows. In the following section, we present the moment model and the M1 closure relation in more detail. In Sect. 3, we outline our new numerical scheme, which is well balanced and asymptotic preserving in the diffusive limit. In Sect. 4, we provide details of some implementation features of our code ARK-RT, especially the Kokkos and Trilinos libraries used for shared memory parallelism and linear algebra for high-performance computing. We also show some performance results. In Sect. 5, we present some numerical test cases to show the importance of the asymptotic preserving and well-balanced properties. Finally, in Sect. 5.6 we present a physical application of our method where we study the stability of the ionization front in the dense cores of a H II region.
2. Model
2.1. Physical model
In this work, we only consider gray radiative transfer, that is, we have computed the average over the frequency. We also assume local thermodynamic equilibrium (LTE) and we do not consider scattering. The mathematical description of radiative transfer was formalized by Chandrasekhar (1960):
where I, the quantity of interest, is the specific intensity, c is the speed of light, σ is the opacity of the medium, and B is the black-body-specific intensity. Also, x, t, and Ω are the spatial, temporal, and angular variables. This model can be generalized to multigroup radiative transfer (e.g., Turpault 2005).
Because the specific intensity I depends on six variables in the three-dimensional case, the numerical treatment of Eq. (1) becomes rapidly costly. To reduce the computational cost, we use a moment method, which we present in detail in the following section.
2.2. M1 model
Let us consider the three first moments of the specific intensity: the radiative energy Er, the radiative flux Fr, and the radiative pressure Pr defined as:
The mean over solid angles of Eq. (1) and its product by Ω give the following system:
where Tg is the gas temperature and ar is the radiation constant.
The fluid and the radiation exchange energy and momentum through emission and absorption. To ensure the conservation of the total energy when the hydrodynamics is frozen, the energy exchange term is given by
Here, ρcvTg is the gas internal energy, with ρ the density of the fluid and cv the heat capacity, which is defined by for a perfect gas, where kb is the Boltzmann constant, μ is the mean molecular weight, mH is the mass of hydrogen, and γ is the adiabatic index of the gas.
We still have to specify a closure relation, that is, a way to express Pr as a function of Er and Fr. The one chosen here is the M1 model (Levermore 1984). From Dubroca & Feugeas (1999), we write Pr = DEr where D is the Eddington tensor, defined by with χ the Eddington factor, I the identity matrix, and
is the reduced flux. Here, χ can be specified either by applying a Lorentz transform to an isotropic distribution of photons (Levermore 1984), or by minimizing the radiative entropy (Dubroca & Feugeas 1999). In both cases, we have
. Let us note that f ≤ 1, which ensures that the radiative energy cannot be transported faster than the speed of light.
The M1 model preserves both free-streaming and diffusive limits. On one hand, if f = 1, then Pr = DErn ⊗ n, and only the transport regime remains. On the other hand, if f = 0, the model in the diffusive regime simplifies to the P1 model, with . The radiative pressure tensor becomes isotropic. We look into the diffusion limit more precisely in Sect. 2.4.
2.3. Radiation hydrodynamics
We now consider the radiation hydrodynamics equations. The fluid evolution is described by the Euler equations expressing conservation of mass, balance of momentum, and balance of energy. Because photons are relativistic particles, we have to evaluate the quantities in the laboratory frame or the comoving frame, which is moving with the fluid. On one hand, using the comoving frame introduces nonconservative terms in the left-hand side of the equations. On the other hand, the hyperbolic part of the system remains simple in the laboratory frame but some source terms have to be incorporated to describe the interactions between the matter and the radiation. We have chosen the second approach. These source terms characterize the momentum and energy exchanges between the fluid and the radiation:
where u is the material velocity, p is the pressure of the fluid, g is the external gravitational field, and is the density of total matter energy with e the specific internal energy. The terms SEr(u) and SFr(u) depend on the velocity u. Using Eqs. (29) and (31) from Lowrie et al. (1999), we have
at first order in . To close the system, we also add the equation of state of an ideal gas: p = ρe(γ − 1).
2.4. Diffusive limit in a static fluid
Let us now focus on the diffusive regime with the hydrodynamics frozen, that is, the limit of large opacity, long timescale, and u = 0. We consider the P1 closure relation
Following Berthon & Turpault (2011), we introduce a rescaling parameter ε to write the time and opacity as t̃ = εt and , respectively. The radiative energy, radiative flux, and gas temperature are expanded with ε; for example Er = Er, 0 + εEr, 1 + 𝒪(ε2). System (7) becomes
By expanding Eqs. 8a and 8b at zero order, we have
Expanding Eq. (8b) at first order leads to
Finally, expanding the sum of Eqs. (8a) and (8c) at second order gives
In the diffusive limit, the total energy Er + ρcvTg at zero order obeys the diffusion equation given by Eq. (11). Similar development of the radiation hydrodynamics case is presented in Appendix A. Section 3 presents our design of an asymptotic preserving scheme, that is, a numerical scheme that will degenerate to the discretization of Eq. (11) in the diffusive regime.
3. Numerical scheme and algorithm
3.1. Radiation transport in a static fluid
Let us first introduce some notations: we use Δx to denote the step along the x-direction. here, Δt is the time interval between the current time tn and tn + 1. We write xi the center of the cell i and the interface between the cell i and the cell i + 1. We use the notation
to represent the averaged quantity associated with the field u at time tn in the cell i (finite volume). Finally, we use
to represent the quantity associated with the field u at time tn and at the interface between cells i and i + 1.
The development of the numerical scheme is presented only in the one-dimensional case, but its extension to higher dimensions is straightforward. To ease notations, we drop the indices r for all radiative variables.
We present the development of the numerical scheme using a time-implicit integration, but a similar development can be done with a semi-implicit solver: source terms remain implicit, but the hyperbolic part is time-explicit.
3.1.1. Hyperbolic system
Following González et al. (2007), we discretize the hyperbolic part of Eq. (3) using a first-order Godunov-type solver (Toro 2009). From Berthon & Turpault (2011), we also introduce an extra parameter α which is specified in Sect. 3.1.3:
where and
are the numerical fluxes given by
with and
, where λmax and λmin are the eigenvalues of Eq. (3). From Berthon et al. (2007), we have
with . See Fig. 1 of González et al. (2007) for more details about the structure of the eigenvalues.
is a well-chosen discretization of the term σFr in the cell i and at time tn + 1, and is specified in the following section.
3.1.2. Well-balanced modification of the source term
From Berthon et al. (2015), a well-balanced scheme catches the correct steady regime. The steady state, if it exists, is given by
Equation (15c) is discretized by . An obvious choice for
is
However, using this formulation, Eq. (15c) is discretized as
with . The radiative flux remains cell-centered and is equal to the divergence of radiative pressure, which itself is defined at the interfaces of the cells. This can create some spurious flux at the interface when looking for a steady state with a constant flux in the box (see Sect. 5.2). Inspired by well-balanced schemes for hydrodynamics (e.g., Padioleau et al. 2019) which preserve the hydrostatic balance between the pressure forces and the gravitational force (and the similarity between this balance and that between radiative pressure and the radiative flux source term in Eq. (15c)), we choose to use an average of a face discretization of the radiative flux source term:
with
One way to interpret this equation is to remember that
Equation (16) is obtained with the rectangle rule for numerical integration of Eq. (20):
whereas Eq. (18) is given by the trapezoidal rule:
To have
in the whole domain, we also impose it as boundary condition:
where is the radiative pressure given by the boundary condition. In that way, the radiative flux is centered at the interfaces of the cells, as is the divergence of radiative pressure. A von Neumann stability analysis of the modified scheme is presented in Appendix B.
3.1.3. Asymptotic preserving scheme
We still have to specify our choice for . Here,
corresponds to a classic HLL scheme. However, the solution given by an asymptotic preserving scheme has to approximate the solution of Eq. (11) as soon as the asymptotic regime is reached, that is, it must incorporate large opacities and long timescales. Unfortunately, a standard HLL scheme does not have this property (see Sect. 5.1). To tackle this issue and obtain an asymptotic preserving scheme, we choose
with . The derivation of Eq. (25) is presented in Appendix C. Other choices can be made (see e.g., Berthon & Turpault 2011). If
goes to zero,
goes to 1, and we recover a standard HLL scheme. Considering the diffusive limit, we prove that the scheme is asymptotic preserving in Appendix C. We show that
which is a standard discretization of Eqs. (9)–(11).
Unfortunately, we cannot prove that this scheme will preserve the admissible states f < 1 and, indeed, numerical experiments with this scheme have shown that we can get f > 1 when we are close to the free-streaming regime. In these situations, we can either enforce f < 1 (Sect. 5.6) or come back to a centered discretization of the source term (Sect. 5.4). Furthermore, the development of the asymptotic preserving scheme with the well-balanced modification of the source term is only done in the case of a static fluid. Using the asymptotic correction, Equation (25) is only the first step to obtaining an asymptotic preserving scheme in the case of a moving fluid (see Sect. 6.2).
3.2. Coupling to hydrodynamics
Following González et al. (2007), the resolution of the whole of the system shown in Eq. (5) describing radiation hydrodynamics is split into three steps:
1. Update the hydrodynamics quantities (Eqs. (5a)–(5c) without the terms of energy and momentum exchange) using the well-balanced and all-regime solver developed in Padioleau et al. (2019);
2. Update the radiative quantities and gas temperature (Eqs. 3 and (4)) using the solver developed in Sect. 3.1. During this step, the hydrodynamics quantities are frozen;
3. Add source terms SEr(u) and SFr(u). For simplicity, all source terms which depend on the velocity are treated explicitly. The term in Eqs. (5b) and (5e) is discretized using the well-balanced scheme proposed in Sect. 3.1.2. All the other terms remain cell-centered.
This splitting allows us to reduce the number of equations solved implicitly, making the method more efficient.
3.3. Algorithm for nonlinear implicit solver
The time-step given by the CFL condition is much smaller for radiation than for hydrodynamics. Indeed, for the radiation, it is limited by the speed of light, whereas it is limited by the speed of sound in the fluid for the hydrodynamics. Because we turn to radiation hydrodynamics, we consider a long timescale for the radiative transfer. Therefore, we use a time-implicit integration for the radiative transfer.
3.3.1. Newton–Raphson method and linear solver
Because of the Eddington tensor, the eigenvalues in the numerical fluxes, and the factor, the system is nonlinear and is solved using a Newton-Raphson method. At each iteration, we have to solve a linear system. Because the system is large ((2 + d)N unknowns, where d is the number of dimensions and N the total number of cells) and sparse, it cannot be solved using a direct method. Because of the numerical fluxes, the matrix is not symmetric, and we use the biconjugate gradient stabilized method (Van der Vorst 1992).
3.3.2. Preconditioner
Using large time-steps for the radiative transfer, the matrix is ill-conditioned and iterative methods might not converge. One way to deal with this issue is to use a preconditioner. Instead of solving the original linear system Ax = b, we solve the right preconditioned system AK−1Kx = b via solving AK−1y = b to compute y and then Kx = y. As long as the matrix K is invertible, this gives the same solution as the original system. If K is well chosen, the condition number of the matrix AK−1 is lower than that of A. For example, the Jacobi or diagonal preconditioner is given by
See Saad (2003) for more details.
Numerous preconditioners have been developed over the years; some of them perform well and some do not, depending on the problem considered. Among different preconditioners, we explore the algebraic multigrid (AMG) technique. We present this method in the following section, and we compare it with other preconditioners in Sect. 4.4: a standard ILU(k) factorization (Saad 2003), a slightly modified variant of the standard ILU factorization (Saad 1994), an additive Schwarz domain decomposition (Cai & Sarkis 1999), and a classical relaxation method, Jacobi with damping (Saad 2003).
3.3.3. Algebraic multigrid preconditioner
The AMG methods were first developed as linear solvers for symmetric positive definite matrices arising from the discretization of scalar elliptic PDEs. For such a matrix, classical iterative methods are able to efficiently compute the high frequencies of the solution, but lack efficiency when computing its low frequencies. However, the computation is easier on a coarser grid with fewer unknowns. The idea of the multigrid solver is to build a coarser grid, solve the problem on this coarse grid, and then finally interpolate the solution on the fine grid. We can subsequently define a restriction operator R which transfers vectors from the fine grid to the coarse grid and an interpolation operator P used to return to the finer grid. P and R are nonsquared matrices. From Saad (2003), the main steps of the method are as follows:
1. Pre-smoothing: a few iterations of a simple method such as Jacobi or an incomplete factorization are performed to get the value .
2. The residual is projected over the coarse grid with the restriction operator R, to get the residual equation
.
3. This equation is solved, possibly with a direct solver.
4. The solution y is interpolated over the fine grid with the interpolation operator P and then .
5. Post-smoothing: a few iterations of a simple method are again performed to get the solution .
The solution is used as a preconditioner result. If the coarse grid has too many unknowns to be solved directly, this process is applied recursively: the coarse grid becomes the fine grid and a coarser grid is built. Therefore, we have a hierarchy of grids. With a geometric multigrid solver, the restriction and interpolation operators are determined by the mesh, whereas with an algebraic multigrid solver they are automatically generated using data from the matrix.
Before describing our numerical tests, we present details of our implementation and parallelization in the following section.
4. Implementation and parallelization
We used the code ARK-RT1 for implementation, which is a fork of the code ARK developed in Padioleau et al. (2019). The hydrodynamics and gravity part of our solver is similar to ARK and is solved with a well-balanced and all-regime solver. Because the solver for the radiative transfer equations is time-implicit, we have to solve large, sparse linear systems. This is done using the library Trilinos, described in the following section.
4.1. Linear algebra
We use the second generation of packages of the Trilinos framework (Heroux et al. 2005), namely Kokkos (Edwards et al. 2014) for shared memory computation, Tpetra (Baker & Heroux 2012) for distributed vectors and matrices, Belos (Bavier et al. 2012) for linear solvers, Ifpack2 (Prokopenko et al. 2016) for classical preconditioners, and MueLu (Berger-Vergiat et al. 2019a,b) for the AMG preconditioner. Let us first focus on the package Kokkos.
4.2. Shared memory computation
As new architectures have increasing numbers of cores, the distributed memory model is insufficient to take advantage of all the computational power available. Therefore, we need to use a shared memory model inside the nodes. Furthermore, computational nodes are increasingly heterogeneous; for example, multi-core, many-core, or accelerators such as GP-GPUs. Each architecture requires its own interface, such as OpenMP or C++11 threads for multi-core and many-core processors and CUDA or OpenACC for NVIDIA GPUs. This raises the problem of portability and performance portability: many HPC codes are optimized for some specific architectures, and so running the code on a different architecture will result in poor performance.
The library Kokkos (Edwards et al. 2014) tackles this issue. The user has a unique code which can be compiled with different shared memory models such as OpenMP or CUDA. Kokkos is based on different abstractions, like execution spaces (where a function is executed), memory spaces (where the data are stored), and execution policies (how the function is executed). Also, Kokkos provides execution patterns such as parallel loops and multidimensional arrays, and the storage is optimized according to the architecture.
4.3. Implementation
For the hydrodynamics step, Kokkos is used as an independent library for shared memory computation. Communications between the nodes are handled by the Message Passing Interface (MPI) programming model through a regular domain decomposition. Following Kestener (2017), inside each node, the domains are endowed with ghost cells used to implement physical boundary conditions, but also to contain values from neighbor domains. The code is organized with computational kernels, and each kernel is a C++ functor; see Padioleau et al. (2019) for more details.
The second step is the time-implicit solver for radiative transfer. The values of the matrix and the right-hand side of the linear system have to be updated at each iteration of the Newton–Raphson method, but the graph of the matrix does not change. Using Trilinos, the rows of the matrix are distributed across the MPI processes, each of them is associated with a unique global index. Each global index has a matching local index on the owning process.
Trilinos provides several methods to update the coefficients of the matrix. One of them uses only global indices, which is recommended for Trilinos. However, this function can only be called by the host. This has two main consequences: First, because all the rows of the matrix have to be updated, we use a sequential loop over the rows of the matrix. Second, in the case where we are using GP-GPUs, we have to update the matrix with data coming from the device. Therefore, we have to transfer some data from the device to the host, update the matrix with these data, and then transfer the matrix from the host to the device; this last step is done implicitly by Trilinos. This process increases the computational cost.
Another way is to use local indices. The package KokkosKernels, which is part of the Kokkos ecosystem, provides several ways to update the coefficients of the matrix through a kernel. This allows the use of a parallel loop (via Kokkos::parallel_for) and we avoid data transfers between the host and the device. For performance reasons, we use local indices.
4.4. Performances
Thanks to Trilinos, we can use many preconditioners. Unfortunately, they do not behave in the same way when the size of the system increases. All tests are performed on Poincare, our local cluster at Maison de la Simulation. Each node consists of two Sandy Bridge E5-2670 at 2.60 GHz (2 × 8 cores, 32 Go RAM) processors. We use a hybrid configuration MPI/OpenMP, with one MPI process per socket to avoid NUMA effects.
We first performed a weak scaling test, where we consider a two-dimensional case with periodic boundary conditions and a hot source located at the center of each domain. Each MPI process provides a piece of the whole domain of 15002 cells, and therefore the size of the system increases with the number of MPI tasks. The resolution is close to our target for three-dimensional simulations. As shown in the left panel of Fig. 1, for all preconditioners, the number of iterations remains constant, at around 10 iterations for the AMG preconditioner, around 20 iterations for both incomplete factorizations and the additive Schwarz domain decomposition, and around 250 iterations for the relaxation. As shown in the right panel, the speed-up reaches a plateau of 80% to 90% of maximum performance depending on the preconditioner.
![]() |
Fig. 1. Weak scaling test. Each MPI process treats 15002 cells. We tested different preconditioners: Jacobi with damping (Relaxation), AMG, standard ILU(k) factorization (RILUK), a variant of the standard ILU factorization (ILUT), and additive Schwarz domain decomposition (Schwarz). Left panel: number of iterations to solve the linear system as a function of the number of cells. Right panel: speed-up as a function of the number of MPI processes. |
Figure 2 shows the number of iterations and the speed-up as a function of the number of MPI processes for a strong scaling test. We now consider a Marshak wave propagation in the diffusive limit. The global resolution remains constant as the number of processes increases, and is set to 20482 cells. Because the global resolution is constant, one can expect the number of iterations to also remain constant when the number of MPI processes increases. However, using the AMG (orange curve) and the incomplete factorizations (RILUK, ILUT; green curve in left panel), when four MPI processes or more are used, the number of iterations is twice the number of iterations reached with one or two MPI processes. Therefore, the computational time is the same when using two or four MPI processes. Furthermore, all tested preconditioners and the linear solver require several communications per iteration, which likely becomes the main cost when the local resolution decreases.
![]() |
Fig. 2. Strong scaling test. The global resolution is 20482 cells. The same preconditioners as those listed in Fig. 1 were tested. Left panel: number of iterations to solve the linear system as a function of the number of MPI processes. Right panel: speed-up as a function of the number of MPI processes. |
Thanks to Kokkos, we can use exactly the same code on different architectures, namely Sandy Bridge processors and NVIDIA GP-GPUs (e.g., K80). Unfortunately, the memory required by the AMG preconditioner with a 15002 simulation is greater than the memory available on a K80 GPU. For the next tests, we use a lower resolution of 10002 cells. Table 1 summarizes the computational time for a fixed problem with different schemes and different architectures. As the explicit solver is restricted by a CFL condition, it requires several thousand time-steps whereas the implicit solver only needs a few time-steps to reach the same final time. Updating the matrix in parallel allows for a 25% reduction in computational time required. On a CPU, the implicit solver is around 160 times faster than the explicit solver, whereas on a GPU it is only 11 times faster.
Computational time with both explicit and implicit solvers on CPU and GPU.
Figure 3 compares the computational time with different preconditioners, on both a CPU and a GPU. Except for the implicit solver using the AMG preconditioner, all solvers are faster on a GPU than on a CPU; up to three times faster for the relaxation preconditioner. Part of the AMG algorithm probably remains sequential. On a CPU, the AMG preconditioner is faster than the relaxation preconditioner. The other preconditioners are slower, by up to a factor eight between the relaxation and the additive Schwarz domain decomposition on a GPU.
![]() |
Fig. 3. Computational time for the implicit solver with different preconditioners (as listed for Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 10002 cells. |
Figure 4 compares the memory consumption with different preconditioners. Using GP-GPU, most of the data are located on the device, but Trilinos still allocates some memory, namely between 0.125 GB and 0.208 GB, on the host, unlike the explicit solver. Using the relaxation as a preconditioner, the amount of memory allocated is lower than with the other preconditioners (7.3 GB for the relaxation against 11.5 GB for the AMG).
![]() |
Fig. 4. Memory consumption for the implicit solver with different preconditioners (as listed in Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 10002 cells. |
Choosing a well-suited preconditioner can be challenging and problem dependent. Once the preconditioner is chosen, it depends on many parameters. For example, Trilinos allows the user to choose the damping factor ω for the relaxation method or the smoother and the coarse solver for the AMG. Performances and stability can largely depend on these choices. For example, the relaxation method seems to be well suited for this problem with low computational time, and memory consumption, but in many other test cases, the linear solver will not converge. The AMG preconditioner performs well on a CPU but is less efficient on a GPU. Both incomplete factorizations and the additive Schwarz domain decomposition are slightly less efficient than the AMG preconditioner. Overall, we find the AMG preconditioner and the relaxation method to produce a good compromise between stability and performance.
The performances we obtain thanks to Kokkos and Trilinos are encouraging for the study of astrophysical problems. The time-step given by the hydrodynamics can be written as . Using a relaxation as a preconditioner, we need CFL ≥ 50 on a CPU and CFL ≥ 100 on a GPU to save computational time, whereas when using an incomplete factorization, we need CFL ≥ 250 on a CPU and CFL ≥ 1000 on a GPU. We need a larger CFL number on a GPU because the explicit solver is more efficient on a GPU than on a CPU.
In the following section, we use several numerical tests to show that the scheme developed in Sect. 3 is well suited for the study of radiation hydrodynamics problems.
5. Numerical results
We performed a series of verification tests to validate different properties of the scheme: the asymptotic correction with a Marshak wave, the well-balanced property to reach a steady state with a jump of opacity, the properties of the M1 model with a beam test and a shadow test, and the coupling to the hydrodynamics with radiative shocks. We also present a physical application, investigating the stability of the ionization front in the dense core of a H II region. To ease notations, we define the radiative temperature as .
5.1. Marshak wave
From Mihalas & Mihalas (1984), a Marshak wave is the propagation of hot radiation into a cold medium. We consider a one-dimensional case in the diffusive limit in order to test the asymptotic preserving scheme developed in Sect. 3.1.3.
The length of the computational domain is 1 cm; it is discretized with 400 points. Initially, the medium is at equilibrium with the radiation: T0 = Tr = 300 K, the initial radiative flux is Fr = 0. We consider a perfect gas with . The hydrodynamics is frozen. The density is constant, such that ρcv = 1 J K−1 cm−3, and the opacity is also constant, with σ = 10 000 cm−1, and therefore σΔx = 25. At time t = 0, a source is lit at the left boundary with Tr = 1000 K.
The results are shown in Fig. 5 at time tf = 2 × 10−4 s. We compare different solutions: a reference solution, the solution given by our asymptotic preserving scheme, and the solution given by a standard scheme. The reference solution is given by a standard discretization of Eq. (11). From Audit et al. (2002), when a scheme that is not asymptotic preserving is used with σΔx ≫ 1, the grid does not sample the mean free path of the photons and the solution is dominated by numerical diffusion. The relative L2 error between the reference solution and the solution with given by Eq. (25) is 1.1%, whereas with the standard HLL scheme the relative L2 error is 84%. Using the asymptotic correction, we recover the correct behavior in the asymptotic limit.
![]() |
Fig. 5. Marshak wave simulation. This figure shows a snapshot of the gas temperature at time tf = 2 × 10−4 s, with and without the asymptotic correction and the reference solution. Spatial resolution is n = 400 and the opacity is σ = 10 000 cm−1. |
5.2. Steady state with a jump of opacity
In the previous case, the opacity is constant, we now consider a test with a jump of opacity, still in the one-dimensional case. We use this test to highlight the need for a well-balanced modification of the source term.
The length of the computational domain is 1 cm; it is discretized with 100 points. Initially, the medium is at equilibrium with the radiation: T0 = Tr = 300 K, the initial radiative flux is Fr = 0. The opacity σ is now a function of space:
At time t = 0, a source is lit at the left boundary with Tr = 1000 K.
Figure 6 shows the radiative flux at time tf = 10−3 s. From Eq. (15b), when the steady state is reached, we expect the radiative flux to be constant in the box. Using a standard discretization of the source term, such as Eq. (16), a spurious peak located at the discontinuity of opacity is observed (orange curve). The value taken by the radiative flux is more than 20 times the expected value. This seems to be caused by a numerical instability. This can result in f > 1 during the iterations of our Newton–Raphson implicit scheme, which is not physically admissible. However, using the well-balanced modification of the source term proposed by Eq. (18) (blue curve), the spurious peak no longer appears and the constant steady state is reached.
![]() |
Fig. 6. Simulation of a steady state with a jump of opacity. The opacity is piecewise constant, and a jump is located at x = 0.5 cm (gray line). This figure shows a snapshot of the radiative flux at time tf = 10−3 s. |
Using the standard discretization of the source term in Eq. (16), one can show that the numerical scheme is unconditionally stable in that the error between the numerical solution and the exact solution goes to zero as Δx and Δt go to zero. The spurious peak seems to be due to a lack of precision in the integration of the source term. Using Eq. (18), the source term is defined at the interfaces of the cells and balances the divergence of radiative pressure, which is also defined at the interfaces.
5.3. Beam
We now perform the same two-dimensional test as in Richling et al. (2001), González et al. (2007). The domain [−1, 1] × [−1, 1] is discretized with 128 × 128 cells. The initial temperature is T0 = Tr = 300 K, the initial radiative flux is Fr = 0, and the opacity is σ = 0. At time t = 0, a beam with T = Tr = 1000 K is introduced with an angle of 45°. The beam is located at x = −1 and y ∈ [−0.875, −0.75]. Because we are in the free-streaming regime, we cannot use large time-steps. For performance reasons, we use the semi-implicit scheme.
Because there is no opacity, the beam propagates in the vacuum, and we expect it to cross the box without dispersion. The direction of the beam is not along the mesh axis; we use this test to quantify the numerical diffusion. Figure 7 shows the radiative energy at steady state, with the eigenvalues fixed to ±c and with computed eigenvalues. Because there is no opacity, neither the asymptotic correction nor the well-balanced source term affect the result, and we recover the same result as in González et al. (2007). Figure 8 shows the horizontal cut at the middle height. The beam introduced at the boundary is sampled over 8 cells. Ideally, without any numerical diffusion, we would expect the width of the beam to remain exactly 8 cells. With the computed eigenvalues, we can keep the numerical diffusion under control. Using fixed eigenvalues, the width of the beam at middle height is approximately 30 cells, whereas it is only 24 cells with calculated eigenvalues.
![]() |
Fig. 7. Beam simulation, showing radiative energy. The eigenvalues are fixed to ±c (upper panel) or calculated with Eq. (14) (lower panel). |
The main difference in this test between ARK-RT and HERACLES (González et al. 2007) is the computation of the eigenvalues. We use the exact eigenvalues given by Eq. (14) from Berthon et al. (2007), whereas in González et al. (2007), to save computational time, the eigenvalues are computed once at the beginning of the simulation and then interpolated. However, this approximation does not impact the result.
5.4. Shadow
Let us now consider a two-dimensional test with source terms. Following Hayes & Norman (2003), González et al. (2007), we consider a shadow test. The computational domain is a cylinder of length L = 1 cm and radius R = 0.12 cm, and is discretized with 280 × 80 cells. A spheroid clump is located at the center of the box, on the symmetric axis: (zc, rc)=(0.5, 0). The extension of the clump is (z0, r0) = (0.1, 0.06). Initially, the medium is at equilibrium with the radiation, with T0 = Tr = 290 K. We consider a homogeneous gas, with ρ0 = 1 g cm−3, except for the clump with density ρ1 = 100ρ0. The boundary of the clump is smoothed: with
. The opacity in the medium is
with σ0 = 0.1 cm−1. At time t = 0, a source is lighted at the left boundary with Tr = 1740 K and the reduced flux is set to f = 1. Because f is close to 1 in the free-streaming regime, we encounter f > 1 in the simulation. To tackle this issue, we use the nonwell-balanced scheme: the radiative flux source term is discretized using Eq. (16). Because we are in the free-streaming regime, we cannot use large time-steps. For performance reasons, we use the semi-implicit scheme. To recover the same result as in González et al. (2007), we use
and
, where λmax and λmin are given by Eq. (14).
Figure 9 shows the radiative temperature at the final time tf = 10−10 s with different closure relations: the P1 model, the M1 model with fixed eigenvalues, and the M1 model with computed eigenvalues. Because of the high opacity in the clump, the light does not cross it and we expect the shadow behind it to remain stable.
![]() |
Fig. 9. Shadow simulation, showing snapshots of the radiative temperature at time tf = 10−10 s with different closure relations: P1 model (upper panel), M1 model with fixed eigenvalues (middle panel), and M1 model with computed eigenvalues (lower panel). |
As in Hayes & Norman (2003), González et al. (2007), we plot the radial profile of the radiative temperature at the right boundary (Fig. 10). Using the P1 model, the radiative pressure is isotropic, and therefore the photons go around the obstacle immediately, heating the whole domain. Using the M1 closure relation, the shadow is better preserved, and the temperature behind the obstacle remains at its initial value, 290 K. As the opacity remains rather low outside of the clump and the light has not crossed the obstacle, the asymptotic correction has no impact on the result. Because the boundary of the clump is smoothed, the transition between the optically thick and thin medium is less sharp than in Sect. 5.2 and the well-balanced modification of the source term is not necessary.
![]() |
Fig. 10. Shadow simulation, showing the radial profiles of the radiative temperature at time tf = 10−10 s with different closure relations: P1 model, M1 model with fixed eigenvalues, and M1 model with computed eigenvalues. |
5.5. Radiative shocks
We finally consider radiative shocks: the gas and the radiation exchange energy and momentum. Following Ensman (1994), Hayes & Norman (2003), González et al. (2007), we consider a one-dimensional homogeneous medium, with ρ = 7.78 × 10−10 g cm−3 and σ = 3.1 × 10−10 cm−1. We consider a perfect gas with an adiabatic coefficient and a mean molecular weight μ = 1. The length of the domain is 7 × 1010 cm. It is discretized with 400 cells. The initial temperature at the left boundary is set to 10 K and is increased by 0.25 K per cell. Initially, the radiation is at equilibrium with the gas. The left boundary condition is reflective, and the initial velocity of the fluid is set to u0. According to the value of u0, the shock will be subcritical or supercritical; see González et al. (2007) for more details. To compare our results with those of Ensman (1994), Hayes & Norman (2003), González et al. (2007), we plot the temperature as a function of xi = x − u0t.
5.5.1. Subcritical shock
We first consider a subcritical shock, the initial velocity is set to u0 = −6 km s−1. Figure 11 shows the gas temperature, the radiative temperature, and the reduced flux at three different times: 1.7 × 104 s, 2.8 × 104 s, and 3.8 × 104 s. As expected, the gas and the radiation are not at equilibrium, before nor after the shock. The gas temperature reaches 1135 K, as in González et al. (2007), whereas it is only 850 K in Ensman (1994).
![]() |
Fig. 11. Subcritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 1.7 × 104 s, 2.8 × 104 s, and 3.8 × 104 s. |
5.5.2. Supercritical shock
We now consider a supercritical shock, where the initial velocity is set to u0 = −20 km s−1. Figure 12 shows the gas temperature, the radiative temperature, and the reduced flux at three different times: 4 × 103 s, 7.5 × 103 s, and 1.3 × 104 s. As in González et al. (2007), the radiative temperature is the same as the matter temperature on both sides of the shock. The gas and the radiation are therefore at equilibrium. The radiative precursor is larger than that of the subcritical shock, as intended, and the temperature reaches 5000 K, as in Ensman (1994). We also recover the Zel’dovich spike.
![]() |
Fig. 12. Supercritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 4.0 × 103 s, 7.5 × 103 s, and 1.3 × 104 s. |
5.6. Expansion of H II region
Now that we have confirmed the good behavior of the numerical scheme with both the asymptotic preserving and the well-balanced properties, we can apply it to a physical situation: the propagation of the ionization front in a massive pre-stellar dense core.
5.6.1. Model
We consider the early stage of the development of an H II region in a massive pre-stellar dense core (Churchwell 2002). We focus on a region of the dense core at about 100 AU from the massive young stellar object (YSO). This region has been heated by the YSO during the pre-main sequence phase. The temperature reached at this location by infrared heating is of the order of 1000 K and the transport of energy in this region can be dominated by convection. We have inferred the convective state of this region by computing thermal and adiabatic gradients based on observations of Herpin et al. (2009) (Fig. 7). High-energy photons emitted by the YSO when entering the main sequence start to ionize the surrounding gas. This triggers the propagation of an ionization front in a convective medium, and we are interested in the stability of such a front perturbed by the pre-existing convective motions.
The interaction of the ionizing photons with the gas is described by Eq. (5). The only photons able to ionize the gas are emitted by the YSO, that is, there is no local source of ionizing photons. Following Tremblin (2012), we need to modify this model to take into account photochemistry and thermal balance. We define the fraction of ionization X = nH+/nH where nH = nH+ + nH0, nH+ is the number of ionized atoms and nH0 is the number of cold atoms. The evolution of the number of ionized atoms is simply the number of incoming photons that interact with the gas minus the number of ionized atoms that recombine (on the spot approximation, see Lesaffre 2002). Therefore,
where Fγ is the number of incoming photons per unit of surface and time, σγ is the average cross-section at the temperature of the star, and β gives the recombination rate: β = 2 × 10−10T−0.75 with T the temperature of thermodynamic equilibrium (Black 1981).
The thermal balance is the difference between the heating rate and the cooling rate. The extra energy of the absorbed photons is converted into kinetic energy of electrons; this is the only source of heating during the ionization, and therefore the heating rate is given by (1 − X)nHFγσγeγ. In this simplified model, the equilibrium temperature is obtained from the balance between the heating from the ionization and the cooling from the recombination. We do not consider any other effects, such as metal cooling. Therefore, we take eγ = 1 eV (Lesaffre 2002) to recover the observed temperature around 1000 K. From Tremblin (2012), the cooling rate is given by . We also add a term of Newtonian forcing,
, to trigger convection. Tforcing is the equilibrium temperature profile, depending on space, and τforcing is the relaxation timescale. The gas temperature will relax toward the equilibrium temperature profile Tforcing.
By writing cEr = Fγeγ, ρ = nHmH and σ = σγnH, we finally have to solve the following system:
In this test, we use the M1 solver with the asymptotic correction presented in Sect. 3.1.3, but we do not use the well-balanced discretization of the source term because of stability issues that we discuss in Sect. 6.1.
5.6.2. Setup
We consider a square domain of 5 AU on all sides. We use a setup similar to Padioleau et al. (2019) for compressible convection simulations. The temperature is set to 500 K and 1000 K at the top and bottom of the box, respectively. The initial temperature is a linear interpolation between the top and the bottom of the box. It is also the forcing temperature profile Tforcing. We take τforcing = 107 s. These parameters are chosen to trigger the initial convective motions. We also set the pressure at the bottom: 10−3 dyne ⋅ cm−2 (Herpin et al. 2009). The density and the pressure are linked by the ideal gas law: , where μ is the mean molecular weight. The nonionized medium is made of hydrogen, with μ1 = 1. When the medium is fully ionized, it is made of atomic nucleus and electrons, and therefore has twice as many particles for the same mass. Because the distribution of nucleus and electrons is homogeneous, the mean molecular weight is μ2 = 0.5. When the medium is partially ionized, we take μ as the mean of the previous values balanced by the fraction of ionization, that is, μ = (1 − X)μ1 + Xμ2. The density is initialized with the recursive formula
, which is the discrete version of the hydrostatic balance ∇p = −ρg.
We impose Neumann boundary conditions for the temperature. The pressure and density are imposed by an extrapolation of the hydrostatic balance. Because the hydrodynamics solver is well-balanced for the gravity, this configuration will remain static, even if the initial condition is unstable. The hydrostatic equilibrium is destabilized with a velocity mode perturbation of the form
with cs the speed of sound, xmid = ymid = 2.5 AU and Lx = Ly = 5 AU. Without any interaction with the ionizing photons, the convective motions are stationary.
The opacity is set to with σγ = 6 × 10−18 cm2 (Lesaffre 2002). The radiative energy and flux are set to zero and the medium is not ionized (X = 0). We initialize the hydrodynamics variables with the steady state described above. At time t = 0, the bottom boundary of the region is ionized: the reduced flux is set to a value of 1 and the radiative energy is set to
with F* = 3 × 1017 cm−2 s−1 in the boundary. The boundary conditions for the hydrodynamics variables remain unchanged.
5.6.3. Results
As the initial condition is such that Er is close to zero, this can easily create some spurious values such that f > 1. This is clearly a numerical artifact induced by the very low value of the radiative energy in regions where no ionizing photons are present. Even with a centered discretization of the radiative flux source term and without the asymptotic correction, we still encounter f > 1 during the simulation. Because of this problem, we impose f = 1 in the computation of the Eddington tensor in the cells where f > 1.
Figure 13 shows the evolution of the position of the ionization front as a function of time. With and without the initial convective rolls, the position of the ionization front oscillates around an equilibrium position, between 0.3 AU and 0.4 AU. The oscillations around the equilibrium are expected and have been observed with simpler models (Tremblin et al. 2012).
![]() |
Fig. 13. Evolution of the position of the ionization front as a function of time, with and without the initial velocity perturbation. |
Figures 14 and 15 show the ionization front at time t = 6 × 108 s and at the final time tf = 1010 s. With and without the initial convective rolls, a numerical noise appears as a consequence of the long timescales. Because of the numerical noise, some lack of symmetry can appear, such as is seen in the left panel of Fig. 14. The fraction of ionization, which is always between 0 and 1, reaches values between 10−12 and 10−6. The effect of the preconditioner and the MPI domain decomposition is discussed in Appendix D. However, the numerical noise does not affect the position of the ionization front.
![]() |
Fig. 14. Snapshots of the fraction of ionization and the velocity field at time t = 6 × 108 s without the initial velocity perturbation (left panel) and with it (right panel). |
![]() |
Fig. 15. Snapshots of the fraction of ionization and the velocity field at the final time tf = 1010 s without the initial velocity perturbation (left panel) and with it (right panel). |
The stability of the ionization front is an issue that has been discussed for a long time in the literature (Mizuta et al. 2008). For example, 3D simulations of the expansion of a spherical ionization front in 3D Cartesian grids have shown instabilities either on the axis of the grid or in the diagonal depending on the numerical scheme (see Fig. A3 Bisbas et al. 2015). The dependence of the instability on the grid casts doubt on the possibility of a physical regime. Our test case shows that even with convective motions of large amplitude, the ionization front remains stable.
6. Discussion and conclusion
We present a new radiation hydrodynamics code. The radiative transfer is described with a two-moment model and uses the M1 closure relation. We first discuss some limitations of our work before presenting our conclusions.
6.1. Well-balanced discretization of the source term
In Sect. 3.1.2 we propose a well-balanced discretization of the source term on the radiative flux equation. This discretization allows us to properly capture the steady state with constant flux and with a discontinuity of opacity. However, this discretization can lead to spurious oscillations in the radiative flux, a problem that we have encountered in the test case for the expansion of H II regions. Although we have changed the discretization of the source term to achieve a well-balanced property, our integration of the hyperbolic part on the source term is still split into two steps. Such a splitting strategy might be unstable if the source term is not taken into account in the hyperbolic part. A possible solution to this problem would be to incorporate the source term in a Lagrange-projection-like scheme such as Buet & Despres (2008). Such a strategy might be necessary to treat the radiation hydrodynamics problem of cloud interfaces.
6.2. Asymptotic limit for radiation hydrodynamics
In Sect. 3.1.3, we present an asymptotic correction which allows us to capture the asymptotic behavior, whereas the solution given by a standard scheme is dominated by numerical diffusion. The asymptotic correction uses the numerical diffusion to recover the physical one in a static fluid. Nevertheless, this scheme does not capture the asymptotic regime in a moving fluid, as presented in Appendix A. Most of the schemes proposed in the literature do not preserve this asymptotic regime (e.g., González et al. 2007; Berthon & Turpault 2011). The diffusive regime depends on the material velocity; our scheme cannot reach it. A possible solution would be to limit the numerical diffusion with a correction similar to a low Mach correction, as in Chalons et al. (2016), in conjunction with a cell-centered discretization of the source term, as proposed by the Lagrange-projection scheme of Buet & Despres (2008).
6.3. Conclusion
The model for radiation hydrodynamics proposed in this paper has the correct behavior in both free-streaming and diffusive limits. It is discretized with an asymptotic preserving scheme. This asymptotic correction is important to capture the correct behavior in the diffusive limit and to preserve the free-streaming regime in a static fluid.
We take advantage of the libraries Kokkos and Trilinos to reach high-performance computing and to solve linear systems. This approach allows us to take advantage of different architectures and to use large time-steps for radiative transfer. Using the implicit solver is profitable as soon as the time-step given by the hydrodynamics is 50−100 times larger than the explicit time-step for radiative transfer, depending on the preconditioner and the architecture.
The solver is then coupled with the hydrodynamics code that implements an all-regime and well-balanced solver for hydrodynamics with gravity. The tests performed show that ARK-RT is well suited to studying many astrophysical problems. We used this code to study the development of H II regions in massive pre-stellar dense cores, especially the propagation of the ionization front in the presence of convection. We show that even with the destabilizing effect of convection, the ionization front is strongly stable against perturbations. A linear stability analysis similar to the Rayleigh-Taylor instability but including a source term could provide more insight into this behavior.
Further work will consist of the development of a numerical scheme that preserves the admissible states while keeping the asymptotic preserving and well-balanced properties. We will then be able to take full advantage of the largest next-generation, massively parallel architectures to study atmospheric physics with opacity interfaces, among other things.
Acknowledgments
P. T. acknowledges supports by the European Research Council under Grant Agreement ATMO 757858. The authors thank the referee for his useful comments, especially about the test on the expansion of H II regions. H. B. thanks also Rodolphe Turpault for his welcome in Bordeaux and the helpful discussion.
References
- Anderson, J. D. 1995, Computational Fluid Dynamics : The Basics with Applications, McGraw-Hill Series in Mechanical Engineering (New York: McGraw-Hill) [Google Scholar]
- Audit, E., Charrier, P., Chièze, J. P., & Dubroca, B. 2002, ArXiv e-prints [arXiv:astro-ph/0206281] [Google Scholar]
- Baker, C., & Heroux, M. 2012, Sci. Program., 20, 115 [Google Scholar]
- Bavier, E., Hoemmen, M., Rajamanickam, S., & Thornquist, H. 2012, Sci. Program., 20, 241 [Google Scholar]
- Berger-Vergiat, L., Glusa, C. A., Hu, J. J., et al. 2019a, MueLu Multigrid Framework, http://trilinos.org/packages/muelu [Google Scholar]
- Berger-Vergiat, L., Glusa, C. A., Hu, J. J., et al. 2019b, MueLu User’s Guide, Tech. Rep. SAND2019-0537 (Sandia National Laboratories) [Google Scholar]
- Berthon, C., & Turpault, R. 2011, Numer. Methods Partial Differ. Equations, 27, 1396 [CrossRef] [Google Scholar]
- Berthon, C., Charrier, P., & Dubroca, B. 2007, J. Sci. Comput., 31, 347 [CrossRef] [Google Scholar]
- Berthon, C., Crestetto, A., & Foucher, F. 2015, J. Sci. Comput., 67, 618 [CrossRef] [Google Scholar]
- Bisbas, T. G., Haworth, T. J., Williams, R. J. R., et al. 2015, MNRAS, 453, 1324 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
- Black, J. H. 1981, MNRAS, 197, 553 [NASA ADS] [Google Scholar]
- Buet, C., & Despres, B. 2008, A gas dynamics scheme for a two moments model of radiative transfer [Google Scholar]
- Cai, X.-C., & Sarkis, M. 1999, SIAM J. Sci. Comput., 21, 792 [CrossRef] [Google Scholar]
- Chalons, C., Girardin, M., & Kokh, S. 2016, Commun. Comput. Phys., 20, 188 [Google Scholar]
- Chandrasekhar, S. 1960, Radiative Transfer Dover Books on Intermediate and Advanced Mathematics (Dover Publications) [Google Scholar]
- Churchwell, E. 2002, ARA&A, 20, 27 [NASA ADS] [CrossRef] [Google Scholar]
- Dubroca, B., & Feugeas, J. L. 1999, C. R. Acad. Sci. - Ser. I - Math., 329, 915 [NASA ADS] [Google Scholar]
- Edwards, H. C., Trott, C. R., & Sunderland, D. 2014, J. Parallel Distrib. Comput., 74, 3202 [CrossRef] [PubMed] [Google Scholar]
- Ensman, L. 1994, ApJ, 424, 275 [NASA ADS] [CrossRef] [Google Scholar]
- Gnedin, N. Y., & Abel, T. 2001, New Astron., 6, 437 [NASA ADS] [CrossRef] [Google Scholar]
- González, M. 2006, Theses, Université Paris Sud - Paris XI [Google Scholar]
- González, M., Audit, E., & Huynh, P. 2007, A&A, 464, 429 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Hayes, J. C., & Norman, M. L. 2003, ApJS, 147, 197 [NASA ADS] [CrossRef] [Google Scholar]
- Heroux, M. A., Bartlett, R. A., Howle, V. E., et al. 2005, ACM Trans. Math. Softw., 31, 397 [CrossRef] [Google Scholar]
- Herpin, F., Marseille, M., Wakelam, V., Bontemps, S., & Lis, D. C. 2009, A&A, 504, 853 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Kestener, P. 2017, Implementing High-Resolution Fluid Dynamics Solver in a Performance Portable Way with Kokkos, Tech. rep. (GPU Technology Conf. GTC) [Google Scholar]
- Krumholz, M. R., Klein, R. I., McKee, C. F., & Bolstad, J. 2007, ApJ, 667, 626 [NASA ADS] [CrossRef] [Google Scholar]
- Kuiper, R., Klahr, H., Beuther, H., & Henning, T. 2010, ApJ, 722, 1556 [CrossRef] [Google Scholar]
- Lesaffre, P. 2002, Theses, Université Paris-Diderot - Paris VII, France [Google Scholar]
- Levermore, C. D. 1984, J. Quant. Spectr. Rad. Transf., 31, 149 [NASA ADS] [CrossRef] [Google Scholar]
- Levermore, C. D., & Pomraning, G. C. 1981, ApJ, 248, 321 [Google Scholar]
- Lowrie, R. B., Morel, J. E., & Hittinger, J. A. 1999, ApJ, 521, 432 [CrossRef] [Google Scholar]
- Mignon-Risse, R., González, M., Commerçon, B., & Rosdahl, J. 2020, A&A, 635, A42 [CrossRef] [EDP Sciences] [Google Scholar]
- Mihalas, D., & Mihalas, B. W. 1984, Foundations of Radiation Hydrodynamics (New York: Oxford University Press) [Google Scholar]
- Mizuta, A., Kane, J., Pound, M., et al. 2008, ApJ, 621, 803 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Padioleau, T., Tremblin, P., Audit, E., Kestener, P., & Kokh, S. 2019, ApJ, 875, 128 [CrossRef] [Google Scholar]
- Pichard, T., Alldredge, G., Brull, S., Dubroca, B., & Frank, M. 2016, J. Comput. Theor. Transp., 1 [Google Scholar]
- Prokopenko, A., Siefert, C. M., Hu, J. J., Hoemmen, M., & Klinvex, A. 2016, Ifpack2 User’s Guide 1.0, Tech. Rep. SAND2016-5338 (Sandia National Labs) [Google Scholar]
- Richling, S., Meinköhn, E., Kryzhevoi, N., & Kanschat, G. 2001, A&A, 380, 776 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Saad, Y. 1994, Numer. Linear Algebra Appl., 1, 387 [CrossRef] [Google Scholar]
- Saad, Y. 2003, Iterative Methods for Sparse Linear Systems, 2nd edn. (Philadelphia, PA, USA: Society for Industrial and Applied Mathematics) [Google Scholar]
- Spitzer, L. 1978, Physical Processes in the Interstellar Medium (New York: Wiley) [Google Scholar]
- Stiavelli, M. 2009, From First Light to Reionization: The End of the Dark Ages [CrossRef] [Google Scholar]
- Thomas, G. E., & Stamnes, K. 2002, Radiative Transfer in the Atmosphere and Ocean (UK: Cambridge University Press) [Google Scholar]
- Toro, E. 2009, Riemann Solvers and Numerical Methods for Fluid Dynamics, 163 [CrossRef] [Google Scholar]
- Tremblin, P. 2012, PhD Thesis, Univ. Paris Diderot - Paris 7, France [Google Scholar]
- Tremblin, P., Audit, E., Minier, V., Schmidt, W., & Schneider, N. 2012, A&A, 546, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Turpault, R. 2005, J. Quant. Spectrosc. Radiat. Transfer, 94, 357 [CrossRef] [Google Scholar]
- Van der Vorst, H. 1992, SIAM J. Sci. Stat. Comput., 13, 631 [CrossRef] [MathSciNet] [Google Scholar]
Appendix A: Diffusive limit for radiation hydrodynamics
As in Sect. 2.4, we consider the diffusive limit with the P1 closure relation. We introduce a rescaling parameter ε to write the time and opacity as t̃ = εt and , respectively. Because the velocity of the fluid is smaller than the speed of light (
), we also rescale it as
. Let us focus on the equations describing the evolution of the radiative variables:
By expanding Eqs. (A.1a) and (A.1b) at zero order, we have
Expanding Eq. (A.1b) at first order leads to
Finally, looking at the radiative energy and the gas internal energy at second order, source terms cancel each other, and only the divergence of the radiative flux at first order remains, and we have
One can also look at Eq. (A.1a) at second order. Expanding Eq. (A.1a) at second order gives
We recover Eq. (43) of Krumholz et al. (2007). The term is the development at second order of the term κ0(4πB − cE). Because we do not neglect any terms
, some coefficients are slightly different. See the discussion in Krumholz et al. (2007) for the importance of the term
.
Appendix B: Von Neumann stability analysis for the well-balanced modification of the source term
For simplicity, we split Eq. 3b into a pure hyperbolic problem ∂tFr + c2∇ · Pr = 0 and a source problem ∂tFr = −cσFr. We focus on the one-dimensional source problem, with periodic boundary conditions on the domain [0, T] × [0, 1] with T the final time. The following can easily be extended to an arbitrary space interval. Because we use periodic boundary conditions, we can apply the von Neumann stability analysis (see e.g., Anderson 1995), based on the decomposition of the numerical solution into Fourier series. Let us recall that, using the well-balanced modification of the source term, the source problem is discretized as
with . We define the function Fn, piecewise constant, such that
This function is then extended to ℝ by periodicity. Fn can now be expanded in a Fourier series:
with
We can define the 2-norm of the function Fn:
We apply the Fourier transform to Eq. (B.1):
We define the amplification factor A(k) as
and we then have . By induction, we have
. The coefficient
remains bounded if and only if |A(k)| ≤ 1. In this case, for all k ∈ ℤ,
. Therefore, ||Fn + 1||2 ≤ ||Fn||2 ≤ ||F0||2 and the scheme is unconditionally stable. We now have to prove that |A(k)| ≤ 1:
As , we have |A(k)| ≤ 1.
Appendix C: Numerical scheme in the diffusive limit
We consider the numerical scheme developed in Sect. 3 in the asymptotic regime, with . Following Sect. 2.4, we introduce the rescaling parameter ε to write the time and opacity as
and
, respectively. Using the P1 closure relation, we have
and
Radiative variables are expanded, e.g., . Expanding Eqs. (C.1a) and (C.1b) at zero order leads to
At first order for Eq. (C.1b), we have
Using the boundary condition given by Eq. (24), we have
in the whole domain.
Now, we consider the sum of Eqs. (C.1a) and (C.1c) expanded at second order. If , we have
whereas the asymptotic development of a standard discretization of Eq. (11) would be
Therefore, we are looking for such that the term of first order in Eq. (C.5) becomes a term of second order with the expected coefficient of diffusion
and the term of second order becomes a term of third order and therefore negligible. In other words, we want the asymptotic development of
to be
. One way to achieve this is to take
However, in the general case, we do not have . We can then replace Eq. (C.7) by
Unfortunately, in numerical tests with σΔx close to 1, the condition f < 1 is not preserved. Because f is close to 1 in this case, we write
We use because numerical experiments have shown good results using this form. In the diffusion regime, because
, we recover Eq. (C.8).
Now that we have the form of , we can check that the proposed scheme is asymptotic preserving. We have
Therefore,
We finally have
Equations (C.2), (C.4), and (C.12) are standard discretization of Eqs. (9)–(11), and therefore this scheme is asymptotic preserving.
Appendix D: Expansion of H II region
In the test case described in Sect. 5.6, some numerical noise appears as a consequence of the long timescales. Let us recall that a time-implicit scheme is used, with large time-steps for the radiative transfer. At each time-step, the Newton-Raphson method is used and at each iteration of this algorithm an ill-conditioned linear system is solved using an iterative process. This results in the appearance of some numerical noise.
We performed the same simulations as in Sect. 5.6 with different numbers of MPI processes and different preconditioners. The physical domain is either distributed over 4 × 4 MPI processes (Fig. D.1a,c) or 2 × 2 MPI processes (Fig. D.1b,d). We also tried two preconditioners which allowed us to reach the final time with reasonable computational time: a standard ILU(k) factorization (Fig. D.1a,b) and an additive Schwarz domain decomposition (Fig. D.1c,d).
![]() |
Fig. D.1. Snapshots of the fraction of ionization and the velocity field at the final time tf = 1010 s without the initial velocity perturbation (left panel) and with it (right panel). The physical domain is distributed across different numbers of MPI processes and different preconditioners have been used. Figure D.1a is the same figure as Fig. 15. (a) 4 × 4 MPI processes, ILU(k) factorization. (b) 2 × 2 MPI processes, ILU(k) factorization. (c) 4 × 4 MPI processes, Schwarz domain decomposition. (d) 2 × 2 MPI processes, Schwarz domain decomposition. |
Figure D.1 shows snapshots of the fraction of ionization and the velocity field at the final time tf = 1010 s. The shape of the small structures produced by the numerical noise varies with the number of MPI processes and the preconditioner. Furthermore, the propagation of the ionization front creates some velocity that also depends on the number of MPI processes and preconditioners. However, the position of the ionization front is not affected by these parameters.
All Tables
All Figures
![]() |
Fig. 1. Weak scaling test. Each MPI process treats 15002 cells. We tested different preconditioners: Jacobi with damping (Relaxation), AMG, standard ILU(k) factorization (RILUK), a variant of the standard ILU factorization (ILUT), and additive Schwarz domain decomposition (Schwarz). Left panel: number of iterations to solve the linear system as a function of the number of cells. Right panel: speed-up as a function of the number of MPI processes. |
In the text |
![]() |
Fig. 2. Strong scaling test. The global resolution is 20482 cells. The same preconditioners as those listed in Fig. 1 were tested. Left panel: number of iterations to solve the linear system as a function of the number of MPI processes. Right panel: speed-up as a function of the number of MPI processes. |
In the text |
![]() |
Fig. 3. Computational time for the implicit solver with different preconditioners (as listed for Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 10002 cells. |
In the text |
![]() |
Fig. 4. Memory consumption for the implicit solver with different preconditioners (as listed in Fig. 1) on different architectures (Sandy Bridge CPU and K80 NVIDIA GPU). The resolution is 10002 cells. |
In the text |
![]() |
Fig. 5. Marshak wave simulation. This figure shows a snapshot of the gas temperature at time tf = 2 × 10−4 s, with and without the asymptotic correction and the reference solution. Spatial resolution is n = 400 and the opacity is σ = 10 000 cm−1. |
In the text |
![]() |
Fig. 6. Simulation of a steady state with a jump of opacity. The opacity is piecewise constant, and a jump is located at x = 0.5 cm (gray line). This figure shows a snapshot of the radiative flux at time tf = 10−3 s. |
In the text |
![]() |
Fig. 7. Beam simulation, showing radiative energy. The eigenvalues are fixed to ±c (upper panel) or calculated with Eq. (14) (lower panel). |
In the text |
![]() |
Fig. 8. Beam simulation, showing a horizontal cut in Fig. 7 at the middle height. |
In the text |
![]() |
Fig. 9. Shadow simulation, showing snapshots of the radiative temperature at time tf = 10−10 s with different closure relations: P1 model (upper panel), M1 model with fixed eigenvalues (middle panel), and M1 model with computed eigenvalues (lower panel). |
In the text |
![]() |
Fig. 10. Shadow simulation, showing the radial profiles of the radiative temperature at time tf = 10−10 s with different closure relations: P1 model, M1 model with fixed eigenvalues, and M1 model with computed eigenvalues. |
In the text |
![]() |
Fig. 11. Subcritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 1.7 × 104 s, 2.8 × 104 s, and 3.8 × 104 s. |
In the text |
![]() |
Fig. 12. Supercritical shock simulation, showing snapshots of gas temperature, radiative temperature, and reduced flux at different times: 4.0 × 103 s, 7.5 × 103 s, and 1.3 × 104 s. |
In the text |
![]() |
Fig. 13. Evolution of the position of the ionization front as a function of time, with and without the initial velocity perturbation. |
In the text |
![]() |
Fig. 14. Snapshots of the fraction of ionization and the velocity field at time t = 6 × 108 s without the initial velocity perturbation (left panel) and with it (right panel). |
In the text |
![]() |
Fig. 15. Snapshots of the fraction of ionization and the velocity field at the final time tf = 1010 s without the initial velocity perturbation (left panel) and with it (right panel). |
In the text |
![]() |
Fig. D.1. Snapshots of the fraction of ionization and the velocity field at the final time tf = 1010 s without the initial velocity perturbation (left panel) and with it (right panel). The physical domain is distributed across different numbers of MPI processes and different preconditioners have been used. Figure D.1a is the same figure as Fig. 15. (a) 4 × 4 MPI processes, ILU(k) factorization. (b) 2 × 2 MPI processes, ILU(k) factorization. (c) 4 × 4 MPI processes, Schwarz domain decomposition. (d) 2 × 2 MPI processes, Schwarz domain decomposition. |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.