EDP Sciences
Free access
Volume 525, January 2011
Article Number A70
Number of page(s) 7
Section Numerical methods and codes
DOI http://dx.doi.org/10.1051/0004-6361/201014070
Published online 02 December 2010

© ESO, 2010

1. Introduction

Monte Carlo methods have been used successfully for radiative transfer problems for many years (see e.g., Juvela 2005; Lucy 1999b). When these algorithms are employed usually the elementary processes, i.e., the flights of a (large) number of photons, are simulated or packages of photons cf. Lucy (1999a,b), of a given energy are tracked. These methods, which we refer to as classical Monte Carlo methods are popular since (i) the corresponding code is quite easy to implement; (ii) complicated geometries and various physical effects can be included in a straightforward way; and (iii) the computing time to solve a given problem with a reasonable accuracy is reasonable as long as the optical depths are not too high.

On the other hand, it is well-known that Monte Carlo methods can be used quite generally for the solution of linear ordinary and partial differential equations (see e.g. Zwillinger 1989). To our knowledge, Markov Chains have been proposed in astrophysics only once for radiative transfer problems (Esposito & House 1978) but only in the context of specific intensities. However, in the fields of computer visualisation, Baysian statistics or global optimization similar approaches (called Markov Chain Monte Carlo or abbreviated MCMC) are frequently used. In this paper we follow this line of thought and apply it to the solution of the radiative transfer equation. Pseudo particles (or for short) particles are mostly not physical entities but symbolize stochasic variables, which are more intuitive.

For a general introduction to stochastic processes, we refer to e.g. Cox & Miller (1965), Papoulis (1991), or Williams (2001). An excellent introduction to these ideas can be found in Diaconis’ article (2009).

To establish a sound mathematical basis, we present in the next two sections a short introduction to transient Markov processes including the corresponding potential equation (Sect. 2) and discuss the probabilistic background of the Monte Carlo method (Sect. 3). The potential of a Markov process is a well known subject in the theory of Markov processes (see Meyer 1966). The classical potential given by the potential equation ΔΦ = −4πρ (cf. Eq. (6)) is the potential of the Brownian motion. In Sect. 4, we show that Beer’s law and the radiative transfer equation can be derived by means of simple probabilistic arguments and the corresponding difference equation may be solved very simply by a Markov Chain Monte Carlo method. We also demonstrate how various forms of the transfer equation (differential, difference, or stochastic equation) and the corresponding solutions transform into each other. In Sect. 5, the probabilistic interpretation of the discretized Feautrier equation is derived and discussed. To highlight the wide range of application of the Markov Chain Monte Carlo algorithm, the Milne equation is shown in Sect. 6 to be the potential equation of a Markov process. We note, that Milne’s equation is defined in three dimensional space, so it is much easier to treat than the usual radiative transfer equation in five dimensional space (for fixed time and frequency variable). A general scheme for the numerical application is given. In Sect. 7, the numerical properties of MCMC are exemplified by means of several 1D and 3D configurations. We end with a discussion of the advantages and disadvantages of MCMC compared to the classical Monte Carlo approach and grid methods.

2. The transient Markov process and the corresponding potential equation

In accordance with Wang & Uhlenbeck (1945), we consider a pseudo particle that can be in one of i = 1,...,N + 1 states at discrete times t = 0,1,2,... A random process performed by this pseudo particle is described completely by the set of probability distributions wn((i1,t1),(i2,t2),...,(in,tn)) where

  • w1(i,t) indicates the probability of finding the pseudo particle at place i at time t;

  • w2((i1,t1)),(i2,t2)) indicates the probability of finding the particle at time t2 in the place i2 and at time t1 at place i1;

  • w3((i1,t1),(i2,t2),(i3,t3)) gives the probability of finding the particle at time t3 in the place i3, at time t2 in the place i2 and at time t1 at place i1;

  • and so on.

The set of distributions must fulfill the following conditions:

  • wn((i1,t1),...,(in,tn)) ≥ 0;

  • wn((i1,t1),...,(in,tn)) is symmetric in its arguments;

  • as the knowledge of the distributions at times t1,...tn implies knowledge of the distributions at times t1,...tk for k < n.

The set of distributions form therefore a kind of hierarchy. They describe successively the random process in more detail.

A Markov process is given by an initial probability π and by the matrix of transition probabilities P1. The initial probability is a vector π = (π1,π2,...,πN + 1)T with πi ≥ 0 and  ∑ πi = 1. The matrix elements of P are the transition probabilities Pi,j that indicates the probability of a transition from state j to state i where Pi,j > 0 and  ∑ Pi,j = 1. A Markov process is now defined by (1)Therefore, the probability of moving in one step from state j to state i is Pi,j and the corresponding probability of this occurring in n steps is (Pn)i,j. This Markov process with discrete states (here 1,...,N + 1 ) and discrete times ( here 0,1,2,... ) is often called a Markov chain.

We now assume that the Markov process is transient, which means in the simplest case (used in this paper) that there exists a unique state Φ, called an absorbing state or “graveyard”, in which all pseudo particles which have left the system are gathered. Since the pseudo particles do not return, Φ indicates that the process remains in the graveyard after the particle has entered it. Transient means that all particles finally end in Φ with probability 1 regardless of the point in  { 1,...,N + 1 }  where they started, i.e. that each particle is finally either absorbed or leaves the medium.

We assume that the graveyard is the state N + 1. Then P is of the form (2)where Q is an N × N-matrix representing the probabilities for jumps outside the graveyard and q is the probability of a jump into the graveyard.

Since the process ends in state (N + 1) with probability 1, we have Qn → 0 for n → ∞. Hence, (1 − Q)(1 + Q + Q2...Qn) = 1 − Qn + 1 → 1 for n → ∞ and therefore 1 − Q is invertible, where 1 denotes the N-dimensional unit matrix.

We define the potential of the transient Markov process with the matrix (3)The element Πi,j indicates the mean number of times that a pseudo particle enters state i during its “life time”, when it has started at position j.

The potential equationof the Markov process with S as a source is given by (4)The solution of Eq. (4) is (5)We note that this potential equation is the analogue of the classical potential equation (6)in which the underlying Markov process is the Brownian motion. The matrix Q corresponds to the Laplace operator and S to −4πρ, and Π is the Poisson kernel as can be seen from the discretisation of the Laplace operator. In the Dirichlet problem, i.e., for zero boundary conditions, the particle goes to the graveyard, when it crosses the boundary.

3. Probabilistic background of the Monte Carlo method

We choose an enormous, but finite number K and consider the space

of all sequences of numbers 1,...,N + 1 of length K + 1. On the huge, but finite space, we define a probability P by

We avoid treating the case K = ∞, for that would require some complicated mathematical constructions. A subset A ⊂ ΩK has the probability

A random variable is a mapping from ΩK into the set of real numbers. If Y is a random variable, we define the expectation or mean or average value by

We consider the random variable

where Nj is the number of times the particle is in state j for the times t = 0,...,K, i.e. practically during its lifetime. It is well known that for a transient Markov process

When we return to the potential equation (Eq. (4)) and assume that Si ≥ 0 , we have to assume in particular that SN + 1 = 0. TWe then define the initial probability by πi = Si/(∑ lSl), so that

These considerations suggest – as a recipe for the use of the Monte Carlo method – that we perform the following experiment: We choose the starting point of the pseudo particle according to the initial probability π. we then let the particle jump until it enters the graveyard. We subsequently, count the number of times it has been in the state j. The experiment has to be repeated many times and the average over the numbers is to be computed. If the number of repetitions goes to infinity, then the averages approach ENj, which equals Ij except for a factor.

4. The radiative transfer equation for an absorbing medium

4.1. Derivation of Beer’s law by means of stochastics

The derivation follows Reif (1976) and is analogous to the determination of the nearest neighbor distribution (cf. Hertz 1909; Traving 1960).

We assume that an absorption process can be considered as a collision of a photon with a disk of cross-section σ, the absorption cross-section of an atom. Furthermore, we assume that there are N of these disks in a unit volume and that the disks are randomly distributed (equal probability distribution) in the medium without any correlations.

On the basis of these assumptions, the probability of finding the nearest atom with cross-section σ along a beam at a distance s is given by the probability P that there is no atom between 0 and s and the probability P +  that there is one atom between s and s + ds(7)The probability that an interaction takes place in the range s...s + ds is simply given by (8)and the probability that there is no interaction in the interval s...s + ds is consequently (9)The probability of having no interaction in the range is given – because of the assumed independence of the particle positions – by the product of the probabilities of the partial ranges of which there are s/ds(10)which for ds → 0 gives (11)Therefore, (12)where  = χ is the extinction coefficient.

Note that , i.e. that the probability density is normalized. The mean free path of photons is now given by (13)Since P(s) is given by the ratio of the specific intensity at s to that at 0, I(s)/I(0), Beer’s law (14)follows from Eq. (12) and hence also the radiative transfer equation (15)

4.2. Markov Chain Monte Carlo solution

As a most simple non-trivial example, we now assume that Eq. (15) is given and we solve it by means of the Markov Chain Monte Carlo method following Zwillinger (1989).

The finite total z-range 0...z0 is divided into N subintervals of length Δz = z0/N, which are numbered by the index i. Implicit (“up-stream”) discretisation of Eq. (15) results in


where 1/(1 + χΔz) is now interpreted as the probability of a jump from position i to position i + 1.

From a theoretical point of view, we now have a state space  { 1,...,N,N + 1 } , where state N + 1 is the graveyard or absorbing state. The transition probability matrix is very simple in this case since it is only possible to jump from i either to i + 1 or the graveyard. The corresponding probabilities are α = 1/(1 + χΔz) and 1 − α, so that e.g. in the case of N = 5 the transition probability matrix is given by (16)At the right boundary, we note that the pseudo particle can jump only to the graveyard.

We now work the other way round and show that the transfer equation follows from the stochastic process, i.e. work from the probabilistic interpretation (cf. Zauderer 1983). For small Δz, we write


so that

i.e. we regain the transport equation when we assume the probabilistic interpretation to be valid.

5. Direct calculation of the mean intensity for a scattering and absorbing medium by means of the Feautrier equation

We now show that by means of a stochastic process one can calculate directly the mean intensity J from the well-known Feautrier equation (see e.g. Mihalas 1978), (17)where τ is the optical depth defined by dτ = χds and ε the ratio of absorption to extinction. We note that this equation is a special case of the telegrapher’s equation, which is usually derived from the Maxwell equations but that can also be derived from a stochastic process as shown by Zauderer (1983).

For simplicity, we restrict ourselves now to the two-stream-approximation and neglect internal sources and spatial variations in ε. Since we can discretize the second derivative by (18)we find that (19)or (20)or after the replacement i + 1 → i to have a symmetric representation (21)We interpret this equation (see Zwillinger 1989) as representing a pseudo particle that after the nth jump at position i will move at jump n + 1 either with probability (2 + εΔτ2)-1 to position i + 1 or with the same probability to position i − 1, or go to the absorbing state. The last case has a probability of 1−(2 + εΔτ2)-1.

We again have a state space of N + 1 states. The transition probability matrix has now in general two non-zero entries in most rows and is therefore slightly more complicated than the matrix above. It is given e.g. in the case N = 5 by


In contrast, for small Δτ we can approximate (22)and (23)since (24)Therefore, Eq. (21) is now given by (25)or (26)such that in the limit Δτ → 0 we regain Eq. (17). In other words, based on the assumption of a Markov process with the jumping probabilities of Eq. (21) the transport equation in Eq. (17) follows in the limit of small Δτ.

We note also that it can easily be shown that the analytical solution of the difference equation in Eq. (21) in the limit Δz → 0 reduces to that of the differential equation of Eq. (17) as it should be.

6. The Milne equation as the potential equation of a Markov process in discrete time

In 1D radiative transfer, the Schwarzschild-Milne equations express the angular moments of the specific intensity by a linear integral operator acting on the source function. Of particular interest is the relation for the zero-order moment, i.e. the mean intensity J, if the source function is proportional to J (i.e. isotropic scattering and no absorption). The resulting linear integral equation where the optical depth is defined by dτ = χdz with χ being the extinction coefficient, is called Milne’s equation, cf. e.g. Mihalas (1978).

While the one-dimensional Milne equation has been extensively discussed in the theory of radiative transfer (cf. e.g. Mihalas 1978; Mihalas & Weibel Mihalas 1984) the Milne equation for three spatial dimensions is not well documented so that we start by deriving it here – following Führer (1993) – from the radiative transfer equation.

The radiative transfer equation for a static 3D medium with isotropic coherent scattering can be written as the directional derivative (27)where x is the position vector with components (x1,x2,x3). We assume that x3 = z and I(x + sn,n) is the specific intensity (at a fixed wavelength) at spatial position x + sn in the direction n. The mean intensity is denoted by J(x + sn), and the (isotropic) Planck function by B(x + sn). The ratio ε of absorption to extinction at the wavelength considered is assumed to be constant with respect to x.

The geometry of the medium is mathematically represented by a finite convex domain G. It follows that there exists a unique real number s0 for any x ∈ G such that (x − s0n) is on the boundary. For the boundary conditions, we have to assume that for all points on the boundary of G the intensities of the rays entering G are known since otherwise no sensible solution could be found.

If, however, G is not convex, some regions of the domain are not directly connected to each point of G. In particular, the mutual illumination by the domain’s surface elements is not taken into account in this case. One way to deal with this problem is to imbed G in a larger domain in which then, however, the value of χ has to be set to 0 in the region outside G.

Integration of the differential equation in Eq. (27) yields (28)To keep the expressions simple, we have assumed here that the extinction coefficient χ is independent of x. Otherwise we have to substitute here and in the following expressions χ·s with the integral χds.

We now set s = 0 and s0 = s0(x,n) and obtain (29)The mean intensity is now obtained by integrating the above equation over the unit sphere S2(30)The source term equals (31)The first term includes the boundary conditions, and the second contains the sources inside G. Then (32)where (33)The third term arises from the second one by introducing −n instead of n and −s′ instead of s′ as new variables. We introduce the variable x′ = x + sn, where s = |x − x′| and the volume element is (34)We obtain


Finally (35)which is the3D Milne equation.

We now define a Markov process and show that J(x) is the solution of the corresponding potential equation.

The state space is continuous and given by G ∪ Φ, the union of G and the graveyard Φ. The initial probability has the density (36)The graveyard is empty at the beginning. In the continuous case, the probability matrix P (Eq. (2)) becomes the transition probability P(x,x0) of moving from x0 to x, which is given in the following way:

  • (i)

    if x0 = Φ, then x = Φ;

  • (ii)

    if x0 ∈ G, then

    • (iia)

      choose a random number If , the particle is scattered; if it is absorbed and goes to the graveyard;

    • (iib)

      choose a random direction uniformly distributed across the unit sphere (n is the direction in which the particle is scattered);

    • (iic)

      choose a random number that is exponentially distributed. The probability that is in the interval (s...s + ds) is χe − χ·sds. If , continue, if , the particle goes to the graveyard.

We calculate the transition probability function of moving for coming from x0 ∈ G to x ∈ G, by using a test function f on G. The expectation value of f is (37)where is the characteristic function of the set. Comparing with Eqs. (33) and (35), we see that the Milne’s equation is the potential equation for this Markov process. We obtain the transition probability from x0 to the graveyard Φ (38)To perform the algorithm, we divide the set G into N cells of equal volume and numerate the cells by 1,...,N. We calculate the source term S(x) given by Eq. (32) and calculate the initial probability π with the help of Eq. (36). Then following the prescription above, choose random variables , independent from each other until the particle ends up in the graveyard. We note the number of times the particle visits the cell j and calulate J, as it has been explained in Sect. 3.

We note that if we generalize to G = R3 and assume that there is no absorption, we have a random walk that ends in the graveyard at infinity. The probability density of moving from x0 to x is (39)

7. Selected astronomical applications of MCMC to the Milne equation

We discuss a few applications that demonstrate the versatility and ease of use of MCMC. We restrict ourselves to configurations that have the same values for ε, χ, and B everywhere but emphasize that this is not an inherent limitation of this method. We note that the calculation of the mean intensity for ε = 0 is equivalent to the determination of the temperature distribution for a grey configuration. This implies that it is possible to determine this temperature distributions using MCMC without difficulties.

We begin with some easy 1D problems, the radiation fields of slabs, that can also efficiently be solved by standard numerical algorithms. The first test case is the mean intensity of a slab calculated for the two-stream approximation. Here the kernel of the Milne equation is the exponential function exp(−χs), so that the corresponding random numbers can easily be generated by −1/χ·log (χ × Random) where Random refers to random numbers that are equally distributed between 0 and 1. We devised a corresponding Mathematica version 2.1 code for this purpose that is only 15 lines long. If the slab has an extinction coefficient of unity and a total optical depth of 50 and is illuminated from one side but does not have internal sources, the CPU times for 105 pseudo particles range between 12 s (ε close to 1) and 91 s (ε close to 0). The distributions for the mean intensities (Fig. 1) agree with those from the analytic solution to within the width of the line. If the illuminating source is switched off and a homogeneous distribution of internal sources is employed instead, the CPU times are similar. Figure 2 shows an example in which 106 pseudo particles are used.

We have performed calculations in which the full angle space is taken into account. These take much longer since the transformation law for the random numbers has to be evaluated numerically. On the other hand, if the Mathematica code is replaced by a corresponding FORTRAN code the execution times are reduced by about a factor of one hundred.

To convince ourselves of the versatility of the MCMC approach, we calculate some 3D radiation fieldsof mean intensity (i) in a homogeneous rotational ellipsoid having internal sources; (ii) of a homogeneous accretion disk with constant flaring angle heated by a constant viscosity; and (iii) of a simple accretion funnel heated by the accretion shock at the bottom (stellar surface).

thumbnail Fig. 1

Run of the number of visits of a pseudo particle in a cell (which is proportional to the mean intensity) with cell number (which is proportional to the optical depth starting at the left) in a slab that is illuminated from the left but does not have internal sources. The curves represent ratios of absorption to extinction ε = 0,10-3,10-2,10-1 (from top to bottom). In all cases, N = 105 pseudo particles are used.

Open with DEXTER

thumbnail Fig. 2

Run of the number of visits of a pseudo particle in a cell (which is proportional to the mean intensity) with cell number (which is proportional to the optical depth starting at the left) in a slab with uniformly distributed sources but without illumination. The ratio of absorption to extinction is ε = 10-1 and N = 106 pseudo particles are used.

Open with DEXTER

The corresponding FORTRAN code comprises just about hundred lines and is therefore one to several orders of magnitude shorter than grid codes for these problems. Owing to the larger number of cells, however, many more pseudo particles have to be used (in our case 108 to 109) to achieve high quality statistics but the CPU times still do not exceed about 10 min if the Fortran code is used on a 2 GHz laptop. Where the configuration is not fully convex as for the accretion disk or for the funnel, andwhen the illumination of parts of the medium by other parts cannot be neglected, one has to wrap the configuration with a box and assume that the intermediate domain is very optically thin. If the mutual illumination is small one may proceed as in the convex case.

8. Discussion

We propose that the most important advantages of MCMC as presented here are its sound mathematical basis and the clarification of the similarities between the treatments of radiative transfer problems using a Monte Carlo method, a difference equation method, or differential/integral equation methods. From the point of view of actual numerical calculations, when compared to grid methods (for a review, see Wehrse & Kalkofen 2006), the MCMC method presented here shares all the advantages of classical Monte Carlo methods: (i) it is easy to implement even for complex geometries; (ii) it allows the incorporation of many physical effects; (iii) it is quite fast for medium accuracy requirements; and (iv) the corresponding codes run well on classical parallel machines as well as on massively parallel devices such as modern graphics cards. As already observed by Lucy (1999a,b), Monte Carlo methods are very flexible. They allow the direct calculation of derived radiative quantities such as e.g., the mean intensity without resorting to the explicit calculation of specific intensities. They allow the computation of residual intensities occurring in the core saturation method, cf. Bastian et al. (1980). Furthermore, MCMC can be used to directly calculate a grey temperature distribution and we should be ableto calculate directly mean intensities for differentially moving media. However, some disadvantages are obvious: (i) there is no precise and well-founded error estimator as it exists for finite-element approaches, cf. Meinköhn et al. (2007); (ii) additional quantities, such as e.g., the radiative flux, cannot be obtained by differentiation because of the error-amplifying nature of this process; (iii) the number of pseudo-particles required is sometimes very high (and therefore the CPU times are sometimes rather long) since the use of convergence accelerating methods is limited; and (iv) it can hardly be used when simultaneously non-linear equations (as e.g. hydrodynamic equations) have in addition to be solved. Here we have focused mainly on the geometrical structure of the radiation. The distribution of frequency and the dependence of the temperature on the radiation that are included e.g., in Lucy’s papers have not been considered.

However, it sshould be possible to generalize the algorithm described here (i) to calculate the radiation pressure and the radiative acceleration, since the calculation of these quantities can be done by integration, which can also be performed – as is well documented, see e.g. Liu (2001) – by MCMC methods; (ii) to evaluate integrals such as that are usually needed in the construction of model atmospheres; and (iii) to model differentially moving configurations. In this case, Eq. (17) becomes a full telegrapher’s equation but one that can also be solved in a stochastic way, cf. Zauderer (1983).


Following the use in mathematics, scalars, vectors, and matrices are generally not distinguished typographically in this paper.


All Figures

thumbnail Fig. 1

Run of the number of visits of a pseudo particle in a cell (which is proportional to the mean intensity) with cell number (which is proportional to the optical depth starting at the left) in a slab that is illuminated from the left but does not have internal sources. The curves represent ratios of absorption to extinction ε = 0,10-3,10-2,10-1 (from top to bottom). In all cases, N = 105 pseudo particles are used.

Open with DEXTER
In the text
thumbnail Fig. 2

Run of the number of visits of a pseudo particle in a cell (which is proportional to the mean intensity) with cell number (which is proportional to the optical depth starting at the left) in a slab with uniformly distributed sources but without illumination. The ratio of absorption to extinction is ε = 10-1 and N = 106 pseudo particles are used.

Open with DEXTER
In the text