Free Access
Issue
A&A
Volume 527, March 2011
Article Number A56
Number of page(s) 12
Section Astronomical instrumentation
DOI https://doi.org/10.1051/0004-6361/201015451
Published online 25 January 2011

© ESO, 2011

1. Introduction

Seismology of solar-like stars is a powerful tool that can be used to increase our understanding of stellar structure and evolution. Solar-like oscillations in main-sequence stars and subgiants have been measured thanks to data collected from ground-based high-precision spectroscopy (for a review e.g., Bedding & Kjeldsen 2008) and, more recently, to photometric space-based missions such as CoRoT (e.g., Michel et al. 2008). Red giants also exhibit solar-like oscillations, although at lower frequencies, and hence require longer time series in order to resolve them (e.g., De Ridder et al. 2009, and references therein). The launch of the NASA Kepler mission (Koch et al. 2010) definitely marked a milestone in the field of asteroseismology. Kepler will particularly lead to a revolution in the seismology of solar-like oscillators, since it will increase by more than two orders of magnitude the number of stars for which high-quality observations will be available, while allowing for long-term follow-ups of a selection of those targets. The large homogeneous sample of data made available by Kepler opens the possibility of conducting a seismic survey of the solar-like part of the colour-magnitude diagram, which researchers in the field already started naming as ensemble asteroseismology. As of the time of writing of this article, first results arising from the Kepler asteroseismic programme had already been made available (Bedding et al. 2010a; Chaplin et al. 2010; Gilliland et al. 2010; Hekker et al. 2010b; Stello et al. 2010; Christensen-Dalsgaard et al. 2010; Metcalfe et al. 2010).

The rich informational content of power spectra of solar-like oscillations allows fundamental stellar properties (e.g. mass, radius, and age) to be determined, and the internal structure to be constrained to unprecedented levels provided that individual oscillation mode parameters are measured (e.g., Christensen-Dalsgaard 2004). Furthermore, the measured stellar background signal provides us with valuable information on activity and convection. In the case of the highest signal-to-noise ratio (S/N) observations, for which it is possible to measure individual oscillation mode parameters, we expect asteroseismology to produce a major breakthrough on stellar structure and evolution, on topics as diverse as energy generation and transport, rotation and stellar cycles (e.g., Karoff et al. 2009).

For the past few years significant work has been invested in making preparations for the mode parameter analysis of Kepler data. This analysis involves the estimation of individual and average oscillation mode parameters, as well as estimation of parameters that describe non-resonant signatures of convection and activity. Examples include the work conducted in the framework of the AsteroFLAG consortium (Chaplin et al. 2008a) and the work undertaken by the CoRoT data analysis team (Appourchaux et al. 2006). This consequently paved the way for the development of suites of analysis tools for application to Kepler data (Hekker et al. 2010a; Huber et al. 2009; Karoff et al. 2010; Mathur et al. 2010; Mosser & Appourchaux 2009; Campante et al. 2010).

In the present study we give continuity to this work by presenting a comprehensive guide to the implementation of a Bayesian peak-bagging1 tool that employs a MCMC. These techniques derive from the tools traditionally used in helioseismology and are in many ways an extension of the maximum likelihood estimation (MLE) methods. This peak-bagging tool is to be applied to the power spectra of solar-like oscillators and used as a means to infer both individual oscillation mode parameters and parameters describing non-resonant features. Besides making it possible to incorporate relevant prior information through Bayes’ theorem, this tool also allows one to obtain the marginal probability density function (PDF) for each of the model parameters (frequencies, mode heights, mode lifetimes, rotational splitting, inclination angle etc.). This is one of the main advantages of these MCMC techniques, as it not only performs well in low signal-to-noise conditions, but also provides reliable error bars on the parameters. Parameter space is sampled using a Metropolis-Hastings algorithm featuring a built-in statistical control system that allows to automatically set an appropriate instrumental law during the burn-in stage. Also included is parallel tempering, which increases the mixing properties of the Markov chain.

The outline of the paper is as follows: we start in Sect. 2 by providing an overview of the theory behind the power spectrum of solar-like oscillations, introducing the assumptions and the set of parameters needed to model the spectrum to the level of detail required by modern asteroseismic data. In Sect. 3 we describe the subjacent Bayesian statistical framework by highlighting the topics of parameter estimation and model selection. Section 4 is devoted to the modus operandi of advanced Markov chain Monte Carlo methods and their implementation. In Sect. 5 we present a couple of examples where this tool has been applied to recent asteroseismic data sets, evidencing some of its capabilities and illustrating its functioning. A summary and discussion are presented in Sect. 6.

2. The power spectrum of solar-like oscillations

Solar-like oscillations or p modes (pressure playing the role of the restoring force) are global standing acoustic waves. They are characterized by being intrinsically damped while simultaneously stochastically excited by near-surface convection. Therefore, all stars cool enough to harbor an outer convective envelope – whose locus in a H–R diagram approximately extends from the cool edge of the Cepheid instability strip up to the red giant branch – may be expected to exhibit solar-like oscillations.

Modes of oscillation are characterized by three wave numbers: n, and m. The radial order n characterizes the behaviour of the mode in the radial direction. The degree and the azimuthal order m determine the spherical harmonic describing the properties of the mode as a function of colatitude and longitude. In the case of stellar observations, the associated whole-disk light integration and consequent lack of spatial resolution strongly suppress the signal from all but the modes of the lowest degree (with  ≤ 3). For a spherically symmetric star mode frequencies depend only on n and .

2.1. Statistics and likelihood function of the spectrum

Stellar p modes can be modelled as stochastically excited and intrinsically damped harmonic oscillators (Kumar et al. 1988). The frequency-power spectrum arising from such a system can in turn be modelled by a mean spectrum profile, P(νj;Θ), described by the set of parameters Θ which contain the desired physical information, multiplied by a random noise with a χ2 probability distribution with 2 degrees of freedom (Woodard 1984; Duvall & Harvey 1986). This means that, at a fixed frequency bin j, the probability density, f(Pj), that the observed power spectrum takes a particular value Pj, is related to the mean spectrum, P(νj;Θ), by: (1)Very often when dealing with long time series, it is customary to divide the observational data set into several independent subsets, to compute their separate spectra and to average them. In doing so one aims at decreasing the variance in the power spectrum. The average power spectrum will then obey a χ2 probability distribution with 2s degrees of freedom, , s being the number of combined spectra (Appourchaux 2003a): (2)Equation (2) also holds when binning the power spectrum over s bins (Appourchaux 2004).

We would now like to specify the likelihood function, i.e., the joint PDF for the data sample  { Pj } . Assuming that the frequency bins are uncorrelated, the joint PDF is simply given by the product of f(Pj;Θ) over some frequency interval of interest spanned by j: (3)Notice that we have written f(Pj;Θ) to make the dependence on the parameters Θ explicit. In spite of the fact that Eq. (3) is valid for an uninterrupted data set, the same is not true when gaps are present in the time series. In that event, Stahn & Gizon (2008) have derived an expression for the joint PDF of solar-like oscillations in complex Fourier space, in agreement with the earlier work of Gabriel (1994). The latter PDF explicitly takes into account frequency correlations introduced by the convolution with the spectral window.

The basic idea when employing a Maximum Likelihood Estimator (MLE) is to determine estimates so as to maximize the likelihood function (e.g., Toutain & Appourchaux 1994). Due to improved numerical stability, however, it is more convenient, in practice, to work with logarithmic probabilities: (4)One therefore ends up maximizing the logarithm of the likelihood function instead: (5)

2.2. Modelling the power spectrum2

The power spectrum of a single mode of oscillation is distributed around a mean profile with an exponential probability distribution according to Eq. (1). As already mentioned, this mean profile contains the information on the physics of the mode. In the limit of taking the ensemble average of an infinite number of realisations of the power spectrum, it can be shown (Anderson et al. 1990) that the limit spectrum thus obtained follows in fact a standard Lorentzian profile near the resonance, i.e., for  |ν − ν0|  ≪ ν0. A Lorentzian profile is defined as: (6)where S is the mode height and Γ is the mode linewidth. Γ is related to the mode lifetime, τ, through Γ = (πτ)-1. In the case of solar-type stars and for low angular degree , we can assume that Γ is a function of frequency alone, which is supported both by observations of the Sun and by theoretical models (e.g. Aerts et al. 2010; Dupret et al. 2009).

A power spectrum of solar-like oscillations will, of course, contain a myriad of modes spanning a broad range in frequency, superimposed on a background signal of both stellar and instrumental origin. The overall limit spectrum is then given by the sum of the separate limit spectra arising from the different sources, since interference effects from beating between the modes average out in the limit. Notice that we are assuming that a mode is uncorrelated with any other modes or with the background signal. In doing so, we neglect any eventual asymmetries of the Lorentzian profiles (Duvall et al. 1993; Abrams & Kumar 1996). Nevertheless, when dealing with long time series, such asymmetries should be included in order to avoid biases in mode frequency determination. Furthermore, the presence of gaps and the finite length of the time series lead to a degradation of the observed power spectrum, which then results from the convolution of the true spectrum (i.e., the one that would be obtained were there no gaps) with the power spectrum of the window function (i.e., the spectral window). However, this problem is overcome by convolving the final limit spectrum with the spectral window.

Ignoring any departure from spherical symmetry, non-radial modes differing only on the azimuthal number m are degenerate and their profiles will be combined into a single profile, that of the (n,ℓ) multiplet. Stellar rotation removes the (2 + 1)-fold degeneracy of the frequency of oscillation of non-radial modes, thus allowing for a direct measurement of the angular velocity of the star averaged over the region probed by these modes. When the angular velocity of the star, Ω, is small and in the case of rigid-body rotation, the frequency of a (n,ℓ,m) mode is given to first order by (Ledoux 1951): (7)The kinematic splitting, mΩ/(2π), is corrected for the effect of the Coriolis force through the dimensionless quantity Cnℓ > 0. In the asymptotic regime, i.e., for high-order, low-degree p modes, rotational splitting is dominated by advection and the splitting between adjacent modes within a multiplet is νs ≃ Ω/(2π). Second-order rotational effects are related to the distortion of the equilibrium structure of the star caused by centrifugal forces. Although negligible in the Sun, these effects are significant for faster solar-type rotators where these effects can cause non-negligible biases on frequency determinations (e.g., Ballot 2010). Large-scale magnetic fields may also introduce further corrections to the oscillation frequencies.

Assuming energy equipartition between the components of a multiplet, we define the following symmetric profile for a (n,ℓ) multiplet: (8)where Eℓm(i) represents mode visibility within a multiplet and i is the inclination angle between the direction of the stellar rotation axis and the line of sight. The overall profile of a multiplet thus consists of the sum of 2 + 1 Lorentzian profiles regularly spaced in frequency and scaled in height according to the Eℓm(i) factors (see Fig. 1), which in turn are given by (Gizon & Solanki 2003): (9)where are the associated Legendre functions. Notice that  ∑ mEℓm(i) = 1, meaning that Eℓm(i) represents the relative power contained in a mode within a multiplet.

thumbnail Fig. 1

Artificial power density spectrum of a  = 0 singlet and a  = 1 multiplet. One has νs = Γ = 3   μHz. Notice how the  = 1 multiplet splits into its m components. The power spectrum (grey) is distributed around a mean spectrum (black) with an exponential probability distribution.

Open with DEXTER

Since we are primarily interested in performing a so-called global fit (e.g., Appourchaux et al. 2008) to the observed power spectrum, whereby several radial orders are fitted simultaneously within a broad frequency range, we end up modelling the mean acoustic spectrum according to the following general relation: (10)where we have also included a profile describing the background signal, N(ν). Granulation, faculae and active regions might contribute to the stellar background signal, which is commonly modelled as a sum of power laws describing these physical phenomena (Harvey 1985; Aigrain et al. 2004): (11) { Ak }  and  { Bk }  being, respectively, the corresponding amplitudes and characteristic time-scales, whereas the  { Ck }  are the slopes of each of the individual power laws. A flat component, N, is needed in order to model the photon shot noise. Equation (11) might just well incorporate any instrumental background signal. We refer to Snℓ/N(νnℓ) as the signal-to-noise ratio (in power) of the multiplet (n,ℓ).

Once again assuming energy equipartition between the different components of a multiplet, their heights can be expressed as: (12)The quantity is an estimate of the geometrical visibility of the total power in a multiplet (n,ℓ) as a function of , whereas αnℓ depends mainly on the frequency and excitation mechanism, i.e., αnℓ ≃ α(νnℓ). Christensen-Dalsgaard (2003) concisely treats this issue of spatial filtering. Equation (12), however, is only strictly valid under one assumption: when the stellar flux is integrated over the full apparent disc, one must assume that the weighting function, W, which gives the contribution of a surface element to the integral, is a function of the distance to the disc centre alone, i.e., W = W(θ′), where θ′ is defined in an inertial frame with polar axis pointing toward the observer. In this case, the apparent mode amplitude can effectively be separated into two factors: Eℓm(i) and . This assumption holds very well in the case of intensity measurements, since the weighting function is then mainly linked to the limb-darkening, whereas for velocity measurements departures might be observed due to asymmetries in the velocity field induced by rotation (see Ballot et al. 2006, 2008, and references therein). See Appendix A for how to compute Eℓm(i) and V.

The heights of non-radial modes are commonly defined based on the heights of radial modes according to Eq. (12), and taking into account the V/V0 ratios. Note that  = 0 modes constitute a sensible reference since they are not split by rotation. Table 1 displays the relative spatial response functions, V/V0, computed according to Bedding et al. (1996), for a number of present and upcoming instruments/missions used when measuring solar-like oscillations. Those performing intensity measurements are the red channel of the VIRGO SPM instrument on board the SOHO spacecraft (Fröhlich et al. 1995), as well as the CoRoT and Kepler space missions. On the other hand, velocity measurements are performed by the HARPS spectrograph (Mayor et al. 2003) and are the purpose of the forthcoming SONG network (Grundahl et al. 2007).

Table 1

Relative spatial response functions, V/V0, for a number of present and upcoming instruments/missions.

Finally, a possible set of parameters going into the model is given by: (13)We have described in detail how the modelling of a power spectrum of solar-like oscillations can be achieved. When actually fitting a model to an observed power spectrum, the set of parameters entering the model might differ from the one represented in Eq. (13). Moreover, it might be desirable to justifiably fix some of the parameters in order to reduce the dimension of parameter space.

3. Bayesian inference

Having set up the model of the power spectrum, we will now introduce the Bayesian statistical framework to be used for estimating the model parameters and for comparing competing models. Let us start by considering a set of competing hypotheses,  {Hi} , not necessarily mutually exclusive. We should then be able to assign a probability, p(Hi | D,I), to each hypothesis, taking into account the observed data, D, and available prior information, I, arising from theoretical considerations and/or previous observations. This is done through Bayes’ theorem: (14)The probability of the hypothesis Hi in the absence of D is called the prior probability, p(Hi | I), whereas the probability including D is called the posterior probability, p(Hi | D,I). The quantity p(D | Hi,I) is called the likelihood of Hi, p(D | I) being the global likelihood for the entire class of hypotheses. Bayesian inference thus encodes our current state of knowledge into a posterior probability concerning each member of the hypothesis space of interest. Moreover, the sum of the posterior probabilities over the hypothesis space of interest is unity, and thus (15)

3.1. Parameter estimation

Very often a particular hypothesis, i.e., a model of the power spectrum, is assumed to be true and the hypothesis space of interest then relates to the values taken by the model parameters Θ. These parameters are continuous, which means that the quantity of interest is a PDF. The global likelihood of model M, assumed true, is now given by the continuous counterpart of Eq. (15): (16)Let us restate Bayes’ theorem in order to account for this new formalism: (17)where we have substituted the hypothesis, Hi, with the parameters of the model that is assumed true. The terms entering this equation have the same meaning as the corresponding terms entering Eq. (14). Use of Eq. (17) allows one to obtain the full joint posterior PDF, p(Θ | D,I), this being the Bayesian solution to the problem of parameter estimation in contrast to traditional point estimation methods (e.g. MLE). The procedure of marginalisation makes it possible to derive the marginal posterior PDF for a subset of parameters ΘA, by integrating out the remaining parameters ΘB, called nuisance parameters: (18)Furthermore, assuming that the prior on ΘA is independent of the prior on the remaining parameters, then by applying the product rule we have: (19)We will be working, in practice, with logarithmic probabilities. The global likelihood of the model plays the role of a normalisation constant and we rewrite Eq. (17) as follows: (20)

3.2. Model comparison

We might also be facing a situation wherein several parametrized models are available to describe the same physical phenomenon. We then expect Bayes’ theorem to allow for a statistical comparison between these competing models. In fact, Bayesian model comparison has a built-in Occam’s razor, a principle also known as lex parsimoniae, by which a complex model is automatically penalised, unless the available data justifies its additional complexity. Notice that these might be intrinsically different models or similar models with varying number of parameters, or even the same model with different priors for its parameters.

Given two or more competing models,  {Mi} , and our prior information, I, being in the current context that one and only one of the models is true, we can assign individual probabilities similarly to what has been done in Eq. (14), after substituting Hi with Mi: (21)where p(D | Mi,I), also called the evidence of model Mi, is given by Eq. (16). The problem of model comparison is therefore analogous to the problem of parameter estimation as can be seen by comparing Eqs. (17) and (21).

Of particular interest to us will be calculating the ratio of the probabilities of two competing models, (22)where Oij is the odds ratio in favour of model Mi over model Mj, Bij is the so-called Bayes’ factor and the remaining factor is the prior odds ratio. We will always assume that we have no prior information impelling us to prefer one model over the other, and hence p(Mi | I)/p(Mj | I) = 1. One is now naturally in need of a scale by which to judge the ratio of the evidences of two competing models. The usual scale employed is the Jeffreys’ scale (Jeffreys 1961), which we display in Table 2 for convenience.

Table 2

Jeffreys’ scale.

Furthermore, the Bayesian framework makes it possible to extract parameter constraints even in the presence of model uncertainty, i.e., when the implementation of model selection has not been successful. This is done by simply combining the probability distribution of the parameters within each individual model, weighted by the model probability. This procedure, called Bayesian model averaging (see Liddle 2009, and references therein), is an analogue of the superposition of eigenstates of an observable in quantum mechanics.

3.3. Ignorance priors

The main advantage of the Bayesian framework when compared to a frequentist approach is the ability to incorporate relevant prior information through Bayes’ theorem and evaluate its effect on our conclusions. Assuming that the prior on each parameter is independent of the prior on any other parameter, then according to Eq. (19) we have: (23)where fkk) is the prior PDF associated with the kth parameter entering the model. As our state of knowledge of a particular physical phenomenon evolves through continued study and experimentation, the set of priors relevant for the analysis of a new data set will change. In the early stages of research, however, we look for a set of priors that encode our rather limited state of knowledge, i.e., a set of ignorance priors (e.g., Gregory 2005a, and references therein).

When dealing with location parameters, e.g.  {νnℓ}  in Eq. (13), our choice of prior would at first be the uniform prior: (24)If we are ignorant about the limits and , then we refer to fkk) as an improper prior, meaning that it is not normalised. An improper prior is not suitable for model comparison problems. On the other hand, when dealing with scale parameters, e.g.  { Snℓ }  in Eq. (13), our choice of prior might be that of a Jeffreys’ prior: (25)By employing a Jeffreys’ prior we are assigning equal probability per decade (scale invariance), mainly useful when the prior range spans several orders of magnitude. In case the prior lower limit includes zero, a modified Jeffreys’ prior should be used instead to avoid the divergence at zero: (26)For , Eq. (26) behaves just like a Jeffreys’ prior, whereas for it behaves like a uniform prior, thus not diverging at zero. marks the transition between the two regimes.

4. Markov chain Monte Carlo

After inspection of Eq. (18), the need for a mathematical tool that is able to efficiently evaluate the multi-dimensional integrals required in the computation of the marginal posteriors becomes clear. This constitutes the rationale behind the method known as Markov chain Monte Carlo, first introduced in the early 1950s by statistical physicists and nowadays widely used in all areas of science and economics.

4.1. Metropolis-Hastings algorithm

The aim is to draw samples from the target distribution, p(Θ | D,I), by constructing a pseudo-random walk in model parameter space such that the number of samples drawn from a particular region is proportional to its posterior density. Such a pseudo-random walk is achieved by generating a Markov chain, whereby a new sample, Xt + 1, depends on the previous sample3, Xt, in accordance with a time-independent quantity called the transition kernel, p(Xt + 1 | Xt). After a burn-in phase, p(Xt + 1 | Xt) is able to generate samples of Θ with a probability density converging on the target distribution. The Markov chain must fulfil three requirements in order to achieve this convergence: it must be irreducible, aperiodic and positive recurrent (Roberts 1996).

The algorithm that we employ in order to generate a Markov chain was initially proposed by Metropolis et al. (1953), and subsequently generalised by Hastings (1970), this latter version being commonly referred to as the Metropolis-Hastings algorithm. It works in the following way: Suppose the current sample, at some instant denoted by t, is represented by Xt. We would like to steer the Markov chain toward the next sampling state, Xt + 1, by first proposing a new sample to be drawn, Y, from a proposal distribution, q(Y | Xt), centred on Xt. Here we specifically treat q(Y | Xt) as being a multivariate normal distribution with covariance matrix Σ. We employ independent Gaussian parameter proposal distributions and thus Σ is assumed diagonal. The proposed sample is then accepted with a probability given by: (27)where α(Xt,Y) is the acceptance probability and r is called the Metropolis ratio. In the present case of a symmetric proposal distribution, we have q(Xt | Y) = q(Y | Xt). As a result, if the posterior density for the proposed sample is greater than or equal to that of the current sample, i.e., p(Y | D,I) ≥ p(Xt | D,I), then the proposal will always be accepted, otherwise it will be accepted with a probability given by the ratio of the posterior densities. If Y is not accepted, then the chain will keep the current sampling state, i.e., Xt + 1 = Xt. The procedure just described is repeated for a predefined number of iterations or, alternatively, for a number of iterations determined by a convergence test applied to the Markov chain (e.g., Gelman & Rubin 1992). The total number of iterations is denoted by nit.

Once a Markov chain has been created, the problem of marginalization becomes trivial, as the way to extract information on the individual parameters is simply to generate a histogram for each parameter and thus obtain its PDF. An appropriate number of bins in the histograms can be selected using for example Scott’s criterion (Scott 1979). Usually the information in the PDF will be condensed using some summary statistics, like for example finding the median of the distribution and the 68% credible region around it.

4.2. Parallel tempering

The Metropolis-Hastings algorithm outlined above might become stuck in a local maximum of the target distribution, thus failing to fully explore all regions in parameter space containing significant probability. A way of overcoming this is to employ parallel tempering (e.g., Earl & Deem 2005), whereby a discrete set of progressively flatter versions of the target distribution are created by introducing a temperature parameter, T. In practice, use is made of its reciprocal, β = 1/T, referred to as the tempering parameter. By modifying Eq. (17), we generate the tempered distributions as follows: (28)For β = 1, we recover the target distribution, also called the cold sampler, whereas for β < 1 the hotter distributions are effectively flatter versions of the target distribution. Drawing samples from a hotter, i.e., flatter, version of the target distribution will allow, in principle, to visit regions of parameter space containing significant probability, otherwise not accessible to the basic algorithm. The problem of parameter estimation obviously continues to rely on samples drawn from the cold sampler. In Sect. 4.4 we describe how samples drawn from the remaining tempered distributions are useful in evaluating Bayes’ factor.

Implementation of parallel tempering works in the following way: Several versions of the Metropolis-Hastings algorithm are launched in parallel (nβ in total), each being characterised by a different tempering parameter, βi. At random intervals, comprehending a mean number (nswap) of iterations, a pair of adjacent chains, labelled with βi and βi + 1, is randomly chosen and a proposal is made to swap their parameter states. The proposed swap is then accepted with a probability given by: (29)where, at instant t, chain βi is in state Xt,i and chain βi + 1 is in state Xt,i + 1. By running such a set of cooperative chains, we effectively enable the algorithm to sample the target distribution in a way that allows for both the investigation of its overall features (low-β chains) and the examination of the fine details of a local maximum (high-β chains). A schematic representation of the functioning of the parallel tempering mechanism is shown in Fig. 2. In Fig. 3, a version of the Metropolis-Hastings algorithm is shown, written in pseudocode, and with the inclusion of the parallel tempering mechanism.

thumbnail Fig. 2

Schematic representation of the functioning of the parallel tempering mechanism, whereby tempering chains are allowed to swap their parameter states (swaps are indicated by vertical arrows).

Open with DEXTER

Concerning the values taken by the tempering parameter,  {βi} , optimal values are chosen in order to achieve a swap acceptance rate between adjacent levels of  ~50%. Heuristically, we can assert that by employing a geometric progression (cf. Benomar et al. 2009a), (30)such a desideratum is reached by setting λ ~ 1.2. The number of chains, nβ, should be chosen such as to reach a desired balance between sampling efficiency and computational time. However, when using the parallel tempering mechanism in model comparison problems, as we will get back to in Sect. 4.4, a large number of tempering chains are needed (typically nβ ≳ 10). The value of nswap should be chosen inversely proportional to nβ (typically a few dozens).

4.3. Automated MCMC

thumbnail Fig. 3

Version of the Metropolis-Hastings algorithm written in pseudocode and with the inclusion of parallel tempering.

Open with DEXTER

So far we have not mentioned the need to adequately choose the set  { σ }  of diagonal elements of the Σ matrix, indicating the width of the Gaussian proposal distribution for each parameter. The set of individual σ values specifies the direction and step size in parameter space when proposing a new sample to be drawn. The optimal choice of  { σ }  is closely related to the average rate at which proposed state changes are accepted, the so-called acceptance rate. Accordingly, small σ values will lead to a large acceptance rate, with successive samples being highly correlated and ultimately requiring a large number of iterations in order to yield equilibrium distributions of model parameters. On the other hand, large σ values will lead to a low acceptance rate, meaning that proposed state changes will seldom be accepted. This is illustrated in Fig. 4, where the same simplified target distribution is sampled by three chains, each being characterised by a set of σ values differing on the respective magnitudes. Roberts et al. (1997) recommend, based on empirical studies, calibrating the acceptance rate to  ~25% when dealing with a high-dimensional model as it is the case when performing a global peak-bagging.

thumbnail Fig. 4

The same target distribution sampled by three chains, each being characterised by a different set  {σ} . The contours map the target distribution, which in turn depends only on the parameters Θ1 and Θ2. The starting point in each of the chains is (Θ12) = (−4.5, −4.5) and all contain 2000 iterations. Both parameters share a common σ-value whose optimal setting is σ = 1. It is important to note that given a sufficiently large number of iterations, all the chains would eventually map out the target distribution, however an optimal choice of the proposal distribution will result in significantly faster convergence.

Open with DEXTER

One could, of course, employ a trial-and-error approach and manually calibrate the σ values. However, since we are dealing with a large number of parameters that, in addition, correspond to several different physical quantities, this would quickly become very time-consuming and impractical. We instead employ an automated process of calibration of the proposal σ values, which is based on a statistical control system similar to the one described in Gregory (2005b). The control system makes use of an error signal to steer the selection of the σ values during the burn-in stage of a single parallel tempering MCMC run, acting independently on each of the tempered chains. The error signal is proportional to the difference between the current acceptance rate and the target acceptance rate. As soon as the error signal for each of the tempered chains is less than a measure of the Poisson fluctuation expected for a zero mean error (computed as the square root of the target acceptance rate times the number of iterations between changes in the σ values), the control system is turned off and the algorithm switches to the standard parallel tempering MCMC. In practice this effectively marks the end of the burn-in stage.

thumbnail Fig. 5

Power density spectrum of HD 49933 with the best-fitting model (using prior set S2) overlaid. The shaded areas indicates the ranges of the uniform priors on the frequencies.

Open with DEXTER

The control system as briefly described here is also used in Gruberbauer et al. (2009), whereas Benomar et al. (2009a) employ a self-learning process that appropriately adapts the covariance matrix, assumed non-diagonal.

4.4. Model comparison using parallel tempering MCMC

We are now interested in computing the odds ratio, Oij, in favour of model Mi over model Mj according to Eq. (22). When analysing solar-like oscillations, a recurrent difficulty is to correctly tag the modes of oscillation by angular degree . There are two possible ways of tagging the modes or, equivalently, two competing models. Computation of Oij is thus a means of assessing which of the two identification scenarios is statistically more likely (although not necessarily physically more meaningful, as is often misinterpreted).

Samples drawn from the tempered distributions can, in principle, be used to compute the global likelihood, p(D | Mi,I), of a given model Mi. Notice that Bayes’ factor, Bij, is defined as the ratio of the global likelihoods of two competing models: (31)It can be shown that the global likelihood of a model is given by (for a derivation see Gregory 2005b): (32)where (33)is the expectation value of the natural logarithm of the likelihood for a particular tempered chain characterised by β. The set  { Xt,β }  represents the corresponding samples drawn after the burn-in stage, while n is the number of samples in each set. A sufficient number (≳10) of parallel tempered chains is required if we are to estimate the integral in Eq. (32) by interpolating values of ⟨lnp(D | Mi,X,I)⟩β.

5. Examples

In the following we will pick a couple of examples where we have applied the described Automated Parallel Tempering MCMC formalism to recent measurements of solar-like oscillators.

5.1. HD 49933: the importance of priors

We have performed an analysis of the star HD 49933, based on 180 days of photometry from the CoRoT satellite arising from two runs: The initial 60-day run, IRa01, and 120 days from the longer second run, LRa01. The time series was split up into segments of 30 days and the power spectra of the individual segments were averaged to construct a mean power spectrum (s = 6 in Eq. (2)). The acoustic spectrum of this F5 main-sequence star has proven to be very difficult to interpret mainly due to the relatively large linewidths (see Fig. 5). We assume the ridge identification denoted as “Scenario B” in Benomar et al. (2009b).

The acoustic spectrum was fitted using the APT MCMC formalism, but using two different sets of priors (see Table 3). The first set (S1) was constructed using only ignorance priors, while the second set (S2) includes knowledge about the stellar rotation. From spectroscopic and asteroseismic studies of HD 49933, Bruntt (2009) was able to constrain the rotation of the star to vsin(i) = 10 ± 1   km   s-1 and the radius to R/R = 1.385 ± 0.031, which can be combined to impose a constraint on the projected rotational splitting, , of 1.65 ± 0.17 μHz. In set S2 this knowledge is added as a gaussian prior on the projected splitting of the star. In both cases the fits were done using the following configuration:

  • 15 orders were fitted with  = 0,1,2 modes in a fitting window spanning from 1220 to 2465   μHz (see Fig. 5);

  • one linewidth and one height per order assigned to the  = 0 mode, and then linearly interpolated by frequency and scaled to the higher degree modes;

  • rotation and inclination angle fitted with the two free parameters, and i;

  • the background was parametrized as a sum of 3 Harvey-like models plus a white noise contribution;

  • 800 000 samples were drawn from the target distribution, employing 10 parallel chains.

First of all, it is important to note that the results are consistent with the ones reported in Benomar et al. (2009b). For example the derived frequencies and linewidths are all well within the error bars. We will here focus on the results of the rotational splitting and inclination angle. The probability density functions for the fitted parameters when using ignorance priors (S1) are shown in Fig. 6a and, after applying the Gaussian prior on the projected splitting (S2), the results change to the ones shown in Fig. 6b.

Table 3

Prior input for the HD 49933 analysis.

thumbnail Fig. 6

Results on the rotation and inclination angle of HD 49933. In both cases the prior on the inclination angle is uniform in the interval 0°–90°. a) We employ a uniform prior on rotation (S1). b) We employ a Gaussian prior on projected rotation (S2). The vertical lines in the histograms indicate the median and the boundaries of the 68% credible regions of the distributions. The dashed line in the top figures indicates the frequency resolution in the spectrum.

Open with DEXTER

What these results demonstrate is, first of all, that these techniques are extremely efficient at probing and constraining parameters which traditional methods would have considerable difficulty in constraining. In Benomar et al. (2009b), where similar techniques are employed, the derived inclination angle was and the rotational splitting placed in the range 3.5–6.0 μHz, with which our results are perfectly consistent with. Another point to be drawn from this example is the importance of the inclusion of prior knowledge. By incorporating our prior knowledge about the rotation of the star through the accurate measurements from a spectral analysis, we are able to yield a much cleaner constraint on the inclination angle, which in Fig. 6a has a considerable tail of probability towards large inclinations.

It is important to note that this prior on is strong in the sense that it dominates the fit. This simply comes from the fact that the data do not provide any further information on this parameter and so our prior knowledge of the model is still providing the best constraint. In such cases one of course has to be careful that such a strong prior is not wrongfully restrictive. If other effects (in this case, for instance, differential rotation or second-order rotational effects) were present, this could introduce biases to the fitted results. This could of course be tested using the methodology described in Sects. 3.2 and 4.4, by constructing models that incorporate these effects and testing their significance.

5.2. Procyon: the problem of ridge identification

Here we address the issue of tagging the oscillation modes by angular degree in the case of the F5 star Procyon. We have thus reanalysed the data acquired during a multi-site campaign (Arentoft et al. 2008; Bedding et al. 2010b) carried to observe oscillations in this star. The data consist of high-precision velocity observations obtained over more than three weeks with eleven telescopes, representing the most extensive campaign organised so far on a solar-type oscillator.

The problem of ridge identification in F stars dates back to when CoRoT observations of HD 49933 were first analysed (Appourchaux et al. 2008), a problem that would be recently solved for this star only after a new longer time series was made available (Benomar et al. 2009b). Bedding et al. (2010b) address this same problematic in the case of Procyon by employing three distinct methodologies: (i) a collapsed power spectrum along several radial orders; (ii) a scaled échelle diagram (Bedding & Kjeldsen 2010); and (iii) Bayesian model comparison (as described in Sect. 4.4). The last-mentioned methodology statistically favours their Scenario A over their Scenario B identification, whereas the first and second methodologies suggest the contrary although without quantifying their preference for Scenario B in a statistical sense.

We performed a peak-bagging of the power spectrum of Procyon considering both identification scenarios while simultaneously testing for the presence of  = 3 modes. This gives a total of four competing models, i.e., , the notation chosen to be unambiguous. The details of the peak-bagging as implemented here slightly differ from those presented in Bedding et al. (2010b), and especially concern the limits of the fitting window and the way in which the background was parametrized. The details are as follows:

  • The peak-bagging was performed on the sidelobe-optimisedpower density spectrum whose intrinsic frequency resolution is0.77   μHz. Peaks were described by symmetric Lorentzians centred on the mode frequencies. Three frequencies were fitted per overtone, each with a different angular degree ( = 0,1,2). Type of prior imposed at first: independent and uniform, centred ( ± 8   μHz) on the initial guesses. The mode frequencies were further constrained to lie close to the ridge centroids and to have only small jumps from one order to the next. Also, a Gaussian prior (μ = 4   μHz, σ = 5   μHz) was imposed on the small frequency separation, δν02, between adjacent modes with  = 0 and  = 2. The small separation was not itself a free parameter in the fit, but instead a derived quantity. Note that the last-mentioned constraint implies that the type of prior on the  = 0,2 frequency parameters is ultimately not independent nor uniform. Optionally, modes with  = 3 could be included in the model with their frequencies fixed to (34)according to the asymptotic relation (Tassoul 1980). A total of 14 overtones were considered and the fitting window runs from 500 to 1300   μHz. By employing this construction it is assumed that no mixed modes are present in the fitting window. The inclusion of  = 3 modes does not add any more free parameters, while adding however their features to the model spectrum which are derived from the fitted  = 0,1,2 frequencies.

  • The linewidth was parametrized as a linear function of frequency, defined by two parameters Γ600 and Γ1200, which are the values at 600 and 1200μHz, respectively. Both parameters were fitted. Type of prior imposed: uniform in the range 0–10μHz.

  • The heights of radial modes in units of power density were fixed according to Chaplin et al. (2008b): (35)where A2 is the total power of the mode as determined from the power envelope for radial modes (Kjeldsen et al. 2008), and T is the effective length of the observational run. The heights of non-radial modes were then defined based on the heights of radial modes according to Eq. (12), and taking into account the appropriate V/V0 ratios given in Table 1 of Kjeldsen et al. (2008).

  • The background was parametrized as a linear function of frequency since it had previously been suppressed at low frequencies (high-pass cut at 280   μHz) to effectively remove slow variations.

  • The inclination angle between the direction of the stellar rotation axis and the line of sight was fixed at 31.1°, which is the inclination of the binary orbit and is consistent with the rotational modulation of the velocity curve. The rotational splitting was fixed at 0.7   μHz, which was chosen to match the observed value of vsin(i) = 3.16   km   s-1 (Allende Prieto et al. 2002), given the known radius of the star of 2.05   R (Kervella et al. 2004).

  • We drew  ~800 000 samples from the target distribution after a burn-in phase. We employed 12 parallel tempered chains.

  • We thus have a total of 46 free parameters, namely, 42 frequencies, 2 parameters for the linewidth and 2 parameters for the background.

Table 4 summarises the model selection calculations assuming equal prior probabilities for the models belonging to our discrete model space. Individual probabilities are assigned to models according to Eq. (21). Similarly to Bedding et al. (2010b), Bayesian model comparison again statistically favours Scenario A over Scenario B. Furthermore, the presence of residual power due to  = 3 modes is suggested. Computing Bayes’ factor in favour of model over model gives a factor of approximately 9:1 or, equivalently, a logarithmic factor of 2.2, which classifies as “significant” on Jeffreys’ scale. Figure 7 displays the power density spectrum of Procyon in échelle format with the fitted frequencies for model overlaid.

Table 4

Model probabilities.

thumbnail Fig. 7

Power density spectrum of Procyon (smoothed to 2   μHz) in échelle format. The fitted frequencies for model appear as overlaid filled symbols. Symbol shapes indicate mode degree:  = 0 (circles),  = 1 (triangles),  = 2 (squares).

Open with DEXTER

6. Summary and discussion

In this paper, we have presented the basic theory and methods behind the extraction of parameters from the power spectra of solar-like stars. In order to handle the ever rising quality and complexity of modern asteroseismic data, we have developed a tool (APT MCMC) that enables us to constrain parameters associated with the subtlest features in the spectra. The algorithm has been extensively tested and performs extremely well, not only in the traditional case of extracting oscillation frequencies, but also when pushing the limit where traditional methods have difficulties, such as constraining linewidths, rotational splittings and stellar inclination angles. In this work we have focused on data in the signal-to-noise regime of current asteroseismic measurements. In the case of very high signal-to-noise ratios, other features in the power spectrum becomes important, such as mode asymmetries and rotational splittings dependent on , arising from differential rotation with radius. In future work these effects will be incorporated into the program and tested on solar data.

One disadvantage of the method is that it can be quite computationally intensive, both to implement and run, when compared to traditional MLE fits. This is however balanced by the much added information outputted from the fits, specifically in the probability distributions of each parameter, making it easy to obtain accurate, reliable and realistic error bars on the results – a feature seriously missing from the traditional methods. The parameter estimation also benefits enormously from the possibilities the Bayesian formalism provides with inclusion of prior information. This not only allows control of the fit to, for example, not allow unphysical parameter combinations, but also include information into the fit that is better constrained by other measurements (as we saw in Sect. 5.1). Another powerful feature of the method lies in the parallel tempering, which not only keeps the fits from getting stuck in local maxima, but also provides an objective way of comparing different competing models, as it provides a way of calculating the global likelihood. This can for example be utilized in the familiar problem of ridge identification in solar-like stars (see Sect. 5.2).

A thing to keep in mind is also that the APT MCMC algorithm is completely general, in the sense that it could be applied to other problems without modification. MCMC methods are being used in various branches of astrophysics: cosmology (Liddle 2009), extra solar planets (Gregory 2005a) and stellar model fitting (Bazot et al. 2008), but in fact the methods would be applicable in any problem including parameter estimation. And as computational power continues to grow, the downsides are quickly becoming insignificant.

What could to some extent also be seen as a disadvantage of these methods is that they can never be fully automated, in the sense that they will not be able to handle a large number of stars without human interaction. The whole fundamental idea behind the Bayesian formalism is that it relies on “wise” human inputs on the priors and model setup that should not be done in an automated way. If nothing else, take this as a positive reassurance: You will, as an astrophysicist, never be obsolete to computers or monkeys with keyboards.


1

The term “peak-bagging” has become the customary name for the examination of individual oscillation peaks in the field of asteroseismology. The origin of the name is explained in Appourchaux (2003b).

2

To be precise, we will be modelling the power density spectrum and not the power spectrum. The former is independent of the window function and is obtained by multiplying the power spectrum by the effective length of the observational run, which can in turn be calculated as the reciprocal of the area integrated under the spectral window.

3

A remark on the notation: Xt may be thought of as a single vector in parameter space.

Acknowledgments

We would like to thank H. Kjeldsen, T. R. Bedding, C. Karoff, W. J. Chaplin, T. Appourchaux, R. A. García and O. Benomar for fruitful discussions and valuable comments. We also thank the anonymous referee for comments that helped improve the paper. T.L.C. is supported by grant with reference number SFRH/BD/36240/2007 from FCT/MCTES, Portugal. This work was supported by the project PTDC/CTE-AST/098754/2008 funded by FCT/MCTES, Portugal.

References

Appendix A: Computing Em(i) and V

The Eℓm(i) factors are given below for  ∈  [0,4] , having used Eq. (9): (A.1)Notice that when the rotation axis is aligned with the line of sight (i = 0°), only the multiplet component with m = 0 is visible, thus making inviable an inference of rotation.

The spatial response function for each , V, representing the ratio of the observed amplitude to the actual amplitude, is given here for the five lowest degree modes (Bedding et al. 1996): (A.2)where u2 and v2 are wavelength-dependent classical limb-darkening coefficients (Allen 1973) and c is a parameter defining the observational method. This matrix product can be used for velocity measurements by setting c = 1 and for intensity measurements by setting c = 0.

All Tables

Table 1

Relative spatial response functions, V/V0, for a number of present and upcoming instruments/missions.

Table 2

Jeffreys’ scale.

Table 3

Prior input for the HD 49933 analysis.

Table 4

Model probabilities.

All Figures

thumbnail Fig. 1

Artificial power density spectrum of a  = 0 singlet and a  = 1 multiplet. One has νs = Γ = 3   μHz. Notice how the  = 1 multiplet splits into its m components. The power spectrum (grey) is distributed around a mean spectrum (black) with an exponential probability distribution.

Open with DEXTER
In the text
thumbnail Fig. 2

Schematic representation of the functioning of the parallel tempering mechanism, whereby tempering chains are allowed to swap their parameter states (swaps are indicated by vertical arrows).

Open with DEXTER
In the text
thumbnail Fig. 3

Version of the Metropolis-Hastings algorithm written in pseudocode and with the inclusion of parallel tempering.

Open with DEXTER
In the text
thumbnail Fig. 4

The same target distribution sampled by three chains, each being characterised by a different set  {σ} . The contours map the target distribution, which in turn depends only on the parameters Θ1 and Θ2. The starting point in each of the chains is (Θ12) = (−4.5, −4.5) and all contain 2000 iterations. Both parameters share a common σ-value whose optimal setting is σ = 1. It is important to note that given a sufficiently large number of iterations, all the chains would eventually map out the target distribution, however an optimal choice of the proposal distribution will result in significantly faster convergence.

Open with DEXTER
In the text
thumbnail Fig. 5

Power density spectrum of HD 49933 with the best-fitting model (using prior set S2) overlaid. The shaded areas indicates the ranges of the uniform priors on the frequencies.

Open with DEXTER
In the text
thumbnail Fig. 6

Results on the rotation and inclination angle of HD 49933. In both cases the prior on the inclination angle is uniform in the interval 0°–90°. a) We employ a uniform prior on rotation (S1). b) We employ a Gaussian prior on projected rotation (S2). The vertical lines in the histograms indicate the median and the boundaries of the 68% credible regions of the distributions. The dashed line in the top figures indicates the frequency resolution in the spectrum.

Open with DEXTER
In the text
thumbnail Fig. 7

Power density spectrum of Procyon (smoothed to 2   μHz) in échelle format. The fitted frequencies for model appear as overlaid filled symbols. Symbol shapes indicate mode degree:  = 0 (circles),  = 1 (triangles),  = 2 (squares).

Open with DEXTER
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.