EDP Sciences
The CoRoT space mission: early results
Free Access
Issue
A&A
Volume 506, Number 1, October IV 2009
The CoRoT space mission: early results
Page(s) 15 - 32
Section Stellar structure and evolution
DOI https://doi.org/10.1051/0004-6361/200911657
Published online 03 September 2009

The CoRoT space mission: early results

The solar-like oscillations of HD 49933: a Bayesian approach[*],[*]

O. Benomar - T. Appourchaux - F. Baudin

Institut d'Astrophysique Spatiale, CNRS/Université Paris XI, Bâtiment 121, 91405 Orsay Cedex, France

Received 13 January 2009 / Accepted 27 August 2009

Abstract
Context. Asteroseismology has entered a new era with the availability of continuous observations from space-borne missions such as MOST, CoRoT and Kepler. However, the low amplitude and the complexity of the observed spectrum make the exploitation of these data sets difficult.
Aims. The use of robust methods to estimate the parameters of stellar oscillation eigenmodes is necessary to fully exploit these new data sets. These parameters include in particular the frequency, the width and the energy of the eigenmodes, all being required for a seismic interpretation of the stellar internal structure or excitation of the eigenmodes.
Methods. A Bayesian approach, coupled with a Markov chain Monte Carlo (MCMC) algorithm, is presented. Such a method allows the use of a priori knowledge to improve the parameter estimation. It also provides complete information on the probability distribution of the fitted parameters. The method is tested on simulated time series and then applied to CoRoT observations of HD 49933.
Results. The simulated time series allow the validation of the method for conditions similar to those of the observations in terms of spectral complexity and signal-to-noise ratio. However, a very important problem in the analysis of the HD 49933 mode spectrum is the l degree identification of the modes. The degree identification has little impact on the large frequency separation, rotational splitting, energy and width estimation, whereas individual frequencies and the star inclination angle evaluation are strongly affected. From a statistical point of view, we provide a quantitative ranking of the four models considered. The most probable model includes only modes of degree 0 and 1. Two other models include modes with degree up to 2 and have a non negligible level of significance. The last model includes modes of degree 0 and 1 but has an alternate degree identification and can be definitively rejected. In conclusion, the significance of the resulting probabilities is not sufficient to draw a definite conclusion.

Key words: methods: statistical - methods: observational - stars: oscillations

1 Introduction

The advent of space-borne instruments promises a great harvest of seismic data from stars, with the launch of MOST (2003), CoRoT (2006) and Kepler (2009). Our knowledge of internal stellar structure will certainly be revised and extended in the coming years.

Asteroseismology is based essentially on the analysis of the oscillatory variation of an observable (intensity or surface velocity) of the star. Resonant modes appear as peaks in the Fourier spectrum of this observable, characterized by their frequency, height and width, on which further seismic interpretation is based. For stochastically excited oscillations (also called solar-like oscillations), the mode parameters are determined by maximizing the likelihood of a model spectrum with respect to the observed power spectrum (Toutain & Fröhlich 1992; Appourchaux et al. 1998; Anderson et al. 1990). This is a complex and sensitive process, subject to spurious estimates if, for example, the signal-to-noise ratio is low, which could introduce local maxima of the likelihood leading to wrong estimates. If one aims at an automatic analysis of a large quantity of data, expected for present and future space missions, more robust methods are necessary.

A promising way to accomplish this is to use a Bayesian analysis. The Bayesian approach is well suited to extract as much information as possible from a given data set by including a priori knowledge from other data sets or theoretical assumptions. This approach determines the so-called a posteriori probability distribution of a given parameter from a combination of the likelihood of a model with respect to the observation with an a priori probability distribution for this parameter, which reflects some supposed knowledge of this parameter. Moreover, based on these posterior probabilities, the Bayesian analysis allows the comparison of different models to be fitted to the data and provides a quantitative means to decide which model is the most probable. Recently, more and more astronomers have begun to use this approach in different domains (e.g. Gregory 2005a; Parkinson et al. 2006; Brewer et al. 2007; Carrier et al. 2007).

The efficiency of this approach is enhanced if it utilizes all of the information contained in the posterior probability distribution function (pdf), and not only on the determination of its maximum. In some cases, the probability distribution of a parameter will not be as simple as a Gaussian or any other mono-modal function but may consist of multi-modal functions. When dealing with a small number of parameters (of the order of a dozen), analytical or numerical approximations are well suited and allow the construction of the probability distribution of the parameters. This becomes problematic when the number of parameters is large (several tens). In this case, it is preferable to use stochastic simulations and especially Markov chains based on Monte Carlo simulations (hereafter MCMC) which provide a solution to the difficult problem of the sampling of a space with a large number of dimensions. An MCMC can be considered as a chain of random walks in the parameter space in which each step depends only on the immediate previous position: this is a weak memory process. An MCMC is able to sample, for example, a probability density function (pdf) whatever the complexity of its analytical expression. Moreover, compared to most analytical or numerical strategies (such as gradient methods used to find the maximum of a function), they are insensitive to the initial starting point in the parameter space. An MCMC is able to explore and sample all of the regions of the parameter space where the pdf is significant. Consequently, if multiple maxima exist, an MCMC based algorithm will sample all of them and not only one or a few of them. More technical details about MCMC are given in the Appendix.

In Sect. 2, we consider the issues related to solar-like stars and the analysis of their power spectra. In Sect. 3, we present technical aspects regarding the Bayesian mathematics. Section 4 presents our results with simulated data illustrating the efficiency of the Bayesian approach coupled to the MCMC sampling technique. Section 5 presents our results on a star observed by CoRoT (HD 49933). This is a difficult case, and we discuss our results and earlier ones. Section 6 summarizes the results and perspectives.

2 Solar-like physical considerations

A star acts as a resonant cavity and different modes can be excited. These resonances appear as peaks in the Fourier spectrum of, for example, stellar photometric variations. We deal here with solar-like pulsators, or stars showing many acoustic modes (p modes) that are stochastically excited. In the solar case, the source of excitation is the convection of the upper layers of the star. To a first approximation, each mode can be characterized by a lifetime, a frequency and an energy, and described as a Lorentzian profile in the power spectrum. Hence, the stellar oscillation spectrum can be expressed as:

\begin{displaymath}S(\nu,\vec{\theta}_S)=\sum_{n=n_0}^{N_{\rm max}} \sum_{l=0}^{...
...m)}{1+4(\frac{\nu-\nu(n,l)+ m \nu_{\rm s}}{\Gamma(n,l,m)})^2}}
\end{displaymath} (1)

where $H(n,l,m), \nu(n,l), \nu_{\rm s}$ and $\Gamma(n,l,m)$ are the height, the central frequency, the rotational splitting and the width at half height, respectively, for each mode, defined by the integers (n,l,m) for the radial order, the degree and the azimuthal order. For simplicity, the set of the p modes is simply written as the vector $\vec{\theta}_S$. In the solar-like pulsator case, and in the asymptotic approximation, the central frequency of a set of modes (n,l) follows a pattern:

\begin{displaymath}
\nu(n,l) \approx (n+l/2+\epsilon)\Delta \nu-l(l+1) D_0
\end{displaymath} (2)

with $\Delta\nu= \nu(n+1,l)-\nu(n,l)$ and D0 is related to the sound speed at different depths. The large separation $\Delta \nu $ is related to the mean stellar density and D0 to more local properties. $\epsilon $ is a parameter related to surface effects, whose value is between 1 and 2 for solar-like stars.

The stellar signal also includes a background component. In the solar case, this background can be described by a sum of Lorentzian-like profiles, first introduced by Harvey (1985),

\begin{displaymath}
N(\nu,\vec{\theta}_N)=\sum_i\frac{A_i}{1+(B_i \nu)^{p_{i}}} + C
\end{displaymath} (3)

where $\nu$ is the frequency, Ai, Bi, pi are respectively the height, the characteristic time scale and the slope of the power law of the ith component. C represents a white noise added to describe, for example, the photon shot noise. The description provided by Eq. (3) also applies to any instrumental noise. All background parameters are summarized in the vector $\vec{\theta}_N$. A typical time scale of 200-500 s is attributed to the solar granulation noise, the main source of background in the p mode frequency range (Aigrain et al. 2004). Solar activity has a longer typical time scale and plays a negligible role in the frequency range where most of the p modes are observed.

Finally, writing the entire parameter set $\vec{\theta}=\left\{\vec{\theta}_S,\vec{\theta}_N\right\}$, a representative model of the solar-like stellar spectrum is:

\begin{displaymath}M(\nu,\vec{\theta})=S(\nu,\vec{\theta}_S)+N(\nu,\vec{\theta}_N).
\end{displaymath} (4)

  3 Bayesian approach

  3.1 Bayesian formalism

In the case of photometric measurements, the noise affecting the stellar intensity variations obeys Gaussian statistics (Appourchaux et al. 1998). Hence, the imaginary and real part of the Fourier spectrum are also Gaussian. Thus, the noise power spectrum follows a $\chi ^2$ distribution with 2 degrees of freedom. Knowing the statistics of data points yi is sufficient to derive the likelihood function given the model $M_j(\nu,\vec{\theta})$  (Duvall & Harvey 1986),

\begin{displaymath}
\pi(y\vert\vec{\theta},M_j,I)=\prod_{i=1}^N \frac{1}{M_j(\nu_i,\vec{\theta})}{\rm e}^{-\frac{y_i}{M_j(\nu_i,\theta)}}
\end{displaymath} (5)

where N is the number of frequency samples $\nu_i$. I is related to all other contextual information (assumptions on the signal and the physics of the problem, explicit or not). Equation (5) is applicable only if the sampled frequencies are independent. In the Maximum Likelihood Estimator (MLE) approach, the best set of parameters is derived by maximizing this function (Anderson et al. 1990).

From Bayes theorem, we can define the posterior pdf:

\begin{displaymath}\pi(\vec{\theta}\vert y,M_j,I)=\frac{\pi(\vec{\theta}\vert M_j,I) \pi(y\vert\vec{\theta},M_j,I)}{\pi(y\vert M_j,I)}
\end{displaymath} (6)

where $\pi(\vec{\theta}\vert M_j,I)$ is our explicit a priori knowledge (hereafter called ``prior'') of the parameter set of the model Mj, and $\pi(y\vert M_j,I)$ is a normalization constant. To apply a priori information about a particular parameter (or to a subset of parameters), we use the product rule in the case of independent variables: the global prior is simply the product of the individual priors. For example, we can separate the prior associated with the noise from the prior of the p modes,

\begin{displaymath}\pi(\vec{\theta}_S,\vec{\theta}_N\vert M_j,I)=\pi(\vec{\theta}_S\vert M_j,I)\pi(\vec{\theta}_N\vert M_j,I).
\end{displaymath} (7)

  3.2 Model comparison

In the Bayesian formalism, it is also possible to compare different models by using Bayes' rule to define the posterior probability P(Mj|y,I) of a model Mj being consistent with the data y,

\begin{displaymath}
P(M_j\vert y,I)=\frac{P(y\vert M_j,I)P(M_j\vert I)}{P(y\vert I)}\cdot
\end{displaymath} (8)

P(y|Mj,I) represents the global likelihood, P(Mj|I) the prior of the model and P(y|I) a normalization constant for the whole model space. In the present comparisons, we attribute the same prior probability to each model. Thus, comparing models Mj and Mk in a Bayesian approach relies on the ratio of the global likelihoods only. OMj,Mk is called the odds ratio,

\begin{displaymath}
O_{M_j,M_k}=\frac{P(M_j\vert y,I)}{P(M_k\vert y,I)}=\frac{P(y\vert M_j,I)}{P(y\vert M_k,I)}\cdot
\end{displaymath} (9)

This is different from the MLE approach where we compare maximum likelihoods. The model Mj is more favored by the data than Mk if OMj,Mk>1. In practice, when 1<OMj,Mk<3, Mj is barely supported but this support becomes substantial for OMj,Mk>3. OMj,Mk=3 corresponds to 75% probability. We can also calculate the relative probability of a model Mj over the discrete explored model space of size $N_{\rm MOD}$,

\begin{displaymath}
P_{\rm R}(M_j\vert y,I)=\frac{P(y\vert M_j,I)P(M_j\vert I)}{\sum^{i=N_{\rm MOD}}_{i=1} P(y\vert M_i,I)P(M_i\vert I)}\cdot
\end{displaymath} (10)

The calculation of the global likelihood requires a process known as marginalization. It consists of removing the parameter dependence by integration over all of the variable parameters of the conditional likelihood weighted by the prior pdf of the considered model,

\begin{displaymath}
P(y\vert M_j,I)=\int\pi(y\vert\vec{\theta},M_j,I)\pi(\vec{\theta}\vert M_j,I){\rm d\vec{\theta}}.
\end{displaymath} (11)

Marginalization is difficult to achieve analytically in most cases because of the complexity of the function to be integrated. With the expression of the likelihood of Eq. (5), it is not possible and only a numerical evaluation is feasible. If the size of the parameter space is small, we can evaluate the integral by a simple Monte-Carlo approach, but this becomes very difficult for a large (several tens) number of parameters. An alternative approach consists of sampling the parameter space by using a MCMC algorithm. MCMC simulations have the advantage of providing an integral evaluation and thus marginalization capabilities from the sampling of the function to be integrated. This makes the global likelihood computation easy, even for complex expressions such as $\pi(y\vert\vec{\theta},M_j,I)$ and $\pi(\vec{\theta}\vert M_j,I)$. In practice, computing the mean of the conditional likelihood weighted by the priors consists of an evaluation of Eq. (11). We provide more technical information on the evaluation of Eq. (11), using MCMC results in Sect. A.3.

  3.3 Bayesian priors for solar-like pulsators

We can now apply the Bayesian approach using our knowledge of solar-like stars. We will restrict ourselves to spatially unresolved photometric stellar observations: in the best case, the mode visibility is too low to distinguish modes with a degree l higher than 3 in the power spectrum. Referring to Eq. (2) we see that the frequency of the l=2 and l=3 modes can be written relative to the l=0 and l=1 modes, respectively:

  
$\displaystyle \delta \nu_{02}(n) = \nu_{n+1,2}-\nu_{n,0} \simeq 6 D_0$     (12a)
$\displaystyle \delta \nu_{13}(n) = \nu_{n+1,3}-\nu_{n,1} \simeq 10 D_0.$     (12b)

These differences are called the small separations. Each l=0 mode has an l=2 companion. Similarly, each l=1 mode has an l=3 neighbor. In the present case, as was already noted by Appourchaux et al. (2008), Benomar (2008) and Gruberbauer et al. (2009), it is clear that the signal-to-noise ratio is low, even for the l=0 and l=1 modes. These modes are supposed to have the highest signal-to-noise ratio and the l=3 mode is known to have a very low relative height ( $H_{l=3}/H_{l=0} \approx 0.03$ in the solar case). Moreover the computation time is tremendously increased by adding this mode. Therefore, we assume that the l=3 modes are not visible, but still keeping in mind that they may contribute to the noise background.

In an observed power spectrum, we can have two different l degree identifications (hereafter, models $M_{\rm A}$ and $M_{\rm B}$) for a given peak in the spectrum: one can choose to consider that this peak is made of a pair of l=0,2 modes or just made of an l=1 mode, as these two structures alternate in the spectrum. If the l=2 modes appear clearly in the spectrum, there is no doubt about the identification as it makes the peak wider. But if this is not the case (e.g. low signal-to-noise condition, say H/N < 10 for the l=0 modes, or l=0 and l=2 mode overlap), the identification becomes difficult. In the Bayesian context, the mode identification can be achieved using the odds ratio (Eq. (10)). In this case, the equation can be rewritten as

\begin{displaymath}P_R(M_{\rm A}\vert y,I)=\frac{1}{1+O_{M_{\rm A},M_{\rm B}}^{-1}}\cdot
\end{displaymath} (13)

By convention, we will denote by $M_{\rm A}$ the model with the highest probability.

To evaluate the two hypotheses, we propose to use a prior for the frequencies of the l=0 and l=1 modes of order n between  $n_{\rm min}$ and  $n_{\rm max}$:

$\displaystyle \pi(\nu_{n,0},\nu_{n,1}\vert M_j,I)$ = $\displaystyle \frac{1}{2(n_{\rm max}-n_{\rm min})\sigma \sqrt{2\pi}}$  
    $\displaystyle \times\sum_{n=n_{\rm min}}^{n_{\rm max}}{\left[ {\rm e}^{-\frac{(\nu-\nu(n,0))^2}{2\sigma^2}}+{\rm e}^{-\frac{(\nu-\nu(n,1))^2}{2\sigma^2}}\right]}$ (14)

where $\sigma $ is the standard deviation of the a priori Gaussian distribution of both l=0 and l=1 mode frequencies. We choose a $\sigma $ large enough to not strongly influence of the frequency posterior estimation $\nu(n,l=0)$ and $\nu(n,l=1)$ but not so large as to avoid, during the parameter space exploration process, the mode identification inversion (l=0 for l=1, and vice versa). The priors of the l=0 and l=1 mode frequencies are generated using the asymptotic formula (Eq. (2)). We also impose a Gaussian prior on the mean value of the large separation $\Delta \nu $, based on an a priori estimation of $\Delta \nu $ obtained by computing the autocorrelation of the power spectrum. $\epsilon $ is fixed from the observed spectrum. D0 is chosen according to an a priori stellar model (cf. Eq. (12)).

The priors for l=2 mode frequencies are described by a Gaussian prior to the small separation,

\begin{displaymath}
\pi(\delta \nu_{02}(n)\vert M_j,I)=\frac{1}{\sigma_{02} \sq...
...ngle \delta\nu_{02}\rangle_{\rm th}\right)^2}{2\sigma_{02}^2}}
\end{displaymath} (15)

where $\langle \delta\nu_{02}\rangle_{\rm th}$ is the mean value of the expected small separation between the l=0 modes and the l=2 modes, according to a stellar model. The standard deviation $\sigma_{02}$ drives the prior l=2 frequency dispersion.

Here, we define and use as a parameter the mode height of a mode relative to the height of the radial mode l=0: Vl=1=Hl=1/Hl=0 and Vl=2=Hl=2/Hl=0. In the case of long duration observations, the mode height ratios are essentially constrained by a geometrical visibility factor (Toutain & Gouttebroze 1993) and are thus independent of $\nu$. For the inputs to our simulations (Sect. 4), we used the solar relative height ratio for unresolved photometric observations: Vl=1=1.5 and Vl=2=0.5. However, for the analysis, the relative heights are not fixed, unlike previous approaches (Benomar 2008; Appourchaux et al. 2008).

The Jeffrey prior ( $P_{\rm J}(x) \propto 1/x$) can be appropriate in the case of scale parameters (e.g. parameter such as f(x)=ax, where a is a ``scale'' factor), which is the case of the height and the width of the modes. Such a prior is non-informative and uniform in the logarithmic domain. Jeffrey (1961) justified the choice of this kind of prior as they are invariant under parameter transformation. However, this prior is improper (integration over x yielding an undefined result) and it is common to use a truncated version of this prior: $P_{\rm J} (x)={C}/(x+x_{\rm min})$ where $C=\ln(1+x_{\rm max}/x_{\rm min})$ is the normalization constant. $x_{\rm min}$ and $x_{\rm max}$ are upper and lower bounds such that if $x>x_{\rm max}$ or $x<x_{\rm min}$ then $P_{\rm J} (x)=0$. Thus, the probability density is proper (integral over x is finite). Unfortunately the truncation, although it renders the probability density proper, may not follow the prescription of Jeffrey (1961) of invariance. This should be borne in mind.

Another non-informative prior is the uniform prior ( $P_{\rm J} (x)={1}/(x_{\rm max}-x_{\rm min})$ if $x_{\rm min}<x<x_{\rm max}$, else $P_{\rm J} (x)=0$). It is recommended by Jeffrey when dealing with ``location'' parameters (parameters such as f(x)=x+a), which is the case of the mode frequencies. We compared the effect of these two priors applied to the height and width on the model probability in the case of scenario 4 and 5 (see the next section for more information about the explored scenarios). In these scenarios, we compare two models $M^2_{\rm A}$ and $M^2_{\rm B}$. We note that using a Jeffrey prior for the scale parameters leads to an increase in the contrast of the relative model probability (about 0.2-0.4 dex or 10%-20% in probability). However, this variation can also be attributed to the probability estimation method itself (i.e. numerical calculation in Sect. A.3). The estimate of the uncertainty of the probability calculated by MCMC is still an open issue: for the conditions described throughout this article, two successive and independent algorithm executions on the same data set provide probability results with a log probability difference of the order of 0.2 dex.

The relative heights H(n,l,m) of the split components of nonradial modes depend on the angle between the line of sight of the observer and the rotation axis of the star (see for ex. Gizon & Solanki 2003). Knowing that, we can define the heights of these components as a function of the angle. Depending on the available information, an angle prior can be set or not; here, in the case of the analysis of the simulated spectra (Sect. 4) or the real data (Sect. 5), a uniform prior over the range 0 $^\circ{-}90^\circ$ is set.

Regarding the background, we adopted a two-step strategy:

1.
we fit the power spectrum over a large frequency range (in the case of HD 49933, 0-5000 $\mu {\rm Hz}$ after removing the spacecraft orbital peaks) with 3 Lorentzian shaped components and a white noise (Eq. (3)) plus a Gaussian profile (characterized by its height A0, its central frequency $\nu_{\rm c}$, and its width $\sigma_{\rm G}$) to globally describe the modes. The slope of the Lorentzian profiles are assumed to be equal ( p1=p2=p3=p). At this point we have 11 free parameters;
2.
we verify that the Lorentzian describing the granulation background is effectively the dominant one in the p mode frequency range. Then, we neglect the two other components and we use the posterior results to constrain the background in the mode range (1210-4000  $\mu~{\rm {Hz}}$) with Gaussian priors.
This method ensures a robust estimation of the background at low frequency: this is an important point as we are dealing with low amplitude modes.

4 Illustrative simulations

  4.1 Simulation conditions

 \begin{figure}
\par\includegraphics[angle=90,width=7cm,clip]{11657_E.ps}\par\vspace*{2mm}
\includegraphics[angle=90,width=7cm,clip]{11657_F.ps}
\end{figure} Figure 1:

a) Smoothed power spectrum of HD 49933, showing activity (dot-dashed line) and granulation (solid line) backgrounds and stellar modes.The modes are first modeled by a Gaussian (solid blue curve). b) Mode frequency range with frequency priors (l=0: red dash line; l=1: green dot line) for the mode identification A.

Open with DEXTER

In order to check the reliability (in particular for the mode identification) of the method presented here, simulated spectra were simulated with different inputs in terms of noise level and stellar parameters, yielding an ensemble of six test scenarios. All of them correspond to the mode identification A used for the analysis of HD 49933 (see Sect. 5). Most of the synthetic spectra (scenarios 1 to 5) contain the 3 first degrees (l=0 to l=2) for 8 radial orders. The relative heights Vl=1 and Vl=2 are fixed at the solar values. The background is composed of a Lorentzian profile plus white noise. The frequency resolution is set to 0.2  $\mu~{\rm {Hz}}$ which corresponds to an observation duration of about 58 days. Different scenarios were tested by changing the height, the width, the rotational splitting and the small separation $\delta \nu _{02}$. The frequencies follow an asymptotic law, with $\Delta\nu=90~\mu$Hz. The inclination angle is set to 40$^\circ$. Scenario 6 considers the case of a power spectrum with only 2 degrees (l=0 and l=1). For a summary of the input spectrum parameters, see Table 1.

We explored four kinds of scenarios:

1.
relatively high H/N ratio (>30), distant neighboring l=2 and l=0 modes and small rotational splitting ( $\nu_{\rm s}\ll \delta\nu_{02}$). These conditions reduce the mode multiplet superposition (scenario 1);
2.
identical to the previous scenario but for a relatively low H/N ratio, around 4 (scenario 2);
3.
intermediate H/N ratio (around 3 to 12) with a $\delta \nu_{02} \sim \nu_{\rm s}$, leading to a superposition of the split components of the l=2 and the l=0 modes. The widths are chosen particularly large to increase the difficulty of separating the neighboring modes (scenario 3 to 5);
4.
scenario 6 is designed to test the algorithm in the case of an overinterpretation of the spectrum: only l=0 and l=1 modes are included in the simulated spectrum but mode indentifications including l=2 modes are tested (thus, four models are tested: the two possible mode identifications A and B, with and without l=2).
The different priors are those described in Sect. 3.3 and the typical values are summarized in Table 2. For scenarios 1 and 2, we suppose that the mean value of the rotational splitting and the small separation is known and we set the standard deviation for these parameters at respectively 10% and 50% of their prior mean value. In scenarios 3 and 5, we use a small separation prior that is not centered on the real value, in order to evaluate the influence of a biased prior choice.

Table 1:   Input parameters used for the simulated spectra.

Table 2:   Priors used for the analysis of the simulated spectra for the inclination angle i, the rotational splitting $\nu _{\rm s}$, the large separation $\Delta \nu $ (Eq. (2)) and the small separation $\langle \delta\nu_{02}\rangle_{\rm th}$ (Eq. (15)).

  4.2 Simulation results

Table 3:   Top: model odds ratio $O_{M_{\rm A},M_{\rm B}}$ for scenario 1 to 5 (which include l=2 modes). Bottom: odds ratio for scenario 6, comparing 4 models (identifications A and B, with or without l=2).

For each scenario, we acquire 1 000 000 samples after a burn-in phase, during which the MCMC algorithm progresses towards the values of interest (i.e. with significant probabilities) while the acceptance rate is adjusted.

For scenarios 1 to 3, the mode identification is correct with a confidence of more than 99.99%. The odds ratio is systematically in favor of the correct identification even for low signal-to-noise conditions. We remark that the l=0-2 modes overlap and the large width $\Gamma$ strongly decrease the odds ratio (see Table 3). For very large width, the odds ratio becomes ambiguous (scenario 5) and we are not able to decide which model is the most realistic. In scenario 6, the priors on the small separation $\delta \nu _{02}$ act as expected by penalizing the more complex models. The most probable model is the correct one but with a confidence of only about $64 \%$.

 \begin{figure}
\par\mbox{\subfigure{\includegraphics[angle=90,width=3.6cm,clip]{...
...figure{\includegraphics[angle=90,width=3.6cm,clip]{11657_H.ps} } }\end{figure} Figure 2:

l=2 mode relative height from simulation (scenario 3) for the correct identification  a) and for the wrong identification  b). In the correct identification, due to the l=0 and l=2 mode superposition, the relative height H2/H0 is underestimated by 30%. In the wrong identification, the distribution of the height follows the $\chi ^2$ distribution, as noise would.

Open with DEXTER

The observation of the frequency posterior distributions shows that an error on the model choice easily leads to multimodal pdf. The relative height Vl=2 is also strongly affected as it follows a $\chi ^2$ distribution with a 2 d.o.f. statistics in the wrong identification case, similarly to the noise distribution. In scenario 3, the retrieved mean median value of the small separation $\delta \nu _{02}$ is $6.1\pm 0.7~\mu$Hz, to be compared to the input value of $5.0~\mu$Hz. The prior and/or the l=0, l=2 mode overlap seems to shift the estimation by about 1  $\mu {\rm Hz}$. However, this offset remains smaller than the 2$\sigma $ interval.

\begin{figure}
{\vbox{
\hbox{
\epsfig{figure=11657_I.ps,width=4.0cm,angle=90}\ep...
...ps,width=4.0cm,angle=90}\hfill
\hspace{3mm}
\parbox[b]{50mm}{
}
}}}
\end{figure} Figure 3:

HD 49933 typical frequency, height, width and amplitude pdf (here for model $M^1_{\rm A}$). Frequencies and amplitudes generally have a Gaussian shape, while widths and heights follow a log-normal distribution.

Open with DEXTER

 \begin{figure}
\par\mbox{\subfigure{\epsfig{figure=11657_N.ps,width=4cm,angle=90...
...}
\subfigure{\epsfig{figure=11657_S.ps,width=4cm,angle=90} }}
\par
\end{figure} Figure 4:

Comparison of the output and input values for simulated spectra, for scenario 3 ( a) b) c)) and for scenario 4 ( d) e) f)). The box represent the 68.3% confidence interval (1$\sigma $ equivalent) centred on the median. The lines indicate a 95.5% confidence range (2$\sigma $ equivalent). The frequency difference and its error ( a) d)) for the l=0 and l=2 modes (black solid and grey dotted line, respectively) is represented as a function of the quantity n+l/3. Note the bias for the l=2 frequencies due to the prior choice in the scenario 3. For heights ( b) e)) and widths ( c) f)), we compare the input model value (dot-dashed line) to the inferred values. The noise background is also represented (dashed line). The inferred values are generally within the 95.5% confidence range. Note also the anti-correlation between height and width.

Open with DEXTER

The non-Gaussian shape of the pdf obtained in some cases (especially for heights and widths), and the multi-modal shape of the pdf, raise the problem of the estimator choice and of the error bar definition. In statistical terms and in terms of parameter estimation, it is interesting to refer to the median which is considered to be a good alternative to the Maximum a posteriori (MAP) inference (Gregory 2005b). The maximum of a pdf is a poor estimator as it does not provide global information about the pdf. It is also very difficult to evaluate the maximum of a pdf as the sampling may not be sufficient. To illustrate the inadequacy of the pdf maximum as a good estimator, we can consider the example of the $\chi ^2$ noise pdf. It is a decreasing exponential and the maximum will give a zero value whatever the noise level. Moreover, the width and mode height are known to not be distributed as a Gaussian.

Generally, the mode parameter pdfs are supposed has being Gaussian and as a consequence, the error bars are symmetrical and given at $\pm$1$\sigma $ which corresponds to a 68.3% confidence interval. As we often deal with more complex pdfs, we must provide either the pdf or multiple confidence indicators to summarize the function and describe how spread it is. To fully describe the variety of the pdfs, we decided to provide the confidence intervals at 68.3% and 95.5%, (corresponding to 1 and 2$\sigma $, respectively, for a Gaussian distribution) centred on the median. The confidence interval determination involves a cumulative probability function calculation. Thus, the tables of results contain 5 values per parameter: the median and asymmetrical $1 \sigma$ and $2 \sigma$ equivalent confidence ranges.

Following these definitions, most of the inputs used to build the different synthetic spectra are within a 95.5% confidence interval of the estimated values resulting from the analysis of these spectra. All of them exceed 99.7% (3$\sigma $ equivalent) confidence. The estimates of height and width are poor even in good signal-to-noise cases (see Fig. 4), whereas frequencies are very well determined, with a typical error tending to the spectrum resolution in the most favorable cases (scenario 1). A systematic underestimation ($\sim$$30\%$) of the relative height H2/H0 occurs in the case of l=0 and l=2 mode superposition.

5 The Bayesian analysis of HD 49933

The analysis of HD 49933 (a main sequence F5V star) was carried out using a 60-day time series of CoRoT observations (IRa01). This star has been observed previously from the ground (Mosser et al. 2005) and from space by CoRoT. A first analysis of the latter data has been carried out by Appourchaux et al. (2008) using a simple likelihood maximization (MLE) approach. These data were also studied by Benomar (2008) and Gruberbauer et al. (2009), using a Bayesian approach.

The main characteristics of this star can be summarized as follows: an estimated mass of $\sim$ $1.2~M_{\odot}$ (Mosser et al. 2005), radius $R/R_{\odot}$ of $1.34 \pm 0.06$ (Thévenin et al. 2006); $v \sin i$ estimates by different sources lead to values from $9.5 \pm 0.3$ km s-1 (Mosser et al. 2005) to 10.9 km s-1 (Solano et al. 2005). As reported by Appourchaux et al. (2008), the low frequency power spectrum shows a strong peak at 3.4  $\mu {\rm Hz}$ understood as the signature of the rotation period, caused by the transit of active regions across the surface of HD 49933. Then, from radius and $v \sin i$ estimates, the angle i between the line of sight and the stellar rotation axis appears to be between 22 and 30$^\circ$ while the asteroseismic study of Appourchaux et al. (2008) leads to an angle value between 50 and 62$^\circ$ (for these authors, a possible reason for this discrepancy is the probable superposition of the l=2 and l=0 modes).

Table 4:   Priors used for HD 49933 data analysis. $\nu _{\rm s}$ is the rotational splitting, i the star inclination angle, $\delta \nu _{02}$ the small frequency separation, $\Delta \nu $ the large separation. The value of $\epsilon $ and D0 depend on the model and set the position of the first prior frequency (see Eq. (2)).

Table 5:   Model odds ratio and estimated probability for HD 49933 spectrum analysis. The value of the matrix represents the odds ratio.

This information can be used to set some priors. The splitting prior is set by supposing that the rotation signature at 3.4  $\mu {\rm Hz}$ is representative of the internal stellar rotation. As in the simulation, the standard deviation is fixed at 10% of the expected value. Because of the inconsistency previously described for the angle i, we prefer not to impose a prior on i: we use a uniform prior between 0$^\circ$ and 90$^\circ$. As in Benomar (2008), the large separation prior is set to $85.6 \pm 0.8$  $\mu {\rm Hz}$and the small separation  $\langle \delta\nu_{02}\rangle_{\rm th}$ at $6 \pm 3$  $\mu {\rm Hz}$ using stellar model information. Priors on mode frequencies are Gaussian with a width of 8 $\mu$Hz. All of the priors are described in Sect. 3.3 and summarized in Table 4.

The power spectrum of the time series is computed once a second order polynomial has been removed from the original time series. This spectrum is normalized such that the total power integrated from zero to twice the Nyquist frequency equals the lightcurve variance. 15 mode profiles were fitted to this spectrum in a frequency interval from 1210 to 4000  $\mu {\rm Hz}$. The pdf estimations are based on 2 000 000 samples (for each Markov chain).

  \begin{figure}
\par\subfigure{\epsfig{figure=11657_T.ps,width=5cm,angle=90} }\quad
\subfigure{\epsfig{figure=11657_U.ps,width=5cm,angle=90} }\\
\par\end{figure} Figure 5:

HD 49933 echelle diagram for the model $M^1_{\rm A}$ corresponding to the most probable identification a) and for model $M^2_{\rm B}$ b). The black dots represents the l=0 modes, the grey dot-dashed ones the l=1 and the grey dashed ones the l=2.

Open with DEXTER
The complex content of the power spectrum of HD 49933 and the difficult mode identification require a discussion and a comparison of our results with those of Appourchaux et al. (2008), Benomar (2008) and Gruberbauer et al. (2009).

The present work differs from that of Gruberbauer et al. (2009) as they did not take into account the rotational splitting and the inclination angle and assumed a constant mode width whereas we have considered this hypothesis too strong and have included rotation and variable mode widths in our description. As already mentioned, Appourchaux et al. (2008) used a MLE approach. Benomar (2008) used a Bayesian approach but in a less sophisticated manner: for example, the question of the presence of l=2 modes was not tested in the process of mode identification.

 \begin{figure}
\par\epsfig{figure=11657_V.ps,width=5.5cm,angle=90}\par\epsfig{fi...
...}\par\subfigure{\epsfig{figure=11657_X.ps,width=5.5cm,angle=90} }
\end{figure} Figure 6:

Estimated heights, width and amplitude for model $M^1_{\rm A}$. The boxes represent the 68.3% confidence interval centred on the median. The lines indicate a 95.5% confidence interval.

Open with DEXTER

 \begin{figure}
\par\epsfig{figure=11657_Y.ps,width=5.5cm,angle=90}\par\epsfig{fi...
...5.5cm,angle=90}\par\epsfig{figure=11657_ZA.ps,width=5.5cm,angle=90}
\end{figure} Figure 7:

Estimated heights, width and amplitude for model $M^2_{\rm B}$. The boxes represent the 68.3% confidence interval centred on the median. The lines indicate a 95.5% confidence interval.

Open with DEXTER

Here, we compared 4 models using Eqs. (9) and (10): the two possible mode identifications and the presence or absence of the l=2 modes. The probabilities obtained are summarized in Table 5. The most probable model ( $P_{\rm R}(M^1_{\rm A}\vert y,I)=62.8\%$) corresponds to the same mode identification as offered by Appourchaux et al. (2008) but it does not include the l=2 modes. While the criteria used by Appourchaux et al. (2008) are based on the comparison of the maximum likelihoods, here we estimate the global probability which we consider more reliable and more conservative. Moreover Appourchaux et al. (2008) did not explore the possibility of the absence of the l=2 modes. Here, the most probable model ( $M^1_{\rm A}$) contains only the l=0 and l=1 modes. Occam's principle, naturally implemented in the Bayesian approach, penalizes the more complex model and thus favors models without l=2 modes. However, adding the l=2 modes to the identification B (comparison between $P(M^1_{\rm B}\vert y)$ and $P(M^2_{\rm B}\vert y)$) strongly reinforces its credibility, but not sufficiently so to definitely overrule the A identification.

 \begin{figure}
{
\par\mbox{\subfigure{\epsfig{figure=11657_A.ps,width=3cm,angle...
...subfigure{\epsfig{figure=11657_D.ps,width=3cm,angle=90} }}
\par
}
\end{figure} Figure 8:

Posterior probability density function for the rotational splitting and the inclination angle of HD 49933 for each possible mode identification including l=2 for model $M^1_{\rm A}$ ( a) and  b)) and model $M^2_{\rm B}$ ( c) and  d)). The rotational splitting prior is also represented with a grey-dashed line. The splitting prior influence seems stronger for model $M^2_{\rm B}$, as expected for a low inclination angle, the split component of modes having a very small height.

Open with DEXTER

Table 6:   HD 49933 frequencies and height-to-noise posterior estimates for l=0 and l=1 modes for the $M^1_{\rm A}$ model.

A complementary explanation of the difficulty of retrieving the l=2 could be that the background model may not be representative of the real nature of the granulation background. At the current signal-to-noise ratio, a good prior knowledge is critical for mode parameter estimation. The estimated model probability might also be strongly affected by the chosen background model, as this latter can bias the observed heights of the modes. Nevertheless, it is clear that a low value of the small separation combined with large linewidth and splitting introduces crosstalk between the l=0 and l=2 modes, especially for the frequency and height estimation, and renders the l=2 extraction difficult, if not impossible.

Nevertheless, notice that the odds ratio between the two most probable models is lower than 3 (cf. Table 5). Despite having the highest level of confidence, the $M^1_{\rm A}$ model is not strongly supported by the data. We also notice that the inclination angle for the identification B is much closer to the expected value from non seismic observations. It is likely that if we had imposed a prior on this parameter using $v \sin i$ and the estimate radius, we would have obtained a stronger probability for the identification B. The only model that can be reasonnably excluded is  $M^1_{\rm B}$.

The model choice does not significantly affect the estimation of the mean large separation. Its value is compatible at the $1 \sigma$ level with Appourchaux et al. (2008) ( $\Delta \nu=85.9 \pm 0.15$) whichever the model chosen here (see Table 9).

The estimated rotational splitting for the model $M^1_{\rm A}$ is slightly lower than for the model $M^2_{\rm B}$ but both are compatible with the results of MLE (Appourchaux et al. 2008). The posterior pdf of the rotational splitting for this latter model is probably dominated by its prior rather than its likelihood (see Fig. 8). This is not surprising, as a low stellar inclination leads to a very low height of the split component of all degrees l except for the m=0 component, making the splitting weakly constraining.

We have also expressed the mode energy pdf by the product $\sqrt{\pi h\Gamma}$ which we summarize in Table 8. Energies are relatively well constrained in comparison with heights and widths. Models without l=2 have slightly higher l=0 mode energy than the models with l=2 as the total observed energy remains constant. Figures 6 and 7 show the variations versus frequency of the heights, widths and energies for the models $M^1_{\rm A}$ and $M^2_{\rm B}$. We notice through the error bars that the pdf dispersion is higher than in our simulations for a comparable height-to-noise ratio, especially for the widths. Multiple factors can be the source of these differences. First, our simulations are certainly simpler than reality. Moreover the p mode model and the stellar noise model may be inappropriate. It is possible for example that the mode widths are poorly determined because of a strong asymmetry of the split components and of their profiles. Compared to Appourchaux et al. (2008), some differences can be noticed at high frequency for the width. In particular, we find again the plateau beyond 1700  $\mu {\rm Hz}$ for the width but with larger error bars.

To summarize, in most cases when the comparison is possible, the parameter estimates are within $1 \sigma$ error bars of those of Appourchaux et al. (2008) and Gruberbauer et al. (2009). However, here, the mode indentification is highly ambiguous and prevents a precise interpretation of individual mode frequencies. Nevertheless, some parameters do not show large differences from one identification to another, for example the mode width or the mode energy.

Table 7:   Height and width posterior estimates for HD 49933 for the most probable model. n is given for the l=0 modes.

Table 8:   Energy posterior estimates for HD 49933. n is given for the l=0 modes.

Table 9:   Global parameter posterior estimates for HD 49933. i represents the inclination angle of the star. $\nu _{\rm s}$ is the rotational splitting, Vl=1 and Vl=2 the relative height of l=1 and l=2 modes, respectively. $\Delta \nu $ represent the large separation.

 \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZB.ps,width=6.2cm,height=18cm,angle=90} }\\
\par
\end{figure} Figure 9:

HD 49933 power spectrum with $M^1_{\rm A}$ model (identification A without l=2 modes) superimposed.The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER

6 Conclusions and perspectives

We have presented a Bayesian analysis using an MCMC technique to infer mode parameters of solar-like oscillators, and applied it to the case of HD 49933. A global fitting strategy was adopted, taking into account the mode visibility dependence upon stellar inclination and the rotational mode splitting. The approach used here allows a global perspective of the fitted parameters thanks to the sampling of the probability density functions of these parameters and to the full implementation of our a priori physical knowledge of the object as a solar-type star.

The simulations show that this approach provides reliable parameter estimations with a high robustness at low signal-to-noise ratio. Indeed, the systematic exploration of parameter space allows the method to avoid the local maxima encountered in MLE or MAP and permits us to obtain global information about the pdf of the parameters for the spectrum model tested. One of the greatest difficulties encountered is the identification of the observed peaks in the Fourier spectrum with modes of given degree l. The pdf sampling using parallel tempering gives the most probable model thanks to the global likelihood calculation. As explained in Sect. 5, the estimated probability for each possible mode identification is more ambiguous than was shown by Appourchaux et al. (2008). In a statistical view, the best model is the mode identification A with only the l=0 and l=1 modes ( $M^1_{\rm A}$, cf. Table 5). Gruberbauer et al. (2009) also favor a model with these modes, but using a more heuristic approach, without taking into account the effects of rotation on the spectrum (and thus the effect of a star's inclination angle on the mode components relative heights). However, the two models with modes of degree up to 2 ( $M^2_{\rm B}$ and $M^2_{\rm A}$) have comparable probabilities. The identification B, with only l=0 and l=1 modes ( $M^1_{\rm B}$), is highly improbable. The low probability differences between the 3 most probable models prevents any definitive conclusion on the mode identification. The model $M^{\rm A}_1$ is respectively only 2.7 and 4.5 times more probable than $M^2_{\rm B}$ and $M^2_{\rm A}$. We consider these factors not large enough to reject these models including l=2 modes. The model $M^1_{\rm B}$ being 110 times less probable than $M^1_{\rm A}$, it can be definitely rejected. We have to emphasize that even if statistically speaking the most probable model does not include l=2 modes, it is unlikely in a physical view to not have l=2 modes excited (while l=1 are) for a solar like star in the main sequence. It would have been possible to take this hypothesis into account in the priors, but here, we have prefered to consider all the models as equally probable.

 \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZC.ps,width=6.2cm,height=18cm,angle=90} }
\par
\end{figure} Figure 10:

HD 49933 power spectrum and echelle diagram for $M^2_{\rm B}$ model (identification B with l=2 modes). The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER

For example, these probabilities result from no informative a priori on the star inclination angle and on the relatives heights of the modes. However, one should note that identification B leads to a star inclination angle coherent with independent determinations of this angle (see Sect. 5) and also to relative heights more compatible with the expected values from solar observations. We have to stress that if we had applied an informative prior on the angle and/or on the relative heights, then it is possible that the model $M^2_{\rm B}$ would have overruled the model $M^1_{\rm A}$. Moreover we remind the reader that the estimated error on the determination of the probability is of $10\%{-}20\%$.

The present analysis could certainly be extended, for example by quantitatively evaluating the impact of the background model choice on the identification. The probabilistic Bayesian approach used here could be used to evaluate the relevance of the background description.

However, we think that, at the present stage, MCMC sampling is mainly useful in low signal-to-noise conditions to bring more information than classical approaches like MLE or MAP techniques do.

Acknowledgements
O.B. thanks Marie-Jo Goupil and Marc-Antoine Dupret for discussions about stellar theoretical modeling, and Reza Samadi for helpful discussions. Many thanks from all of us to John Leibacher for very instructive comments and Lucinda Croft for her help.

  Appendix A: Technical aspects

  A.1 Basic algorithm description

The Metropolis-Hasting (MH) algorithm  (Hastings 1970; Metropolis et al. 1953) is the most commonly used MCMC algorithm because of its ease of implementation. We have already described the main advantages of the MCMC algorithm for the sampling of the space of the fitted parameters. Here we address its inherent difficulties and the solutions to manage these difficulties.

Considering $\pi(\vec{\theta})$, the distribution we want to sample (also called ``distribution of interest''), we define a probability transition function $\alpha({\vec{\theta}_{i+1}},{\vec{\theta}_{i}})$ from an initial parameter set $\vec{\theta}_{i}$ to a new proposed set ${\vec{\theta}_{i+1}}$, in order to sample the space of parameters

\begin{displaymath}\alpha({\vec{\theta}_{i+1}},{\vec{\theta}_{i}})= {\rm min}\le...
...}_{i}})q({\vec{\theta}_{i}}\vert{\vec{\theta}_{i+1}})}\right\}
\end{displaymath} (A.1)

where $q({\vec{\theta}_{i}}\vert{\vec{\theta}_{i+1}})$ is the proposal transition pdf. Hence, it is related to the step size of the transition. A symmetrical proposal pdf implies that $\frac{q({\vec{\theta}_{i+1}}\vert{\vec{\theta}_{i}})}{q({\vec{\theta}_{i}}\vert{\vec{\theta}_{i+1}})}$ reduces to 1. Usually q is chosen to be a Gaussian. The quantity $\alpha$ is compared to a uniform random variable $u \in [0,1]$. If the ratio $\frac{\pi({\vec{\theta}_{i+1}}) q({\vec{\theta}_{i+1}}\vert{\vec{\theta}_{i}})} {\pi({\vec{\theta}_{i}}) q({\vec{\theta}_{i}}\vert{\vec{\theta}_{i+1}})}$ is greater than 1, the transition from $\vec{\theta}_{i}$ to ${\vec{\theta}_{i+1}}$ is always accepted. Otherwise, the transition is accepted or not depending on the value of the ratio which is compared to a threshold value.

  A.2 Gaussian proposal covariance matrix adjustment

In the present analysis, we used a Gaussian proposal distribution $q(\vec{\theta})$: the trial set of parameter $\theta$ is created by the scheme ${\vec{\theta}_{i+1}} \sim {\vec{\theta}_{i}}+\mathcal{N}(0,\Sigma)$. Then, an important task for a correct sampling consists of adjusting the covariance matrix $\Sigma^{-1}$ of this multidimensional Gaussian function (if all parameters were independent, only the diagonal terms would be non-zero). Indeed, the acceptance rate depends directly on $\Sigma^{-1}$. If $\alpha({\vec{\theta}_{i+1}},{\vec{\theta}_{i}})$ is too low then few samples are accepted and the typical computing time to sample the distribution of interest will be large. Conversely, a high $\alpha({\vec{\theta}_{i+1}},{\vec{\theta}_{i}})$ introduces a high correlation between two successive samples and the sampling process can be easily trapped in local maxima of the distribution of interest. Roberts et al. (1997) empirically show for high dimensional models that the best sampling is obtained for an acceptance rate of around 25%. Many authors recommend a rate between 20% and 50%.

The manual adjustment of such a matrix is easy if we deal with a few parameters but becomes difficult to manage in high dimensional cases. A way to automate the matrix determination process is to use Adaptive MCMC. Adaptive MCMC runs a second process, in parallel to the sampling process, that adapts the covariance matrix continuously. Many Adaptive MCMCs have been proposed. The one used in this paper was inspired by the work of Atchadé (2006). It uses a Robbins-Monro type algorithm, widely used to calculate numerical solutions of differential equations (Robbins & Monro 1951).

Initially, $\Sigma^{-1}$ is defined as diagonal with coefficient $a_{i,j}={\delta_{i,j}}/{\sigma^2_{i,j}}$ where $\delta_{i,j}$ the Kronecker symbol and $\sigma^2_{i,j}$ the initial variance for the ith element of the parameter vector. The initial parameter values are set at the center of the prior range and $\sigma $ at 10% of their values. During the acceptance-rate adapting phase (or learning phase), even the non-diagonal terms are adjusted, taking into account a hypothetical correlation between parameters to enhance the sampling. Despite its name, an Adaptive MCMC is not a Markovian process as the i+1 state depends not only on the previous one i, but also on all previous states i-1,i-2,.... There is no guarantee that the sampling will be done according to the distribution of interest. To avoid this problem, we simply stop the learning process when the acceptance rate is reasonable, i.e. is within 20% of the optimal acceptance rate.

  A.3 Parallel tempering

The MH algorithm can encounter difficulties for multimodal distributions, in particular if the maxima are far from one other. In order to efficiently sample the entire parameter space, a solution was proposed by Jennison (1993). It consists of the launching of multiple parallel MCMC processes. Each process has a different distribution according to a power law $\pi_k(\vec{\theta}\vert y,M_j,I) \propto \pi(y\vert\vec{\theta},M_j,I)^{1/T_k}$ where Tk is called the temperature of the chain k. The higher the temperature, the flatter is the distribution. In a flat distribution, the potential barrier between distant maxima is lowered and the transitions in the parameter space are easier.

At each iteration i, a random mix is made among neighboring chains: two chains k and k+1 are mixed with the probability

\begin{displaymath}\alpha({\vec{\theta}_{(i)}^{k}},{\vec{\theta}_{(i)}^{k+1}})={...
...i_{k+1}({\vec{\theta}_{(i)}^{k+1}}\vert y,M_j,I)}\right\}\cdot
\end{displaymath} (A.2)

The temperatures follow a geometric law: $T_k=\lambda^{k-1}$, where $\lambda$ is a scale factor. k is an integer such as $1\le k \le C$, C being the number of parallel chains. During our MCMC simulations, we decided to use 12 parallel chains and a transition rate adjusted around $50\%$.

Thanks to this numerical method, MCMC simulations can be used to evaluate the odds ratio (Eq. (9)) through the calculation of Eq. (11). As shown by Gregory (2005b),

\begin{displaymath}\ln[P(y\vert M_j,I)]=\int_0^1 \langle \ln[\pi(y\vert M_j,\vec{\theta},I)]\rangle_{\beta}{\rm d}\beta
\end{displaymath} (A.3)

where $\beta=\frac{1}{T}$. With a discrete set of temperatures $\beta_k$, we have ${\langle\ln[\pi(y\vert M_j,\vec{\theta},I)]\rangle_{\beta}=\frac{1}{N}\sum \ln[\pi(D\vert M_j,{\vec{\theta}_{\beta}},I)]}$, the mean conditional likelihood.

When $T_k \rightarrow \infty$, the likelihood distribution is flat and the target distribution $\pi_k(\vec{\theta}\vert y,M_j,I)$ is dominated by the priors. Conversely, Tk=1 corresponds to the distribution of interest.

To summarize, we proceed as follows: for each chain at temperature $T_k=\frac{1}{\beta_k}$, we calculate the mean value of the conditional likelihood and then we evaluate the log probability $\ln[P(y\vert M_j,I)]$ by generating and integrating an interpolated function of $\langle \ln[P(y\vert M_j,\vec{\theta},I)]\rangle_{\beta}$.

A.4 Possible improvements

From a technical point of view, improvements can be made. The probability calculation is a crucial aspect which has only been sketched. Indeed we use a random walk algorithm and the probability is subject to fluctuations as a function of the number of samples, the number of parallel chains and of their distribution (depending on the chosen temperature set). It is necessary to investigate these three points to make sure of the robustness of the probability calculation.

Concerning the execution speed, the Bayesian analysis coupled with a MCMC sampling for large data sets and a large number of parameters is costly in terms of computation time. In the present analysis of HD 49933, the computation time is of the order of a week per model (on a personal computer). This heavy computation limits the degree of complexity possible of the models investigated to describe the observed spectrum.

References

Online Material

Table 10:   HD 49933 frequencies and height-to-noise posterior estimates for l=0 and l=1 modes for the $M^2_{\rm A}$ model. n is given supposing $\epsilon \in [1,2]$. H/N represents the height-to-noise ratio in the power spectrum.

Table 11:   Height and width posterior estimates for HD 49933 for the model $M^2_{\rm A}$. n is given for the l=0 modes.

Table 12:   Energy posterior estimates for HD 49933, model $M^2_{\rm A}$. n is given for the l=0 modes.

Table 13:   HD 49933 frequencies and height-to-noise posterior estimates for the l=0 and l=1 modes for the $M^2_{\rm B}$ model. n is given supposing $\epsilon \in [1,2]$. H/N represents the height-to-noise ratio in the power spectrum.

Table 14:   Height and width posterior estimates for HD 49933 for the model $M^2_{\rm B}$. n is given for the l=0 modes.

Table 15:   Energy posterior estimates for HD 49933, model $M^2_{\rm B}$. n is given for the l=0 modes.

 \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZD.ps,width=6.cm,height=18cm,angle=90} }
\par
\end{figure} Figure 11:

HD 49933 power spectrum and echelle diagram for $M^2_{\rm A}$ model (identification A with l=2 modes). The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER

 \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZE.ps,width=6.cm,height=18cm,angle=90} }
\par
\end{figure} Figure 12:

HD 49933 power spectrum and echelle diagram for $M^1_{\rm B}$ model (identification B without l=2 modes). The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER


Footnotes

... approach[*]
The CoRoT space mission, launched on 2006 December 27, was developed and is operated by the CNES, with participation of the Science Programs of ESA, ESA's RSSD, Austria, Belgium, Brazil, Germany and Spain.
...[*]
Tables 10 to 15 and Figs. 11, 12 are only available in electronic form at http://www.aanda.org

All Tables

Table 1:   Input parameters used for the simulated spectra.

Table 2:   Priors used for the analysis of the simulated spectra for the inclination angle i, the rotational splitting $\nu _{\rm s}$, the large separation $\Delta \nu $ (Eq. (2)) and the small separation $\langle \delta\nu_{02}\rangle_{\rm th}$ (Eq. (15)).

Table 3:   Top: model odds ratio $O_{M_{\rm A},M_{\rm B}}$ for scenario 1 to 5 (which include l=2 modes). Bottom: odds ratio for scenario 6, comparing 4 models (identifications A and B, with or without l=2).

Table 4:   Priors used for HD 49933 data analysis. $\nu _{\rm s}$ is the rotational splitting, i the star inclination angle, $\delta \nu _{02}$ the small frequency separation, $\Delta \nu $ the large separation. The value of $\epsilon $ and D0 depend on the model and set the position of the first prior frequency (see Eq. (2)).

Table 5:   Model odds ratio and estimated probability for HD 49933 spectrum analysis. The value of the matrix represents the odds ratio.

Table 6:   HD 49933 frequencies and height-to-noise posterior estimates for l=0 and l=1 modes for the $M^1_{\rm A}$ model.

Table 7:   Height and width posterior estimates for HD 49933 for the most probable model. n is given for the l=0 modes.

Table 8:   Energy posterior estimates for HD 49933. n is given for the l=0 modes.

Table 9:   Global parameter posterior estimates for HD 49933. i represents the inclination angle of the star. $\nu _{\rm s}$ is the rotational splitting, Vl=1 and Vl=2 the relative height of l=1 and l=2 modes, respectively. $\Delta \nu $ represent the large separation.

Table 10:   HD 49933 frequencies and height-to-noise posterior estimates for l=0 and l=1 modes for the $M^2_{\rm A}$ model. n is given supposing $\epsilon \in [1,2]$. H/N represents the height-to-noise ratio in the power spectrum.

Table 11:   Height and width posterior estimates for HD 49933 for the model $M^2_{\rm A}$. n is given for the l=0 modes.

Table 12:   Energy posterior estimates for HD 49933, model $M^2_{\rm A}$. n is given for the l=0 modes.

Table 13:   HD 49933 frequencies and height-to-noise posterior estimates for the l=0 and l=1 modes for the $M^2_{\rm B}$ model. n is given supposing $\epsilon \in [1,2]$. H/N represents the height-to-noise ratio in the power spectrum.

Table 14:   Height and width posterior estimates for HD 49933 for the model $M^2_{\rm B}$. n is given for the l=0 modes.

Table 15:   Energy posterior estimates for HD 49933, model $M^2_{\rm B}$. n is given for the l=0 modes.

All Figures

  \begin{figure}
\par\includegraphics[angle=90,width=7cm,clip]{11657_E.ps}\par\vspace*{2mm}
\includegraphics[angle=90,width=7cm,clip]{11657_F.ps}
\end{figure} Figure 1:

a) Smoothed power spectrum of HD 49933, showing activity (dot-dashed line) and granulation (solid line) backgrounds and stellar modes.The modes are first modeled by a Gaussian (solid blue curve). b) Mode frequency range with frequency priors (l=0: red dash line; l=1: green dot line) for the mode identification A.

Open with DEXTER
In the text

  \begin{figure}
\par\mbox{\subfigure{\includegraphics[angle=90,width=3.6cm,clip]{...
...figure{\includegraphics[angle=90,width=3.6cm,clip]{11657_H.ps} } }\end{figure} Figure 2:

l=2 mode relative height from simulation (scenario 3) for the correct identification  a) and for the wrong identification  b). In the correct identification, due to the l=0 and l=2 mode superposition, the relative height H2/H0 is underestimated by 30%. In the wrong identification, the distribution of the height follows the $\chi ^2$ distribution, as noise would.

Open with DEXTER
In the text

 \begin{figure}
{\vbox{
\hbox{
\epsfig{figure=11657_I.ps,width=4.0cm,angle=90}\ep...
...ps,width=4.0cm,angle=90}\hfill
\hspace{3mm}
\parbox[b]{50mm}{
}
}}}
\end{figure} Figure 3:

HD 49933 typical frequency, height, width and amplitude pdf (here for model $M^1_{\rm A}$). Frequencies and amplitudes generally have a Gaussian shape, while widths and heights follow a log-normal distribution.

Open with DEXTER
In the text

  \begin{figure}
\par\mbox{\subfigure{\epsfig{figure=11657_N.ps,width=4cm,angle=90...
...}
\subfigure{\epsfig{figure=11657_S.ps,width=4cm,angle=90} }}
\par
\end{figure} Figure 4:

Comparison of the output and input values for simulated spectra, for scenario 3 ( a) b) c)) and for scenario 4 ( d) e) f)). The box represent the 68.3% confidence interval (1$\sigma $ equivalent) centred on the median. The lines indicate a 95.5% confidence range (2$\sigma $ equivalent). The frequency difference and its error ( a) d)) for the l=0 and l=2 modes (black solid and grey dotted line, respectively) is represented as a function of the quantity n+l/3. Note the bias for the l=2 frequencies due to the prior choice in the scenario 3. For heights ( b) e)) and widths ( c) f)), we compare the input model value (dot-dashed line) to the inferred values. The noise background is also represented (dashed line). The inferred values are generally within the 95.5% confidence range. Note also the anti-correlation between height and width.

Open with DEXTER
In the text

   \begin{figure}
\par\subfigure{\epsfig{figure=11657_T.ps,width=5cm,angle=90} }\quad
\subfigure{\epsfig{figure=11657_U.ps,width=5cm,angle=90} }\\
\par\end{figure} Figure 5:

HD 49933 echelle diagram for the model $M^1_{\rm A}$ corresponding to the most probable identification a) and for model $M^2_{\rm B}$ b). The black dots represents the l=0 modes, the grey dot-dashed ones the l=1 and the grey dashed ones the l=2.

Open with DEXTER
In the text

  \begin{figure}
\par\epsfig{figure=11657_V.ps,width=5.5cm,angle=90}\par\epsfig{fi...
...}\par\subfigure{\epsfig{figure=11657_X.ps,width=5.5cm,angle=90} }
\end{figure} Figure 6:

Estimated heights, width and amplitude for model $M^1_{\rm A}$. The boxes represent the 68.3% confidence interval centred on the median. The lines indicate a 95.5% confidence interval.

Open with DEXTER
In the text

  \begin{figure}
\par\epsfig{figure=11657_Y.ps,width=5.5cm,angle=90}\par\epsfig{fi...
...5.5cm,angle=90}\par\epsfig{figure=11657_ZA.ps,width=5.5cm,angle=90}
\end{figure} Figure 7:

Estimated heights, width and amplitude for model $M^2_{\rm B}$. The boxes represent the 68.3% confidence interval centred on the median. The lines indicate a 95.5% confidence interval.

Open with DEXTER
In the text

  \begin{figure}
{
\par\mbox{\subfigure{\epsfig{figure=11657_A.ps,width=3cm,angle...
...subfigure{\epsfig{figure=11657_D.ps,width=3cm,angle=90} }}
\par
}
\end{figure} Figure 8:

Posterior probability density function for the rotational splitting and the inclination angle of HD 49933 for each possible mode identification including l=2 for model $M^1_{\rm A}$ ( a) and  b)) and model $M^2_{\rm B}$ ( c) and  d)). The rotational splitting prior is also represented with a grey-dashed line. The splitting prior influence seems stronger for model $M^2_{\rm B}$, as expected for a low inclination angle, the split component of modes having a very small height.

Open with DEXTER
In the text

  \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZB.ps,width=6.2cm,height=18cm,angle=90} }\\
\par
\end{figure} Figure 9:

HD 49933 power spectrum with $M^1_{\rm A}$ model (identification A without l=2 modes) superimposed.The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER
In the text

  \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZC.ps,width=6.2cm,height=18cm,angle=90} }
\par
\end{figure} Figure 10:

HD 49933 power spectrum and echelle diagram for $M^2_{\rm B}$ model (identification B with l=2 modes). The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER
In the text

  \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZD.ps,width=6.cm,height=18cm,angle=90} }
\par
\end{figure} Figure 11:

HD 49933 power spectrum and echelle diagram for $M^2_{\rm A}$ model (identification A with l=2 modes). The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER
In the text

  \begin{figure}
\par\subfigure{\epsfig{figure=11657_ZE.ps,width=6.cm,height=18cm,angle=90} }
\par
\end{figure} Figure 12:

HD 49933 power spectrum and echelle diagram for $M^1_{\rm B}$ model (identification B without l=2 modes). The superimposed dashed line is the best fit according to the median value of the parameters.

Open with DEXTER
In the text


Copyright ESO 2009

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.