A&A 421, 281-294 (2004)
DOI: 10.1051/0004-6361:20040127

Estimating stellar parameters from spectra[*]

I. Goodness-of-fit parameters and lack-of-fit test

L. Decin1,[*] - Z. Shkedy2 - G. Molenberghs 2 - M. Aerts 2 - C. Aerts 1

1 - Department of Physics and Astronomy, Institute for Astronomy, K.U. Leuven, K.U. Leuven, Celestijnenlaan 200B, 3001 Leuven, Belgium
2 - Biostatistics, Center for Statistics, Limburgs Universitair Centrum, Universitaire Campus, Building D, 3590 Diepenbeek, Belgium

Received 22 April 2003/ Accepted 18 January 2004

Estimating stellar parameters from spectrophotometric data is a key tool in the study of stellar structure and stellar evolution. Although many methods have been proposed to estimate stellar parameters from ultraviolet (UV), optical and infrared (IR) data using low, medium or high-resolution observational data of the target(s), only a few address the problem of the uncertainties in the stellar parameters. This information is critical for a meaningful comparison of the derived parameters with results obtained from other data and/or methods. Here we present a frequentist method to estimate these uncertainties. We demonstrate that the combined use of both a local and a global goodness-of-fit parameter alters the uncertainty intervals as determined from the use of only one of these deviation estimating parameters. This technique using both goodness-of-fit parameters is applied to the infrared 2.38-4.08 $\mu$m ISO-SWS data (Infrared Space Observatory - Short Wavelength Spectrometer) of $\alpha $ Boo, yielding an effective temperature range from 4160 K to 4300 K, a logarithm of the gravity range from 1.35 to 1.65 dex and a metallicity from -0.30 to 0.00 dex. However, using a lack-of-fit test, it is shown that even the "best'' theoretical models are still not capable of capturing all the structure in the data, and this is due to our incomplete knowledge and modelling of the full physical stellar structure or due to problems in the data reduction process.

Key words: methods: data analysis - methods: statistical - techniques: spectroscopic - stars: fundamental parameters - stars: individual: Alpha Boo

1 Introduction

Everything we know about the structure of stellar objects being studied is the result of a comparison between theoretical predictions and stellar observations. To give realistic answers to many physical questions induced by stellar phenomena, not only are accurate theoretical models indispensable, but also realistic uncertainty estimates on the parameters being deduced are required. In astronomy, uncertainty estimates (if assessed at all) are still often based on too simple a study of the sample of observables, resulting in too "optimistic'' error bars. A realistic knowledge of the uncertainties is however crucial and has to be taken into account if one, for example, wants to test the proposed physical mechanism explaining certain phenomena (see, e.g., De Bruyne et al. 2003). The present paper is the first of a series devoted to the development and description of a statistical method to assess the uncertainties of stellar atmospheric parameters deduced from astronomical spectra.

One way of estimating stellar atmospheric parameters and drawing inferences from them consists of comparing the observed spectrum of the target being studied with a collection of synthetic spectra (e.g., Decin et al. 2000, for an application to cool stars; Bailer-Jones 2000, for an application using neural networks). Depending on the quality, the resolution and the wavelength coverage of the data, different stellar parameters can be traced. In this paper we focus on the three most important stellar parameters for the model structure: the effective temperature  ${T}_{\rm eff}$, the gravity g and the metallicity [Fe/H]. Other parameters such as the abundance pattern or the microturbulence are treated as known. Let $\Omega = ($ ${T}_{\rm eff}$, $\log~g$, [Fe/H]) present the parameters of the stellar atmosphere. A synthetic spectrum, $\theta^{(m)}$, $m=1,2,\dots,M$, is calculated for specific values of the parameters, $\Omega^{(m)}=($ ${T}_{\rm eff}$(m), $\log~g^{(m)}$, [Fe/H](m)) and compared to the observed spectrum. When the synthetic spectrum and the observed spectrum agree the parameters of the stellar atmosphere are assumed to be known. The first question which arises when applying this kind of method is how to measure the goodness-of-fit between observational and theoretical data. A second - equally important - question is then how to assess the uncertainties on the derived stellar parameters. These questions (and answers) become even more complicated when we want to take measurement errors into account.

A search of the astronomical literature reveals that statistical tests are often restricted to local deviation estimating parameters, as e.g. the ordinary least square method (OLS) (or derivatives from it) (Bailer-Jones 2000; Katz et al. 1998; Erspamer & North 2002; Valenti & Piskunov 1996). The first goal of this paper is to demonstrate that parameter ranges as determined from the use of a local goodness-of-fit parameter can be optimized by combining a local with a global goodness-of-fit parameter (Sects. 3 and 4). In both methods, the estimate for $\Omega$ is the value of  $\Omega^{(m)}$ which minimises the proposed goodness-of-fit parameter. Using the results of this first part of the study, our second goal is to discuss (Sect. 5) how the differences between observed and synthetic spectra can be used to check the appropriateness of a proposed set of stellar parameters - a step often neglected by astronomers. This qualification can be performed using lack-of-fit tests.

The methodology developed in this paper has broad applications in astronomy as it relies only on observed spectra and on theoretical predictions thereof. To test our method, we have chosen to apply it to the spectra of a stellar target of which the basic stellar parameters are already very well known from successful comparisons with models. Accurate estimations of stellar parameters for cool standard stars were done using data of the ISO-SWS (Decin et al. 2003a-c). We therefore have chosen to illustrate our general methodology on the ISO-SWS observations of one such star, the case study of the K2IIIp giant Alpha Bootis (Arcturus, HD 124897).

Before doing the analysis in Sects. 3-5, we give in Sect. 2 a description of the observational and theoretical data on which the method will be tested. The results of both Part I (Sects. 3 and 4) and Part II (Sect. 5) are discussed in Sects. 4.3 and 5.3 respectively. We end with a summary and some conclusions in the last section, Sect. 6. How to treat observational errors in this kind of study will be discussed in a forthcoming paper of this series.

2 Observational and synthetic data

This section describes the used observational ISO-SWS and theoretical data. The grid of synthetic spectra calculated for the test-case $\alpha $ Boo is specified in Sect. 2.3.

2.1 Observational data y

The observational data for this study consist of near-infrared (2.38-4.08 $\mu$m) spectra of $\alpha $ Boo observed with the SWS (Short Wavelength Spectrometer, de Graauw et al. 1996) on board ISO (Infrared Space Observatory, Kessler et al. 1996). The spectrometer was used in the SWS observing mode AOT01 (=a single up-down scan for each aperture with four possible scan speeds at degraded resolution) with scanner speed 4, resulting in a resolving power of $\sim$1500. The observation lasted for 6538 s and was performed during revolution 452[*].

Table 1: Resolution and factors used to shift the sub-bands.

The reduction was made using the SWS Interactive Analysis Package IA $^{\rm {3}}$ (de Graauw et al. 1996) using calibration files and procedures equivalent with pipeline version 10.0. Further data processing consisted of bad data removal ( $\sigma = 2.0$), aligning of the 12 detectors to their average level. Since the grid of observational pixel values does not have a fixed resolution, we first want to "summarise'' the observational pixel values, and then make a comparison between this summary (denoted as y) and a synthetic spectrum ($\theta $) with the same resolution. The standard way to resample the input data is by "rebinning''. To summarise the ISO-SWS data to a fixed resolution we have applied a flux conserving non-parametric rebinning method - i.e. for each bin the flux value is calculated using the trapezoidal rule - with an oversampling of 4. This means that the resolution bin used is 4 times the grid separation determined by the resolution for a specific wavelength range of the ISO-SWS data. To fully recover the intervening flux values it can be shown in the context of "rectangular filtering'' that taking 4 points in an interval of length $\Delta t$ is enough to optimise the signal-to-noise (S/N) ratio (Bracewell 1985). The rebinning used in the data reduction procedure introduces a correlation between the data point values. The appropriate resolving power was taken to be the most conservative resolving power as determined by Lorente in Leech et al. (2002) (see Table 1), with the exception being band 1A[*] for which this value has been changed from 1500 to 1300 (Decin et al. 2003b).

The individual sub-band spectra can show jumps in flux level at the band-edges when combining them into a single spectrum. These band-to-band discontinuities can have several causes: uncertainties in flux calibration, the low responsivity at the band edges, pointing errors, and a problematic dark current subtraction in combination with the RSRF (Relative Spectral Response Function) correction, from which the pointing errors are believed to have the largest impact for this high-flux observation. Hence, the individual sub-bands were multiplied by a factor to construct a smooth spectrum (see Table 1). These factors were determined using the SED (Spectral Energy Distribution) of $\alpha $ Boo as constructed in Decin et al. (2003b) as a reference. The estimated $1\sigma$ uncertainty on these factors is 10% (Leech et al. 2002).

2.2 Synthetic data $\theta $

The synthetic spectra used in this study have been generated using model photospheres calculated with the MARCS code, version May 1998. This version is a major update of the MARCS model-photosphere programs first developed by Gustafsson et al. (1975), and further improved by, e.g.  Plez et al. (1992), Jørgensen et al. (1992), Edvardsson et al. (1993).

The common assumption of spherical stratification in homogeneous stationary layers, hydrostatic equilibrium and Local Thermodynamic Equilibrium (LTE) were made. Energy conservation was required for radiative and convective flux, where the energy transport due to convection was treated through a local mixing-length theory. The mixing-length l was chosen as $1.5~{{H\rm _p}}$, with ${H\rm _p}$ the pressure scale height. Turbulent pressure was neglected. The reliability of these assumptions is discussed in Plez et al. (1992). The continuous absorption as well as the new models will be fully described in a series of forth-coming papers (Gustafsson et al.; Jørg ensen et al.; Plez et al., all in preparation).

Using the computed model atmospheres, the synthetic spectra were generated by solving the radiative transfer at a high wavelength resolution ( $\Delta t \sim 1~{\rm {km~s^{-1}}}$, corresponding to  $t/\Delta t \sim 330~000$). With a microturbulent velocity $\xi_t \sim 2~{\rm {km~s^{-1}}}$, this means we are sure to sample all lines in the atomic and molecular database in the generation of the synthetic spectrum. This is necessary so as not to overestimate the absorption in regions with a high line density, or to underestimate it in regions with a low line density (Ryde & Eriksson 2002). For the line opacity in the ISO-SWS range a database of infrared lines including atoms and molecules has been prepared. For the molecular lines, the same data have been used as in Decin et al. (2000). The accuracy and completeness of these line lists are discussed in Decin (2000). For the atomic transitions, the newly generated atomic linelist of J. Sauval (priv. comm.) based on the FTS-ATMOS (Atmospheric Trace Molecule Spectroscopy) spectrum of the Sun (Geller 1992; Farmer & Norton 1989) has been included. The emergent synthetic spectra are then convolved with a Gaussian instrumental profile with the same resolution as the ISO-SWS sub-bands (see Table 1).

2.3 Synthetic data for $\alpha $ Boo

As described in the introduction, we will calculate a grid of synthetic spectra over discrete values in the parameter vector $\Omega$. In the following sections, we will use this grid over the vector parameter $\Omega$ to estimate $\Omega$ with  $\Omega^{(*)}$ for which the synthetic spectrum  $\theta^{(*)}$ is the "closest'' to the observed spectrum (see Sects. 3 and 4).

Based on the results in Decin et al. (2003a), 125 spectra have been calculated for $\alpha $ Boo, with parameter ranges:
${T}_{\rm eff}$ : 4160 K, 4230 K, 4300 K, 4370 K, 4440 K
$\log~g$ : 1.20, 1.35, 1.50, 1.65, 1.80
[Fe/H] : 0.00, -0.15, -0.30, -0.50, -0.70.

The other parameters needed to compute a proper spherical symmetric spectrum - the mass, the abundances of C, N, O, Mg, and Si, the microturbulent velocity, and the ${\rm ^{12}C/^{13}C}$-ratio - were kept fixed, with values as determined in Decin et al. (2003a). For a detailed comparison between the stellar parameters as deduced in Decin et al. (2003a) and other literature values, we refer to Decin et al. (2003a). Each synthetic spectrum was thus calculated for a different combination of the atmospheric parameters. The fact that we are dealing with a 3-dimensional stellar parameter space made us decide to assign each theoretical model an (artificial) model number to plot all the results in one figure. The model numbers are specified in Table 2.

Table 2: Angular diameters in mas and model numbers (in between brackets) associated with the different model parameters of the grid of synthetic spectra.

Since the ISO-SWS data are absolutely calibrated, one also has to compute the angular diameter to compare the rebinned observed and synthetic data properly. Therefore, the angular diameter  $\theta_{\rm d}$ is deduced from the energy distribution of the synthetic spectrum between 2.38 and 4.08 $\mu$m and the absolute flux-values in this wavelength range of the ISO-SWS spectrum. We therefore have minimised the residual sum of squares

\begin{displaymath}\sum\limits_{t=1}^{n} \left(y(t) - \left(\frac{\pi}{4}~ \hbox{$\theta_{\rm d}$ }^2\right) * \theta(t)\right)^2,
\end{displaymath} (1)

with y(t) and $\theta(t)$ representing, respectively, a (rebinned) observational and synthetic data point at the t th wavelength point. The derived angular diameters are listed in Table 2. The dependence of the angular diameter on the effective temperature is discussed in Decin (2000). A typical example of the ISO-SWS data of $\alpha $ Boo and a synthetic spectrum (model 1:  ${T}_{\rm eff}$ = 4160 K, $\log~g$ = 1.20, and [Fe/H] = -0.70) is shown in Fig. 1.

3 Model selection based on the OLS criterion

We try for the first time in this paper to determine the parameter ranges of the effective temperature, the gravity and the metallicity of $\alpha $ Boo. We first describe in general the model selection based on the ordinary least square criterion (Sect. 3.1). This is followed by the application to our test-case, the ISO-SWS data of $\alpha $ Boo (Sect. 3.2).

3.1 Definition

For the high-quality (absolutely calibrated) spectroscopic data that we use, it is natural to estimate $\Omega$ with  $\Omega^{(*)}$ for which the synthetic spectrum $\theta^{(*)}$ gives the best resemblance to the observed spectrum, y. By analogy to linear regression we can estimate $\Omega$ by minimising the residual sum of squares (also called the "ordinary least square'' (OLS) method):

 \begin{displaymath}T^{2}(y,\theta^{(m)}) = \frac{1}{n}\sum_{t=1}^{n}\left(y(t)-\...
...\hbox{$\theta_{\rm d}$ }^2\right) *\theta^{(m)}(t)\right)^{2}.
\end{displaymath} (2)

Hence, the minimiser of  $T(y,\theta^{(m)})$ should be seen as the OLS estimator for $\Omega$. In practice one can minimise Eq. (2) with a search over a sensitive grid of the parameter vector $\Omega$.

3.2 Application to $\alpha $ Boo

3.2.1 Band 1A
Figure 2 shows the values of $T^{(m)}(y,\theta^{(m)})$ (on a $\log$ scale). The vertical lines separate the 125 models by  ${T}_{\rm eff}$.

Model 62 ( ${T}_{\rm eff}$ = 4300 K, $\log~g = 1.50$ dex, $\rm [Fe/H]=-0.50$ dex) has the best goodness-of-fit in band 1A. We note that within one temperature level, the models occur in groups of size 5 (according to the value of the gravity). For example, models 1-5 have the same effective temperature, 4160 K, and the same gravity ( $\log~g =1.2$) while for models 6-10 $\log~g = 1.35$. Trends in the goodness-of-fit are visualised in Fig. 3. Three patterns are observed: (a) the goodness-of-fit increases with the level of metallicity, (b) a parabolic shape in which the best goodness-of-fit is achieved for models with metallicity between -0.15 to -0.5, and (c) goodness-of-fit decreases with the level of metallicity. For a fixed temperature, the trend changes from trend (a) via trend (b) to trend (c) when the gravity increases. Sometimes, a trend occurs twice or is absent, but the order of trends never changes. The model having the best goodness-of-fit is always situated at the minimum of a parabolic shape, suggesting that we have reached a local minimum - an equilibrium - in the parameter space.

\end{figure} Figure 1: Comparison between the ISO-SWS data of $\alpha $ Boo (black) and the synthetic spectrum with model number 1, i.e. ${T}_{\rm eff}$ = 4160 K, $\log~g = 1.20$, and [Fe/H] = -0.70 (grey) in the 4 sub-bands 1A, 1B, 1D, and 1E. The main absorption features caused by the CO 1st overtone lines, the SiO 1st overtone lines, and the OH fundamental lines are indicated.
Open with DEXTER

\end{figure} Figure 2: $\log T^{(m)}(y,\theta^{(m)})$ versus the model numbers for band 1A. The vertical lines separate the models according to the temperature. The 10 models with the best goodness-off-fit are situated below the horizontal dashed line.
Open with DEXTER

\end{figure} Figure 3: Trends in the goodness-of-fit condition of  $\log T^{(m)}(y,\theta^{(m)})$ for band 1A. The model numbers are as specified in Table 2.
Open with DEXTER

Table 3 shows the 10 models with  $T^{(m)}(y,\theta^{(m)})$having the lowest values, i.e., they have the best goodness-of-fit. For these models $\log~g$ is between 1.20 and 1.65 dex, the effective temperature ranges from 4160 K to 4440 K and [Fe/H] is between -0.15 dex and -0.70 dex.

3.2.2 Overall goodness-of-fit

Table 3: Top 10 models in band 1A.

While in the previous subsection we have concentrated on band 1A of the ISO-SWS data of $\alpha $ Boo, we now will take the whole 2.38-4.08 $\mu$m wavelength range into account. Band 1A has a very characteristic footprint, determined by the first overtone CO ( $\Delta v = 2$) vibration-rotation bands in this wavelength range (Decin et al. 2000). Molecules absorbing in bands 1B, 1D, and 1E are mainly OH and SiO, while also some atomic features are visible. The absorption pattern of these last molecules is however not as pronounced as for CO ( $\Delta v = 2$) in band 1A. Although these CO features can already give us quite a good idea of the temperature and the gravity of the target, it is essential to use the whole 2.38-4.08 $\mu$m wavelength in order to minimise the uncertainties on the stellar parameters being studied. This is due to the fact that all of these molecular and atomic absorption features have their own characteristic dependence on the atmospheric parameters (see Decin et al. 2000). Since, however, each sub-band has its own instrumental characteristics, and since the observational data have their largest uncertainties at the band edges (Leech et al. 2002), we will not join the whole 2.38-4.08 $\mu$m wavelength range into 1 spectrum, but we will combine the results obtained from the separate bands.

The values of  $T^{(m)}(y,\theta^{(m)})$ were ranked at each band, and for each model we have calculated the mean of the ranks. This means that the "best" model is the one with the smallest mean rank. For example, model 38 ( ${T}_{\rm eff}$ = 4230 K, $\log~g = 1.50$ dex, $\rm [Fe/H]=-0.30$ dex) has a rank of 2 in band 1A, but this model is ranked 9, 28, 25 in band 1B, 1D and 1E respectively (hence, the mean rank is 13.50). Overall, the rank of the mean rank of model 38 is 5. The ranks of the mean ranks of the 125 models are displayed in Fig. 4. The models with the lowest rank are models 62 and 82. Model 62 is ranked 1, 7, 8, and 4 in the 4 bands, respectively, with mean rank being 5.0.

\end{figure} Figure 4: Rank of the mean ranks for the 125 synthetic spectrum.
Open with DEXTER

3.2.3 Conclusions

Table 4: Stellar parameters of the models having the lowest overall rank. The overall rank is given in between brackets in the first column. In the last column, the rank in band 1A is tabulated.

The models having the best rank of the mean rank are listed in Table 4. Based on the models which rank in the top 10, the range in effective temperature is between 4230 K and 4440 K, in gravity between 1.35 and 1.65, and in metallicity between -0.70 to -0.30 dex.

4 Model selection based on Kolmogorov-Smirnov statistics

In the previous section, $T(y,\theta )$ was used as a measure for the goodness-of-fit. In this section the analysis discussed above was repeated using the Kolmogorov-Smirnov ($\beta $) statistics as a measure for the goodness-of-fit.

4.1 Definition

The Kolmogorov-Smirnov statistical test globally checks the goodness-of-fit of the observed and synthetic spectra by computing a deviation estimating parameter $\beta $ (see Eq. (5) in Decin et al. 2000). Without specifying the distribution function of $\beta $, we may summarise that

 \begin{displaymath}\beta = \sqrt{n} \sup\limits_{1 \le k \le n-1} \left\vert
\frac{y(t)}{\theta(t)}} - \frac{k}{n}\right\vert\cdot
\end{displaymath} (3)

The lower the $\beta $-value, the better the accordance between the observed data and the synthetic spectrum. For more details about the use of Kolmogorov-Smirnov statistics to estimate stellar parameters and their uncertainties see Decin et al. (2000). Hence, the main difference between $\beta $ and T is that the Kolmogorov-Smirnov parameter $\beta $ measures a global goodness-of-fit, so that local deviations between observations and theoretical data only have a minor influence on the final result, while for  $T(y,\theta )$ local deviating points are important. Note that a shift in the absolute flux values (e.g. to simulate a change or uncertainty in the angular diameter) influences T a lot, while $\beta $ remains almost the same.

4.2 Application to $\alpha $ Boo

Since both deviation estimating parameters stress a different point in the goodness-of-fit, a combination of the results based on the two parameters separately can only improve our knowledge on the stellar parameters and their uncertainties. This is illustrated in Fig. 5.

\par {\hspace*{4cm}{\bf (a)}\hspace*{4cm}}\\
\end{figure} Figure 5: Panel  a) mean of the $\beta $-values of the 4 sub-bands versus the mean of  $T(y,\theta )$ for the 4 sub-bands. Panel  b) ranks of the mean ranks based on $\beta $ versus ranks of the mean ranks based on  $T(y,\theta )$.
Open with DEXTER

Table 5 shows the best 5 models, which besides appearing in the lower left corner of Fig. 5a also are ranked among the top 30 for both $\beta $ and  $T(y,\theta )$ (see Fig. 5b and Table 4). The combined use of both the scores of $\beta $ and  $T(y,\theta )$ themselves and the ranking of these scores does allow us to determine a set of "best'' models! While e.g. model 54 ( ${T}_{\rm eff}$ = 4300 K, $\log~g = 1.20$ dex, $\rm [Fe/H]=-0.15$ dex) has very low ranks based on both $\beta $ and  $T(y,\theta )$, the mean $\beta $-value is rather high. The advantage of using ranks is that all deviation estimating parameters can be treated in the same magnitude level. The disadvantage of using the ranks is that a sudden increase in the deviation estimating parameter is not translated into a sudden jump in the rank. Using both diagnostics together solves this problem.

Note that one can see a correlation between the ranks of the mean ranks of the T and $\beta $ parameter, but that there are a few outliers namely in the upper left corner of Fig. 5b where a few models are situated with low T and high $\beta $. Inspecting why the deviation estimating parameters do show this trend, shows us that all of these models have a very low rank in T for band 1D and/or band 1E. When zooming into these bands, one indeed sees a resemblance (see e.g. Fig. 6a for model 77: ${T}_{\rm eff}$ = 4370 K, $\log~g = 1.20$ dex, [Fe/H] = -0.50 dex), explaining the low T-value. The ratio between the observational and synthetic data does, however, show a trend and is not randomly distributed around 1, explaining the high $\beta $ value. This is illustrated by the gray line in Fig. 6b with slope -0.02. The few very strong (negative) peaks are due to the underestimation of the OH-lines. Note also that the model having the lowest rank in $\beta $ (model 40: ${T}_{\rm eff}$ = 4230 K, $\log~g = 1.50$ dex, [Fe/H] = 0.00 dex) is only ranked 46 in T. This illustrates once more that a combination of a local and global deviation estimating parameter enlarges our knowledge on the estimated stellar parameters and their uncertainties, which in this particular case (see Table 5) results in a range in  ${T}_{\rm eff}$ from 4160 K to 4300 K, in $\log~g$ from 1.35 dex to 1.65 dex, and in [Fe/H] from -0.30 to 0.00 dex. Note that while the local goodness-of-fit parameter T favours the lower range in metallicity, the combined use of T and $\beta $ clearly indicates a higher range in metallicity.

Table 5: Overall goodness-of-fit.The 5 models given in this table do belong to the group of "best'' models based on both the values of  $T(y,\theta )$ and $\beta $, and are moreover ranked among the top 30 for both  $T(y,\theta )$and $\beta $.

\par {\hspace*{3.6cm}{{\bf (a)}}\hspace*{3.6cm}}\\
\end{figure} Figure 6: ISO-SWS observations of $\alpha $ Boo in band 1E versus the synthetic data of model 77 ( ${T}_{\rm eff}$ = 4370 K, $\log~g = 1.20$ dex, [Fe/H] = -0.50 dex). Panel  a) ISO-SWS data are plotted in black, synthetic data in grey. Panel  b) the ratio between observational and synthetic data. Panel  c) illustration of how the $\beta $ parameter is calculated: the argument of the absolute value in the right-hand side of Eq. (3) is displayed as a function of the wavelength.
Open with DEXTER

4.3 Discussion on the estimated parameter ranges

It is important to compare the derived parameter ranges with literature values. We therefore will use Table D.3 as published in Decin et al. (2003a), in which a comprehensive list of derived and assumed parameter values published in the literature is given. These listed parameters have already been compared with the results as deduced from the ISO-SWS data in Sect. 3.5.2 in Decin et al. (2003a). However, the uncertainties as given in Decin et al. (2003a) were empirical values estimated from (1) the intrinsic uncertainty on the synthetic spectrum (i.e., the possibility to distinguish different synthetic spectra at a specific resolution, i.e. there should be a significant difference in $\beta $-values) which is thus dependent on both the resolving power of the observation and the specific values of the fundamental parameters; (2) the uncertainty on the ISO-SWS spectrum which is directly related to the quality of the ISO-SWS observation; (3) the value of the $\beta $-parameters in the KS-test; and (4) the still remaining discrepancies between observed and synthetic spectra. Decin et al. (2003a) obtained ${T}_{\rm eff}$ =  $4320 \pm 140$ K, $\log~ g=1.50 \pm 0.15$, and [Fe/H] =  $-0.50 \pm 0.20$. Comparing these results with the ones as given in the previous section, we see that a combination of both a global and a local deviation estimating parameter restricts the uncertainty ranges for these 3 fundamental parameters in the case of the ISO-SWS data of $\alpha $ Boo.

Combining the results given in Table 5 with the angular diameter values tabulated in Table 2, we obtain for the angular diameter $20.77<\hbox{$\theta_{\rm d}$ }<21.24$ mas, mainly resulting from the uncertainty in the effective temperature (compared to the gravity and the metallicity). However, from Eq. (2) in Decin et al. (2003b), one can see that in the case of the ISO-SWS data the main uncertainty on the angular diameter is determined from the uncertainty in the absolute flux level (being $\approx$10%, see Sect. 2.1). Taking this last uncertainty into account (as done in Eq. (2) in Decin et al. 2003b), we obtain $\theta_{\rm d}$ =  $21.01~ \pm~ 1.24$ mas.

Uncertainty intervals as listed in Table D.3 in Decin et al. (2003a) are often intrinsic uncertainties or (sometimes) have been propagated from uncertainties in other parameters. However, as in this analysis, observational measurement uncertainties are never taken into account. As commented on in Decin et al. (2003a), we do see that the derived parameters from the ISO-SWS data are in good agreement with other listed values, but it should be noted that our uncertainty on the metallicity is quite large compared to other results. Several causes can be reported for this larger uncertainty range: (1) the used grid is not sensitive enough in the metallicity, and we should diminish the spacing in metallicity; (2) the used low-resolution ISO-SWS data are not that well suited to derive the metallicity; (3) the lack of a proper uncertainty estimate significantly underestimates the derived uncertainty ranges found in other studies.

5 Lack-of-fit tests

Within the classical regression framework, estimation is usually followed by inference and model diagnostics. In the previous sections we focused on model selection and estimation of the stellar parameters. In this section we propose a tool for model diagnostics. The term model "diagnostics'' refers to any technique that offers evidence of whether a particular model is an adequate description of the data or not. Our main argument is that choosing the "best'' model (out of a grid) does not necessarily imply that the model is a "good'' representation of the observed data. Hence, when a model is chosen, one can investigate how well the model fits the data. We focus on the variable


Note that Vt was used to construct the Kolmogorov-Smirnov statistics in the previous section. Now, if a specific synthetic spectrum is a "good'' model, then we expect that  $V_{t} \approx 1$. The aim of this section is to investigate the behaviour of Vt locally rather than globally. Note that if the synthetic spectrum is a good model, we expect that a non-parametric smoother (see details in the appendix) of Vtwill be flat around 1. Contrasting a hypothesized parametric model to a non-parametric model is a key aspect in an omnibus lack-of-fit test.

Lack-of-fit tests are an attractive tool for model diagnostics since they allow us to asses the goodness-of-fit of a proposed model in a formal way. For a comprehensive discussion on lack-of-fit tests we refer to Hart (1997). In our setting, we wish to test the null hypothesis  H0:E(Vt)=1 against the alternative hypothesis that  $H_{0}:E(V_{t})=\eta(t)$ where $\eta(t)$ is a smooth function which is not necessarily constant along the wavelength. For the remainder of this section we first formulate the hypotheses to be tested in more detail, review the test procedure proposed by Bowman & Azzalini (1997) and apply these methods to our setting.

5.1 Test of hypothesis for the "no effect" model

Let y(t) be the rebinned data at wavelength t and let $\mu(t)$ be the "true" spectrum at the same wavelength. We assume that $\mu(t)$ is determined by the values of the stellar parameters $\Omega = ($ ${T}_{\rm eff}$, $\log~g$, [Fe/H]) and we consider the model

\end{displaymath} (4)

with $\varepsilon(t)$ being the observational error on the observed spectrum. We further assume that  $E(\varepsilon(t))=0$. Since the parameter of interest is $\Omega$, we wish to test the hypotheses
    $\displaystyle H_{0}:\Omega=\Omega_{0},$  
    $\displaystyle H_{1}:\Omega \ne \Omega_{0},$ (5)

with $\Omega_0$ representing the null stellar parameters of the target being studied. The hypotheses in Eq. (5) can be reformulated in terms of the synthetic spectrum,
    $\displaystyle H_{0}:\mu(t)=\theta_{0}(t),\;\;\; \mbox{for all $t$ }$  
    $\displaystyle H_{1}:\mu(t) \ne \theta_{0}(t),\;\;\; \mbox{for some $t$ },$ (6)

where $\theta_{0}(t)$ is the null synthetic spectrum corresponding to the stellar parameters  $\Omega_0$. In terms of the observed spectrum the hypotheses in Eq. (6) can be rewritten as
    $\displaystyle H_{0}:E(y(t))=\theta_{0}(t),\;\;\; \mbox{for all $t$ }$  
    $\displaystyle H_{1}:E(y(t)) \ne \theta_{0}(t),\;\;\; \mbox{for some $t$ }.$ (7)

Under the null hypothesis in Eq. (7) we expect that  E(Vt)=1. Thus, in terms of Vt, we consider two competing models,
    $\displaystyle H_{1}:E(V_{t})=\eta(t)\;\;\mbox{with}\;\;\eta(t) \ne 1 ,\;\;\;
\mbox{for some $t$ }.$ (8)

Here, $\eta(t)$ is assumed to be a smooth function. The model under H0 is called the "no effect'' model (see e.g., Hart 1997, p. 148).

To test the hypotheses in Eq. (8), we used the procedure as described by Bowman & Azzalini (1997). One therefore needs to calculate the residual sum of squares under the two hypotheses and compare them. Since the mean of Vt under H0 is constant, the residual sum of squares under the null hypothesis is

 \begin{displaymath}{\it RSS}_{0}=\Sigma_{t=1}^{n} \left \{ V_{t}-1 \right \}^{2},
\end{displaymath} (9)

and under H1

 \begin{displaymath}{\it RSS}_{1}=\Sigma_{t=1}^{n} \left \{ V_{t}-\hat{\eta}(t) \right \}^{2},
\end{displaymath} (10)

where $\hat{\eta}(t)$ is a linear smoother of Vt. Note that we do not specify any parametric structure for $\eta(t)$ under the alternative in Eq. (8). The underling assumption that we made is that if a specific synthetic model it not a "good'' model, there is a structure in the rebinned data that this specific model cannot capture. This structure can be captured by the non-parametric smooth function  $\hat{\eta}(t)$. In practice, we use the Loess method, e.g. see Cleveland (1979) and Chambers & Hastie (1992), to model the relationship between Vt and the wavelength. More details about the Loess method is given in the appendix. Intuitively, it is clear that for a "good'' synthetic spectrum  ${\it RSS}_{0}$ and  ${\it RSS}_{1}$ have close values. Therefore we will not reject the null hypothesis if  ${\it RSS}_{0}$ is sufficiently close to  ${\it RSS}_{1}$. Formally, the test statistics which quantifies the difference between the residual sum of squares is given by

 \begin{displaymath}F=\frac{{\it RSS}_{0}-{\it RSS}_{1}}{{\it RSS}_{1}}\cdot
\end{displaymath} (11)

Note that if H0 is correct we expect that F will be small. Hence, we reject the null hypothesis for a large value of F.

To proceed further we need to find the distribution of F under the null hypothesis. This can be done using a bootstrap procedure (Davison & Hinkley 1997) which we describe in more detail in the appendix. Briefly, the bootstrap procedure we applied consists of sampling B samples from the original sample while reflecting the null hypothesis. For each bootstrap sample we calculate the value of F. The empirical p-value of the test statistics is simply the proportion of the bootstrap statistics that is larger than the one observed in the original sample. For a given significance level $\alpha $, one cannot reject the null hypothesis if the p-value$~>~\alpha$.

To calculate the residual sum of squares under H1 we smooth Vt with Loess. Thus, the distribution of the test statistics in Eq. (11) depends on the choice of the smoothing parameter. As argued by Hart (1997), the smoothing parameter should be chosen in advance and should be fixed for all bootstrap samples. Therefore, our conclusion of whether to reject the null hypothesis or not depends on our choice of the smoothing parameter. A method to overcome this problem is to calculate the so-called "significance trace'' (Hart 1997 p. 160; Bowman & Azzalini 1997 p. 89). In this method, one computes the p-value for a wide range of smoothing parameters and the decision (whether to reject H0 or not) is based on the significance trace plot. This point will be illustrated in the following section (Sect. 5.2.1).

Table 6: Empirical p-values in band 1A. The first column gives the model number, and the second column the rank of the corresponding model determined from the $\beta $-value in band 1A. Empirical p-values based on a bootstrap with B=1000 and smoothing parameter $\gamma = 0.85$ are given in the third and fourth column: the third column shows the empirical p-values calculated under the null hypothesis in Eq. (8), and the fourth column shows the empirical p-values under the null hypothesis in Eq. (12). The last column gives the bias as determined from Eq. (14).

In addition to the hypotheses in Eq. (8) we test the following hypotheses

                                      $\displaystyle H_{0}:E(V_{t})= \mu\;\;\;(\mbox{any constant}),$  
    $\displaystyle H_{1}:E(V_{t})=\eta(t).$ (12)

The null hypothesis in Eq. (12) states that the mean of Vt is constant, but not necessarily equal to 1. Under H0 in Eq. (12) the residual sum of squares is

 \begin{displaymath}{\it RSS}_{0}=\Sigma_{t=1}^{n} \left \{ V_{t}-\bar{V}_{t} \right \}^{2},
\end{displaymath} (13)

with $\bar{V}_{t}$ indicating the mean of Vt. Note that if we reject the null hypothesis in Eq. (12) the null hypothesis in Eq. (8) will be rejected as well but not vice versa.

5.2 Application to the data

5.2.1 Band 1A

Table 6 presents the results for the lack-of-fit tests for the top 20 models in band 1A. For each synthetic spectrum 1000 bootstrap samples (B=1000) were drawn from the original sample as described in the appendix. Whenever the empirical p-value is greater than 0.05 the null hypothesis cannot be rejected. This means that the relationship between Vt and t is assumed to be constant for all models with p-value greater than 0.05.

(1) Testing H0:E(Vt)= 1:

The third column in Table 6 presents the empirical p-values calculated under the null hypothesis in Eq. (8). Using a smoothing parameter  $\gamma = 0.85$ for the Loess model, the empirical p-values are all 0. Thus, we reject H0 in Eq. (8) for all models.

(2) Testing $H_{0}:E(V_{t})=\mu$:

The bias in the last column in Table 6 is defined as

 \begin{displaymath}{\rm bias}=(\bar{V}_{t}- 1) \times 100,
\end{displaymath} (14)

where $\bar{V}_{t}$ was estimated under the null hypothesis in Eq. (12). Thus, a good synthetic model for the spectrum is one with empirical p-value greater than 0.05 (hence, constant relationship between Vt and t) and small bias (hence, the constant is closed to 1). When the empirical p-value was calculated under H0 in Eq. (12) (with $\gamma = 0.85$) the null hypothesis is rejected for all models. Figure 7 shows the plot of Vt with Loess smoothers (with several values for the smoothing parameter) for model 51 ( ${T}_{\rm eff}$ = 4300 K, $\log~g = 1.20$ dex, [Fe/H] = -0.70 dex). Note that, for  $\gamma = 0.85$ (the value that was used to calculate the empirical p-value in Table 6) the Loess model is quite flat, but lies above 1. This means that, in general, the values of the rebinned observational data are greater than the values of the synthetic spectrum along the wavelength. Figure 8 shows similar patterns for model 26 ( ${T}_{\rm eff}$ = 4230 K, $\log~g = 1.20$ dex, $\rm [Fe/H]=-0.70$ dex). Model 52 ( ${T}_{\rm eff}$ = 4300 K, $\log~g = 1.20$ dex, $\rm [Fe/H]=-0.50$ dex) is shown in Fig. 9. Figure 10 shows the results for model 125 ( ${T}_{\rm eff}$ = 4440 K, $\log~g = 1.80$ dex, [Fe/H] = 0.00 dex), which fit the data poorly according to the least squares criterion. Note how the Loess smoother is always below 1 and suggest an increasing trend with the wavelength.

We turn now to the discussion on the effect of smoothing parameters on the estimation procedure which depends on the choice of the smoothing parameter of the Loess. To be able to calculate the p-value one needs to construct the null distribution of the test statistics. This can be done only if the smoothing parameter is held fixed for each bootstrap replication (e.g., see Hart 1997, Sect. 6.4). King et al. (1991) proposed to compute the p-values corresponding to several different choices of the smoothing parameter. The plot in which p-values are plotted versus the smoothing parameter is called a "significance trace''. Figure 11 shows the significance trace plot for model 51 for the null hypothesis  $E(V_{t})=\mu$. For all values of $\gamma $ the null hypothesis is rejected (the p-value is below the horizontal line of 0.05). This means that the data do not support the null hypothesis. The same conclusion can be drawn for all other models. The fact that, for all models, the significance trace for the null hypothesis $E(V_{t})=\mu$ is below the 0.05 line regardless of the choice of the smoothing parameter means that the null hypothesis is rejected for all possible values of the smoothing parameter.

\end{figure} Figure 7: Band 1A: model 51 ( ${T}_{\rm eff}$ = 4300 K, $\log~g = 1.20$ dex, [Fe/H] = -0.70 dex). Vt and the Loess smoother with three values of $\gamma $.
Open with DEXTER

\end{figure} Figure 8: Band 1A: model 26 ( ${T}_{\rm eff}$ = 4230 K, $\log~g = 1.20$ dex, [Fe/H] = -0.70 dex). Vt and the Loess smoother with three values of $\gamma $.
Open with DEXTER

\end{figure} Figure 9: Band 1A: model 52 ( ${T}_{\rm eff}$ = 4300 K, $\log~g = 1.20$ dex, [Fe/H] = -0.50 dex). Vt and the Loess smoother with three values of $\gamma $.
Open with DEXTER

5.2.2 Bands 1B, 1D, and 1E

The results in bands 1B, 1D, and 1E are similar: the empirical p-values for all model were either zero or very close to zero. Hence, the null hypothesis in Eq. (8) was rejected for all models in the three bands which indicates that the synthetic spectra do not follow the same pattern as the rebinned observational data.

5.3 Discussion

\end{figure} Figure 10: Band 1A: model 125 ( ${T}_{\rm eff}$ = 4440 K, $\log~g = 1.80$ dex, [Fe/H] = 0.00 dex). Vt and the Loess smoother with three values of $\gamma $.
Open with DEXTER

\end{figure} Figure 11: Significance trace for model 51 for the null hypothesis $(H_{0}:V_{t}=\mu )$ in band 1A. The dotted horizontal line represents a significance level of 0.05. Whenever the significance trace plot is above the dotted line, the null hypothesis cannot be rejected.
Open with DEXTER

What are the lessons learned from the rejection of the null hypothesis in so many cases? It may be clear that this failure cannot be solved by relaxing the criteria, e.g. by lowering the level of significance $\alpha $. These lack-of-fit tests are an objective tool to demonstrate that there is still too much structure left in Vt. This is illustrated, e.g., in Figs. 12-15 where model 39 with a very good goodness-of-fit is depicted in bands 1A, 1B, 1D, and 1E. As can be seen from the upper panel in Fig. 12 the low-excitation 12CO lines are often predicted as being too strong, while it is clearly visible in the upper panel in Fig. 13 that the low-excitation OH-lines are often predicted as being too weak. This systematic discrepancy between observations and theory is captured in Vt and its Loess smoother, explaining why the lack-of-fit test rejects the null hypothesis. This systematic problem is not solved by one of the other models in the grid. Neither it is possible to solve this problem by reducing the carbon abundance  $\varepsilon$(C) or enhancing the oxygen abundance  $\varepsilon$(O), since then other molecular features are mispredicted. The described problem may be an outcome of three possible reasons. (1) We should enlarge our vector parameter $\Omega$ including not only the effective temperature, the gravity, and the metallicity, but also the carbon, nitrogen and oxygen abundance and the microturbulence, thus enlarging our grid to a 7-dimensional grid. However, some first tests done in the framework of the study in Decin (2000) do show that this inflation of the parameter range does not solve the problem in the case of $\alpha $ Boo. (2) Secondly, we have to consider that inaccuracies may occur in the temperature distribution in the outermost layers of the model photosphere (Decin et al. 2003b), indicating that some assumptions, on which the theoretical models are based, are questionable for cool stars. One of the assumptions in the MARCS-code is that radiative equilibrium is required, also for the outermost layers. This implies that temperature bifurcation, caused by e.g. effects of convection and convective overshoot with inhomogeneities in the upper photosphere, cannot be allowed for. Consequently the cores of e.g. the satured CO and OH lines are not predicted with full success, resulting in a systematic pattern in Vt and so to a rejection of the null hypothesis. At least for $\alpha $ Boo, recent studies done by Ryde et al. (2002) show that the outermost surface layers may be a few hundreds of Kelvin cooler than predicted by the MARCS code. (3) Inaccuracies in the (satellite) data-reduction process result in (non)-rebinned data being systematically off from the "true'' stellar spectrum. A problem with the Relative Spectral Response Function (RSRF) at the shorter wavelenghts of band 1A has already been reported by Decin et al. (2003b). Since the data are divided by the RSRF, a small problem with the RSRF at these places may introduce a pronounced error at the band edge. This kind of data-reduction problem can never be captured by the synthetic predictions, thus resulting in a systematic rejection of the null hypothesis. Using lack-of-fit tests for a sample of standard stars covering a broad parameter space, one can trace calibration problems.

In general, we may conclude that a systematic rejection of the null hypothesis in the lack-of-fit tests is an indication of a still incomplete modelling of all the physical mechanisms determining the spectral footprint in the wavelength range considered or of problems with the data reduction.

\end{figure} Figure 12: Band 1A. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: Vt and the Loess smoother.
Open with DEXTER

\end{figure} Figure 13: Band 1B. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: Vt and the Loess smoother.
Open with DEXTER

\end{figure} Figure 14: Band 1D. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: Vt and the Loess smoother.
Open with DEXTER

\end{figure} Figure 15: Band 1E. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: Vt and the Loess smoother.
Open with DEXTER

6 Summary and conclusions

In the first part of this article (Sects. 3 and 4) we have demonstrated that the use of either a local or a global goodness-of-fit parameter is an important first step to objectively determine the uncertainty range on the estimated parameters. But a very important message is that combining both a local and a global deviation estimating parameter allows us to pin down the parameters with even more certainty. In the test-case of the ISO-SWS data of $\alpha $ Boo, we estimate the stellar parameters  ${T}_{\rm eff}$, $\log~g$ and metallicity as ranging respectively from 4160 K to 4300 K, from 1.35 to 1.65 dex, and from -0.30 to 0.00 dex using synthetic spectra calculated from MARCS model atmospheres.

Having determined the "best'' models is however not the end of the story. The use of lack-of-fit tests enables us to detect systematic patterns in the difference between observed and theoretical data. For the case-study of $\alpha $ Boo, we obtained that in all the 4 sub-bands the closest synthetic spectra to the observational data are not capable of capturing all the structure from the data, i.e. the "best'' models are not "good'' enough. Both gaps in our knowledge of the physical mechanisms taking place during the life of a star, too simple assumptions in the theoretical modelling, uncertainties in additional stellar parameters - which are now kept fixed - and satellite data reduction problems may result in the rejection of the null hypothesis in the lack-of-fit tests.

As was illustrated by the example of the ISO-SWS data of $\alpha $ Boo, the statistical methods presented in this paper for comparing observational and synthetic data provide useful, practical and general tools: (1) to estimate objectively the stellar parameters and their uncertainties from observational data and a grid of synthetic spectra; (2) to refine the uncertainty intervals by combining a local and a global goodness-of-fit parameter; and (3) to trace if our knowledge of the physical mechanisms in a star, of the theoretical (numerical) modelling of the stellar photosphere or of the calibration process still need considerable refinement. The main limitation of this methodology is that measurement errors are still not included. In the following paper in this series (Shkedy et al., submitted) we will use hierarchical Bayesian models for the spectrum. In this approach, the measurement errors will be incorporated in the model as well.

L.D. acknowledges support from the Science Foundation of Flanders. Z.S. and G.M. gratefully acknowledge support from the Belgian IUAP/PAI network "Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data".

Appendix A: Smooting using Loess models

\end{figure} Figure A.1: Band 1A: model 58 ( ${T}_{\rm eff}$ = 4300 K, $\log~g = 1.35$ dex, [Fe/H] = -0.30 dex). Vt and 4 Loess smoothers with $\gamma $ equals to 0.1 (solid line), 0.25 (short dashed line), 0.5 (long dashed line) and 0.75 (dotted-dashed line).

Non-parametric regression models aim to describe the relationship between a response variable y and a predictor x. The model has the general form of


where $\eta$ is considered to be a smooth function. The local linear approach for non-parametric models is based on solving the local weighted least square problem

\begin{displaymath}\mbox{min}_{\alpha,\beta}\sum_{i=1}^{n}\big\{ y_{i}-\alpha-\beta(x_{i}-x)
\big\}^2 w(x_{i}-x;\gamma).

Here, w is a weight function symmetric around zero and $\gamma $ is a smoothing parameter which controls the width of w and therefore the amount of smoothing in the model. Local quadratic models can be fitted by including the term  (xi-x)2 into the model. In our setting we applied a local linear model. The effects of the smoothing parameter on the fitted model is illustrated in Fig. A.1 which shows 4 Loess smoothers with smoothing parameters increasing from 0.1 to 0.75. Clearly, as the smoothing parameter increases, the fitted model becomes "more" smooth.

For a further discussion about Loess, which stands for "local regression'', we refer to Cleveland (1979) and the book of Bowman & Azzalini (1997). An intuitive introduction about Loess can be found in Cleveland (1993). The book of (Hart 1997) gives a comprehensive discussion about smoothing and data driven choice of the smoothing parameter.

Appendix B: The bootstrap procedure

We applied the following bootstrap procedure in order to calculate the empirical p-values.

The empirical p-value is the proportion of the bootstrap statistics, $F^*_{1},\dots,F^*_{B}$, that are greater or equal to the value of F which is calculated from the original sample. For further discussion of the bootstrap procedure we refer to the books of Davison & Hinkley (1997) and Bowman & Azzalini (1997).


Copyright ESO 2004