A&A 421, 281-294 (2004)
DOI: 10.1051/0004-6361:20040127
L. Decin^{1,}^{} - Z. Shkedy^{2} - G. Molenberghs ^{2} - M. Aerts ^{2} - C. Aerts ^{1}
1 - Department of Physics and Astronomy, Institute for
Astronomy, K.U. Leuven, K.U. Leuven, Celestijnenlaan 200B, 3001 Leuven,
Belgium
2 -
Biostatistics, Center for Statistics, Limburgs Universitair
Centrum, Universitaire Campus, Building D, 3590 Diepenbeek, Belgium
Received 22 April 2003/ Accepted 18 January 2004
Abstract
Estimating stellar parameters from spectrophotometric data
is a key tool in the study of stellar structure and stellar
evolution. Although many methods have been proposed to estimate
stellar parameters from ultraviolet (UV), optical and infrared (IR)
data using low, medium or high-resolution observational data of the
target(s), only a few address the problem of the uncertainties in the
stellar parameters. This information is critical for a meaningful
comparison of the derived parameters with results obtained from other
data and/or methods. Here we present a frequentist
method to estimate these uncertainties. We demonstrate that
the combined use of both a local and a global
goodness-of-fit parameter alters the uncertainty intervals as
determined from the use of only one of these deviation estimating parameters.
This technique using both goodness-of-fit parameters
is applied to the infrared 2.38-4.08 m ISO-SWS data
(Infrared Space Observatory - Short Wavelength Spectrometer) of Boo,
yielding an effective temperature range from 4160 K to 4300 K, a logarithm of the gravity range from 1.35 to 1.65 dex and a
metallicity from -0.30 to 0.00 dex.
However, using a lack-of-fit test, it is shown that even the "best''
theoretical models are still not capable of capturing all
the structure in the data, and this is due to our incomplete
knowledge and modelling of the full physical stellar structure or
due to problems in the data reduction process.
Key words: methods: data analysis - methods: statistical - techniques: spectroscopic - stars: fundamental parameters - stars: individual: Alpha Boo
Everything we know about the structure of stellar objects being studied is the result of a comparison between theoretical predictions and stellar observations. To give realistic answers to many physical questions induced by stellar phenomena, not only are accurate theoretical models indispensable, but also realistic uncertainty estimates on the parameters being deduced are required. In astronomy, uncertainty estimates (if assessed at all) are still often based on too simple a study of the sample of observables, resulting in too "optimistic'' error bars. A realistic knowledge of the uncertainties is however crucial and has to be taken into account if one, for example, wants to test the proposed physical mechanism explaining certain phenomena (see, e.g., De Bruyne et al. 2003). The present paper is the first of a series devoted to the development and description of a statistical method to assess the uncertainties of stellar atmospheric parameters deduced from astronomical spectra.
One way of estimating stellar atmospheric parameters and drawing inferences from them consists of comparing the observed spectrum of the target being studied with a collection of synthetic spectra (e.g., Decin et al. 2000, for an application to cool stars; Bailer-Jones 2000, for an application using neural networks). Depending on the quality, the resolution and the wavelength coverage of the data, different stellar parameters can be traced. In this paper we focus on the three most important stellar parameters for the model structure: the effective temperature , the gravity g and the metallicity [Fe/H]. Other parameters such as the abundance pattern or the microturbulence are treated as known. Let , , [Fe/H]) present the parameters of the stellar atmosphere. A synthetic spectrum, , , is calculated for specific values of the parameters, ^{(m)}, , [Fe/H]^{(m)}) and compared to the observed spectrum. When the synthetic spectrum and the observed spectrum agree the parameters of the stellar atmosphere are assumed to be known. The first question which arises when applying this kind of method is how to measure the goodness-of-fit between observational and theoretical data. A second - equally important - question is then how to assess the uncertainties on the derived stellar parameters. These questions (and answers) become even more complicated when we want to take measurement errors into account.
A search of the astronomical literature reveals that statistical tests are often restricted to local deviation estimating parameters, as e.g. the ordinary least square method (OLS) (or derivatives from it) (Bailer-Jones 2000; Katz et al. 1998; Erspamer & North 2002; Valenti & Piskunov 1996). The first goal of this paper is to demonstrate that parameter ranges as determined from the use of a local goodness-of-fit parameter can be optimized by combining a local with a global goodness-of-fit parameter (Sects. 3 and 4). In both methods, the estimate for is the value of which minimises the proposed goodness-of-fit parameter. Using the results of this first part of the study, our second goal is to discuss (Sect. 5) how the differences between observed and synthetic spectra can be used to check the appropriateness of a proposed set of stellar parameters - a step often neglected by astronomers. This qualification can be performed using lack-of-fit tests.
The methodology developed in this paper has broad applications in astronomy as it relies only on observed spectra and on theoretical predictions thereof. To test our method, we have chosen to apply it to the spectra of a stellar target of which the basic stellar parameters are already very well known from successful comparisons with models. Accurate estimations of stellar parameters for cool standard stars were done using data of the ISO-SWS (Decin et al. 2003a-c). We therefore have chosen to illustrate our general methodology on the ISO-SWS observations of one such star, the case study of the K2IIIp giant Alpha Bootis (Arcturus, HD 124897).
Before doing the analysis in Sects. 3-5, we give in Sect. 2 a description of the observational and theoretical data on which the method will be tested. The results of both Part I (Sects. 3 and 4) and Part II (Sect. 5) are discussed in Sects. 4.3 and 5.3 respectively. We end with a summary and some conclusions in the last section, Sect. 6. How to treat observational errors in this kind of study will be discussed in a forthcoming paper of this series.
This section describes the used observational ISO-SWS and theoretical data. The grid of synthetic spectra calculated for the test-case Boo is specified in Sect. 2.3.
The observational data for this study consist of near-infrared (2.38-4.08 m) spectra of Boo observed with the SWS (Short Wavelength Spectrometer, de Graauw et al. 1996) on board ISO (Infrared Space Observatory, Kessler et al. 1996). The spectrometer was used in the SWS observing mode AOT01 (=a single up-down scan for each aperture with four possible scan speeds at degraded resolution) with scanner speed 4, resulting in a resolving power of 1500. The observation lasted for 6538 s and was performed during revolution 452^{}.
Table 1: Resolution and factors used to shift the sub-bands.
The reduction was made using the SWS Interactive Analysis Package IA (de Graauw et al. 1996) using calibration files and procedures equivalent with pipeline version 10.0. Further data processing consisted of bad data removal ( ), aligning of the 12 detectors to their average level. Since the grid of observational pixel values does not have a fixed resolution, we first want to "summarise'' the observational pixel values, and then make a comparison between this summary (denoted as y) and a synthetic spectrum () with the same resolution. The standard way to resample the input data is by "rebinning''. To summarise the ISO-SWS data to a fixed resolution we have applied a flux conserving non-parametric rebinning method - i.e. for each bin the flux value is calculated using the trapezoidal rule - with an oversampling of 4. This means that the resolution bin used is 4 times the grid separation determined by the resolution for a specific wavelength range of the ISO-SWS data. To fully recover the intervening flux values it can be shown in the context of "rectangular filtering'' that taking 4 points in an interval of length is enough to optimise the signal-to-noise (S/N) ratio (Bracewell 1985). The rebinning used in the data reduction procedure introduces a correlation between the data point values. The appropriate resolving power was taken to be the most conservative resolving power as determined by Lorente in Leech et al. (2002) (see Table 1), with the exception being band 1A^{} for which this value has been changed from 1500 to 1300 (Decin et al. 2003b).
The individual sub-band spectra can show jumps in flux level at the band-edges when combining them into a single spectrum. These band-to-band discontinuities can have several causes: uncertainties in flux calibration, the low responsivity at the band edges, pointing errors, and a problematic dark current subtraction in combination with the RSRF (Relative Spectral Response Function) correction, from which the pointing errors are believed to have the largest impact for this high-flux observation. Hence, the individual sub-bands were multiplied by a factor to construct a smooth spectrum (see Table 1). These factors were determined using the SED (Spectral Energy Distribution) of Boo as constructed in Decin et al. (2003b) as a reference. The estimated uncertainty on these factors is 10% (Leech et al. 2002).
The synthetic spectra used in this study have been generated using model photospheres calculated with the MARCS code, version May 1998. This version is a major update of the MARCS model-photosphere programs first developed by Gustafsson et al. (1975), and further improved by, e.g. Plez et al. (1992), Jørgensen et al. (1992), Edvardsson et al. (1993).
The common assumption of spherical stratification in homogeneous stationary layers, hydrostatic equilibrium and Local Thermodynamic Equilibrium (LTE) were made. Energy conservation was required for radiative and convective flux, where the energy transport due to convection was treated through a local mixing-length theory. The mixing-length l was chosen as , with the pressure scale height. Turbulent pressure was neglected. The reliability of these assumptions is discussed in Plez et al. (1992). The continuous absorption as well as the new models will be fully described in a series of forth-coming papers (Gustafsson et al.; Jørg ensen et al.; Plez et al., all in preparation).
Using the computed model atmospheres, the synthetic spectra were generated by solving the radiative transfer at a high wavelength resolution ( , corresponding to ). With a microturbulent velocity , this means we are sure to sample all lines in the atomic and molecular database in the generation of the synthetic spectrum. This is necessary so as not to overestimate the absorption in regions with a high line density, or to underestimate it in regions with a low line density (Ryde & Eriksson 2002). For the line opacity in the ISO-SWS range a database of infrared lines including atoms and molecules has been prepared. For the molecular lines, the same data have been used as in Decin et al. (2000). The accuracy and completeness of these line lists are discussed in Decin (2000). For the atomic transitions, the newly generated atomic linelist of J. Sauval (priv. comm.) based on the FTS-ATMOS (Atmospheric Trace Molecule Spectroscopy) spectrum of the Sun (Geller 1992; Farmer & Norton 1989) has been included. The emergent synthetic spectra are then convolved with a Gaussian instrumental profile with the same resolution as the ISO-SWS sub-bands (see Table 1).
As described in the introduction, we will calculate a grid of synthetic spectra over discrete values in the parameter vector . In the following sections, we will use this grid over the vector parameter to estimate with for which the synthetic spectrum is the "closest'' to the observed spectrum (see Sects. 3 and 4).
Based on the results in Decin et al. (2003a), 125
spectra have been calculated for
Boo, with parameter ranges:
: | 4160 K, 4230 K, 4300 K, 4370 K, 4440 K | |
: | 1.20, 1.35, 1.50, 1.65, 1.80 | |
[Fe/H] | : | 0.00, -0.15, -0.30, -0.50, -0.70. |
Table 2: Angular diameters in mas and model numbers (in between brackets) associated with the different model parameters of the grid of synthetic spectra.
Since the ISO-SWS data are absolutely calibrated,
one also has to compute the
angular diameter to compare the rebinned observed and synthetic
data properly. Therefore, the angular diameter
is deduced from the
energy distribution of the synthetic spectrum between 2.38 and 4.08 m and the absolute flux-values in this wavelength range of
the ISO-SWS spectrum. We therefore have minimised the residual sum of
squares
(1) |
We try for the first time in this paper to determine the parameter ranges of the effective temperature, the gravity and the metallicity of Boo. We first describe in general the model selection based on the ordinary least square criterion (Sect. 3.1). This is followed by the application to our test-case, the ISO-SWS data of Boo (Sect. 3.2).
For the high-quality (absolutely calibrated) spectroscopic data that
we use, it is natural to estimate
with
for
which the synthetic spectrum
gives the best resemblance
to the observed spectrum, y. By analogy to linear
regression we can estimate
by minimising the residual sum
of squares (also called the "ordinary least square''
(OLS) method):
Model 62 (
= 4300 K,
dex,
dex)
has the best goodness-of-fit in band 1A. We note that
within one temperature level, the models occur in groups of size 5 (according to the value of the gravity). For example, models 1-5 have the same effective temperature, 4160 K,
and the same gravity (
)
while for models 6-10
.
Trends in the goodness-of-fit are visualised in
Fig. 3. Three patterns are observed: (a) the
goodness-of-fit increases with the level of metallicity, (b) a
parabolic shape in which the best goodness-of-fit is achieved for
models with metallicity between -0.15 to -0.5, and (c) goodness-of-fit decreases with the level of metallicity. For a
fixed temperature, the trend changes from trend (a) via trend (b) to trend (c) when the gravity increases.
Sometimes, a trend occurs twice or is absent, but the order
of trends never changes. The model having the best goodness-of-fit is
always situated at the minimum of a parabolic shape, suggesting
that we have reached a local minimum - an equilibrium - in the
parameter space.
Figure 1: Comparison between the ISO-SWS data of Boo (black) and the synthetic spectrum with model number 1, i.e. = 4160 K, , and [Fe/H] = -0.70 (grey) in the 4 sub-bands 1A, 1B, 1D, and 1E. The main absorption features caused by the CO 1st overtone lines, the SiO 1st overtone lines, and the OH fundamental lines are indicated. | |
Open with DEXTER |
Figure 2: versus the model numbers for band 1A. The vertical lines separate the models according to the temperature. The 10 models with the best goodness-off-fit are situated below the horizontal dashed line. | |
Open with DEXTER |
Figure 3: Trends in the goodness-of-fit condition of for band 1A. The model numbers are as specified in Table 2. | |
Open with DEXTER |
Table 3 shows the 10 models with having the lowest values, i.e., they have the best goodness-of-fit. For these models is between 1.20 and 1.65 dex, the effective temperature ranges from 4160 K to 4440 K and [Fe/H] is between -0.15 dex and -0.70 dex.
Table 3: Top 10 models in band 1A.
While in the previous subsection we have concentrated on band 1A of the ISO-SWS data of Boo, we now will take the whole 2.38-4.08 m wavelength range into account. Band 1A has a very characteristic footprint, determined by the first overtone CO ( ) vibration-rotation bands in this wavelength range (Decin et al. 2000). Molecules absorbing in bands 1B, 1D, and 1E are mainly OH and SiO, while also some atomic features are visible. The absorption pattern of these last molecules is however not as pronounced as for CO ( ) in band 1A. Although these CO features can already give us quite a good idea of the temperature and the gravity of the target, it is essential to use the whole 2.38-4.08 m wavelength in order to minimise the uncertainties on the stellar parameters being studied. This is due to the fact that all of these molecular and atomic absorption features have their own characteristic dependence on the atmospheric parameters (see Decin et al. 2000). Since, however, each sub-band has its own instrumental characteristics, and since the observational data have their largest uncertainties at the band edges (Leech et al. 2002), we will not join the whole 2.38-4.08 m wavelength range into 1 spectrum, but we will combine the results obtained from the separate bands.The values of were ranked at each band, and for each model we have calculated the mean of the ranks. This means that the "best" model is the one with the smallest mean rank. For example, model 38 ( = 4230 K, dex, dex) has a rank of 2 in band 1A, but this model is ranked 9, 28, 25 in band 1B, 1D and 1E respectively (hence, the mean rank is 13.50). Overall, the rank of the mean rank of model 38 is 5. The ranks of the mean ranks of the 125 models are displayed in Fig. 4. The models with the lowest rank are models 62 and 82. Model 62 is ranked 1, 7, 8, and 4 in the 4 bands, respectively, with mean rank being 5.0.
Figure 4: Rank of the mean ranks for the 125 synthetic spectrum. | |
Open with DEXTER |
Table 4: Stellar parameters of the models having the lowest overall rank. The overall rank is given in between brackets in the first column. In the last column, the rank in band 1A is tabulated.
The models having the best rank of the mean rank are listed in Table 4. Based on the models which rank in the top 10, the range in effective temperature is between 4230 K and 4440 K, in gravity between 1.35 and 1.65, and in metallicity between -0.70 to -0.30 dex.
In the previous section, was used as a measure for the goodness-of-fit. In this section the analysis discussed above was repeated using the Kolmogorov-Smirnov () statistics as a measure for the goodness-of-fit.
The Kolmogorov-Smirnov statistical test
globally checks the goodness-of-fit of the observed and
synthetic spectra by computing a deviation estimating parameter
(see Eq. (5) in Decin et al. 2000).
Without specifying the distribution function of ,
we may
summarise that
Since both deviation estimating parameters stress a different point in the goodness-of-fit, a combination of the results based on the two parameters separately can only improve our knowledge on the stellar parameters and their uncertainties. This is illustrated in Fig. 5.
Figure 5: Panel a) mean of the -values of the 4 sub-bands versus the mean of for the 4 sub-bands. Panel b) ranks of the mean ranks based on versus ranks of the mean ranks based on . | |
Open with DEXTER |
Table 5 shows the best 5 models, which besides appearing in the lower left corner of Fig. 5a also are ranked among the top 30 for both and (see Fig. 5b and Table 4). The combined use of both the scores of and themselves and the ranking of these scores does allow us to determine a set of "best'' models! While e.g. model 54 ( = 4300 K, dex, dex) has very low ranks based on both and , the mean -value is rather high. The advantage of using ranks is that all deviation estimating parameters can be treated in the same magnitude level. The disadvantage of using the ranks is that a sudden increase in the deviation estimating parameter is not translated into a sudden jump in the rank. Using both diagnostics together solves this problem.
Note that one can see a correlation between the ranks of the mean ranks of the T and parameter, but that there are a few outliers namely in the upper left corner of Fig. 5b where a few models are situated with low T and high . Inspecting why the deviation estimating parameters do show this trend, shows us that all of these models have a very low rank in T for band 1D and/or band 1E. When zooming into these bands, one indeed sees a resemblance (see e.g. Fig. 6a for model 77: = 4370 K, dex, [Fe/H] = -0.50 dex), explaining the low T-value. The ratio between the observational and synthetic data does, however, show a trend and is not randomly distributed around 1, explaining the high value. This is illustrated by the gray line in Fig. 6b with slope -0.02. The few very strong (negative) peaks are due to the underestimation of the OH-lines. Note also that the model having the lowest rank in (model 40: = 4230 K, dex, [Fe/H] = 0.00 dex) is only ranked 46 in T. This illustrates once more that a combination of a local and global deviation estimating parameter enlarges our knowledge on the estimated stellar parameters and their uncertainties, which in this particular case (see Table 5) results in a range in from 4160 K to 4300 K, in from 1.35 dex to 1.65 dex, and in [Fe/H] from -0.30 to 0.00 dex. Note that while the local goodness-of-fit parameter T favours the lower range in metallicity, the combined use of T and clearly indicates a higher range in metallicity.
Table 5: Overall goodness-of-fit.The 5 models given in this table do belong to the group of "best'' models based on both the values of and , and are moreover ranked among the top 30 for both and .
Figure 6: ISO-SWS observations of Boo in band 1E versus the synthetic data of model 77 ( = 4370 K, dex, [Fe/H] = -0.50 dex). Panel a) ISO-SWS data are plotted in black, synthetic data in grey. Panel b) the ratio between observational and synthetic data. Panel c) illustration of how the parameter is calculated: the argument of the absolute value in the right-hand side of Eq. (3) is displayed as a function of the wavelength. | |
Open with DEXTER |
It is important to compare the derived parameter ranges with literature values. We therefore will use Table D.3 as published in Decin et al. (2003a), in which a comprehensive list of derived and assumed parameter values published in the literature is given. These listed parameters have already been compared with the results as deduced from the ISO-SWS data in Sect. 3.5.2 in Decin et al. (2003a). However, the uncertainties as given in Decin et al. (2003a) were empirical values estimated from (1) the intrinsic uncertainty on the synthetic spectrum (i.e., the possibility to distinguish different synthetic spectra at a specific resolution, i.e. there should be a significant difference in -values) which is thus dependent on both the resolving power of the observation and the specific values of the fundamental parameters; (2) the uncertainty on the ISO-SWS spectrum which is directly related to the quality of the ISO-SWS observation; (3) the value of the -parameters in the KS-test; and (4) the still remaining discrepancies between observed and synthetic spectra. Decin et al. (2003a) obtained = K, , and [Fe/H] = . Comparing these results with the ones as given in the previous section, we see that a combination of both a global and a local deviation estimating parameter restricts the uncertainty ranges for these 3 fundamental parameters in the case of the ISO-SWS data of Boo.
Combining the results given in Table 5 with the angular diameter values tabulated in Table 2, we obtain for the angular diameter mas, mainly resulting from the uncertainty in the effective temperature (compared to the gravity and the metallicity). However, from Eq. (2) in Decin et al. (2003b), one can see that in the case of the ISO-SWS data the main uncertainty on the angular diameter is determined from the uncertainty in the absolute flux level (being 10%, see Sect. 2.1). Taking this last uncertainty into account (as done in Eq. (2) in Decin et al. 2003b), we obtain = mas.
Uncertainty intervals as listed in Table D.3 in Decin et al. (2003a) are often intrinsic uncertainties or (sometimes) have been propagated from uncertainties in other parameters. However, as in this analysis, observational measurement uncertainties are never taken into account. As commented on in Decin et al. (2003a), we do see that the derived parameters from the ISO-SWS data are in good agreement with other listed values, but it should be noted that our uncertainty on the metallicity is quite large compared to other results. Several causes can be reported for this larger uncertainty range: (1) the used grid is not sensitive enough in the metallicity, and we should diminish the spacing in metallicity; (2) the used low-resolution ISO-SWS data are not that well suited to derive the metallicity; (3) the lack of a proper uncertainty estimate significantly underestimates the derived uncertainty ranges found in other studies.
Lack-of-fit tests are an attractive tool for model diagnostics since they allow us to asses the goodness-of-fit of a proposed model in a formal way. For a comprehensive discussion on lack-of-fit tests we refer to Hart (1997). In our setting, we wish to test the null hypothesis H_{0}:E(V_{t})=1 against the alternative hypothesis that where is a smooth function which is not necessarily constant along the wavelength. For the remainder of this section we first formulate the hypotheses to be tested in more detail, review the test procedure proposed by Bowman & Azzalini (1997) and apply these methods to our setting.
To test the hypotheses in Eq. (8), we used
the procedure as described by Bowman & Azzalini (1997). One therefore needs
to calculate the residual sum of squares under the two hypotheses and
compare them. Since the mean of V_{t} under H_{0} is
constant, the residual sum of squares under the null hypothesis is
To proceed further we need to find the distribution of F under the null hypothesis. This can be done using a bootstrap procedure (Davison & Hinkley 1997) which we describe in more detail in the appendix. Briefly, the bootstrap procedure we applied consists of sampling B samples from the original sample while reflecting the null hypothesis. For each bootstrap sample we calculate the value of F. The empirical p-value of the test statistics is simply the proportion of the bootstrap statistics that is larger than the one observed in the original sample. For a given significance level , one cannot reject the null hypothesis if the p-value.
To calculate the residual sum of squares under H_{1} we smooth V_{t} with Loess. Thus, the distribution of the test statistics in Eq. (11) depends on the choice of the smoothing parameter. As argued by Hart (1997), the smoothing parameter should be chosen in advance and should be fixed for all bootstrap samples. Therefore, our conclusion of whether to reject the null hypothesis or not depends on our choice of the smoothing parameter. A method to overcome this problem is to calculate the so-called "significance trace'' (Hart 1997 p. 160; Bowman & Azzalini 1997 p. 89). In this method, one computes the p-value for a wide range of smoothing parameters and the decision (whether to reject H_{0} or not) is based on the significance trace plot. This point will be illustrated in the following section (Sect. 5.2.1).
Table 6: Empirical p-values in band 1A. The first column gives the model number, and the second column the rank of the corresponding model determined from the -value in band 1A. Empirical p-values based on a bootstrap with B=1000 and smoothing parameter are given in the third and fourth column: the third column shows the empirical p-values calculated under the null hypothesis in Eq. (8), and the fourth column shows the empirical p-values under the null hypothesis in Eq. (12). The last column gives the bias as determined from Eq. (14).
In addition to the hypotheses in Eq. (8) we test the following
hypotheses
Table 6 presents the results for the lack-of-fit tests for
the top 20 models in band 1A. For each synthetic spectrum 1000 bootstrap samples (B=1000) were drawn from the original sample as
described in the appendix. Whenever the empirical p-value is greater
than 0.05 the null hypothesis cannot be rejected. This means that the
relationship between V_{t} and t is assumed to be constant for all
models with p-value greater than 0.05.
(1) Testing
H_{0}:E(V_{t})= 1:
The third column in Table 6 presents the empirical p-values
calculated under the null hypothesis in Eq. (8).
Using a smoothing parameter
for the Loess model, the empirical
p-values
are all 0. Thus, we reject H_{0} in Eq. (8) for all
models.
(2) Testing
:
The bias in the last column in Table 6 is defined as
We turn now to the discussion on the effect of smoothing parameters on the estimation procedure which depends on the choice of the smoothing parameter of the Loess. To be able to calculate the p-value one needs to construct the null distribution of the test statistics. This can be done only if the smoothing parameter is held fixed for each bootstrap replication (e.g., see Hart 1997, Sect. 6.4). King et al. (1991) proposed to compute the p-values corresponding to several different choices of the smoothing parameter. The plot in which p-values are plotted versus the smoothing parameter is called a "significance trace''. Figure 11 shows the significance trace plot for model 51 for the null hypothesis . For all values of the null hypothesis is rejected (the p-value is below the horizontal line of 0.05). This means that the data do not support the null hypothesis. The same conclusion can be drawn for all other models. The fact that, for all models, the significance trace for the null hypothesis is below the 0.05 line regardless of the choice of the smoothing parameter means that the null hypothesis is rejected for all possible values of the smoothing parameter.
Figure 7: Band 1A: model 51 ( = 4300 K, dex, [Fe/H] = -0.70 dex). V_{t} and the Loess smoother with three values of . | |
Open with DEXTER |
Figure 8: Band 1A: model 26 ( = 4230 K, dex, [Fe/H] = -0.70 dex). V_{t} and the Loess smoother with three values of . | |
Open with DEXTER |
Figure 9: Band 1A: model 52 ( = 4300 K, dex, [Fe/H] = -0.50 dex). V_{t} and the Loess smoother with three values of . | |
Open with DEXTER |
The results in bands 1B, 1D, and 1E are similar: the empirical p-values for all model were either zero or very close to zero. Hence, the null hypothesis in Eq. (8) was rejected for all models in the three bands which indicates that the synthetic spectra do not follow the same pattern as the rebinned observational data.
Figure 10: Band 1A: model 125 ( = 4440 K, dex, [Fe/H] = 0.00 dex). V_{t} and the Loess smoother with three values of . | |
Open with DEXTER |
Figure 11: Significance trace for model 51 for the null hypothesis in band 1A. The dotted horizontal line represents a significance level of 0.05. Whenever the significance trace plot is above the dotted line, the null hypothesis cannot be rejected. | |
Open with DEXTER |
What are the lessons learned from the rejection of the null hypothesis in so many cases? It may be clear that this failure cannot be solved by relaxing the criteria, e.g. by lowering the level of significance . These lack-of-fit tests are an objective tool to demonstrate that there is still too much structure left in V_{t}. This is illustrated, e.g., in Figs. 12-15 where model 39 with a very good goodness-of-fit is depicted in bands 1A, 1B, 1D, and 1E. As can be seen from the upper panel in Fig. 12 the low-excitation ^{12}CO lines are often predicted as being too strong, while it is clearly visible in the upper panel in Fig. 13 that the low-excitation OH-lines are often predicted as being too weak. This systematic discrepancy between observations and theory is captured in V_{t} and its Loess smoother, explaining why the lack-of-fit test rejects the null hypothesis. This systematic problem is not solved by one of the other models in the grid. Neither it is possible to solve this problem by reducing the carbon abundance (C) or enhancing the oxygen abundance (O), since then other molecular features are mispredicted. The described problem may be an outcome of three possible reasons. (1) We should enlarge our vector parameter including not only the effective temperature, the gravity, and the metallicity, but also the carbon, nitrogen and oxygen abundance and the microturbulence, thus enlarging our grid to a 7-dimensional grid. However, some first tests done in the framework of the study in Decin (2000) do show that this inflation of the parameter range does not solve the problem in the case of Boo. (2) Secondly, we have to consider that inaccuracies may occur in the temperature distribution in the outermost layers of the model photosphere (Decin et al. 2003b), indicating that some assumptions, on which the theoretical models are based, are questionable for cool stars. One of the assumptions in the MARCS-code is that radiative equilibrium is required, also for the outermost layers. This implies that temperature bifurcation, caused by e.g. effects of convection and convective overshoot with inhomogeneities in the upper photosphere, cannot be allowed for. Consequently the cores of e.g. the satured CO and OH lines are not predicted with full success, resulting in a systematic pattern in V_{t} and so to a rejection of the null hypothesis. At least for Boo, recent studies done by Ryde et al. (2002) show that the outermost surface layers may be a few hundreds of Kelvin cooler than predicted by the MARCS code. (3) Inaccuracies in the (satellite) data-reduction process result in (non)-rebinned data being systematically off from the "true'' stellar spectrum. A problem with the Relative Spectral Response Function (RSRF) at the shorter wavelenghts of band 1A has already been reported by Decin et al. (2003b). Since the data are divided by the RSRF, a small problem with the RSRF at these places may introduce a pronounced error at the band edge. This kind of data-reduction problem can never be captured by the synthetic predictions, thus resulting in a systematic rejection of the null hypothesis. Using lack-of-fit tests for a sample of standard stars covering a broad parameter space, one can trace calibration problems.
In general, we may conclude that a systematic rejection of the null hypothesis in the lack-of-fit tests is an indication of a still incomplete modelling of all the physical mechanisms determining the spectral footprint in the wavelength range considered or of problems with the data reduction.
Figure 12: Band 1A. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: V_{t} and the Loess smoother. | |
Open with DEXTER |
Figure 13: Band 1B. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: V_{t} and the Loess smoother. | |
Open with DEXTER |
Figure 14: Band 1D. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: V_{t} and the Loess smoother. | |
Open with DEXTER |
Figure 15: Band 1E. Upper panel: comparison between the rebinned data (solid line) and theoretical data of model 39 (dashed line). Lower panel: V_{t} and the Loess smoother. | |
Open with DEXTER |
In the first part of this article (Sects. 3 and 4) we have demonstrated that the use of either a local or a global goodness-of-fit parameter is an important first step to objectively determine the uncertainty range on the estimated parameters. But a very important message is that combining both a local and a global deviation estimating parameter allows us to pin down the parameters with even more certainty. In the test-case of the ISO-SWS data of Boo, we estimate the stellar parameters , and metallicity as ranging respectively from 4160 K to 4300 K, from 1.35 to 1.65 dex, and from -0.30 to 0.00 dex using synthetic spectra calculated from MARCS model atmospheres.
Having determined the "best'' models is however not the end of the story. The use of lack-of-fit tests enables us to detect systematic patterns in the difference between observed and theoretical data. For the case-study of Boo, we obtained that in all the 4 sub-bands the closest synthetic spectra to the observational data are not capable of capturing all the structure from the data, i.e. the "best'' models are not "good'' enough. Both gaps in our knowledge of the physical mechanisms taking place during the life of a star, too simple assumptions in the theoretical modelling, uncertainties in additional stellar parameters - which are now kept fixed - and satellite data reduction problems may result in the rejection of the null hypothesis in the lack-of-fit tests.
As was illustrated by the example of the ISO-SWS data of Boo, the statistical methods presented in this paper for comparing observational and synthetic data provide useful, practical and general tools: (1) to estimate objectively the stellar parameters and their uncertainties from observational data and a grid of synthetic spectra; (2) to refine the uncertainty intervals by combining a local and a global goodness-of-fit parameter; and (3) to trace if our knowledge of the physical mechanisms in a star, of the theoretical (numerical) modelling of the stellar photosphere or of the calibration process still need considerable refinement. The main limitation of this methodology is that measurement errors are still not included. In the following paper in this series (Shkedy et al., submitted) we will use hierarchical Bayesian models for the spectrum. In this approach, the measurement errors will be incorporated in the model as well.
Acknowledgements
L.D. acknowledges support from the Science Foundation of Flanders. Z.S. and G.M. gratefully acknowledge support from the Belgian IUAP/PAI network "Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data".
Non-parametric regression models aim to describe the relationship
between a response variable y and a predictor x. The model has the
general form of
For a further discussion about Loess, which stands for "local regression'', we refer to Cleveland (1979) and the book of Bowman & Azzalini (1997). An intuitive introduction about Loess can be found in Cleveland (1993). The book of (Hart 1997) gives a comprehensive discussion about smoothing and data driven choice of the smoothing parameter.
We applied the following bootstrap procedure in order to calculate the empirical p-values.