A&A 376, 735-744 (2001)
DOI: 10.1051/0004-6361:20010984
N. Bissantz1 - A. Munk2
1 - Astronomisches Institut der Universität Basel, Venusstr. 7, 4102 Binningen/Basel, Switzerland
2 - Fakultät für Mathematik und Informatik der Universität GH Paderborn,
Warburgerstr. 100, 33098 Paderborn, Germany
Received 16 June 2000 / Accepted 12 June 2001
Abstract
The assumption that a parametric class of functions fits the
data structure sufficiently well is common in
fitting curves and surfaces to regression data. One then derives
a parameter estimate resulting from a least squares fit, say, and
in a second step various kinds of
goodness of fit measures,
to assess whether the deviation between data and estimated surface is due to
random noise and not to systematic departures from the model.
In this paper we show that commonly-used
-measures are invalid
in regression models, particularly when inhomogeneous noise is present.
Instead we present a bootstrap algorithm which is applicable in problems
described by noisy versions of Fredholm integral equations of the first kind.
We apply the suggested method to the problem of recovering
the luminosity density in the Milky Way from
data of the DIRBE experiment on board the
COBE satellite.
Key words: methods: data analysis - methods: statistical - Galaxy: structure
Regression problems arise in almost any branch
of physics, including
astronomy and astrophysics. In general,
the problem of estimating a regression function (or surface)
occurs when a functional relationship between
several quantities of interest has to be
found from blurred observations (yi,ti),
.
Here
denotes a vector of
measurements (response vector) and
a quantity which affects the response vector in a
systematic but blurred way, which is to be investigated.
This systematic component is usually denoted
as the regression function
.
Note that
Yi is a random variable, of which yi is a realisation.
If
,
this includes
signal detection problems or image restoration if
.
Many problems bear the additional difficulty that the quantity of interest
is not directly accesible to the observations y and
the relationship has to be expressed by a noisy
version of a Fredholm integral equation of the first
kind, viz.
Reconstruction procedures (estimation) of
in general depend
on various a priori assumptions about
,
such as smoothness properties
or geometrical constraints, e.g. monotonicity.
The most common assumptions are that
has a particular structure and
shape, depending on some unknown parameter
.
Such an assumption is
denoted as a parametric model. Typically, these structural
assumptions arise from physical reasoning and approximation procedures.
Often, however, it is not completely clear whether
these assumptions are satisfied and therefore it is an important task
to investigate empirically (by means of the data at hand) whether the resulting model
is valid. Therefore, in this paper we discuss recent methodology
for the investigation of the adequacy of such a parametric model
,
.
This will be done
for regular regression problems as well as for the inverse case, as in (1).
The paper is organized as follows.
In the next section we briefly review common practices to judge the
goodness of fit of a model U. It is shown that
classical goodness of fit approaches, such as least square statistics
are insufficient from many methodological points of view, particulary when
inhomogeneous noise is present, i.e. the variation
of the error
is expected to vary with the grid point (covariate) ti.
We show in Sect. 2 that statistically valid conclusions about the goodness of fit
from the residuals
(or variants of it) are impossible
in general, particularly when inhomogeneous noise is present, as is the case in
our data example. This is mainly due to the fact that in the inhomogeneous
case the distribution of
depends on the whole vector
which is in general unknown.
Therefore, we suggest in Sect. 3 a measure of fit which is based on "smoothed residuals''
and which allows for the calculation of the corresponding probability distribution.
In Sect. 4, a bootstrap resampling algorithm
is suggested which allows the algorithmic reconstruction of the distribution
of the suggested goodness of fit quantity.
The use of bootstrap techniques is well documented in astronomy (cf. Barrow et al.
1984; Simpson & Mayer 1986; van den Bergh & Morbey
1984 for various applications). The work similar in spirit to ours
is Bi & Börner's (1994) residual type bootstrap, used as a method for
nonparametric estimation in inverse problems. As a byproduct we show, however, that
this residual bootstrap is insufficient in the case of inhomogeneous noise in the data
and a so-called "wild'' bootstrap has to be used instead.
Finally we will apply our new method in Sect. 5 to the fit of the COBE/DIRBE L-band
data. We use a functional form for a parametric model of the MW as presented
by Binney et al. (1997, hereafter BGS) and find similar structural
parameters
of the Milky disk and bulge, except for the scale height of the disk which we find to
be about
smaller.
One of the most popular techniques for finding a
proper fit of a given model U to a given set of data
is to
minimize a (penalized) weighted sum of squares
In general, the most important properties required
of any goodness of fit (GoF) quantity
such as
are that
In order to get a first insight into the probabilistic behaviour of statistics such as
,
used as a quantitative measure of fit, it is helpful to consider
the distribution in the simplest case when
.
A simple calculation then shows
that (assuming a normal distribution
of the data)
is distributed as a sum of normally distributed variables having the
expectation
,
and variance
.
Hence, already in this simple case it can be seen that the determination of the law of
is practically impossible if the variances
are
not known. Then it is difficult to quantify what a "too large value of
" means, because this will depend on the unknown quantities
,
and a rule as in (2) can lead in
principle to any result in favour or against the model
.
We mention that standardisation by the predicted values as in (3)
does not avoid this problem. This is in
contrast to goodness of fit problems
for the assessment of distribution assumptions, i.e.
when one investigates by a
measure whether
a population is normal, say (Cox & Hinkley 1974).
Note, that the case of homoscedastic regression models (i.e. the distribution of the
noise is identical for all data points) is somewhat simpler, because here the expectation
and the square root of the variance
is proportional, i.e. the signal to noise ratio
Many attempts were
made in order to find simple approximations of the distribution for
.
Among them
a quite attractive option is use of a bootstrap method, an algorithmic
approximation of the true law (see Efron & Tibshirani 1993
for an overview and many applications).
Bootstrapping random quadratic forms (such as
)
is, however, a rather delicate matter, because standard
bootstrap algorithms such as Efron's (1979) n out of n bootstrap are inconsistent
(Babu & Shankya 1984; Shao & Tu 1995), i.e. the distribution
is not approximated correctly with increasing number of observations.
The use of a particular bootstrap algorithm
was indeed suggested by Bi & Börner (1994) in the context
of assessing the goodness of fit
in deconvolution models. We mention that their bootstrap
algorithm, however, is asymptotically not correct
in inhomogeneous models.
Interestingly, the suggested algorithm is similar in
spirit to the so called residual bootstrap (i.e. drawing random samples with
replacement from the residuals ri) which is well documented
in the statistical literature (cf. Davison & Hinkley 1997, p. 281) for
the estimation of the regression parameters).
Despite the abovementioned difficulties,
the main problem encountered with the naive use of
in regression models as a measure of GoF is
that asymptotically (here and in the following, asymptotically means
the sample size tends to infinity) the law of
does in
general not converge asymptotically to any reasonable quantity, in contrast
to goodness of fit testing for distributional assumptions. Even after
rescaling by
in order to force the variance
In summary, we see that without explicit
knowledge of the variances
,
the use of
as a quantitative
measure of validity of a model is not appropriate.
Due to the above-described difficulties, statisticians throughout the last two decades
have extensively studied the
problem of checking the goodness of fit in regression models.
It is beyond the scope of this paper to review
this work; many references can be found in the
recent monograph by Hart (1997). Among the variety of
procedures suggested so far, we mention methods
which are based on model selection criteria,
such as Akaike's (1974) information criterion
(Eubank & Hart 1992; Aerts et al. 1999)
and methods which compare nonparametric estimators with a parametric
estimator. To this end Azzalini & Bowman (1993),
Härdle & Mammen (1993)
and Müller (1992) used a kernel
estimator, Cox et al. (1988) smoothing splines and
Mohdeb & Mokkadem (1998) a Fourier series estimator.
However, the applicability of many of these methods is
often limited. For example, Härdle & Mammen's
test is confronted with bias problems, whereas
other procedures are only applicable for homogeneous
errors or when the error distribution is completely known
(Eubank & Hart 1992; Aerts et al. 1999).
Another serious difficulty arises with the nonparametric estimation of
the signal as the dimension k of the grid points increases. This
is sometimes denoted as the curse of dimensionality (Wand & Jones 1995;
Bowmann & Azzalini 1997).
A rough rule of thumb is that the number of observations required
in dimension k is nk in order to obtain the same
precision of the estimate of .
Hence, the precision
induced by 100 observations on the real line is approximately the same as
10000 drawn from the plane. Furthermore,
measurements often cannot be taken equidistantly over a grid, which
leads to sparse data structures causing further difficulties
with increasing dimension.
One should also note that another difficulty consists of
transferring these methods to the case of inverse
problems, a situation which up to now has never been treated.
Munk & Ruymgaart (1999)
have developed a general regression methodology which remains valid in
the heteroscedastic case (i.e. the distribution of the noise depends on
the data point) with arbitrary dimensions of the grid points. The
underlying idea dates back to H. Cramér and
can be summarized as "smoothing the residuals"
in order to obtain asymptotical stabilization of
the test criterion. In our context this reads
as follows. Let T denote an injective smoothing linear
integral operator with associated integral kernel
,
i.e.
The reasoning behind this approach is that
direct estimation of
is a rather difficult
task, whereas estimation of the smoothed
transformation
is much simpler.
Furthermore, the distribution of the minimizer of
becomes tractable.
If we denote the minimum of
as
one can
show under very mild regularity conditions
that (Munk & Ruymgaart 1999) the distribution of
converges to
The true distribution (6) of
depends on the
unknown
.
It is therefore not possible to use
this distribution for practical purposes.
However, it is possible to approximate the
distribution numerically using the following bootstrap
algorithm:
Step 1: (Generate residuals). Compute residuals
Now we construct the empirical cumulative distribution function [ECDF], which
can be taken as an approximation for the right side in (6), because
Munk & Ruymgaart (1999)
have shown that the ECDF, based on
,
asymptotically approximates the distribution
of
.
The ECDF can be obtained by ordering the
values of
increasingly and plotting them
against the value (i)/B, where
(i) denotes the position of
in the ordered sample
.
The so-called estimated
evidence of the model U can now be obtained by determining
the position of the original statistic
in the ordered sample
.
This
is some number
.
From this number one computes
![]() |
Figure 1: Binary probability distribution required in step 2 of the wild bootstrap algorithm. |
Open with DEXTER |
A formal test at significance level
can be performed
when deciding against U if
We would like to close this section by making some remarks
about the applicability of bootstrap algorithms in the
context of goodness of fit, and giving some arguments why
our bootstrap algorithm is valid in the heteroscedastic case.
Stute et al. (1998) have shown that the wild bootstrap is
valid in heteroskedastic models with
random grid points t.
This result can be extended to
deterministic grid points, as is the case in our example,
provided the scheme is not "too''
wiggly (a precise formulation can be found in
Munk 1999), which holds true for the subsequent example.
We mention that an explanation for the wild bootstrap validity
is its automatic adaptivity to inhomogeneous variances, because it
can be shown that the variance in the artifical datapoints
induced by "wild'' resampling (step 2 in our algorithm) yields
The DIRBE experiment on board the COBE satellite,
launched in 1989,
made measurements of the surface brightness in several infrared wavebands
(Weiland 1994). A difficulty with the COBE/DIRBE
data is that it has to be corrected against certain effects.
The most important correction
is the removal of dust absorption. This has been done by Spergel et al. (1996).
We use their corrected COBE/DIRBE L-band data in our fits.
The resolution of the data are
points
in l,b respectively, covering a range
and
.
The points in this two-dimensional grid are equally spaced.
The COBE/DIRBE data have been used to deproject the three-dimensional density of the MW in a number of projects. A main difficulty in recovering the three-dimensional luminosity distribution from the two-dimensional surface brightness distribution of the MW is that it is not a unique operation, in general. One way to avoid this problem is to fit a parametric model to the MW in order to reduce the set of possible models. Several parametric models have been suggested, see for example Kent et al. (1991), Dwek et al. (1995) or Freudenreich (1998). Another approach is to use the non-parametric Richardson-Lucy algorithm for the deprojection of the data (Binney & Gerhard 1996; Binney et al. 1997; Bissantz et al. 1997), in order to reconstruct the luminosity distribution of the MW.
In parametric models of the MW density, about ten "structural parameters'' - including normalisations, scale lengths and geometrical shape parameters of the bulge/bar - are used (see, for example, Kent et al. 1991; Dwek et al. 1995; Freudenreich 1998). In what follows, we assume that these parameters are selected such that the projection of a model onto the sky is an injective operation.
We will first derive a general
mathematical model of the problem of recovering the MW luminosity
distribution from the L-band data.
The projection of a three-dimensional light distribution
to a surface brightness (on the sky) is defined as follows.
Let
be the set of possible luminosity densities of the MW, i.e.
of maps
![]() |
Figure 2: A sketch of the two coordinate systems that we use in this paper. Luminosity densities of the MW are defined in the galactocentric coordinate system x,y,z. Galactic longitude l and latitude bdefine a position on the sky. Together with the distance from the observer r they constitute the observer centered coordinate system. |
Open with DEXTER |
Let
Now, as a first step, we estimate
by
We will now investigate
whether the functional form of the luminosity distribution
of the MW as suggested by BGS
provides a satisfactory fit to the COBE/DIRBE L-band data.
This functional form is a superposition of a double-exponential disk with a truncated
power-law bulge
As a first step we will investigate graphically
whether an inhomogeneous variance pattern has to be assumed,
which is indicated by inhomogeneous squares of residuals.
Figure 3 shows the COBE/DIRBE L-band data and Fig. 4 the residuals
,
with the
model parameters
taken from BGS.
Provided this model holds (approximately)
true, as an important conclusion from Fig. 4 we find
strong indication for inhomogeneous noise. Interestingly, towards the boundary
of the observed part of the sky, the variability of the observations increases.
Figure 6 shows the difference between the
model and the data including the algebraic sign of the difference. Note that
the error distribution is obviously inhomogeneous, both in the logarithmic
magnitude scale plotted in the figures and in a linear scale.
Further note there is a systematic dependence of the sign of the deviations
on the position on the sky, whereas the model fits well in the central part
of the observed part of the sky.
This is indication that the MW disk shows deviations from an exponential
-dependence.
![]() |
Figure 3: COBE/DIRBE L-band data. Contours levels are given in magnitudes. Note that contour levels are only defined up to a common offset. |
Open with DEXTER |
![]() |
Figure 4:
Square difference between COBE/DIRBE L-band data logarithmic
surface brightness (magnitudes) and the parametric model with the
parameters from BGS.
Contours levels are
![]() |
Open with DEXTER |
![]() |
Figure 5:
The smoothed COBE/DIRBE L-band data. This is the observed
data in the sky region
![]() ![]() ![]() ![]() ![]() |
Open with DEXTER |
Parameters | z0 | z1 | ![]() |
![]() |
d | ![]() |
![]() |
![]() |
b |
![]() |
"Our'' | 0.162 | 0.042 | 0.27 | 2.56 | 0.41 | 0.502 | 0.59 | 1.90 | 306.1 | 0.86 |
"BGS'' | 0.21 | 0.042 | 0.27 | 2.5 | 0.463 | 0.5 | 0.6 | 1.9 | 234.4 | 0.80 |
We now determine the best-fit model parameter
by
minimisation of
.
We use the parameters found by BGS as starting values
for our minimisation algorithm.
Due to the increasing noise towards the boundary of the observed part of the sky,
we restrict the region of the surface brightness data used in the fit
to the region
and
.
This is done to downweight those parts of the
sky where noninformative parts in the data are expected (see Fig. 4).
Figure 5 shows this data after it has been smoothed with
the smoothing operator
.
Note how much smoother the smoothed data appears
compared to the original COBE/DIRBE L-band data.
Our computational strategy consists of two steps.
![]() |
Figure 6:
Difference between COBE/DIRBE L-band data logarithmic
surface brightness (magnitudes) and the parametric model
from BGS in magnitudes. Negatives values for the contour
levels indicate that a model is too bright as compared to the data. The contour
levels are chosen such that their squares are equivalent to ![]() ![]() ![]() |
Open with DEXTER |
Applying the bootstrap algorithm presented in Sect. 4 to
our model we
find that
,
which indicates
no significant evidence against this model.
For the parameters found in BGS
a value
is obtained which yields a slightly worse
fit. Note that, at a first glance,
this statement is in contradiction to the argument
given by BGS in the last paragraph of their p. 366. They pointed
out that a graphical inspection of residuals
suggests that for the model considered,
some local regions of the sky seem to
show systematic differences between their model and the observed data.
As a conclusion we find that the proposed
method is not capable of concluding
that these local deviations between model and data are due
to systematic deviations.
As pointed out by the referee this might be due to
lack of power of the proposed method,
because an additional smoothing step was proposed.
Indeed, this corresponds to some theoretical results
concerning the asymptotic efficiency of the proposed method
(Munk & Ruymgaart 1999).
In fact, a more powerful method
could result from chosing a data-driven smoothing operator
,
similar
to bandwidth selection in kernel regression.
The main difficulty which arises is a different
limit law compared to the case discussed
in the present, where
is fixed.
However, this is beyond the scope
of this paper and will be an interesting topic for further research.
It can be seen from Table 1 that the main difference between the
two models is that our model has a lower disk scale height z0.
The value
was found to be slightly better for our new model compared to
the BGS parameters. However, recall that
we used only a part of the COBE/DIRBE L-band surface brightness
data in our fit.
We have
argued that classical measures of goodness of fit adopted from
checking distributional assumptions
can be misleading in the context of (inverse) regression.
Particularly, an inhomogeneous noise field can
inflate the precision of common
quantities.
For this case, a new method was proposed for noisy Fredholm equations of the first kind
by Munk & Ruymgaart (1999). As an example for the application of the
suggested algorithm, we use the problem of determining the luminosity density in the
MW from surface brightness data. From this we have found that the parametric model in
Binney et al. (1997) can be improved slightly and gives
a satisfactory fit of the COBE/DIRBE L-band data in a range of
.
Acknowledgements
The authors are indebted to the organizers F. Ruymgaart, W. Stute and Y. Vardi of the conference "Statistics for Inverse Problems'' held at the Mathematical Research Center at Oberwolfach, Germany, 1999. The present paper was essentially initiated by this meeting. We would like to thank O. Gerhard and the referee, P. Saha, for many helpful comments. N. Bissantz acknowledges support by the Swiss Science Foundation under grant number 20-56888.99.
We mention that our procedure can also be performed with any other
smoothing kernel T. This will also yield
in general different values of .
In principle, a valid option is any
injective Operator
.
A good choice of
,
however,
is driven by various aspects, such as efficiency or simplicity.
An extensive simulation study performed in Munk & Ruymgaart (1999),
reveals the kernel
as a reasonable choice which
yields a procedure capable to detect a broad range of deviations from
U. See, however, the discussion in Sect. 5.
A particularly simple choice in noisy inverse models