A&A 468, 501-514 (2007)
DOI: 10.1051/0004-6361:20064927
The XMM-Newton extended survey of the Taurus molecular cloud
K. Arzner1 - M. Güdel1 - K. Briggs1 - A. Telleschi1 - M. Schmidt2 - M. Audard3 - L. Scelsi4 - E. Franciosini4
1 - Paul Scherrer Institut, 5232 Villigen, Switzerland
2 -
Institute for Data Analysis and Process Design, Zurich University of Applied Sciences, Postfach 805, 8401 Winterthur, Switzerland
3 -
Columbia Astrophysical Laboratory, 550 West 120th St, MC 5247, New York, NY 10027, USA
4 -
Dipartimento di Scienze Fisiche e Astronomiche, Piazza del Parlamento 1, 90134 Palermo, Italy
5 - INAF-Osservatorio Astronomico di Palermo, Piazza del Parlamento 1, 90134 Palermo, Italy
Received 27 January 2006 / Accepted 6 September 2006
Abstract
Traditional binned statistics such as
suffer from information loss and arbitrariness of the binning
procedure, which is especially important at low count rates as encountered in the XMM-Newton Extended
Survey of the Taurus Molecular Cloud (XEST). We point out that the
underlying statistical quantity (the log likelihood L) does not require any binning beyond
the one implied by instrumental readout channels, and we propose to use it for low-count data.
The performance of L in the model classification and point estimation problems is explored
by Monte-Carlo simulations of Chandra and XMM-Newton X-ray spectra, and is compared to the
performances of the binned Poisson statistic (C), Pearson's
and Neyman's
,
the Kolmogorov-Smirnov, and Kuiper's statistics.
It is found that the unbinned log likelihood L performs best with regard to the expected chi-square distance between
true and estimated spectra, the chance of a successful identification among discrete candidate models,
the area under the receiver-operator curve of reduced (two-model) binary classification problems,
and generally also with regard to the mean square errors of individual spectrum parameters.
The
(
)
statistics should only be used if more than 10 (15) predicted counts
per bin are available. From the practical point of view, the computational cost of evaluating L
is smaller than for any of the alternative methods if the forward model is specified in terms
of a Poisson intensity and normalization is a free parameter. The maximum-L method is applied to 14 XEST observations,
and confidence regions are discussed. The unbinned results are compared to binned XSPEC results,
and found to generally agree, with exceptions explained by instability under re-binning and by
background fine structures. In particular, HO Tau is found by the unbinned method to be rather
cool (
keV), which may be a sign of shock emission.
The maximum-L method has no lower limit on the available counts, and allows to treat weak sources
which are beyond the means of binned methods.
Key words: methods: statistical - X-rays: stars - stars: formation
Energy histograms are often used as spectrum estimators. While this procedure has the advantage of simplicity, the grouping of counts into a histogram is also associated with information loss. This becomes especially important if only few counts are available, so that the spectral fine structures are sparsely sampled by the observed counts. Such a situation arises in the XMM-Newton Extended Survey of the Taurus Molecular Cloud (XEST; Güdel et al. 2007a), thus prompting the search for alternative unbinned methods.
Another motivation for histogram formation is to use
as a simple and simply coded measure of agreement
between theory and observation. The role of
is thus to provide a plausibility ordering of
alternative models, and to assess their absolute credibility. Since the histograms follow a multinomial
distribution, which becomes gaussian only in the limit of infinite sample size, the use of
represents an approximation and corrections must be applied for small n(e.g., Kendall & Stuart 1958; Wachter 1979; Nousek 1989; Mighell 1999;
Arzner 2004). Alternatively, one may use the binned Poisson (C) statistics.
There exists, however, a simpler solution.
Namely, the relevant statistical quantity, the likelihood function, can be defined for an unlimited instrumental resolution without any reference to binning. Using this unbinned expression avoids arbitrariness of histogram formation, and thus a potential source of discrepant results. It has been successfully applied to ROSAT (Boese & Doebereiner 2001) and EGRET (Digel 2000) observations. Related approaches invoke piecewise-constant intensity models (Scargle 1998; see Stelzer et al. 2007, for an application to the XEST) or transformations to uniform null hypotheses (Kinoshita 2002), and applications in other fields than astronomy include particle physics (Baker & Cousin 1984) and medicine (Miller et al. 2002). While the C statistic is often used in astronomical applications (e.g., Dolphin 2002, for star formation statistics; Babu & Feigelson 1996, for a general overview), the unbinned (exact) Poisson likelihood is less often used. Alternative unbinned methods such as the Kolmogorov-Smirnov or Kuiper's statistics have the advantage of being (asymptotically) model-independent, which, however, also entails sub-optimal performance if the modeling was correct.
In this article we revisit the issue by Monte-Carlo simulations of XMM-Newton and Chandra CCD spectra of the Taurus Molecular Cloud (TMC). By simulating counts from a known spectrum, and applying different statistics in order to find the best-fit model, we assess the performances of the various statistics, and thereby gain a more differentiated picture. The present study represents an extension of the work of Nousek (1989) and Wachter et al. (1979) to more complex spectral models and a broader range of statistics.
The article is organized as follows. Section 2 introduces the models for sources and background spectra. Section 3 discusses the counting statistics. Section 4 describes measures af agreement between models, and between models and data, which can be used to assess the performance of various statistics. Section 5 describes the Monte-Carlo simulations and their numerical results. Real-data applications from the TMC are presented in Sect. 6, where also the issue of confidence regions is briefly touched, and the unbinned estimates are compared to the binned estimates from the XSPEC software. Section 7 contains a summary and conclusions.
We start with introducing the spectral models f(E) used in this study, which represent the expected number of counts per unit energy.
In modern observations, E can usually not assume continuous values but is
restricted to a discrete set of instrumental output channels. For example, the PN detector of XMM-Newton has by default
such channels; Chandra/ACIS has
.
In both cases,
the channel separation
is much smaller than the instrumental resolution as given by
the spectral response matrix, so that f(E) is fully resolved. In order to stress the quasi-continuous
nature of the channel coverage, we shall write
rather than
(
)
unless stated otherwise. The channels are also
much finer than the bins typically used when computing the C or
statistics. In this sense,
methods which directly use the energy channels will be referred to as "unbinned'', whereas
methods which group the channels first into larger "bins'' will be referred to as "binned''.
In what follows we exclusively work with counts and focus on the problem of finding the best spectral model for a given set of counts. Accordingly, we do not consider here the problem of spectral deconvolution, but absorb the instrumental response in the forward model f(E). The symbol E thus denotes the observed (channel) energy rather than the incoming photon energy.
Our source models assume absorbed collisional ionization equilibrium
(APEC; Smith et al. 2001, as implemented in XSPEC; Arnaud 1996), convolved
with the instrumental response. The templates are defined on the
instrumental channels, and
depend on only two physical parameters: the (electron) temperature kT in units of keV, and the hydrogen
column density
in units of 1022 cm-2. This simple model is adapted to the faint sources
addressed by the L statistic. The parameters
are specified on a double logarithmic lattice with
sufficiently dense coverage that intermediate spectra may be interpolated with error
(see Sect. 4.1). This gridding is used for simulations only.
The total number of expected counts
is an additional parameter.
The normalized templates are displayed in Fig. 1 (top panel, selection).
For better physical clarity, the x-axis refers to channel energy rather than to channel number.
In the Monte-Carlo simulations, atomic lines are included assuming a fixed
metallicity of 0.2 times the solar abundances of Anders & Grevesse (1989), and we use the Chandra
rather than the XMM-Newton instrumental response because of its smaller number of instrumental channels,
which accelerates the simulations. This choice applies to the simulations only, and we return to the XMM-Newton instrumental response and to a more refined abundance model when dealing with real XEST data. The function f(E) contains a background which is interpolated from calibration
observations (see Sect. 2.2). The astrophysical relevance of the parameters (
)
is elucidated in the
accompanying article of Güdel et al. (2007a). From an empirical point of view, the
effect of large
is to suppress (absorb) the spectrum from at low energies, and the effect of large kT
is to enhance it mostly at high energies.
Although Chandra, owing to its high spatial resolution, has little background, the present simulations
include an additive background for the sake of generality and applicability to XMM-Newton data. The background is included
in the forward model according to
![]() |
Figure 1:
Top: normalized source models used in this study. An example model (black) and its neighbours
(dark gray) are highlighted for better clarity. Inlet: the example model and its neighbours
in parameter space
|
| Open with DEXTER | |
To summarize, we use a smooth density estimator for the background which interpolates the observed background counts, preserves their total number, and has the (approximately) correct instrumental resolution. The normalized background spectrum of a typical Chandra/ACIS observation is shown in Fig. 1 (bottom panel). The peak at 1.8 keV is due to silicon in the CCD.
Photon counting experiments such as XMM-Newton and Chandra follow the Poisson statistics, which is briefly recalled here. More detailed discussions may be found e.g. in Feller (1968), Eadie et al. (1971), Santaló (1976), Wachter (1979), Reiss (1993), and Protassov et al. (2002).
We consider throughout a non-homogeneous Poisson process in an interval
of the real axis
with finite intensity f(E). Generalizations to (pointwise) infinite f(E) and higher dimensions can be found e.g.
in Reiss (1993). The Poisson process is then characterized by the following two properties: (i)
the probability of finding nj counts in an sub-interval
is
When counts from different intensity functions fk(E) are added, the
resulting process is again Poissonian with intensity
.
This fact may
be used to include an observational background, and to construct a Poisson process of
arbitrary overall expectation
The likelihood function is defined as the probability of finding a certain observation given the true model
from which the observation derives. Given the observed sample size n, the likelihood of
the observation
is
As outlined above, the normalization
factorizes out, and can be estimated from the total
number of observed counts alone. We shall do this using the maximum-likelihood estimator (i.e., that
which maximizes
). Since the background contribution
is
known (Sect. 2.2), only the source contribution
needs to be estimated;
this will be done using the maximum-likelihood estimator
The choice of optimal bins (Schott 1992) is an important issue which affects the outcome of the binned methods. Histograms can have uniform or non-uniform bins. While uniform (equal-size) bins have the advantage of being independent of the data, they are also less adapted and may contain very few counts. Non-uniform bins are usually chosen to contain the same (predicted) probability mass or observed counts.
For either (uniform or non-uniform) type of bins, one needs to decide on the number of bins or
the average number of counts per bin. A well-known
method to choose the number
of uniform bins is Sturges' rule (Sturges 1926)
In our simulations, the optimal number n* of counts per bin is either taken according to Eq. (9)
as
,
or according to Eq. (10) as
where the (unknown) true R(p') has been replaced by an average
over all models under consideration.
The choices based on Eqs. (9) and (10) will be referred to as Sturges' and AMISE methods.
Alternatively, n* is fixed at a given value, similar as in the standard XMM-Newton and Chandra data analysis software.
This will be referred to as fixed-n* method; a typical choice is 10 counts per bin. (In XSPEC,
this becomes
10 cts/bin for high-intensity flares which are not considered here.) For the problem
considered here (
cts/keV3;
ranging from a few 10 to a few 100), the AMISE method
gives less than 10 counts per bins, whereas Sturges' method gives a few to a few ten counts per bin.
Once the number of counts per bin n* is chosen, the binning itself is performed using either uniform or non-uniform bins. For the case of non-uniform bins we use by default bins with equal numbers of observed counts, with the k-th bin running from Ek n* to E(k+1)n*. This simple method is not optimal but adopted for consistency with the standard analysis software. Alternative methods have also been considered, which are based on equal predicted bin content, and found to yield similar results.
Next we define measures of agreement between different models, and between models and observations. These will allow to estimate models from the data, and to assess the discrepancy between true and estimated models in the subsequent Monte-Carlo simulations.
Different sets of parameters
result in different model spectra, whose
discrepancy can be measured by a suitable distance in the space of predicted spectra (f-space).
It is important to realize that it is the distance in f-space which is relevant for the observational
discrimination between alternative models, and not the (say, Euclidean) distance in
parameter space, because the latter may degenerate solely due to ambiguous parameterization.
Recalling that f(E) is the intensity of a Poisson process, and assuming that f1(E) is the true and f2(E) is the estimated spectrum, we use here the chi square distance (e.g., Gibbs 2002)
After having defined a measure of agreement between different models, we shall specify measures of agreement between models and observations. To this end, we consider the unbinned likelihood (Eq. (6)) together with a selection of some of the statistics which are often used in an astrophysical and astronomical context (e.g., Gosset 1987; Wachter et al. 1979; Nousek & Shue 1989; Babu & Feigelson 1996; Mighell 1999; Metchev 2002; Paltani 2004):
It should be pointed out that the selection of statistics (Eqs. (12)-(17)) is aimed at our astrophysical
application of estimating X-ray spectra, and is not free of personal bias.
The L statistic is the one which is primarily addressed by the present article.
The C and
statistics are implemented in the standard XSPEC software pakage; they are included
for benchmarking and comparison of observational results. The
statistic is an obvious variant of
,
which is obtained as the asymptotic limit of the binned Poisson log likelihood (cf. Sect. 3.2).
Among the unbinned statistics, the Kolmogorov-Smirnov D is probably the most fundamental one
used in astrophysics. The Kuipers statistic V has been included as an example of an improved version of D.
Others (such as the Anderson-Darling statistic) could have been included as well, but we decided to restrict
the list in order not to overload the diagnostics.
In order to treat all statistics (12)-(17) on the same footing, the source normalization
is always estimated from Eq. (7), and only the shape parameters (
)
are estimated differently for
each statistic. The case
is rarely met in practice.
Equation (11) can be used to measure the discrepancy between true and estimated models, and thus to assess the performance of the statistics (Eqs. (12)-(17)). Alternatively, their performance can also be characterized in terms of the classical receiver-operator characteristics (ROC; Peterson et al. 1954; Hanley & McNeill 1982; Michel 2003). This diagnostics applies to the binary classification problem, and visualizes the trade-off between the reliability and completeness of detections. In its standard formulation, the binary classification problem considers two types of objects (healthy and ill patients, for example) with a real-valued attribute. The attribute is a random variate, the probability density of which depends on the object type. The task is to classify the objects by their attributes, which is most simply done by applying a threshold. The classification is only unique if the probability densities of the attribute do not overlap. There are four possible outcomes of the classification procedure, with frequencies indicated in brackets: a type 1 object may be correctly classified as type 1 (n11) or erroneously as type 2 (n12), and a type 2 object may be correctly classified as type 2 (n22) or erroneously as type 1 (n21). From these frequencies we define the sensitivity n11/(n11+n12)and the specificity n22/(n21+n22). (The asymmetric naming stems from associating type 1 and 2 with asymmetric entities like healthy and ill patients.) Let us assume that a low attribute is typical for type 1, and a high attribute is typical for type 2. If the threshold is high, only few objects are accepted, but most of them are correctly classified as type 2. However, the bulk of type 2 objects is missed since it interferes with type 1. Thus the specificity is high but the sensitivity is small. Conversely, a low threshold detects all type 2 objects, but at the expense of type 1 contamination. In this case, the sensitivity is high but the specificity is low. As the threshold is varied, it traces out a trade-off between specificity and sensitivity that is traditionally displayed in a (1-specificity) versus sensitivity diagram, equivalent to false positive rate versus true positive rate. This graph is called the receiver-operator characteristics (ROC).
In order to apply the ROC procedure to the XMM-Newton spectrum estimation problem, we restrict the latter to only
two possible models f1 and f2, and use the log likelihood ratio L2 - L1 as an attribute, and
similarly C2-C1,
,
,
D1-D2, and V1-V2.
By convention, a low attribute thus hints to model 1 and a high attribute to model 2. We then choose at random
one of the two models, generate an eventlist, and compute the above attributes. This
procedure is repeated some 105 (=
)
times, and the sensitivity and specificity are
computed for varying thresholds. The resulting ROC curves are shown in Fig. 3. The two models f1 and f2 have the same hydrogen column density
and normalization
but their temperatures
differ by a factor two,
kT1 = 1.4 keV and
kT2 = 2.5 keV. Such a difference is within astrophysical expectations.
The chi square distance between the two models is
.
Different curves represent
different statistics. Ideally (100% sensitivity and 100% specificity), the ROC curve would go through the upper left corner (0,1) of Fig. 3, whereas a completely
non-discriminating test would yield a straight line along the diagonal from (0,0) to (1,1). The actually
realized curves are between the two extremes, with the exact likelihood coming closest
to the ideal point (0,1).
The ROC curve can be used to assess the performance of the statistics (Eqs. (12)-(17)) in the
binary classification problem. A commonly used performance measure is the area under the ROC curve (AUC).
Like the chi square distance, the AUC gives a measure of observable discrepancy between two models.
For Fig. 3 we obtain the AUC's 0.761 (L), 0.705 (C), 0.679 (
),
0.655 (
), 0.699 (D), and 0.727 (V), thus establishing the ordering
.
![]() |
Figure 2:
Schematic construction of a continuous non-homogeneous Poisson process by the rejection method:
out of |
| Open with DEXTER | |
In order to explore the performance of the unbinned likelihood and compare it to the alternative statistics
we have conducted extended Monte-Carlo simulations.
The simulations involve two general steps. In the first step, a model is chosen at random and an event list is generated
by either the inversion or rejection method (Devroye 1986). The inversion method
is described in Appendix A, and the rejection method is illustrated in Fig. 2.
The inversion method is faster and thus used in most simulations; both methods have been checked against
the analytical Poisson probabilities.
In the second step, we compare the event list with models which are close enough to the true model
in a
sense in order to be potentially successful candidates. This is done using all statistics listed
in Eqs. (12)-(17); the decision rule is to accept the model with largest (L, C) or
smallest (
). The above two-step procedure is repeated for many realizations of the
event list, and several diagnostics are applied. When the parameter
is varied in the simulations, this is done by taking it uniformly
distributed out of the indicated interval. The background
is kept fixed. Several runs
addressing different aspects have been performed.
![]() |
Figure 3: Receiver-operator characteristics for the binary classification problem with only two models (inlet). |
| Open with DEXTER | |
In a first family of simulations, we investigate the minimum chi-square distance between two models which allows a safe distinction on grounds of the observed event list.
To this end, we work with the discrete set of models of Sect. 2.1 and call an estimate a
"successful identification'' if the estimated model equals the true model. We are interested in the chance of a
successful identification, and proceed as follows. For each realization
of the eventlist drawn from a true model f1,
the candidate models
are ordered by increasing
,
and examined sequentially on grounds of the
statistics (12)-(17). The first candidate is always taken as the true model.
As the number m of candidates increases, the initial
(successful) estimate may be abandoned in favour of a wrong estimate. Let us assume that this happens,
under statistics
and for the realization
,
at the k-th candidate and define
to be one
for m = 1 ... k and zero for m > k. When the simulation is repeated for a large number of
realizations
(using different f1), the quantity
with
gives the probability of a successful identification in a search
over m
-ordered candidates using statistics
.
An example of
is shown in Fig. 4 (left), with different curves referring to different
statistics
.
The binned methods invoked Sturges' rule to determine the number of counts per (equal-size) bin.
As increasingly unlikely candidates become included, the
converge to a constant
values. This holds true for all statistics; however, there is a clear performance
ordering
.
The exact likelihood performs best, followed by the Kolmogorov-Smirnov distance.
While the quantity
directly relates to the simulation procedure, the number m by itself is not
of interest and should be replaced by
in order to obtain a more meaningful characteristic.
This is achieved by replacing the order indicator
by a distance indicator
which switches from unity to zero at the distance of the first erroneously accepted candidate,
and proceeding in the same way as for
.
As a result we obtain the probability
of a successful identification in a
-ordered search among (discrete) candidates with maximal distance
from the true model. The graphs of
are shown in the right pannel of Fig. 4,
referring to the same simulation as Fig. 4 (left).
Like
,
converges at large
to a constant value which depends only on the
statistics used. The performance ordering derived from
is the same as for
.
| |
Figure 4:
The probability of successful identifications versus the number of
|
| Open with DEXTER | |
Aside from the success rates one may ask for the error in parameter- and f-space if the identification
fails. The closer the estimated solution is to the true one, the better the method.
Figure 5 displays the residuals in f- and parameter space for the same
situation as in Fig. 4 (left); the quantities (
)
are
the standard deviations between true and estimated model parameters. They give the typical accuracy of best-fit
parameters, averaged over the whole parameter space. As can be seen from Fig. 5,
the exact likelihood yields the smallest
(top left) and also
the smallest parameter errors (
). The error in the normalization estimate
(Eq. (7)) is simply
and is not shown.
The stabilization of the curves observed in Figs. 4 (right) and 5 (top left)
can be used to define a maximum chi-square distance
between true and estimated models above which
mis-estimation becomes highly unlikely. From Fig. 5 (top left)
we find that
stabilizes around 3, whereas Fig. 4 (right)
indicates that erroneous estimates almost never occur for
.
This behaviour is
generic and holds true for a wide range of model parameters (Fig. 6), suggesting that
the chi square distance is a useful parameter-free measure of distance between two Poisson intensities.
We thus select
![]() |
Figure 5: Average f-space ( top left) and parameter ( other panels) deviations between true and estimated models. The number m of candidate models delineates the tested volume in parameter space. |
| Open with DEXTER | |
| |
Figure 6: Average chi-square distance between true and estimated models as a function of the true model parameters (marginal distributions). |
| Open with DEXTER | |
In a second family of simulations, the restriction to the discrete models of Sect. 2.1 is relaxed
by interpolating the models in
-space. Equation (18) is then used to limit the search for best-fit
candidates. The number m of candidate models thus delineates the density of models in parameter space,
and as m is increased, the best-fit candidates converge to the best-fit solutions.
Figure 7 shows the result of this density exploration.
Note that the
curves converge at large m
to the same values as in Fig. 5, confirming that
and
the maximal m are ample and do not effect the outcome of the simulation.
![]() |
Figure 7: Similar to Fig. 5 but with m proportional to the density of candidate models in parameter space. |
| Open with DEXTER | |
![]() |
Figure 8:
The effect of binning on the performances of (
|
| Open with DEXTER | |
Up to here the binned methods invoked only Sturges' rule (Eq. (9)) for better
comparability. In this Section, the effect of different binning methods and of the bin size is explored.
The case of a single bin must be excluded, since the predicted bin content
is then insensitive to the parameters kT and
.
In order to demonstrate the effect of the bin size we arbitrarily vary the number of bins
(Fig. 8). Black curves refer to equal-content bins, and gray refer to
equal-size bins. The curves represent averages over
and
.
At small
(many counts/bin), the (
)
curves collapse for both
equal-content and equal-size cases. At large
(few counts/bin), the performance of
C monotonically approaches the L limit (black solid line). For large
,
the
and
statistics are not applicable due to low count rates. As a consequence,
their performance does not approach the C and L limit with increasing
.
This is especially pronounced for the equal-sized bins, which generally perform worse than equal-content bins.
From Fig. 8 one may deduce the minimum number of counts
per bin at which the chi square statistics apply: the increase of the (
)
curves at
indicates that at least some 5-20 counts per bin should be present. By narrowing and shifting
the
-cut one may explore the
-dependence of the turnaround, and thereby find that
and
require about 10 and 15 counts per bin, respectively.
![]() |
Figure 9:
Dependence of the estimation errors on the true expected counts |
| Open with DEXTER | |
Complementary to Fig. 8, Fig. 9 shows the effect of
,
averaged
over
.
Again,
is independent of
for demonstration purposes. The bins have equal content.
The f-space errors (top left) are found to be weakly dependent on
except for the
statistics, where the neglect of empty bins has a significant effect at low count rates (
).
The errors
and
scale like
,
which is inherited from the
the normalization error
(not shown).
We dismiss now the ad-hoc choice of
used in Figs. 8 and 9, and return
to the better adapted binning methods of Sect. 3.3. The number of counts per bin is
thus determined from the Sturges' or AMISE rules, or fixed at n* = 8; the bins have either equal
size or equal (observed) content. The result of this simulation family is summarized in Table 1, showing the performance ranking based on the mean chi square distance between
true and estimated models. All simulations involve 105 realizations, and the true
source normalization is either
or
,
while the parameters
(
)
are continuous (interpolated). A background of
has been included for the sake of generality, so that both background-dominated and source-dominated
situations are addressed. From Table 1 it is seen that
the exact likelihood (L) performs best and the Kolmogorov-Smirnov distances (D) performs worst. The
relatively bad performance of D compared to Sect. 5.1 is found to be caused by the
background. Pearson's
generally performs
better than Neyman's
;
Kuiper's V performs better than Kolmogorov's D.
A detailed comparison of pairs of simulations differing only by the count rate reveals
that the chi square distance between true and estimated models is always smaller for
than for
,
which is expected since more counts allow a sharper distinction of
the models. Equal-content bins generally perform better than equal-size bins.
When the area under the operator-receiver curve (AUC; Sect. 4.3) is used as an alternative performance measure
instead of the average chi square distance between true and estimated models, we obtain the results of Table 2.
This is similar to Table 1 but with
indicating that
and
indicating
(like small
,
small
indicates close agreement of
true and estimated models).
The average is over 104 pairs of models covering the (continuous) parameter space (
), with each
pair of models being probed by
realizations of Poisson data.
Although the AUC criterion is quite different in spirit from the chi square distance criterion
(binary classifier versus estimation problems), the performance orderings obtained by the two methods are
surprisingly similar. In particular, the unbinned likelihood (L) performs always best, while the
internal performance ordering of (
)
partially changes.
It was verified that L performs best not only on the average, but also for all model pairs individually.
Table 1:
Performance ordering of the statistics (Eqs. (12)-(17)), based on the average chi square
distance between true and estimated models. The notation
indicates here that
,
and
indicates that
.
The labels "e.s.'' and "e.c.'' refer to "equal size'' and "equal count'' bins, respectively.
.
Table 2:
Similar to Table 1, but using the area under
the receiver-operator curve (Fig. 3) as a measure of performance.
.
Figures 4-9 did not include background (
).
In order to investigate the effect of the background we have repeated the simulations with variable
background ratio
.
It turns out that the presence of a background does not affect superior performance of
the unbinned likelihood L. There are, however, differences in the performance ranking of the other statistics;
in particular, the unbinned D and V statistics are found to be degraded by the background.
We shall not show here the full diagnostic applied to the zero-background case, but restrict ourselves
to an exemplary result.
Figure 10 shows the number of successful identifications versus the
number of models at choice for
,
and is to be compared to Fig. 4. The lead of the L statistic is even more pronounced in the presence of background;
the performance ordering derived from Fig. 10 is
.
| |
Figure 10:
Similar to Fig. 4, but including background (
|
| Open with DEXTER | |
After establishing the superior performance of the unbinned likelihood by Monte-Carlo simulations, we
apply it to actual XMM-Newton observations of the TMC. The data set (Table 3)
comprises 14 objects, and has been selected for low numbers of observed counts (
)
and simplicity of the
spectra so that parameterisation by (
)
is adequate. Also, we have as far as possible avoided cases
where the background is not well known, and where the maximum-likelihood parameters lie on the boundary of
the XSPEC parameter space, which would require a more careful analysis (Protassov et al. 2002).
"Bad'' time intervals with increased background were omitted.
The source models are similar as in Fig. 1 (top) but adapted for the XMM-Newton instrumental
response and background. The background spectrum is estimated according to the procedure of Sect. 2.2, setting
to characterize the approximate XMM-Newton/EPIC resolution. Only the PN detector of the XMM-Newton/EPIC
instrument is used. The abundances, relative to solar
values (Anders & Grevesse 1989), are representative for highly active young stars
with inverse FIP effect (Telleschi et al. 2005, Scelsi et al. 2005, Garcia-Alvarez et al. 2005): He:1.0 - C: 0.45 - N: 0.788
- O: 0.426 - Ne: 0.832 - Mg: 0.263 - Al: 0.50 - Si: 0.309 - S: 0.417 - Ar: 0.55 - Ca: 0.195 - Fe: 0.195 - Ni: 0.195. The total exposure time
used for the analysis is in the order of a few 10 kiloseconds,
and is given in Table 3. The best-fit fluxes are converted into luminosities LX
using XSPEC and assuming a distance of 140 pc to the TMC.
![]() |
Figure 11:
XMM-Newton/EPIC observation of Haro 6-13, comprising 128 counts. Solid
line: maximum-L solution
|
| Open with DEXTER | |
An example observation is shown in Fig. 11, using data of Haro 6-13.
The source extraction region contains 128 counts between 0.3 and 7.5 keV after elimination of
bad time intervals, whereas the corresponding (scaled) background contribution is
counts. The observed counts are marked by ticks in Fig. 11, while the background spectrum
is shown by the dashed line. Energies below 0.3 keV are discarded. The maximum-likelihood
estimator for the source normalization is
counts.
The unbinned log likelihood is computed inside the parameter range
of Fig. 11 (inlet), and attains its maximum at kT = 3.03 keV,
.
These values are marked
by crosses in the inlets of Fig. 11, and the corresponding best-fit model
is shown by solid line (main panel). For comparison, the
corresponding minimum-
parameters, using 10 equal-content bins,
are given by kT = 2.55 keV,
.
The minimum
- and maximum L-estimates thus agree within errors (see below), but are not equal. The parameters
do not follow an equally simple trend, and are more sensitive to the statistics used. The other
best-fit parameters listed in Table 3 have been obtained in a similar
way as for Haro 6-13.
Table 3:
Maximum-L parameters deduced from XMM-Newton observations of
the TMC. Superscripts a) and b) refer to different observations of the same object;
is the exposure time, and
is the observed number of counts
(corresponding to
). Errors outside brackets represent projections of the 68% likelihood
surfaces (Fig. 11 inlets); errors in brackets are obtained from Monte-Carlo simulations
(Sect. 6.3). Luminosities LX refer to
= 0 and the energy band from 0.3 to 10 keV.
A rough estimate on the errors of the best-fit parameters can be obtained from the quantity
,
which is asymptotically chi square distributed with the number of degrees of
freedom equal to the number of model parameters. Here,
is the maximum achievable log
likelihood, which is associated with the full parameter space of Wilks theorem (Wilk, 1938).
By thresholding
at the
-quantiles of a chi square distribution with 2 degrees
of freedom, accounting for the parameters
,
and 1 degree of freedom, accounting for the parameter
,
approximate confidence domains in parameter space are obtained. They are shown by solid contours in
Fig. 11, representing cuts through the likelihood surfaces at confidence levels 68% (boldface),
90%, and 99%. The first errors given in Table 3 (outside brackets) represent projections
of the 68% confidence surfaces obtained in this way. Note that the absolute credibility of the best-fit solution
(i.e. the value of
)
is not addressed by the statistic
,
which
decouples the goodness-of-fit problem from the confidence domain problem.
The parameter errors can also be predicted from Monte-Carlo simulations alone, similar as in an instrumental design
study. To this end, the true and best-fit parameters (under L statistic) of more than 106 samples
are classified according to the best-fit parameters. For each class (containing about 100 samples),
the means (b) and standard deviations (
)
of the differences between true and best-fit parameters are
evaluated, and taken as a proxy for the biases and errors of the best-fit parameters.
The simulation is repeated for each XMM-Newton observation, so that the correct background and instrumental response
are taken into account. The resulting error bounds
are given in parentheses
in Table 3, and are to be compared with the 68% confidence limits from
the likelihood thresholding method. As can be seen, the Monte-Carlo errors are of the same order as
those obtained from the likelihood thresholding, and the biases are generally in consistent direction.
In summary, we have used here two complementary characterizations of parameter errors, based on
the log likelihood ratio
and on Monte-Carlo simulations. The errors derived from the
Monte-Carlo simulations do not invoke the asymptotic assumption underlying Wilks theorem, and are therefore
better adapted to low count rates. But then, they do not involve the actual observation at all,
and rely entirely on the assumption that the true spectrum is correctly modeled by the template spectra
and the background model. Any systematic error contribution is thus neglected. Therefore, the errors from the
Monte-Carlo simulation tend to under-estimate the true errors and should be considered as lower bounds. As
a further test, we have checked whether the minimum-
estimates are within the errors of
the maximum-L solution, and found that this is so for all cases
when
is used, and holds true in half of the cases when the Monte-Carlo error bounds are invoked (both methods
referring to 68% confidence level).
We shall not pursue here a deeper discussion of confidence regions, but terminate with a hint to the literature
where more refined constructions may be found in Eadie et al. (1971), Wachter et al. (1979),
Cousins (1995), Porter (1996), Feldman & Cousins (1998), and Giunti (1999).
![]() |
Figure 12:
Comparison of the unbinned estimates with results from
binned one-temperature XSPEC fits. For kT and |
| Open with DEXTER | |
![]() |
Figure 13: Exploration of the effect of energy band and binning on the HO Tau parameter estimates. Left column: energies from 0.3 to 2 keV; right column: energies from 0.3 to 4 keV. Top row: unbinned likelihood. Middle row: Neyman's chi square with 5 bins. Bottom row: Neyman's chi square with 4 bins. The binned data are marked by dotted line. |
| Open with DEXTER | |
In order to compare the unbinned estimates with the standard XSPEC results, the XSPEC iterative
fitting package was applied to background-subtracted observations with more than 50 counts, using bins of (at least)
10 counts and default (
)
statistics. The spectral model is identical to the one used for unbinned estimation
(single temperature,
,
abundances). The results are summarized in Fig. 12,
displaying unbinned versus binned results. Numerical values for the binned results are tabulated Güdel et al. (2007a).
The crosses are centered at the best-fit solutions and
cover formal 1-
errors. For LX, which is a derived quantity, only the best-fit parameters are indicated.
As can be seen, the agreement between binned and unbinned estimators is generally
good, except for HO Tau and GN Tau.
In order to clarify the situation we have created plots of HO Tau similar to Fig. 11 for both
the L and
statistics, and have systematically varied the energy band and binning.
The binning procedure provides (approximately) equal number of observed counts per bin, running from low to high energies.
Some exemplary results are shown in Fig. 13. The left column includes energies from 0.3 to 2 keV,
and the right column includes energies from 0.3 to 4 keV. The top row refers to the unbinned likelihood,
as quoted in Table 3 and Fig. 12.
The middle and bottom rows refer to Neyman's chi square statistic with 5 and 4 bins, respectively; containing
(10,10,11,9,11), (11,11,11,11,11), (13,13,12,13), and (14,14,13,14) counts (fixed-n* method). The binned observed spectra are
indicated by the dotted crosses, with errors representing
counts. As can be seen,
the unbinned likelihood has two shallow local maxima: a "cold'' one at
keV and
cm-2,
and a "warm'' one at
keV and
.
While the (cold) maximum-L solution and the L-profiles are stable under change
of the energy range, the 68%
-profiles (middle) undergo a transition from a single
to two regions, and the minimum-
solution flips from cold to warm as the number of bins is
decreased (middle to bottom). This instability is mostly caused by the accumulation of counts around 0.8 keV,
which either fall into a single (middle right) or two (bottom) bins. The 90%
domain
is approximately stable under re-binning, and contains the XSPEC solution (kT = 0.38 keV,
).
Based in Fig. 13 and the instability of "warm'' solutions under re-binning, one may conclude that the HO Tau data favour "cold'' solutions, all the more so as the "warm'' solutions lie on the border of the parameter space (Protassov et al. 2002). However, from an astrophysical point of view it is not obvious how such a cold and absorbed spectrum would arise. A possible explanation is discussed in Sect. 7. We shall therefore argue that no definitive conclusion is possible for HO Tau yet, and that further observations are needed.
For the second discrepant case, GN Tau, the binned estimates yield smaller kT and larger
than
the unbinned ones (the binned estimates lie in the unbinned 90% domain but outside the unbinned 68% domain). This can be traced back to 12 observed counts below 1 keV; if these are excluded then the binned
and unbinned best-fit parameters agree within 20%. We propose the following interpretation.
The presence of counts below 1 keV implies low absorption (
), provided that
they cannot be explained by the background. The unbinned background model has, at E < 1 keV, fine-structures
which do not coincide with the observed counts; hence the latter are attributed by the unbinned
method to the source, entailing low
and (Fig. 1 inlet) large kT. The binned
method, in contrast, assigns more [namely,
]
counts to the background; hence,
is
larger and kT is smaller. The discrepancy of GN Tau stems thus from background fine structures at
energies below 1 keV.
We have used Monte-Carlo simulations to assess the performance of the unbinned (exact) Poisson likelihood (L),
binned Poisson likelihood (C), Pearson's
and Neyman's
,
Kolmogorov-Smirnov
(D), and Kuiper's (V) statistics in the model classification and point estimation problems of low-count
XMM-Newton and Chandra/ACIS spectra parameterized by temperature (kT), hydrogen density (
)
and normalization (
). By "unbinned'' we mean that no grouping of instrumental energy channels is involved, so that the full readout
resolution is used. The L statistic equals the C statistic on the finest possible (instrumental) binning.
In all cases, the source normalization
was taken from its maximum-likelihood estimator, while the
shape parameters (
)
were taken such as to optimize the different statistics.
The outcome of the simulations can be summarized as follows:
The L statistics has been applied to XEST data, and demonstrated to operate
under real observational conditions. The major selection criteria
were spectral simplicity (so that the 3-parameter model applies), low count rate, and comparably
well-known background. The parameter space was rigorously explored to avoid potential difficulties with
iterative optimization. The maximum-L results usually agree with the minimum-
and minimum-
values within 68% confidence, but are not equal. There are, though, two cases (out of 14)
where the unbinned and binned estimates disagree: HO Tau and GN Tau.
Such disagreement, by itself, is a helpful indicator for potential problems in the forward modeling.
In particular, HO Tau is found by the L method to be rather cool (
keV).
Such a cool X-ray-emitting plasma would be unusual for a T Tauri star, the
majority of which have average temperatures of >10 MK. In the present survey
dominant plasma with similarly low temperature is only found in the low
temperature component of the spectrum of the jet-driving sources DG Tau A, GV Tau A, and DP Tau,
which may be due to shocks in the jet (Güdel et al. 2005; Güdel et al. 2007b). However, DG Tau A also has
significant plasma at very high temperatures (
50 MK) and HO Tau has
no known jet. A dominant low-temperature corona has been found on the
nearest classical T Tauri star, TW Hya, which is thus far unique in this
respect. Apparent high densities in this cool plasma suggest that
collisionally-ionized shocks in columns of material accreting onto the
star may generate the X-ray emission (Kastner et al. 2002; Stelzer &
Schmitt 2004). TW Hya is an unusually old classical T Tauri star,
approximately 10 Myr old. Using the Siess et al. (2000) stellar evolution
models, HO Tau appears also to be a relatively old classical T Tauri star,
at 8-9 Myr. Its mass accretion rate of
yr-1 (White & Ghez 2001) is several times
higher than that of TW Hya. Therefore it is plausible that HO Tau may be
analogous to TW Hya, in which case the bulk of its X-ray emission may be
generated by accretion shocks.
On the other hand, we can check whether the derived
agrees with expectations
from measured optical and near-infrared extinctions AV and AJ. For general
interstellar matter, the conversions are
cm-2and
cm-2 (see Vuong et al. 2002 and
references therein). The optical extinction for HO Tau is
AV = 1.11-1.13 mag
(Kenyon & Hartmann 1995; White & Ghez 2001), and the near-IR
extinction is
AJ = 0.32-0.46 mag (Kenyon & Hartmann 1995; Briceño et al.
2002), thus implying
cm-2 under the assumption
of standard gas-to-dust ratios. These estimates are in better agreement with the
"hot'' solution (Fig. 15) although the error ranges are large. We also note
that significant deviations from the standard interstellar relation between extinction
and X-ray absorption are well known (e.g., in the jet-driving stars mentioned
above). Such deviations may point to an "anomalous'' gas-to-dust ratio for example
as a consequence of dust evaporation.
We also note that both the binned and unbinned methods provide low kT and high
.
The discrepancy is in LX (see Fig. 12) which is a consequence of large uncertainties in
the spectral integration because most of the soft emission from the cool plasma is
strongly absorbed. Indeed, if the temperature estimate is changed from 0.14 keV (unbinned) to 0.28 keV
(binned) then LX drops by about a factor ten. The LX of the unbinned method is indeed rather high given
a stellar bolometric luminosity of this system of
erg s-1.
Adopting the usual X-ray saturation level of
LX/L* = 10-3 (e.g., Vilhu & Rucinski 1983),
we would expect a maximum
erg s-1, more in line
with the binned solution.
The low count-rate of HO Tau unfortunately precludes a high-resolution
X-ray spectroscopic study to study density-sensitive triplet emission-lines
from He-like ions of O, Ne and Mg, which suggest high densities in the TW Hya plasma.
As for GN Tau,
AJ = 1.17 mag (Luhman 2004), hence we expect
cm-2,
very close to the result from the unbinned method.
Although the Monte Carlo simulations suggest that the maximum-L estimates are closer to
the true values when averaged over a (hypothetical) ensemble of similar observations, a
few words of caution are in order. The present simulations have assumed that
the true spectrum is contained in the set of candidate spectra. If this is not the case,
or if the Poisson process assumption is violated, then the use of the most binding L statistics also
introduces the most severe misinterpretations. A major cause of flawed models stems from the background
estimation. We have assumed here that this estimation is perfect, and have not considered background
errors (e.g., Conrad et al. 2003), nor the more complicated problem of estimating the
background spectrum jointly with the source spectrum. An imprecise background model may be
the cause for the discrepancy between binned and unbinned estimates found in GN Tau, where
the most relevant low-energy background (0.3 to 1 keV) contains 200 counts only, implying about 30% statistical
background uncertainty at resolution
keV.
Furthermore, we have focused here on the choice of likelihood
functions for the point estimation and model classification problems. Accordingly, our
treatment of parameterization issues and confidence regions was rather crude. In particular,
we did not attempt to introduce any (Bayesian) a priori information other than implicit in
the choice of the forward models.
Finally, it should be pointed out that the unbinned approach is in principle not restricted to the one-dimensional spectrum problem considered in this article, and that the energy tags might be replaced by tuples of photon energy, arrival time, and position on the detector. However, the corresponding forward models would involve many more degrees of freedom, to the point where a simple maximum-likelihood principle might no longer be sufficient to uniquely determine the solution. Additional information (such as a Bayesian prior) would then be required in order to regularize the problem. Also, the more elaborated forward models would be more vulnerable to systematic errors (i.e., from CCD boundaries). In view of the (sparse) data we shall not pursue these issues.
Acknowledgements
We would like to thank the International Space Science Institut (ISSI) in Bern, Switzerland, for logistic and financial support during several workshops on the TMC campaign. This research is based on observations obtained with XMM-Newton, an ESA science mission with instruments and contributions directly funded by ESA member states and the USA (NASA). A.T. and M.G. acknowledge support from the Swiss National Science Foundation under grant Nr. 20-66875. L.S. and E.F. acknowledge financial contribution from contract ASI-INAF I/023/05/0.
This Appendix describes the inversion method (Lewis & Shedler 1979;
Devroye 1986), one of the two methods used here to simulate event lists. The inversion method,
in the present implementation, relies on Theorem 1.4 (Chapter 6) of Devroye (1986) stating
that if
0< X1 < X2 < ... is a homogeneous Poisson process with unit rate function
and
is a non-decreasing function with
= 0 then
is a non-homogeneous Poisson process with
cumulative rate function
.
This follows from the fact that if F is a continuous
(cumulative) distribution with inverse F-1 and U is uniformly distributed in the unit interval [0,1],
then F-1(U) has (cumulative) distribution F. Our numerical
implementation makes use of the monotony of
and the fact that
Xi+1-Xi is
exponentially distributed, which allows a successive computation
of
.
It proceeds as follows. Let a rate function
be specified by a
sufficiently resolved discrete version
with j ranging from 0 to
,
and
define the discrete cumulative rate function by
,
,
with
and
.
Then, an non-decreasing bin number sequence
of a non-homogeneous Poisson process
with intensity
is obtained by the following pseudocode,
while (
)
and (
)
do
j = j + 1
end do
nk = j
![]()
end