A&A 365, 660-680 (2001)
DOI: 10.1051/0004-6361:20000474

Object classification in astronomical multi-color surveys

C. Wolf - K. Meisenheimer - H.-J. Röser

Send offprint request: C. Wolf,

Max-Planck-Institut für Astronomie, Königstuhl 17, 69117 Heidelberg, Germany

Received 4 April 2000 / Accepted 26 July 2000

We present a photometric method for identifying stars, galaxies and quasars in multi-color surveys, which uses a library of $\mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil
$\displaystyle ... color templates for comparison with observed objects. The method aims for extracting the information content of object colors in a statistically correct way, and performs a classification as well as a redshift estimation for galaxies and quasars in a unified approach based on the same probability density functions. For the redshift estimation, we employ an advanced version of the Minimum Error Variance estimator which determines the redshift error from the redshift dependent probability density function itself. The method was originally developed for the Calar Alto Deep Imaging Survey (CADIS), but is now used in a wide variety of survey projects. We checked its performance by spectroscopy of CADIS objects, where the method provides high reliability (6 errors among 151 objects with R<24), especially for the quasar selection, and redshifts accurate within $\sigma_z \approx 0.03$ for galaxies and $\sigma_z \approx 0.1$ for quasars. For an optimization of future survey efforts, a few model surveys are compared, which are designed to use the same total amount of telescope time but different sets of broad-band and medium-band filters. Their performance is investigated by Monte-Carlo simulations as well as by analytic evaluation in terms of classification and redshift estimation. If photon noise were the only error source, broad-band surveys and medium-band surveys should perform equally well, as long as they provide the same spectral coverage. In practice, medium-band surveys show superior performance due to their higher tolerance for calibration errors and cosmic variance. Finally, we discuss the relevance of color calibration and derive important conclusions for the issues of library design and choice of filters. The calibration accuracy poses strong constraints on an accurate classification, which are most critical for surveys with few, broad and deeply exposed filters, but less severe for surveys with many, narrow and less deep filters.

Key words: methods: data analysis - methods: statistical - techniques: photometric - surveys

Author for correspondance: cwolf@mpia-hd.mpg.de

1 Introduction

Sky surveys are designed to provide statistical samples of astronomical objects, aiming for spatial overview, completeness and homogeneous datasets. Mostly they serve as a database for rather general conclusions about abundant objects, but another attractive role is allowing to search for rare and unusual objects. For both purposes, it is very useful to predict rather precisely the appearance of the different known types of objects. The object types can then be discriminated successfully, and allow to extract the information content from the survey. Also, unusual objects can be found as inconsistent with all known sorts of objects, but they might as well hide among the bulk of normal objects mimicking their appearance.

In this picture, we of course want a survey to perform as reliable and as accurate as possible in measuring object characteristics like class, redshift or physical parameters. Since surveys aim typically for large samples upon which future detailed work is based, their results are often not extremely reliable and accurate for a given single object. But for a statistical analysis of large samples, we can usually do without perfect accuracy in the measurement of features and we can also accept occasional misclassifications.

In astronomical surveys pointing off the galactic plane, obvious classes to start out with could basically be stars, galaxies, quasars and strange objects. These can be further differentiated into subclasses, based on physical characteristics derived from their morphology or spectral energy distribution (SED). Therefore, morphology and color or prominent spectral features are the typical observational criteria applied to survey data for classifying the objects contained.

Presently, surveys concentrate mostly on either imaging or spectroscopy. While spectroscopic surveys deliver a potentially high spectral resolution, they have expensive requirements for telescope time. Imaging multi-color surveys can expose a number of filters consecutively, and deliver morphological information and crude spectral information for all objects contained in the field of view.

Since the subject of this paper is the spectral information in multi-color surveys, we want to mention morphological information only briefly: The morphology is only of limited use for classifying objects into stars, galaxies and quasars: Objects observed as clearly extended are certainly not single stars, but the smaller ones could either be galaxies, low-luminosity quasars, or chance projections of more than one object. Objects consistent with point-sources can be stars, compact galaxies or quasars. Also, the morphological differentiation depends on the seeing conditions and typically reaches not to the survey limits set by the photometry.

The power of spectral classification in a multi-color survey depends both on the filter set used and the depth of the imaging, where the optimum choices are determined by the goal of the survey. If a survey aims at identifying only one type of object with characteristic colors, a tailored filter set can be designed. E.g., when looking exclusively for U-band dropouts (Steidel et al. 1995), the UGR filter set is certainly a very good choice. The performance of such a dropout survey depends mostly on the depth reached in the U-band, so the photon flux detection limit in U is the key figure. Also, number count studies are limited by the completeness limit in the filter of concern. Quasar search is very often done with color excess rules (Hazard 1990), where the limit is given by the flux errors combined from two or three filters. E.g., the evolution of quasars between redshift 0 and 2.2 was established using the UV excess method (Schmidt & Green 1983; Boyle et al. 1988). At higher redshift quasars display rather star-like broad-band colors, motivating more advanced approaches like the selection of outliers in an n-dimensional color space (Warren et al. 1991).

If we now intend to focus different survey programs on a common patch of sky to maximise synergy effects from the various efforts, then we might as well combine the individual surveys into one that identifies every object, and avoid double work. Then we have to ask for a filter set which enables identifying virtually every object above some magnitude limit unambigously. In this case, the key number for the performance is the magnitude limit for a successful classification as needed for various science applications. If the classification takes all available color data into account, like template fitting procedures do, then the flux limit of a single filter is not the only relevant number, since the performance will depend to a large extent on the filter choice. This applies also for the estimation of multi-color redshifts, an idea dating back to Baum (1962), who used nine-band photoelectric data to estimate the redshifts of galaxy clusters.

Most multi-color surveys conducted to date obtained spectral information via broad-band photometry. They have been used e.g. to search for quasars or for high-redshift galaxies. However, they always needed follow-up spectroscopy to clarify the true nature of the candidates and to measure their redshift. The SLOAN Digital Sky Survey (York et al. 2000) is now the most ambitious project to provide a broad-band color database, on which the astronomical community might perform a large number of "virtual surveys''.

So far, only very few survey projects make extensive use of medium-band and narrow-band photometry, e.g. the Calar Alto Deep Imaging Survey (Meisenheimer et al. 1998). Surveys like CADIS with typically 10 to 20 filters are sampling the visual spectrum with a resolution comparable to that of low resolution imaging spectroscopy. CADIS fostered the development of a scheme for spectral classification, that distinguishes stars, galaxies, quasars and strange objects. Simultaneously, it assigns multi-color redshifts to extragalactic objects.

Using 162 spectroscopic identifications Wolf et al. (2001, henceforth Paper II) have shown, that it is reliable for virtually all objects above the 10-$\sigma $ limits of the CADIS survey. Also, the photometric redshifts are accurate enough ( $\sigma_z \approx 0.03$ for galaxies and $\sigma_z \approx 0.1$ for quasars around the 10-$\sigma $ limit), so that follow-up spectroscopy is not needed for a number of analyses, e.g. the derivation of galaxy luminosity functions (Fried et al. 2000).

After this algorithm was developed for CADIS, it is now used for classification in additional projects. It provides multi-color redshifts in lensing studies of the cluster Abell 1689 (Dye et al. 2000), aiming at determining the cluster mass after identifying cluster members and weakly lensed background objects. It is also employed for an ongoing widefield survey to search for high-redshift quasars, to provide multi-color redshifts for galaxy-galaxy lensing studies, to search for high-redshift galaxy clusters and to perform a census of L* galaxies at $z \approx 1$(Wolf et al. 2000).

The purpose of this paper is to present our classification scheme and discuss the optimization of its use for optimum survey strategies. The statistical algorithm for the scheme is presented in Sect. 2 and our choice for the template libraries is detailed in Sect. 3. In Sect. 4 we report on simulations of a few competitive filter sets and their expected classification performance. We include an analytic discussion on the comparison of filter sets and conclude that medium-band surveys are altogether more powerful, even when being limited by available telescope time. Section 5 outlines a few real datasets using this classification and draws conclusions about the expected performance. Paper II demonstrates real CADIS data based on which we gained experience during the development of the scheme, and show, that the conclusions from the simulations compare well to the real dataset.

2 The classification algorithm

2.1 General remarks on classification

Generally speaking, classification is a process of pattern recognition which usually has to deal with noisy data. Mathematically, a classifier is a function, which is mapping a feature vector of a measured object characteristics onto a discriminant vector, that contains the object's likelihoods for belonging to the different available classes. Any classification relies on the feature space being chosen such that different classes cover different volumes and overlap as little as possible to avoid ambiguities.

If a survey is designed without class definitions in mind, it will be difficult to choose a set of measurable features for a tailored classification. Also, only unsupervised classifiers (= working without knowledge input) can be used to work on measured object lists. In this case, a classifier can find distinguishable classes, e.g. by cluster analysis. This process leads to a definition of new class terms which depends strongly on the visible features taken in account.

For any classification problem, it is of great advantage, if class terms are defined a priori and encyclopedic knowledge is available about measurable features and their typical values. Then models of the classes representing this knowledge can be constructed to serve as an essential input to a supervised classifier (= using input knowledge as a guide). When selecting the features, two potential problems should be avoided: One is the use of well-known but hardly discriminating features, which will obviously not improve the classification but just increase the effort. The other is using features which are not well-known and therefore can easily cause mistakes in the classification. Especially, with high measurement accuracy this can lead to apparent unclassifiability when an object looks different than expected.

Two different types of class models can be distinguished depending on the uniqueness of the classification answer:

In one type of models geometric rules are used to delimit sectors in feature space covered by the classes in competition. These models assign just one class to the measurement uniquely and definitely, which is the one containing the feature vector within its geometric limits. Effectively, the discriminant vector does not contain likelihood values in a statistical sense but instead a single entry "1'' for the class decided on and zeros for the other classes (while nearest-neighbor classifications can define rather complicated boundary shapes in feature space, they also belong to this type);

Another type are statistical class models rendered as likelihood functions which are defined across the entire feature space range. Only these provide discriminant vectors with relative likelihoods of class membership for an object, thereby following a "fuzzy logic'' approach.

While classes are discrete entities, a statistical classification can also work on continuous parameters. The discriminant vector then becomes a likelihood function of the parameter value. Based on this distinction classification problems can be considered as decision problems for discrete variables and estimation problems for continuous variables (Melsa & Cohen 1978a). In either case, a definite statistical classification containes two consecutive steps: First, the discriminant vector is determined (see Sect. 2.2) and second, it is mapped either by decision to a final class or to a parameter estimate (see Sect. 2.3).

2.2 Step 1: Determining discriminant vectors

We assume an object with m features being measured by any device, thus displaying the feature vector $\vec q = (q_1, \ldots q_m)$. We consider n classes $c_1, \ldots
c_n$ as a possible nominal interpretation and denote the likelihood of this object to belong to the class ci as $p(c_i\vert\vec q)$. A true member of class ci has an a priori probability of displaying the features $\vec q$ given by $p(\vec
q\vert c_i)$.

Initially, we assume a simple case of uniquely defined class models, where all members of a single class ci have the same intrinsic features $\vec q_{c_i}$, so that any spread in measured $\vec q$ values arises solely from measurement errors. Assuming a Gaussian error distribution for every single feature, it follows (Melsa & Cohen 1978b), that

$\displaystyle p(\vec q \vert c_i) = C \exp \left(-\frac{1}{2} (\vec q-\vec q_{c_i})
V^{-1}_k (\vec q-\vec q_{c_i})^t \right) ,$     (1)

where $(\vec q-\vec q_{c_i})$ is the measurement error in case the object does belong to ci and $(\vec q-\vec q_{c_i})^t$ is its transposed version. Each feature qkis measured with its own error variance $\sigma^2_k$, which are the diagonal elements in the variance-covariance matrix V. If all the features are statistically independent, the off-diagonal elements vanish. The normalisation factor C is

$\displaystyle C = \frac{1}{\sqrt{(2 \pi)^m \vert V_k\vert}}\cdot$     (2)

As contained in the discriminant vector, the likelihood for an object observed with $\vec q$ to belong to class ci is then

$\displaystyle p(c_i \vert \vec q) = p(\vec q \vert c_i) / \sum^n_{l=1} p(\vec q \vert c_l) .$     (3)

However, in realistic cases the classes themselves are extended in feature space and their volume might have rather complicated shapes. In the spirit of Parzen's kernel estimator (Parzen 1963) the extended class ci can be represented by a dense cloud of individual uniquely defined (point shape) members cij. Every member accounts for some a-priori probability to display $\vec q$, given as $p(\vec q\vert c_{ij})$, just as if it were a "class'' on its own. The complete class ci is now rendered as a superposition of its Ni members and adds up to a total probability of

$\displaystyle p(\vec q \vert c_i) = \frac{1}{N_i} \sum_j p(\vec q \vert c_{ij}) .$     (4)

In an estimation problem the probability functions have the same form, except for changes in the notion: $\theta$ denotes the parameter to be estimated, and ideally the class model ci had a continuous shape covering the range of expected values. The discriminant vector would then be a function $p(\theta\vert\vec q)$. Again, the class model can be approximated by a discrete set of members sampling the $\theta$ range of interest at sufficient density.

The astronomical application discussed in this paper poses a mixture of decision and estimation problems which can be realized simultaneously with a unified approach: the decision may choose from the three classes c1 = stars, c2 = galaxies and c3 = quasars, and an estimation process takes care of the parameters redshift and different spectral energy distributions (SED). The internal structure of every class ci is then spanned by its individual parameter set $\vec \theta_i =
\{\theta_{i\vec j} \}$, either following a grid design or being unsorted if no parameter structure is needed.

If one chooses to approximate the spatial extension of a class by a dense grid sampling discrete parameter values, two problems are solved at once: on the one hand, an internal structure is present for estimating parameters, and on the other hand, the class is well represented for calculating its total probability $p(\vec
q\vert c_i)$. Altogether, the probability function with internal parameters $\theta_{i\vec j}$being resembled by class members $c_{i \vec j}$ is then

$\displaystyle p(\vec q \vert \theta_{i \vec j}) =
C \exp \left(-\frac{1}{2} (\v...
...vec q(\theta_{i \vec j}))
V^{-1}_k (\vec q-\vec q(\theta_{i \vec j}))^t \right)$     (5)

with the total probability for class ci being

$\displaystyle p(\vec q \vert c_i) = \frac{1}{N_i} \sum_{\vec j} p(\vec q \vert \theta_{i\vec j}) ,$     (6)

and the equation for the class likelihood function still being

$\displaystyle p(c_i \vert \vec q) = p(\vec q \vert c_i) / \sum^n_{l=1} p(\vec q \vert c_l) .$     (7)

Based on these probability functions the classification can perform a decision between object classes and estimations of redshift and other object parameters at once. Two different analyses are integrated into one paradigm and calculated efficiently by evaluating the same probability density function.

2.3 Step 2: Decision and estimation

Decision rules are functions mapping a discriminant vector $\vec p(c_i\vert\vec q)$ to a decision value d. The value di denotes a decision in favor of class ci, i.e. the object displaying features $\vec q$ is then assumed to belong to this class. The most simple decision rule is the maximum likelihood (ML) scheme, which decides for the one class with the highest likelihood p. In case of two classes existing this means

$\displaystyle \hspace{-1mm} {\mbox{if \quad $p(c_1 \vert \vec q) > p(c_2 \vert ...
...{1mm} \mbox{if \quad $p(c_1 \vert \vec q) < p(c_2 \vert \vec q)$ , then}~~d_2.}$     (8)

A more compact notion for the same rule is

$\displaystyle p(c_1 \vert \vec q) \begin{array}{c} {\scriptstyle d}_1 \\  ^>_< \\  {\scriptstyle d}_2 \end{array} p(c_2 \vert \vec q) .$     (9)

Depending on the purpose of the classification tailored improvements can be made to this rule. The probability of error (PoE) method, e.g., attempts to minimize the rate of misclassifications by including the a-priori-probability for observing a member of a given class. Following Bayes theorem these "priors'', denoted P(c1) and P(c2), are just the relative abundance of the class in the whole sample. The PoE decision rule is then

$\displaystyle p(c_1 \vert \vec q) P(c_1) \begin{array}{c} {\scriptstyle d}_1 \\  ^>_< \\  {\scriptstyle d}_2 \end{array}p(c_2 \vert \vec q) P(c_2) ,$     (10)

which causes somewhat ambiguous objects to be preferentially classified as belonging to the more common class. Rare objects are then less likely to be found at all, but the overall performance of the classifier improves. A general approach uses any type of priors for trimming the classification towards specific goals, so every decision rule compares the likelihood ratio $\Lambda$ with a threshold T and follows the form (with T=1 for ML decision)

$\displaystyle \Lambda (\vec q) = \frac{p(c_1 \vert \vec q)}{p(c_2 \vert \vec q)...
...gin{array}{c} {\scriptstyle d}_1 \\  ^>_< \\  {\scriptstyle d}_2 \end{array} T.$     (11)

Estimation rules are functions mapping a discriminant vector $\vec p(\theta\vert\vec q)$to an estimated value $\tilde\theta$. The most simple estimation rule is again the maximum likelihood (ML) rule, which chooses the one parameter value with the highest likelihood p, i.e., the ML estimator is given by

$\displaystyle p(\tilde\theta_{{\rm ML}} \vert \vec q) \geq p(\theta \vert \vec q) \qquad \forall \theta .$     (12)

The Bayesian approach can also be applied to continuous variables, whereas one special case is of particular interest: if the error distribution of the feature measurement is Gaussian, and if the goal is to minimize the variance of the true estimation error, then the optimum estimation rule can be derived analytically (Melsa & Cohen 1978b). This minimum error variance (MEV) estimator is given by

$\displaystyle \tilde\theta_{{\rm MEV}} = \frac{\int \theta p(\theta \vert \vec ...
...eta) \,{\rm d}\theta}
{\int p(\theta \vert \vec q) P(\theta) \,{\rm d}\theta} ,$     (13)

and it is equivalent to interpreting the discriminant vector as a statistical ensemble and determining the mean of the distribution. It is also dubbed mean square estimator or conditional mean estimator. Note that, if $p(\theta\vert\vec q)$ is symmetric in $\theta$ and unimodal, the MEV estimator is identical to the ML estimator.

2.4 Application to astronomical multi-color surveys

Deep extragalactic surveys usually contain mostly galaxies, fewer stars and a tiny fraction of quasars, with relative numbers on the order of 100:10:1. A survey at galactic latitudes above $\vert b\vert \mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil
...ip\halign{\hfil$\scriptscriptstyle ... with a limiting magnitude of R=23and an area of 1 $\ifmmode\hbox{\rlap{$\sqcap$ }$\sqcup$ }\else{\unskip\nobreak\hfil
... }$\sqcup$ }
\parfillskip=0pt\finalhyphendemerits=0\endgraf}\fi\hbox{$^\circ$ }$, e.g., should contain roughly 30000 galaxies (Metcalfe et al. 1995), some 3000 to 6000 stars (Bahcall & Soneira 1981; Phleps et al. 2000), and about 400 quasars including Seyfert-1 galaxies (Hartwick & Schade 1990). Any classification would ideally be capable of distinguishing all three classes of objects. Only in surveys, which do not care about the rare quasars, their class could be dropped and the classification needed to separate only stars from galaxies.

In addition to the class itself, plenty of physical parameters could potentially be recovered from an object's photometric spectrum. Most importantly, we would like to determine redshift estimates for galaxies and quasars. In addition, the spectral energy distribution of galaxies contains information about their star formation rate and the age of their stellar populations. A photometric spectrum of sufficiently high spectral resolution can even allow to estimate the intensity of emission-lines. Finally, the spectra of stars tell mostly their effective temperature, but also their metallicity and their surface gravity.

The literature provides abundant knowledge of spectral properties for all three object classes. Synthetic photometry can use published spectra together with efficiency curves from the survey filter set in order to obtain predicted colors of objects. Sometimes, model assumptions are needed to fill in data gaps present in the literature, which could either be gaps on the spectral wavelength axis or gaps on physical parameter ranges, e.g. star-formation rate. Eventually, systematic multi-color class models can be calculated from published libraries covering various physical parameters. These can serve for later comparison with observed data. Therefore, we decided to build a statistical classification based on published spectral libraries and a limited number of model assumptions (see Sect.3).

In a multi-color survey the dominant information gathered are the object fluxes in the different filters. We decided to use the color indices as an input to the classification rather than the fluxes themselves, which eliminates one dimension from the problem by omitting the need for any flux normalisation, that remains as an additional fit parameter in template fitting procedures. It will be shown in Sect.2.5, that the color-based approach is equivalent to the flux-based one under certain constraints.

Morphological information is typically also available to some extent and can be included in the classification based on the assumption that only galaxies are capable of showing spatial extent. But this should be done carefully, since luminous host galaxies can render quasars as extended. Also, if the image quality varies across the observed field, the morphological analysis is of limited use for not clearly extended sources.

We define the color qg-h as a magnitude difference between the flux measurements in two filters Fg and Fh:

$\displaystyle q_{g-h} = m_g-m_h = -2.5\log\frac{F_g}{F_h}\cdot$     (14)

Obviously, the color system depends on the filter set chosen and also on the flux normalisation used. As long as the flux errors are relatively small, the linear approximation of the logarithm can be used to express magnitude errors as $\sigma_{m_i} \approx \sigma_{F_i} / F_i$, so that the error of the color is

$\displaystyle \sigma_{q_{g-h}} = \sqrt{\sigma^2_{m_g} + \sigma^2_{m_h}} \approx \sqrt{
(\sigma_{F_g} / F_g)^2 + (\sigma_{F_h} / F_h)^2} .$     (15)

Since the likelihoods determined for the classification depend sensitively on the colors $\vec q$ and their errors $\vec \sigma_q$, both values must be carefully calibrated. If any color offset is present between measurement and model, the classification will go wrong systematically. If errors are underestimated, the likelihood function could focus on a wrong interpretation, rather than including the full range of likely ones. Overestimated errors will obviously diffuse the likelihoods and give away focus which is originally present in the data. The approximation of errors as presented will only work well with flux detections of at least 5$\sigma $ to 10$\sigma $, but at lower levels the classification is likely to fail anyway, so we ignore this concern.

Given $\vec q$ and $\vec \sigma_q$ a measured object is represented by a Gaussian error distribution rather than a single color vector. If colors are measured very accurately and the object is rendered as a narrow distribution, it could possibly fall between two grid steps of a discrete class model and "get lost'' for the classification. In this case low likelihoods would be derived despite the spatial proximity of object and model in terms of metric distance. The likelihood function would appear not much different from that of a truely strange object residing off the class in an otherwise empty region of color space. In technical terms, the classification would violate the sampling theorem (Jähne 1991), and the probability functions would not be invertible any more.

For discrete class models the sampling theorem requires that every measurement falling inside the volume of a model should "see'' at least two model members inside of its Gaussian core. Due to practical limitations of computing time and storage space, it does not make sense to develop discrete models with virtually infinite density accounting for arbitrarily sharp measurements. Also, for measurements with low photon noise the dominant source of error will be the limited accuracy of the color calibration.

The solution to the problem is then to design the discrete model with the achievable measurement accuracy in mind, and to smooth the discrete model into a continuous entity by convolving its grid with a continuous function that is wide enough to prevent residual low-density holes between the grid points. A sensible smoothing width would just fulfill the sampling theorem, i.e. the smoothing function should roughly stretch over a couple of discrete points. As a result, even an extremely sharp measurement will be covered by the model and classified correctly.

Higher resolution would only increase the computational efforts while lower resolution would ignore information which is present in the data and therefore potentially worsen the classification. From a different point of view, one could leave the discrete model unchanged and claim the data to have larger effective errors by including the calibration errors thereby limiting the width of the Gaussian data representation to a lower threshold, which will always ensure the sampling theorem on the discrete grid anyway.

Both approaches are mathematically identical, if one chooses to represent the calibration errors as well as the smoothing function by a Gaussian. Due to the symmetry of the Gaussian function, convolving the discrete grid or convolving the error distribution of the data yields the same result. The choice of the Gaussian is computationally very efficient, because the convolution of the Gaussian measurement with the Gaussian calibration error results in another Gaussian of enlarged width. As mentioned in Sect.5.1 and discussed in Paper II, a survey in the visual bands can be calibrated with a relative accuracy on the order of 3% between the different filters. Therefore, we decide to apply a $0\hbox{$.\!\!^{\rm m}$ }03$-Gaussian as a smoothing function.

In summary, we apply the formalism presented in Sect. 2.2 in the following way: the errors $\sigma_{q_i}$ of the colors qi are convolved with the smoothing $0\hbox{$.\!\!^{\rm m}$ }03$-Gaussian and as a result the effective errors are

$\displaystyle \sigma^2_i = \sigma^2_{q_i} + (0\hbox{$.\!\!^{\rm m}$ }03)^2 .$     (16)

For simplicity, we assume the individual colors to be uncorrelated, which is actually not true for filters sharing spectral regions in their transmission. The variance-covariance matrix then becomes diagonal

$\displaystyle V_k = \left( \begin{array}{cccc} \sigma^2_1 & & & \\  & \sigma^2_2 & & \\
& & \ddots & \\  & & & \sigma^2_m \end{array} \right) ,$     (17)

and the probability function turns into

$\displaystyle p(\vec q \vert c_i) = \frac{C}{N_i} \sum_{j=1}^{N_i} \exp \left(-...
\sum_{k=1}^m \left( \frac{q_l - q_{c_{ij},k}}{\sigma_k} \right)^2 \right) .$     (18)

Based on the three object classes discussed the likelihood function is

$\displaystyle p(c_i \vert \vec q) = \frac{p(\vec q \vert c_i)}{p(\vec q \vert c...
... +
p(\vec q \vert c_{{\rm galaxies}}) + p(\vec q \vert c_{{\rm quasars}})}\cdot$     (19)

Considering three classes implies that extremely faint objects with large errors get average probabilities of 33% assigned for all classes. In general applications, we use a decision rule for an object seen as $\vec q$, which requires that one class is at least three times more probable than the other two classes put together, i.e.:

If there is one class with $p(c_i\vert\vec q) > 0.75$, then we assign this class to the object, but if all classes have likelihoods below 0.75, we call it unclassifiable.

For the detection of unusual objects, we look at the color distance of an object to the nearest member of any class model to derive a statistical consistency with the class. The value of this consistency depends on the different color variances and can be calculated from $\chi^2$-statistics. Lacking an analytic expression we use $\chi^2$-tables (Abramowitz & Stegun 1972) to evaluate the statistical consistency between class and object. In practice, the resulting $\chi^2$-values need to be normalised to a plausible scale, since the raw values obtained are enlarged artificially due to the discrete sampling of the library and cosmic variance. We use the following operative criterion for the selection of unusual objects:

If an object is inconsistent at least at a confidence level of 99.73% (i.e. 3$\sigma $ in case of a normal distribution) with all members of all classes, then we call it strange.

Strange objects can formally be classifiable, if the likelihoods still prefer a certain class membership. They have either intrinsically different spectra without counterparts in the class models, or they are reduction artifacts, e.g. when neighboring objects affect their color determination, and this is not taken into proper account for the error calculation.

Apart from the rather trivial ML estimator, we use the MEV estimator to obtain redshifts and SED parameters of galaxies and quasars. Their class models are designed as regular grids (see Sect.3) with members cij residing at redshift zij. The MEV estimator for the redshift is then

$\displaystyle \langle z \rangle _{{\rm MEV}} = \frac{\sum_j z_{ij} p(\vec q \vert c_{ij})}
{\sum_j p(\vec q \vert c_{ij})}\cdot$     (20)

It is applied to the class models for galaxies and quasars independently and for each class interpretation an independent redshift estimate is obtained. There is also an assessment for the likely error of the z estimate given by the variance of the distribution $p(\vec q\vert z)$:

$\displaystyle \sigma^2_z = \frac{\sum_j (z_{ij}-\langle z \rangle _{{\rm MEV}})^2 p(\vec q \vert c_{ij})}
{\sum_j p(\vec q \vert c_{ij})} \cdot$     (21)

This estimation scheme would be sufficient, if models had a rather simple shape in color space, i.e. if color space and model parameter space could easily be mapped onto each other. In fact, the class model for galaxies and particularly the one for quasars can have very complicated folded shapes in color space, so that the distribution $p(\vec q\vert z)$ can have a correspondingly complicated structure that is not at all well described by mean and variance.

Therefore, we distinguish three cases: unimodal (single peaked), bimodal (double peaked) and broad distributions. In unimodal cases $\langle z \rangle _{{\rm MEV}}$ and $\sigma_z$ are appropriate reductions of $p(\vec q\vert z)$. In bimodal cases we split the redshift axis in two intervals delimited at $\langle z \rangle _{{\rm MEV}}$ and obtain two alternative unimodal solutions with relative probabilities given by the p sums in the two intervals. If the distribution is so broad, that it starts to resemble a uniform distribution, $\langle z \rangle _{{\rm MEV}}$ approaches the mean z value of the model and $\sigma_z$ approaches $\sqrt{1/12} (z_{{\rm max}}-z_{{\rm min}})$. In order to keep our statistics clean from such mean redshift contaminants, we cut off the estimator at some uncertainty:

If an object has $\sigma_z > 1/8 (z_{{\rm max}}-z_{{\rm min}})$, then we ignore the MEV estimate and call its redshift uncertain.

In particular, it is possible, that an object has a bimodal distribution with one peak (result) and one broad (uncertain) component. In the following, we denote this extended scheme of MEV estimates accounting for possible bimodalities as our MEV+ estimate. In Sect.4.4 we will compare the performance of all three estimators, ML versus MEV and MEV+.

An effort was made to implement a classification code optimized for short computing time. The use of precalculated class models eliminates any synthetic photometry from a typical fitting procedure. Furthermore, the use of colors instead of fluxes eliminates the need for finding a flux normalisation. In terms of CPU time, the classification of one object contains mainly the calculation of the probability $p(\vec q\vert c_{ij})$ for every class member, which involves first adding up all $\sigma^2_i$-scaled squared color differences and second evaluating an exponential function of the resulting sum that is already a measure of strangeness. Summing up the $p(\vec q\vert c_{ij})$ to obtain class likelihoods and deriving mean and variance of the internal class parameters should take less time than calculating the probability density function, if more then ten color axes are taken into account. With class models containing about 50000 members and 13 colors, the full classification of one object takes about 0.3sec when running on a 200MHz Ultra Sparc CPU inside a SUN workstation. Since different survey applications might require different sample selection schemes, we decided to calculate and store discriminant vectors for all objects and select subcatalogs for further analysis later.

2.5 Equivalence of flux-based and color-based classification

We now show, that the color-based classification yields the same best fit as a flux based template fitting algorithm. Lanzetta et al. (1996), e.g., calculate a likelihood function depending on redshift z, a spectral energy distribution and a flux normalisation parameter A, following the form:

$\displaystyle L_{{\rm model}} = \exp \left(-\frac{1}{2}
\sum_k \left( \frac{F_{k,{\rm obs}}
- A \tilde F_{k,{\rm model}}}{\sigma_{F_k}}\right)^2 \right) .$     (22)

Basically, the likelihood determination relies on the squared photometric distance d between observation and model, resulting from the flux differences $\Delta F_k$in each filter:

$\displaystyle d^2 = \sum_k^n \chi_k^2 \quad \mbox{with} \quad
\chi_k = \frac{F_...
...obs}} - F_{k,{\rm model}}}{\sigma_{F_k}}
= \frac{\Delta F_k}{\sigma_{F_k}}\cdot$     (23)

In the color based approach there are n-1 color indices contributing distance components and we assume the single constraint, that there is one particular base filter approximately free of flux errors, e.g. a deeply exposed broad-band filter. The color indices are made by comparing any filter to this base filter ensuring optimum errors for the colors. In this scheme, any errors in the relative calibration are absorbed into the color indices. Therefore, it is very important, that the base filter is not wrongly calibrated with respect to the other wavebands, since the error would spread into the entire vector of color indices.

We then look only at a range of good fits, and do not mind rather crude $\chi$-approximations for relatively bad fits which are anyway ruled out as solutions. Also, we consider only measurements with $\sigma_{F_k} / F_k \mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfi...
...\offinterlineskip\halign{\hfil$\scriptscriptstyle ..., which allows the assumption of Gaussian color errors and a linear approximation of the logarithm. The distance components are:

$\displaystyle \chi_k$ = $\displaystyle \frac{(m_k-m_{{\rm base}})_{{\rm obs}}-(m_k-m_{{\rm base}})_{{\rm model}}}{\sigma_{m_k-m_{{\rm base}}}}$  
  = $\displaystyle 2.5 \frac{{\rm log} (F_k/F_{{\rm base}})_{{\rm obs}} - {\rm log} ...
{\sqrt{(\sigma_{F_k}/F_k)^2+(\sigma_{F_{{\rm base}}}/F_{{\rm base}})^2}}\cdot$ (24)

Using the terms $\Delta F_k$ and $\sigma_{F_{{\rm base}}}/F_{{\rm base}} \approx 0$, we obtain

$\displaystyle {\chi_k \approx 2.5 \frac{F_k}{\sigma_{F_k}} \cdot}$
    $\displaystyle \left\{ {\rm log} \left(1+\frac{\Delta F_k}{F_{k,{\rm model}}} \r...
...} \left(1+\frac{\Delta F_{{\rm base}}}{F_{{\rm base,model}}} \right) \right\} .$ (25)

Expanding the logarithm for $\Delta F_k/F_k \ll 1$, we get

$\displaystyle \chi_k$ $\textstyle \approx$ $\displaystyle \frac{F_k}{\sigma_{F_k}} \left\{ \frac{\Delta F_k}{F_{k,{\rm model}}} -
\frac{\Delta F_{{\rm base}}}{F_{{\rm base,model}}} \right\}$  
  $\textstyle \approx$ $\displaystyle \frac{\Delta F_k}{\sigma_{F_k}}
+ \frac{\Delta F_k}{\sigma_{F_k}}...
...frac{\Delta F_{{\rm base}}}{\sigma_{F_k}} \frac{F_k}{F_{{\rm base,model}}}\cdot$ (26)

The first term is typically on the order of 1, while the second term is on the order of $\sigma_{F_k}/F_k \ll 1$ and the third one of $\sigma_{F_{{\rm Base}}}/\sigma_{F_k} \ll
1$. Therefore, the last two terms can be dropped and the expression for $\chi_k$reduces to

$\displaystyle \chi_k \approx \frac{\Delta F_k}{\sigma_{F_k}} ,$     (27)

which is identical to the expression used in the flux template fitting method shown in Eq. (23).

2.6 System of color indices

In the previous section, we had discussed the relevance of a common base filter for the various color indices, which is supposed to have relatively small flux errors in order to keep the color errors as low as possible. Our multiband survey applications usually involve a smaller number of broad bands as well as a larger number of medium-band observations. For these, we decided to form color indices from broad bands neigboring on the wavelength axis, i.e. U-B, B-V, V-R and R-I, which we assume to be the optimum solution for comparably deep bands. Each of the shallower medium bands we combine with the most nearby broad-band in terms of wavelength, which then serves as a base filter for the medium-band color indices, e.g. B-486 or 605-R, where letters denote broad bands and numbers represent the central wavelength of medium-band filters measured in nanometers.

In terms of flux template fitting, this scheme of color indices means, that we use a few deep broad bands to fit the global shape of the SED, and then use a few groups of medium bands around each deep broad-band to fit the smaller-scale shape locally. The $\chi^2$-values of the global fit and the several local fits are then just added up to the total $\chi^2$. This scheme has a particular advantage over a solely global flux fitting: the local fits can well trace spectral structures, even if the global distribution of the object differs from the template (e.g. as it could be caused by extinction). Therefore, it is not too dependent on accurate global template shapes and it can use the ability of the medium bands to discriminate narrow spectral features for a more accurate classification. Of course, this advantage vanishes immediately for a pure broad-band survey, where local structures in the spectrum are not traced, and therefore no local fits are available for the $\chi^2$-sum.

3 The classification libraries

  \begin{figure}\par\includegraphics{ds9808f1.eps}\end{figure} Figure 1: This diagram shows a few selected spectra from our template libraries. The shown wavelength scale runs from 315nm to 1000nm for stars (left), from 125nm to 1600nm for galaxies (center) and from 100nm to 550nm for quasars (right). The flux is $\lambda f_\lambda $ in units of photons per nm, time intervall and sensitive area and offset by one unit per step within a class. The flux scale is normalised to unity at 800nm for stars, arbitrary for galaxies, and normalised to 0.2 at 250nm for quasars. The stellar templates are taken from Pickles (1998), the galaxy templates from Kinney et al. (1996) and quasar templates are modelled after Francis et al. (1991). The quasar diagram shows nine spectra with three different spectral indices (-2.0, -0.6, +0.8) and three different relative emission-line intensities (0.6, 2.1, 5.7)
Open with DEXTER

We assembled the color libraries from intrinsic object spectra assuming no galactic reddening. Clearly these libraries can only be sufficient when observing fields with low extinction and little reddening. Usually, such fields are chosen for deep extragalactic surveys and the CADIS fields in paticular were carefully selected to show virtually no IRAS 100$\mu$ flux (below 2MJy/sterad), so we expect "zero'' extinction and reddening there. When applying this color classification to fields with reddening, the libraries would have to be changed accordingly.

Obviously, the libraries should contain a representative variety of objects, but still they can never be assumed to cover a complete class including all imaginable oddities. When classes are enlarged to cover as many odd members as possible, there is a trade-off to be expected between classifying the odd ones right, and introducing more spatial overlap between the classes in general, i.e. introducing more confusion among normal objects. The spectral libraries we employ are partly based on observations only and partly mixed with model assumptions. Our particular choice of libraries is founded on experience we gained within the CADIS survey, where we found several other published templates to be less useful.

3.1 The star library

For the stars, we picked the spectral atlas of Pickles (1998), that contains 131 stars with spectral types ranging from O5 to M8. It covers different luminosity classes but concentrates on main sequence stars, and it also contains some spectra for particularly rich metallicities. For the surveys in consideration, very young and very luminous stars should not be expected, but we include the entire library nevertheless (see Fig.1). Stars later than M8 are missing in the library, but they do show up in deep surveys like CADIS (Wolf et al. 1998). These objects are interesting on their own, of course, but they are so rare, that a couple of misclassifications do not hurt the statistics on other objects.

In earlier stages of the CADIS survey, we reported using the Gunn & Stryker (1983) atlas of stellar spectra (see e.g. Wolf et al. 1999), which has a number of disadvantages compared to the new work by Pickles. Our impression is that the Pickles spectra have a better calibration in the far-red wavelength range and are less affected by noise there. Especially, broad absorption troughs in M stars are rendered more accurately in the Pickles templates, which can be quite relevant for medium-band surveys. Also, they cover the NIR region and, e.g., the entire CADIS filter set all the way out to the $K^\prime$ band, thereby omitting the need for homemade extrapolations. Since it contains two different metallicity regimes, it covers the range of possible stellar medium-band colors better than the Gunn & Stryker atlas, most notably among M stars for colors sensitive to their deep absorption features and, e.g., among K stars for colors probing the Mg I absorption.

The atlas is not structured as a regular grid in the stellar parameters and we consider the resulting color library an unsorted set without internal structure. If variations in dust reddening are to be expected within the field as in the case of Galactic stellar observations, this effect should be treated as an additional parameter in the library.

For multi-color surveys aiming specifically at Galactic stars, one would ideally like to have a library organized as a regular grid in effective temperature, surface gravity and metallicity, which could, e.g., be derived from model atmospheres. Such a fine classification is not needed for extragalactic surveys, where the focus is on galaxies and quasars. We gained some experience with the stellar spectra from the model grid by Allard (1996), but we decided not to use it, since the overall colors seemed to be better matched by the Pickles library.

3.2 The galaxy library

The galaxy library is based on the template spectra by Kinney et al. (1996). These are ten SEDs averaged from integrated spectra of local galaxies ranging in wavelength from 125 nm to 1000 nm. The input spectra of quiescent galaxies were sorted by morphology beforehand to result in four templates called E, S0, Sa and Sb. The starburst galaxies were sorted by color into six groups yielding six more templates called SB6 to SB1. Based on the observation, that color and morphology of galaxies correlate, this template design seems reasonable. This way the classification can indirectly measure morphology of galaxies via their SED, at least as far as the locally determined color-morphology relation holds at higher redshift.

The templates contain a very deep unidentified absorption feature around 540 nm, which we supposed to be an artifact of the data reduction and eliminated. We left the abundant structures in the UV unchanged, although some of them might be noise and we do not know how to interprete them. We modelled a near-infrared addition heuristically by a simple law consistent with the $I-K^\prime$-colors of a sample of galaxies with known spectroscopic redshifts (see Paper II). Using this addition, we extended the spectra out to 2500 nm, and actually replaced the spectrum starting from 800 nm to eliminate the noise in the templates redwards of 800 nm (see Fig.1). Quiescent galaxies were extended according to $f_\nu \sim
\nu^{-1}$, while starburst galaxies seemed most consistent with an extension of $f_\nu \sim \nu^{-1/3}$.

We consider the templates to form a one-dimensional SED axis of increasingly blue galaxies and fill in more templates to obtain a dense grid of 100 SEDs. Our interpolation is done linearly in color space, and the number of filled-in SEDs is chosen such, that the color space is filled rather uniformly. The new SEDs are denominated as numbers from 0 to 99, where the ten original SEDs used for the interpolation reside at the following numbers:

E - S0 - Sa - Sb - S6 - S5 - S4 - S3 - S2 - S1

0 - 15 - 30 - 45 - 75 - 80 - 85 - 90 - 95 - 99.

Internal reddening is considered an important effect for the colors of galaxies and especially common among later types. While trying to account for it, we realized that its effect is merely one of shifting the zeropoint in the SED and hardly one of changing the redshift estimates. If we did introduce an independent reddening parameter, it would be almost colinear with the SED axis itself. Therefore, we opted for using the templates as determined from real galaxies and provided by Kinney et al. (1996), since they probably contain already a typical distribution of reddened objects. Due to our scheme of SED interpolation, we can still classify galaxies, which are reddened more or less than usual.

We also tried to change the SED interpolation scheme by relocating the templates to different SED numbers, which did not seem to improve the results. The color library was calculated for 201 redshifts ranging in steps of $\Delta z = 0.01$ from z=0 to z=2, finally containing $201 \times 100$ members. We did not intend to go beyond a redshift of 2, since our survey applications have typically not become deep enough, yet, to see such objects in useful numbers.

The main shortcoming of this library is that the 1-dimensional SED allows no variation in emission-line ratios independent of the global galaxy color. Since medium-band filters can contain strong emission-line signals from faint galaxies, an observed emission-line ratio detected by two suitably located filters can be in disagreement with the global SED traced by all other filters. Since especially the CADIS filters are placed to deliver multiple detections of emission lines at several selected redshifts, some degradation in real performance could be expected with respect to the simulation (see Paper II).

3.3 The quasar library

The quasar library is designed as a three-component model: We add a power-law continuum with an emission-line contour based on the template spectrum by Francis et al. (1991), and then apply a throughput function accounting for absorption bluewards of the Lyman-$\alpha $ line. We modeled a throughput function T0 after visually inspecting spectra of $z\approx 4$-quasars published by Storrie-Lombardi et al. (1996), and keep its shape constant (see Fig.3) while varying its scale to follow the increasing continuum depression $D_{\rm A}$ towards high redshift. Using data from Kennefick (1996) and Storrie-Lombardi et al. (1996) as a guideline, we arrived at

T (z) = T0(z/4.25)2 .     (28)

The intensity of the emission-line contour was varied only globally, i.e. with no intensity dispersion among the lines. As long as typically only one medium-band filter is brightened by a prominent emission line, the missing dispersion should not affect the classification (see Fig.2). For the intensity factor relative to the template, $e^\epsilon$, ten values were adopted ranging in steps of $\Delta
\epsilon = 0.25$ from $\epsilon = -0.5$ to $\epsilon = 1.75$ on a logarithmic scale, which is roughly 0.6 times to 5.7 times the template intensity. Originally, we tried a range from 0.3 times to 2.7 times, but the first twenty quasars found in CADIS contained mostly strong lines, which are better represented by the current limits.

The slope of the power-law continuum $f_\nu \sim \nu^\alpha$ was varied in 15 steps of $\Delta \alpha = 0.2$ ranging from $\alpha = -2.0$ to $\alpha = 0.8$. The library was calculated for 301 redshifts ranging in steps of $\Delta z = 0.02$ from z=0 to z=6, finally containing $301 \times 15 \times 10 = 45\,150$ members. As a future improvement one could imagine the inclusion of Seyfert I galaxies with nuclei of rather low luminosity, i.e. spectra coadded as a superposition of a host galaxy spectrum with a broad-line spectrum for the nucleus.

\psfig{figure=ds9808f2.ps,angle=270,clip=t,width=8cm} }}
\end{figure} Figure 2: The quasar library is based on an emission line contour taken from the quasar template spectrum by Francis et al. (1991). The wavelength scale runs from 100nm to 550nm and the flux is $\lambda f_\lambda $ in units of photons per nm, time intervall and sensitive area (arbitrary units)
Open with DEXTER

\psfig{figure=ds9808f3.ps,angle=270,clip=t,width=8cm} }}
\end{figure} Figure 3: For the quasars we assumed a throughput function for the Lyman-$\alpha $forest which we derived from a visual inspection of quasar spectra published by Storrie-Lombardi et al. (1996). The scale of this function depends on redshift and is shown for z=3.5, 4.25 and 5.0
Open with DEXTER

3.4 Calculation of color libraries

As a first step, the spectral libraries were transformed into color index libraries representing precisely the set of filters and instruments in use. The use of precalculated filter measurements rather than fully resolved flux spectra removes any computationally expensive calculations for synthetic photometry from the process of classifying the object list. The use of color indices omits the needs for any flux normalisation, further speeding up the classification. A list of $\sim$104 objects and $\sim$10 colors can be classified within a couple of hours on a SUN Enterprise II workstation even when using $\sim$105 templates.

For best results it is required that the color libraries are calculated for an instrumental setup resembling precisely the observed one, i.e. the synthetic photometry calculation has to take every dispersive effect into account. We decided to use photon flux colors derived from the observable object fluxes, averaged over the total system efficiency of each filter and assuming an average atmospheric extinction.

  \begin{figure}\par\includegraphics{ds9808f4.eps}\end{figure} Figure 4: These diagrams of B-V vs. R-I color show the class models of stars (black) and galaxies (grey) on the left, and stars (black) and quasars (grey) on the right to illustrate their location in color space. The colors plotted are photon flux color indices, which are offset compared to astronomical magnitudes, such that Vega has V-R= -0.41 and R-I= -0.61
Open with DEXTER

The shape of the filter transmission curves needs to be known precisely, and is in the best case measured within the imaging instrument itself under conditions identical to the real imaging application. This is easily possible with, e.g., the Calar Alto Faint Object Spectrograph (CAFOS) at the 2.2 m telescope on Calar Alto, Spain: in this instrument light from an internal continuum source is sent first through the filterwheel and second through the grism wheel before reaching the detector. Images are taken with and without the filter, so their ratio gives immmediately the transmission curve. Colors measured in narrow filters depend sensitively on the transmission curve, whenever strong spectral features are probed, e.g. the continuum drop at the Ca H/K absorption or the Mg I absorption in late-type stars. In these cases the curve needs to be known rather precisely, since otherwise the calibration would be off, and misclassifications could occur.

3.5 Potential improvements on the classification

The quality of the classification reached depends on just the three elements of the method: the quality of the measured data, the choice of the classifier and the quality of the libraries forming the knowledge database for the comparison. In principle, improvements on the performance can be achieved only in the following respects:

4 Simulation of competitive filter sets

Initially, it should be natural to assume that surveys with different filter sets show quite a different performance in terms of classification and redshift estimation. If a survey aims for objects with very particular spectra, the filter set can certainly be tailored to this purpose. If the objects of interest span a whole range of spectral characteristics, it is not trivial to guess via analytic thinking which filter set performs best.

Originally, this method was developed for CADIS using real CADIS data to test it. Then, we intended to optimize it and try to draw conclusions about survey strategies. Aiming for more insight into the question of filter choice, we performed Monte-Carlo simulations on different model surveys by feeding simulated multi-color observations of stars, galaxies and quasars into our algorithm. Here, we present a comparison of three fundamentally different filter sets and show their resulting performance for classification and redshift estimation.

The three model surveys spend the same total amount of exposure time on different filter sets, but use the same instrument, telescope and observing site. We chose the Wide Field Imager (WFI) at the 2.2-m-MPG/ESO-telescope on La Silla as a testing ground, because it provides a unique, extensive set of filters ranging from several broad bands to a few dozen medium bands to choose from. Furthermore, the WFI is a designated survey instrument which is extensively used by the astronomical community.


Table 1: Filters and 10-$\sigma $-magnitude limits for the three survey setups compared with Monte-Carlo simulations. The I-band filter is a long wavelength passband filter with a cut-on wavelength roughly at 780nm. Its far-red sensitivity limit is given by the dropping quantum efficiency of the CCDs. All filters are installed in the Widefield Imager at the 2.2 m-MPG/ESO telescope at La Silla observatory

$\lambda_{{\rm cen}}$/fwhm (nm)
name $m_{{\rm lim,A}}$ $m_{{\rm lim,B}}$ $m_{{\rm lim,C}}$

U 23.5 23.5 24.1
456/99 B 25.0 25.0 25.6
540/89 V 24.5 24.5 25.1
652/162 R 24.5 24.5 25.1
850/150* I 23.0 23.0 23.6

  23.6 23.98  
462/14   23.5    
485/31   23.4 23.78  
518/16   23.3    
571/25   23.2 23.58  
604/21   23.1    
646/27   23.0    
696/20   22.8 23.18  
753/18   22.7    
815/20   22.6 22.98  
856/14   22.5    
914/27   22.4 22.78  

4.1 Filtersets and exposure times

The three modelled surveys, here called setup "A'', "B'' and "C'', each spend 150ksec of exposure time distributed on the following filters (see also Table 1):

Setup A spends 50ksec on the five broad-band filters of the WFI (UBVRI) and 100ksec on twelve medium-band filters. Using ESO's exposure time calculator V2.3.1 for the WFI, we related exposure times to limiting magnitudes assuming a seeing of $1\hbox{$.\!\!^{\prime\prime}$ }4$, an airmass of 1.2, point source photometry and a night sky illuminated by a moon three days old. The exposure times are distributed such, that a quasar with a power-law continuum $f_\nu = \nu^\alpha$ and a spectral index of $\alpha = -0.6$ is observed with a uniform signal-to-noise ratio in all medium bands. As a result, the twelve medium bands each deliver a 10-$\sigma $ detection of an R=23.0-quasar.

\psfig{figure=ds9808f5.ps,angle=270,clip=t,width=15cm} }}
\end{figure} Figure 5: Monte-Carlo simulation for the classification of stars, galaxies and quasars with setup A and $R=22\ldots25$. The probability for a simulated object to be assigned to its original class is plotted over the color B-V for stars and over the redshift for galaxies and quasars, where B-V is an astronomical magnitude. In case of the galaxies black dots denote quiescent galaxies (SED < 60) and grey dots are starburst systems (SED $\ge $ 60). For bright objects the performance is limited by a systematic uncertainty of 3% assumed as a minimum error for the color indices
Open with DEXTER

Setup B spends 50ksec on the same broad bands but concentrates the 100ksec for medium-band work on only six filters reaching a uniform 10-$\sigma $ detection of a R=23.38-quasar then.

Setup C finally spends all 150ksec on the broad-band filters and omits the medium bands entirely.

In Sects. 4.2 and 4.3 we present the performance results for setup A, which has actually been used for a recent multi-color survey (Wolf et al. 2000). The relative performance of the three setups is compared in Sect.4.5. In Sect.4.6 we attempt to derive some basic analytic conclusions.

The simulations are carried out by creating a list of test objects from the color libraries presented in Sect.3. We assume a certain R-band magnitude and calculate the individual filter fluxes and corresponding errors for each object. Then we scatter the flux values of the objects according to a normal distribution of the flux errors. Finally, we recalculate the resulting color indices and index errors and use this object list as an input to the classification.

For the stars we use just 131 test objects as there are members in the library. For the test galaxies we take only every third member of the present library giving us 6700 objects. From the quasar library we use every seventh object resulting in 6450 quasars per test run.

These simulations show us how well the classification can possibly work, assuming that real objects will precisely mimic the library objects. Every real situation will contain differences between SED models and SED reality, sometimes called "cosmic variance'', which will worsen the performance of every real application. Nevertheless, the simulation highlights the principal shortcomings of the method itself and the chosen filter set in particular. Therefore, it can be used to judge the relative performance of different filter sets.

We run these tests for stars, galaxies and quasars with magnitudes of R= 22, 23, 24 and 25, respectively, in order to see how the classification performance degrades from optimum to useless with decreasing object flux. Given that R=23 corresponds roughly to the 10-$\sigma $ limit of setup A, the most shallow survey, we expect that the classification has almost reached its best performance at R=22. This is due to our assumption of a 3% uncertainty in the calibration, which causes even the brightest objects with the best photon statistics to perform not much better than an object detected only at a 30-$\sigma $ level. Finally, at R=25 objects are well detected only in the broad-band filters, while the medium bands yield only fluxes with errors higher than 40%. We expect the surveys to be almost useless at this level.

4.2 Classification performance for setup A


Table 2: Classification matrix for objects of R=23 in setups A and C as derived from Monte-Carlo simulations. An input vector containing a true number distribution of objects among the three object classes would be mapped by this matrix onto a classified distribution among four classes. Numbers below 0.005 are left blank

true class, setup A true class, setup C
classified as star galaxy quasar star galaxy quasar

0.98     0.96   0.03
galaxy 0.01 0.95 0.01 0.01 0.92 0.01
quasar   0.01 0.94     0.84
unclassified 0.01 0.04 0.05 0.03 0.08 0.12

We now look at the classification performance as achieved in setup A, the model survey with the highest number of filters, but the shallowest exposures in terms of photon flux detection:

  \begin{figure}\par\includegraphics{ds9808f6.eps}\end{figure} Figure 6: Monte-Carlo simulations for the multi-color redshifts of galaxies and quasars with $R=22\ldots25$ according to the Maximum Likelihood estimate in setup A. In case of the galaxies black dots denote quiescent galaxies and grey dots are starburst systems. This diagram shows the redshift estimates for all galaxies, however they were classified, but only for those quasars passing the classification limit of 75%
Open with DEXTER

For R=22 it turned out, that the classification works almost perfect (see uppermost row of diagrams in Fig.5). Generally, more than 99% of all test objects in any class are correctly classified.

At R=23, usually less than 5% of all objects in any class get lost to unclassifiability. Most affected with 10% incompleteness are quasars at z<2.5 with red spectra and weak emission lines. In this simulation, their location in color space overlaps with starburst galaxies at redshift 1.6<z<2.0. So far, our galaxy templates contain no information in the spectral range bluewards of the Lyman-$\alpha $ line leaving their U-band flux blank in this redshift range. As a result, the classification omits this band for the comparison with the library galaxies.

At R=24, about one third of the stars get lost. These are mostly yellow stars which are too faint in every filter to be classified unambiguously. Rather blue and rather red stars are still successfully classified, because either on the blue or on the far-red side of the filter set they still show significant fluxes and sufficiently accurate color indices. About a quarter of the galaxies would be missed, which are either blue galaxies not showing strong continuum features or red galaxies at redshifts low enough to render them faint in the far-red filters, too. Also, a quarter of the quasars is lost, either red z<2.5-quasars overlapping again with starburst galaxies at 1.6<z<2.0, or z>2.5-quasars with weak emission lines overlapping with early-type galaxies at z<0.4.

At R=25, the classification has finally become highly incomplete, but can still find very blue stars and very red extragalactic objects like quiescent galaxies and quasars at higher redshift (see bottom row in Fig.5, see also Fig.9 for precise numbers).

In all simulations, most incorrectly classified objects are unclassifiable and a minority of them are scattered into another class (see also classification matrix, Table 2). Especially, quasars seem to be not strongly contaminated by false candidates. At any magnitude in any setup, less than 1% of the galaxies are scattered into the quasar candidates except for setup C at R=25. Still, this contamination in the quasar class is not negligible, since a minor fraction of a rich class can be a large number in comparison with a poor class. In CADIS we found about 3% of the extragalactic objects at R<23 to be quasars. A contamination of less than 1% means that less than a quarter of the quasar candidates should be galaxies.

4.3 Multi-color redshifts in setup A

Figure 6 displays the comparison of the photometric MEV+ redshift estimates in setup A with the original true redshifts of the simulated objects. At R=22 (see uppermost row of Fig.6) the redshifts work quite satisfactorily for galaxies and quasars, which is demonstrated by nearly all objects residing on the diagonal of identity.

Towards fainter magnitudes, the galaxy redshifts degrade first at both the lower and the higher redshift ends. The deepest working magnitudes are reached in the redshift range of 0.5<z<1. This feature is due to the location of the 4000Å-break: When the break is located in the central wavelength region of the filter set, many filters are available on either side of the break to constrain its location rather well even for noisy data. For $z=0.15 \ldots 1.15$, the 4000Å-break is at least enclosed by mediumband filters. But if the break is located close to the edge of the filter set and, e.g., detected only by a noisy signal from a single filter, the true redshift interpretation can not be distinguished well from other options.

Quiescent galaxies still work reasonably fine at the higher redshift end, because they are brighter in the far-red filters. Starburst galaxies mostly degrade at higher redshift, because they have less discriminating (and trustworthily known) features in the UV than in the visual wavelength range.

The quasar redshifts remain rather precise at $z=2.2 \ldots 6.0$, all the way down to R=25. This is the redshift range, where the continuum step over the Lyman-$\alpha $line can be seen by the filter set and redshift estimates are expected to reach deep. Of course, at $z \mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil
$\displaystyle ... the R-band magnitude of quasars appears artificially faint, since it is strongly attenuated by the Lyman-$\alpha $ forest, but the redder filters contain higher flux levels sufficient to constrain the location of the continuum step. Redshift confusion arises first in the low-redshift region working its way up to higher redshifts with decreasing brightness. At z<2.2 the continuum shows no Lyman-$\alpha $ forest in our filter sets, but only a redshift invariant power-law shape. In this case, the multi-color redshifts rely solely on some emission-lines showing up in the medium bands.

\psfig{figure=ds9808f7.ps,angle=270,clip=t,width=8.3cm} }}
\end{figure} Figure 7: Monte-Carlo simulations for multi-color redshifts of galaxies and quasars with setup A and R=23 according to the estimators Maximum Likelihood (ML), Minimum Error Variance (MEV) and our advanced MEV with better handling of bimodalities (MEV+). In case of the galaxies black dots denote quiescent galaxies and grey dots are starburst systems. Shown are all galaxies, but quasars only if they passed the classification limit of 75%. It seems that ML and MEV+ are almost equivalent for quasars, while for galaxies MEV and MEV+ make no visible difference. Objects considered uncertain by the MEV estimator do not get an MEV estimate assigned, but they receive an ML estimate that can potentially be wrong
Open with DEXTER

\psfig{figure=ds9808f8.ps,angle=270,clip=t,width=8.3cm} }}
\end{figure} Figure 8: Distribution of true redshift estimation error ( $\Delta z = z_{{\rm mc}}-z$) among simulated galaxies with R=23 in setup A separated for quiescent and starburst objects. The solid line shows results for the Minimum Error Variance estimators (MEV and MEV+ are virtually the same) and the grey line those for the Maximum Likelihood estimator (ML). Starburst systems show higher errors and some large mistakes with $\Delta z>0.1$
Open with DEXTER

Some concentrated linear structures are visible off the diagonal at lower redshift with the best contrast at R=24. Their origin is a misidentification of weak emission lines: There are two structures mirrored at the diagonal following the linear relations $(1+z_{{\rm phot}})/(1+z)
\approx 1.74$ and $(1+z)/(1+z_{{\rm phot}}) \approx 1.74$. They are caused by a confusion of the MgII line with the H$\beta$ line. Another structure at $(1+z_{{\rm phot}})/(1+z) \approx 1.25$ arises from weak Lyman-$\alpha $ lines of very blue quasars which are interpreted as CIV lines, or weak CIV lines which are taken to be CIII lines. The extent of these structures across the diagram obviously depends on the visibility of the involved lines within the medium-band filter set. Finally, there is a large group of quasars estimated to be at nearly zero redshift, but truely strechting even beyond z=3. These are among the quasars with the lowest emission line intensities in the library, which basically display only their redshift-invariant power-law continuum in the filters.

4.4 Maximum likelihood redshift versus minimum error variance redshift

We now compare the relative performance of three different redshift estimators using the example of galaxies and quasars at a fixed magnitude of R=23. We have used the Maximum Likelihood (ML) method, the Minimum Error Variance (MEV) method and an advanced MEV method (MEV+) as we defined it in Sect.2.4.

While the Maximum Likelihood (ML) method always gives a redshift estimate, the Minimum Error Variance (MEV) method does not in the way we use it. Some objects have probability distributions which are close to flat yielding a redshift estimate that reflects primarily the redshift interval chosen for the template library rather than giving a reliable physical interpretation of the object. We do not assign any estimate to these uncertain objects (as we defined them in Sect.2.4), which is justified with their estimates being senseless anyway. A caveat for a direct performance comparison is the fact, that the MEV/MEV+ methods ignore the uncertain objects, whose selection function is redshift-dependent at the faint end and could furthermore be different in a real dataset due to cosmic variance.

As shown in Fig.7, the different estimators deliver rather comparable results with quite similar redshift accuracy. In case of the quasars the improved MEV+ method (which can detect bimodal probability distributions) performs different from the standard MEV method but rather similar to the ML method. This is due to bimodalities where the MEV estimate is a weighted average of the two present probability peaks, while the MEV+ estimate decides for the single peak containing the higher probability integral, which is likely to be roughly coincident with the ML estimate pointing at the redshift with the highest individual probability. Bimodalities can again be seen as linear structures off the main diagonal and arise from confusion among emission lines. In case of the pure MEV method the peak associated with the wrong solution is averaged with the correct solution residing on the diagonal, and the MEV plot shows smeared out structures around the diagonal rather than the linear ones like the ML or MEV+ plots.

4.5 The three setups in comparison

\psfig{figure=ds9808f9.ps,angle=270,clip=t,width=15cm} }}
\end{figure} Figure 9: Fraction Q of simulated objects which are correctly classified in the three different setups (solid line = setup A, grey line = setup B, dashed line = setup C). Except for faint stars, setup A and B are most successful
Open with DEXTER

\psfig{figure=ds9808f10.ps,angle=270,clip=t,width=17cm} }}
\end{figure} Figure 10: Variance of true redshift estimation error ( $\Delta z = z_{{\rm mc}}-z$) among simulated objects in the three different setups (solid line = setup A, grey line = setup B, dashed line = setup C) based on Maximum Likelihood estimate. Setups A and B provide the highest redshift resolution. Early type galaxies work better due to their higher continuum contrast at the 4000Å-break. Nearby quasars without continuum features do not work too well, since the redshift estimate has to rely on emission lines
Open with DEXTER

All setups are designed to spend the same amount of exposure time on a survey field, but distribute it on different filter sets. The pure broad-band survey, setup C, collects far more photons than the setups A and B, which are mainly exposing medium-band filters. But due to their higher spectral resolution, we expect setups A and B to contain more information per photon.

In fact, it turns out, that the classification performance of all three setups is quite similar, which implies that the lack of photons in the medium bands is pretty much compensated by their higher information content (see Fig.9). Among the small remaining differences, there is a tendency for the medium-band setups to be more efficient in finding quasars, supposedly because their spectra contain emission lines which are more prominent in narrow filters.

Also, there is a slight trend indicating that the medium-band surveys sustain a high level of completeness to somewhat fainter magnitudes and then drop more sharply than setup C. In the incompleteness range of very faint magnitudes, all setups perform rather equally meager.

The same trends are more clearly present among the multi-color redshifts, where we compare the statistics for the ML estimator (see Fig.10): Setups A and B provide a much better redshift resolution at the usual working magnitudes. They only fall behind the performance of setup C by a rather insignificant degree in the faintest regime, where the redshift estimates are close to unusable to start out with. This advantage of setup C results just from the broad bands being deeper by $0\hbox{$.\!\!^{\rm m}$ }6$, where the medium-band filters do not contribute to the result anymore.

For brighter objects, estimates in setup A are better than in setup B by an average factor of two, just reflecting the difference in spectral resolution. After all, the convolution of any measurement with a 0 $.\!\!^{\rm m}$03-Gaussian (to account for the calibration errors) makes better photon statistics useless among objects, which are detected at more than a $\sim$$30\sigma$-level. Thus, only increasing the number of filters improves the result for these objects while increasing the depth of any filter has no effect.

At this point we like to emphasize, that the calibration uncertainty limits the best achievable performance. We stress, that a large calibration error of e.g. 10% would turn an entire survey catalog into a collection of "less-than-10-$\sigma $-objects'', at least within our method. If calibration is expected to be a problem due to instrumentation or observing strategy, this conclusion strongly suggests that a large number of filters giving many noisy datapoints deliver more information than a few long exposed and formally deep filters that can not exactly be matched together.

Once more we look into the details within classes: it is no surprise that quiescent galaxies with rather prominent 4000Å-breaks receive more accurate redshift estimates than starburst galaxies with less contrasty continuum features. When comparing equal accuracies, we find that estimates for quiescent galaxies reach typically one magnitude deeper than for starburst objects. When aiming for a redshift resolution of $\sigma_z \approx 0.03$ among quiescent galaxies, it is interesting to see, that any of the medium-band surveys reaches two magnitudes deeper than the broad-band survey (setup C).

The quasar redshifts work best at z>2.2, when the estimation depends not only on emission lines but can take advantage of a strong continuum feature being present within the range of the filter set, i.e. the continuum suppression bluewards of the Lyman-$\alpha $ line. As in the case of galaxies, setups A and B have significantly stronger resolving power in terms of redshift than setup C, with setup A again being the best choice.

It is inspiring to conclude from these simulations, that photometric redshifts for quasars are feasible and are supposed to reach accuracies of $\sigma_z \mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil
...\offinterlineskip\halign{\hfil$\scriptscriptstyle ... in surveys with medium-band filters. Furthermore, observations from the CADIS survey find a surprising number of faint quasars, whose multi-color redshifts were indeed proven by spectroscopy to be as accurate as expected from the simulations (see Wolf et al. 1999 and Paper II).

Altogether, setup A seems to be the most successful among the ones discussed for photometric classification and redshift estimation. It has no disadvantages compared to the other setups, especially it does not lack working depth compared to the pure broad-band survey. Viewing the almost vanishing differences between setup A and B, there might be no incentive to increase the number of filters even higher.

Still, setup A shows a selection function for a successful classification with some redshift dependence. Among the quasars shown in Fig.5, we can see some vertical stripes containing objects at selected redshifts, which are not successfully classified anymore, while the neighboring redshifts still work well. In principal, a set of neighboring medium-band filters touching in wavelength and covering the important spectral range completely would most likely result in a selection function with the smoothest shape and smallest redshift dependence.

4.6 Analytic thoughts on filter choice

In this section, we would like to address the issue of choosing an optimal filter set by analytic thoughts based on a simplified picture of the classification problem. We assume, that we are still limited by a fixed amount of telescope time, which we can distribute over some filters. If different wavebands could be imaged simultaneously, it would be obvious that even a faintly exposed full-resolution spectrum would be better than an unfiltered white light exposure, as long as read-out noise of the recording detector is not an important constraint. Here, we want to discuss the less obvious scenario of consecutive exposures in different wavebands.

As mentioned in the introduction, the choice of the optimum filter set depends entirely on the goal of the survey. For surveys aiming at a particular type of objects with characteristic colors, tailored filter sets can be designed. But if we intend to integrate different survey applications into one observational program on a common patch of sky, then we need a single survey to identify virtually every object unambigously. In this scenario two choices have to be made:

Assuming constant filter width, the choice between
a) either fewer filters with more photons each,
b) or more filters with fewer photons each;
Assuming constant filter number, the choice between
a) broad filters with more photons and less resolution,
b) narrow filters with less photons and more resolution.

We first note, that if all colors were equally discriminating for each object, the choice would be arbitrary. Any distribution on any number of equally wide filters would provide the same total discriminative power and classification performance. In practice, objects can reside at many different redshifts and usually only part of their spectra have discriminating features.

We now try to obtain some insight into this question based on very basic template assumptions. For simplicity, we now just assume two different possible objects posed to the classification algorithm, with one of them being a quasar only distinguished by an emission line from another object with an otherwise identical spectrum.

Addressing choice (1), we find, that concentrating on few filters would mean that only few quasars display their emission line in a filter and can be classified correctly down to some limit, while many objects would be unclassifiable. The classification would lack completeness, but reach deep for a few objects. Distributing exposure time among many filters covering the entire spectrum would give every quasar a chance to show its emission line, which implies that every object is well classifiable but not to the same depth. The classification would remain rather complete and degenerate more sharply than in the case of few filters when reaching its limiting magnitude.

Addressing question (2), we assume one of the filters to observe the emission line and evaluate the line contrast obtained. As long as the line is completely contained in the filter bandpass, our signal, i.e. the absolute flux difference to the continuum induced by the line, is a constant value irrespective of the filter width. The noise is given by the square-root of the total flux from the object which increases along with the width of the bandpass. The optimum signal-to-noise ratio is obtained with a filter matching just the width of the emission line. Any narrower filter would cut off line flux, thus shrinking the signal more than the noise.

Using both conclusions we can ask for the optimum strategy when aiming for high sensitivity and completeness across some redshift range. This goal requires that we observe the emission line in any case regardless of the redshift. Therefore, we need n filters to cover the entire spectrum in question, depending on the filter width $\Delta \lambda \propto 1/n$. Given a fixed total amount of exposure time, the exposure time per filter and thereby the counts measured from the line are $S_{{\rm line}}
\propto 1/n$. The total flux $S_{{\rm tot}}$ in this filter depends on the same exposure factor and on $\Delta \lambda$, so that the $S_{{\rm tot}} \propto 1/n^2$ and $\sigma_{{\rm tot}}
\propto 1/n$. Therefore, the signal-to-noise ratio $S_{{\rm line}}/\sigma_{{\rm tot}} =$ const, independent of the number of filters in any set providing complete coverage.

In summary for the simple quasar example, we have a free choice on the filter set, as long as we cover the spectrum. It seems, that the width of the filters does not affect the magnitude limit for a successful classification, but it determines directly the redshift resolution. Having the free choice, many filters tailored to the typical width of quasar emission lines would be the best solution.

Another example is photometric star-galaxy separation. Some red stars display broad-band colors similar to some redshifted early-type galaxies. Good photometric accuracy is required to tell them apart, especially if only few filters are available. With medium-band filters enclosing the redshifted 4000-Å-break of the galaxies and probing the absorption bands of stars, the two classes can easily be discriminated even at rather noisy flux levels.

Let us assume the most general imaginable case for the classification problem, where the object spectra can have features with potentially any location and any width (due to redshift as well as class). The arbitrary location calls for a filter set covering the entire spectrum. Again, we are left with the choice of many narrow versus few broad filters mentioned in the simple quasar example just above. And again, as long as the features are smaller than the filter width, the choice of filters makes no difference to the classification, if the same total amount of telescope time is used.

We now consider an abstract information value I obtained by a survey. It depends on the number of filters n, on the photons collected in each of them $N_{{\rm ph}} (f)$ and on the information $I/N_{{\rm ph}} (f)$ that a single photon carries after passing through a given filter. If on average the same amount of information is obtained in every filter, we get:

$\displaystyle I = n \times N_{{\rm ph}} (f) \times \frac{I}{N_{{\rm ph}}}(f) .$     (29)

For complete coverage the number n of filters again depends on the filter width $\Delta \lambda \propto 1/n$. Given a fixed total amount of telescope time, the exposure time per filter is $\Delta t \propto 1/n$ and thus the number of photons collected is $N_{{\rm ph}} (f) \propto 1/n^2$. Since narrow filters show features with more contrast than broad filters, we can assume that the information per photon is inversely proportional to the filter width: $I/N_{{\rm ph}} (f) \propto 1/\Delta\lambda$, and thus $I/N_{{\rm ph}} (f) \propto n$. Altogether, the information content of the survey results to:

$\displaystyle I = n \times 1/n^2 \times n = \rm {const} .$     (30)

In theory, the amount of information in terms of classifiability of objects depends only on the total telescope time and not on the characteristic width of the filters, as long as they cover the entire spectral range in question. The smaller number of photons in the medium-band survey is compensated by the larger number of filters and the higher information content per photon. But this conclusion is based on three simplified assumptions:

In practice, there are several advantages for medium-band and mixed surveys compared to broad-band surveys, especially when combined with our classification scheme:

Especially the last three advantages can cause a medium-band survey to reach even deeper than a broad-band survey in terms of classification and redshift estimation, although its nominal flux detection limits might have suggested inferior performance to the intuitive judgement.

The disadvantage of a survey project involving many medium-band filters is, that it needs a larger minimum amount of telescope time, since a few constraints in observational strategy have to be met. An optimal survey has requirements for:

5 Conclusions for real multi-color applications

5.1 Calibration of colors

Obviously, the measurements also need a careful calibration among the wavebands. A large erroneus offset can be disastrous for the photometric classification of narrow class structures in color space. If, e.g., true stars were measured with shifted colors, the classification would potentially find it rather in the location of library galaxies or quasars, and vice versa. Also, the redshift estimates would be thrown off by color offsets.

Calibration problems are of greatest concern, when rare objects are searched and their class gets contaminated. Especially, when class volumes are almost touching in color space, already small calibration errors can push objects into the wrong class. E.g., in many filter sets the quasar class is not well separated from stars and galaxies. In the presence of a calibration error, abundant galaxies can be pushed into the quasar class potentially making up the largest population among the precious candidates. The shape of class volumes is likely to cause quite some redshift dependence for the contamination. Then objects in some redshift range can become virtually unidentifiable, if they are overwhelmed by contaminants.

If calibration errors were known and quantified, they could as well be removed. If they were present but not realized, the measurements would look too accurate and a seemingly faithful classification would be derived, which is potentially wrong. Thus, as long as the calibration errors are unknown, it is still important to take their potential size into account for the error estimates on which the classification is based. As a result, the performance of the classification for bright objects is indeed limited by the calibration error. We assume calibration errors on the order of 3% for the colors in our surveys, which implies that the quality of the classification saturates for objects that are more than $1\hbox{$.\!\!^{\rm m}$ }5$ brighter than the 10-$\sigma $-limits of the survey. On the other hand, if we assume for the moment poor data reduction or uncorrected galactic reddening changing the colors by, e.g., 10%, this would turn an entire survey catalog into a collection of "less-than-10-$\sigma $-objects'' -- a devastating effect for the survey quality.

An accurate relative calibration among many wavebands is best ensured by establishing a few spectrophotometric standard stars in each of the survey fields, a successful approach that we have made into a standard procedure in CADIS. This task can be carried out in a photometric night by taking spectra of the new standards and connecting these to published standard stars (Oke 1990). This way, spectrophotometric standards are available in every one of the survey exposures, which will not require any further calibartion efforts regardless of the conditions under which the regular imaging is carried out. Obviously, standard star spectra are supposed to cover the entire filter set, but if a mixture of (e.g. optical and infrared) instruments is used, the calibration will involve different procedures to be matched.

5.2 The optimum survey strategy

The most basic result of our study on the performance of different multi-color surveys is, that even for small systematic errors in the color indices of $s_{g-h} =
0\hbox{$.\!\!^{\rm m}$ }03$, a survey with 17 bands performs better in classification and redshift estimation than one with only few bands. For the 17 band case we found that the limiting magnitude for reasonable performance is reached when the typical statistical (i.e. photon noise) errors are on the order of 10%. It is obvious, that larger systematic errors will worsen the performance and will allow even higher statistical errors before the survey deteriorates significantly. For the survey strategy this implies that pushing the statistical errors in each band well below the systematic errors will add nothing to the survey performance. When $\Delta t_{{\rm int}}$is the integration time required to reach $\sigma \approx 1/2 s_{g-h}$, the optimum number of bands N for a given amount of total time $t_{{\rm tot}}$ is roughly

$\displaystyle N = t_{{\rm tot}} / \Delta t_{{\rm int}} .$     (31)

Although our present study has been confined to the wavelengths region attainable by optical CCDs and did not address the total wavelength coverage of the survey explicitely, it is predictable that further bands extending the wavenlength coverage (e.g. by adding NIR bands) will have a larger effect than splitting the optical bands. In particular, the maximum redshift for a reliable galaxy classification will be extended.

As the color indices are the prime observables entering the classification and redshift estimation process, it is clear that any multi-color survey has to be processed such, that these indices are measured in an optimum way. For ground-based observations it is of great importance to avoid that variable observing conditions introduce systematic offsets between bands when the observations are taken sequentially. First of all, this requires to assess the seeing point spread function on every dataset very carefully. Second, one has to correct for the effect of variable seeing which might influence the flux measurement of star-like and extended objects in a different way.

In CADIS, we essentially convolve each image to a common effective point spread function and measure the central surface brightness of each object (see Paper II for details). This has the disadvantage, that the spatial resolution (i.e. the minimum separation of objects neighboring each other) is limited by the data with the worst seeing. However, it is not clear whether the obvious alternative -- deconvolution techniques -- can be optimized such that the systematic errors can be kept below a few percent for a wide variety of objects.

The performance of the MEV estimator depends critically on the assumption that not only the color indices but also their errors are determined correctly. For the survey strategy this implies, that an optimization of the photon noise errors under the expense that an accurate estimation of these errors is no longer possible may lead to worse performance than slightly larger errors which are known accurately.

5.3 Ongoing applications and their scientific goals

In this section, we want to mention examples for survey applications using this method and comment on the usefulness of our classification approach. A number of multi-color surveys have been conducted, where filters and exposure times were chosen to match some primary survey strategy. Although, none of these might have been optimal choices in terms of a general classification, we used or intend to use our approach to extract class and redshift data on the objects contained. These surveys are in chronological order of their beginning:

The Calar Alto Deep Imaging Survey (CADIS): Three broad-band and twelve medium-band filters have mostly been chosen to match the needs of the emission line survey in CADIS, while some of them fill in gaps in the spectral coverage. The multi-color part of CADIS has been used to study the evolution of the galaxy luminosity function at $z \mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil
$\displaystyle ..., to search for quasars at all visible redshifts and to use the observed faint stellar population to check models of the Galactic structure and the stellar luminosity function (Meisenheimer et al. 1998; Wolf et al. 1999; Fried et al. 2000; Phleps et al. 2000; Paper II);
A lensing study of the galaxy cluster Abell 1689: Two broad-band and seven medium-band filters have been chosen to separate well between the cluster galaxies at $z \approx 0.19$ and the background population. The galaxy luminosity function in the background of the cluster is compared to a control field taken from CADIS, and the cluster mass is estimated from weak lensing effects on the apparent luminosities (Dye et al. 2000);
A widefield survey for Classifying Objects by Medium-Band Observations in 17 filters (COMBO-17): the filters (setup A from Sect.4.1) are chosen to provide a selection function and a redshift accuracy for quasars and galaxies, which is as independent of redshift as feasible. The data will be used to study the faint end of the quasar luminosity function at all accessible redshifts $z \mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil
$\displaystyle ... and galaxy-quasar correlation at $z \mathrel{\mathchoice {\vcenter{\offinterlineskip\halign{\hfil
$\displaystyle ..., as well as weak lensing effects in the cluster group Abell 901/2 and in the open galaxy field (Wolf et al. 2000);
The Sloan Digital Sky Survey (SDSS): five broad filters have been chosen, which span the entire range of presently available CCD sensitivity. We intend to apply our classification to search for quasars and to separate stars from compact galaxies, where morphology data are not sufficient.

From simulations of the classification scheme presented in this paper, we expect in all these projects, that we should be able to classify virtually all objects above some magnitude limit purely by color, and that especially the medium-band surveys should have selection functions which are not very dependent of redshift. This way, we can omit morphological criteria for defining catalogs of the stellar vs. galaxy vs. quasar population. This conclusion leads to a number of advantages for our method, we like to state explicitely here:

6 Summary

We presented an innovative method that performs a multi-color classification and redshift estimation of astronomical objects in a unifying approach. The method is essentially based on templates and evaluates the statistical consistency of a given measurement with a database of spectral knowledge, serving as a second, very crucial input to the algorithm.

The introduction of this method is motivated by the quest for a statistically correct extraction of the information present in the color vectors of surveys with many filters. The method is derived from basic statistical principles and calculates probability density functions for each survey object telling us two different results simultaneously: the class membership and redshift estimates according to the Maximum Likelihood (ML) and Minimum Error Variance (MEV) estimators. We add our own version of the MEV technique featuring improved handling of bimodalities in the probability function.

Our choice for the database is a large, systematically ordered library containing templates for stars, galaxies and quasars, which are supposed to cover virtually all but some unusual members among each of the three object classes. The libraries were established from a few model assumptions and templates published by various authors and extracted from the literature.

The method can be implemented in a computationally very efficient way, by using directly color indices as object features. We showed that our color-based approach is expected to deliver results consistent with those from flux-based template-fitting algorithms.

The accuracy of the data calibration is a very important issue, constraining the design of the libraries and limiting the maximum achievable performance of the method via the effective photometric quality. Calibration errors can distort results and shrink the information output.

We carried out Monte-Carlo simulations for three model surveys using the same total exposure time but different filter sets. One of them is a UBVRI broad-band survey, while the other two expose two third of the time in various medium-band filters. Altogether, the performance of all three setups was rather similar despite the quite different numbers of collected photons. So it appears, that medium-band filters obtain more information per photon and thereby compensate the loss of depth in terms of flux detection, from which they suffer in comparison to broad bands. Among the differences, medium-band surveys performed better than the broad-band survey for finding quasars, and they provided much higher redshift resolution in their estimates. Also, in the presence of calibration errors or uncorrected reddening effects, bright objects are not easier to classify than faint ones, and a large number of shallow filters might provide more information than a small number of deeply exposed filters.

Based on simple analytic assumptions, we have discussed the relative information content of surveys with different characteristic filter width. All surveys using the same amount of total telescope time and filter sets stretching over the entire spectral range of interest, should perform equal in terms of classification. This theoretical conclusion depends on perfect calibration and perfect template knowledge.

In practice, the classification should reach deeper in medium-band surveys than in broad-band surveys, because the former are less affected by inaccuracies in the calibration and in the template library. Furthermore, the filters can be chosen to avoid noise from strong night sky emission lines which is not possible with broad-band filters.

In particular, using the proposed statistical classification approach in a suitable medium-band survey it should be possible

This method should be very suitable for many survey-type applications, which usually require only low spectral resolution and finite accuracy in the derivation of physical parameters, but aim for large samples to feed statistical studies and to search for rare and unusual objects. Of course, if you need a 100% sure confirmation on the nature of an individual object, or if you aim for high resolution studies, it gives you only a preselection of candidates.

In Paper II we show, that this method is very powerful and indeed of great practical relevance for multi-color surveys with many filters like in the case of CADIS. The results of our shown simulations compare well with the performance of a real survey, and therefore, they can in fact be used for testing the performance of future survey designs.

The authors thank H. H. Hippelein for helpful discussions on template fits and H.-M. Adorf for some on classification methods and their fine-tuning. We also thank D. Calzetti for kindly making available the galaxy templates in digital form. We would finally like to thank the referee, Dr. S. C. Odewahn, for detailed comments improving the paper. This work was supported by the DFG (Sonderforschungsbereich 439).



Copyright ESO 2001