A&A 475, 1159-1183 (2007)
DOI: 10.1051/0004-6361:20077638

Automated supervised classification of variable stars

I. Methodology

J. Debosscher¹ - L. M. Sarro^3,7 - C. Aerts^1,2 - J. Cuypers⁴ - B. Vandenbussche¹ - R. Garrido⁵ - E. Solano^6,7

1 - Instituut voor Sterrenkunde, KU Leuven, Celestijnenlaan 200B, 3001 Leuven, Belgium
2 - Department of Astrophysics, Radbout University Nijmegen, PO Box 9010, 6500 GL Nijmegen, The Netherlands
3 - Dpt. de Inteligencia Artificial , UNED, Juan del Rosal, 16, 28040 Madrid, Spain
4 - Royal Observatory of Belgium, Ringlaan 3, 1180 Brussel, Belgium
5 - Instituto de Astrofísica de Andalucía-CSIC, Apdo 3004, 18080 Granada, Spain
6 - Laboratorio de Astrofísica Espacial y Física Fundamental, INSA, Apartado de Correos 50727, 28080 Madrid, Spain
7 - Spanish Virtual Observatory, INTA, Apartado de Correos 50727, 28080 Madrid, Spain

Received 13 April 2007 / Accepted 7 August 2007

Abstract
Context. The fast classification of new variable stars is an important step in making them available for further research. Selection of science targets from large databases is much more efficient if they have been classified first. Defining the classes in terms of physical parameters is also important to get an unbiased statistical view on the variability mechanisms and the borders of instability strips.
Aims. Our goal is twofold: provide an overview of the stellar variability classes that are presently known, in terms of some relevant stellar parameters; use the class descriptions obtained as the basis for an automated ``supervised classification'' of large databases. Such automated classification will compare and assign new objects to a set of pre-defined variability training classes.
Methods. For every variability class, a literature search was performed to find as many well-known member stars as possible, or a considerable subset if too many were present. Next, we searched on-line and private databases for their light curves in the visible band and performed period analysis and harmonic fitting. The derived light curve parameters are used to describe the classes and define the training classifiers.
Results. We compared the performance of different classifiers in terms of percentage of correct identification, of confusion among classes and of computation time. We describe how well the classes can be separated using the proposed set of parameters and how future improvements can be made, based on new large databases such as the light curves to be assembled by the CoRoT and Kepler space missions.
Conclusions. The derived classifiers' performances are so good in terms of success rate and computational speed that we will evaluate them in practice from the application of our methodology to a large subset of variable stars in the OGLE database and from comparison of the results with published OGLE variable star classifications based on human intervention. These results will be published in a subsequent paper.

Key words: stars: variables: general - stars: binaries: general - techniques: photometric - methods: statistical - methods: data analysis

1 Introduction

The current rapid progress in astronomical instrumentation provides us with a torrent of new data. For example, the large scale photometric monitoring of stars with ground-based automated telescopes and space telescopes delivers us large numbers of high quality light curves. The HIPPARCOS space mission is an example of this and led to a large number of new variable stars discovered in the huge set of light curves. In the near future, new space missions will deliver even larger numbers of light curves of much higher quality (in terms of sampling and photometric precision). The CoRoT mission (Convection Rotation and planetary Transits, launched on 27 December 2006) has two main scientific goals: asteroseismology and the search for exoplanets using the transit method. The latter purpose requires the photometric monitoring of a large number of stars with high precision. As a consequence, this mission will produce excellent time resolved light curves for up to 60 000 stars with a sampling rate better than 10 min during 5 months. Even higher numbers of stars (>100 000) will be measured for similar purposes and with comparable sampling rate by NASA's Kepler mission (launch end 2008, duration 4 years). The ESA Gaia mission (launch foreseen in 2011) will map our Galaxy in three dimensions. About one billion stars will be monitored for this purpose, with about 80 measurements over 5 years for each star.

Among these large samples, many new variable stars of known and unknown type will be present. Extracting them, and making their characteristics and data available to the scientific community within a reasonable timescale, will make these catalogues really useful. It is clear that automated methods have to be used here. Mining techniques for large databases are more and more frequently used in astronomy. Although we are far from reproducing capabilities of the human brain, a lot of work can be done efficiently using intelligent computer codes.

In this paper, we present automated supervised classification methods for variable stars. Special attention is paid to computational speed and robustness, with the intention to apply the methods to the huge datasets expected to come from the CoRoT, Kepler and Gaia satellite missions. We tackle this problem with two parallel strategies. In the first, we construct a Gaussian mixture model. Here, the main goals are to optimize speed, simplicity and interpretability of the model rather than optimizing the classifiers' performance. In the second approach, a battery of state-of-the-art pattern recognition techniques is applied to the same training set in order to select the best performing algorithm by minimizing the misclassification rate. The latter methods are more sophisticated and will be discussed in more detail in a subsequent paper (Sarro et al., in preparation).

For a supervised classification scheme, we need to predefine the classes. Every new object in a database to be classified will then be assigned to one of those classes (called definition or training classes) with a certain probability. The construction of the definition classes for stellar variability is, therefore, an important part of this paper. Not only are these classes necessary for this type of classification method, they also provide us with physical parameters describing the different variability types. They allow us to attain a good view on the separation and overlap of the classes in parameter space. For every variability class, we derive descriptive parameters using the light curves of their known member stars. We use exclusively light curve information for the basic methodology we present here, because additional information is not always available and we want to see how well the classes can be described (and separated) using only this minimal amount of information. This way, the method is broadly applicable. It is easy to adapt the methods when more information such as colors, radial velocities, etc. is available.

The first part of this paper is devoted to the description of the stellar variability classes and the parameter derivation. The classes are visualized in parameter space. In the second part, a supervised classifier based on multivariate statistics is presented in detail. We also summarize the results of a detailed statistical study on Machine Learning methods such as Bayesian Neural Networks. Our variability classes are used to train the classifiers and the performance is discussed. In a subsequent paper, the methods will be applied to a large selection of OGLE (Optical Gravitational Lensing Experiment) light curves, while we plan to update the training classes from the CoRoT exoplanet light curves in the coming two years.

2 Description of stellar variability classes from photometric time series

We provide an astrophysical description of the stellar variability classes by means of a fixed set of parameters. These parameters are derived using the light curves of known member stars. An extensive literature search provided us with the object identifiers of well-known class members. We retrieved their available light curves from different sources. The main sources are the HIPPARCOS space data (Perryman & ESA 1997; ESA 1997) and the Geneva and OGLE ground-based data (Soszynski et al. 2002; Udalski et al. 1999a; Wyrzykowski et al. 2004). Other sources include ULTRACAM data (ULTRA-fast, triple-beam CCD CAMera), see Dhillon & Marsh (2001), MOST data (Microvariability and Oscillations of STars, see http://www.astro.ubc.ca/MOST/), WET data (Whole Earth Telescope, see http://wet.physics.iastate.edu/), ROTOR data (Grankin et al. 2007), Lapoune/CFHT data (Fontaine, private communication), and ESO-LTPV data (European Southern Observatory Long-Term Photometry of Variables project), see Sterken et al. (1995). Table 1 lists the number of light curves used from each instrument, together with their average total time span and their average number of measurements. For every considered class (see Table 2), we have tried to find the best available light curves, allowing recovery of the class' typical variability. Moreover, in order to be consistent in our description of the classes, we tried, as much as possible, to use light curves in the visible band (V-mag). This was not possible for all the classes however, due to a lack of light curves in the V-band, or an inadequate temporal sampling of the available V-band light curves. The temporal sampling (total time span and size of the time steps) is of primordial importance when seeking a reliable description of the variability present in the light curves. While HIPPARCOS light curves, for example, are adequate in describing the long term variability of Mira stars, they do not allow recovery of the rapid photometric variations seen in some classes such as rapidly oscillating Ap stars. We used WET or ULTRACAM data in this case, dedicated to this type of object. For the double-mode Cepheids, the RR-Lyrae stars of type RRd and the eclipsing binary classes, we used OGLE light curves, since they both have an adequate total time span and a better sampling than the HIPPARCOS light curves.

Table 1: The sources and numbers of light curves NLC used to define the classes, their average total time span $\langle T_{\rm tot}\rangle$ and their average number of measurements $\langle N_{\rm points}\rangle$ .

For every definition class, mean parameter values and variances are calculated. Every variability class thus corresponds to a region in a multi-dimensional parameter space. We investigate how well the classes are separated with our description and point out where additional information is needed to make a clear distinction. Classes showing a large overlap will have a high probability of resulting in misclassifications when using them in the training set.

The classes considered are listed in Table 2, together with the code we assigned to them and the number of light curves we used to define the class. We use this coding further in this paper, and in the reference list, to indicate which reference relates to which variability type. For completeness, we also list the ranges for $T_{\rm eff}$ , $\log g$ , and the range for the dominant frequencies and their amplitudes present in the light curves. The first two physical parameters cannot be measured directly, but are calculated from modeling. We do not use these parameters for classification purposes here because they are in general not available for newly measured stars. Also, for some classes, such as those with non-periodic variability or outbursts, it is not possible to define a reliable range for these parameters. The ranges for the light curve parameters result from our analysis, as described in Sect. 2.1.

We stress that the classes considered in Table 2 constitute the vast majority of known stellar variability classes, but certainly not all of them. In particular, we considered only those classes whose members show clear and well-understood visual photometric variability. Several additional classes exist which were defined dominantly on the basis of spectroscopic diagnostics or photometry at wavelengths outside the visible range. For some classes, we were unable to find good consistent light curves. Examples of omitted classes are hydrogen-deficient carbon stars, extreme helium stars, $\gamma$ or X-ray bursts, pulsars, etc. Given that we do not use diagnostics besides light curves at or around visible wavelengths in our methods presently, these classes are not considered here. In the following we describe our methods in detail. A summary of the different steps is shown in Fig. 1.

$\begin{figure} \par\includegraphics[angle=360,width=7cm,clip]{7638f1.eps} \end{figure}$	Figure 1: Schematic overview of the different steps (sections indicated) in the development and comparison of the classification methods presented in this paper.
Open with DEXTER

Table 2: Stellar variability classes considered in this study, their code, the number of light curves we used (NLC) and their source. Also listed (when relevant for the class) are the ranges for the parameters $T_{\rm eff}$ and $\log g$ if they could be determined from the literature. The last two columns list the range for the dominant frequencies (f₁) and their amplitudes (A₁₁) present in the light curves, resulting from our analysis (Sect. 2.1).

2.1 Light curve analysis and parameter selection

After removal of bad quality measurements, the photometric time series of the definition stars were subjected to analysis. First, we checked for possible linear trends of the form a+bT, with a the intercept, b the slope and Tthe time. These were subtracted, as they can have a large influence on the frequency spectrum. The larger the trend is for pulsating stars, the more the frequency values we find can deviate from the stars' real pulsation frequencies.

Subsequently, we performed a Fourier analysis to find periodicities in the light curves. We used the well-known Lomb-Scargle method (Scargle 1982; Lomb 1976). The computer code to calculate the periodograms was based on an algorithm written by J. Cuypers. It followed outlines given by Ponman (1981) and Kurtz (1985) focussed on speedy calculations. As is the case with all frequency determination methods, we needed to specify a search range for frequencies (f₀, f_N and $\Delta f$ ). Since we were dealing with data coming from different instruments, it was inappropriate to use the same search range for all the light curves. We adapted it to each light curve's sampling, and took the starting frequency as $f_0=1/T_{\rm tot}$ , with $T_{\rm tot}$ the total time span of the observations. A frequency step $\Delta f=0.1/T_{\rm tot}$ was taken. For the highest frequency, we used the average of the inverse time intervals between the measurements: $f_N=0.5\langle 1/\Delta T\rangle$ as a pseudo Nyquist frequency. Note that f_N is equal to the Nyquist frequency in the case of equidistant sampling. For particular cases, an even higher upper limit can be used (see Eyer & Bartholdi 1999). Our upper limit should be seen as a compromise between the required resolution to allow a good fitting, and computation time.

We searched for up to a maximum of three independent frequencies for every star. The procedure was as follows: the Lomb-Scargle periodogram was calculated and the highest peak was selected. The corresponding frequency value f₁ was then used to calculate a harmonic fit to the light curve of the form:

$\begin{displaymath}y(t)=\sum_{j=1}^{4} (a_j \sin{2\pi f_1 j t}+b_j \cos{2\pi f_1 j t})+b_0, \end{displaymath}$

(1)

with y(t) the magnitude as a function of time. Next, this curve was subtracted from the time series (prewhitening) and a new Lomb-Scargle periodogram was computed. The same procedure was repeated until three frequencies were found. Finally, the three frequencies were used to make a harmonic best-fit to the original (trend subtracted) time series:

$\begin{displaymath}y(t)=\sum_{i=1}^3\sum_{j=1}^{4} (a_{ij} \sin{2\pi f_i j t}+b_{ij} \cos{2\pi f_i j t})+b_0. \end{displaymath}$

(2)

The parameter b₀ is the mean magnitude value of the light curve. The frequency values f_i and the Fourier coefficients a_ij and b_ijprovide us with an overall good description of light curves, if the latter are periodic and do not show large outbursts. It is important to note, in the context of classification, that the set of Fourier coefficients obtained here is not unique: identical light curves can have different coefficients, just because the zero-point of their measurements is different. The Fourier coefficients are thus not invariant under time-translation of the light curve. Since we want to classify light curves, this is inconvenient. We ideally want all light curves, identical apart from a time-translation, to have the same set of parameters (called attributes when used for classifying). On the other hand, we want different parameter sets to correspond to different light curves as much as possible. To obtain this, one can first transform the Fourier coefficient into a set of amplitudes A_ij and phases PH_ij as follows:

		$\displaystyle A_{ij}=\sqrt{a_{ij}^2+b_{ij}^2},$	(3)
		$\displaystyle PH_{ij}=\arctan(b_{ij},a_{ij}),$	(4)

with the arctangent function returning phase angles in the interval ] $-\pi$ , $+\pi$ ]. This provides us with a completely equivalent description of the light curve:

$\begin{displaymath}y(t)=\sum_{i=1}^3\sum_{j=1}^{4} A_{ij}\sin(2\pi f_i j t+PH_{ij})+b_0. \end{displaymath}$

(5)

The positive amplitudes A_ij are already time-translation invariant, but the phases PH_ij are not. This invariance can be obtained for the phases as well, by putting PH₁₁ equal to zero and changing the other phases accordingly (equivalent to a suitable time-translation, depending on the zero-point of the light curve). Although arbitrary, it is preferable to choose PH₁₁ as the reference, since this is the phase of the most significant component in the light curve. The new phases now become:

$\displaystyle PH'_{ij}=\arctan(b_{ij},a_{ij})-\left(\frac{jf_i}{f_1}\right)\arctan(b_{11},a_{11}),$

(6)

with PH'₁₁=0. The factor (jf_i/f₁) in this expression is the ratio of the frequency of the jth harmonic of f_i to the frequency f₁, because the first harmonic of f₁ has been chosen as the reference. Note that these new phases can have values between $-\infty$ and $+\infty$ . We can now constrain the values to the interval $]-\pi,+\pi]$ , since all phases differing with an integer multiple of $2\pi$ are equivalent. This can be done using the same arctangent function:

$\displaystyle PH''_{ij}=\arctan(\sin(PH'_{ij}),\cos(PH'_{ij})).$

(7)

The parameters A_ij and PH''_ij now provide us with a time-translation invariant description of the light curves and are suitable for classification purposes. Note that this translation invariance strictly only holds for monoperiodic light curves, and is not valid for multiperiodic light curves. Alternate transformations are being investigated to extend the translation invariance to multiperiodic light curves as well. For ease of notation, we drop the apostrophes when referring to the phases PH''_ij.

Another important parameter, which is also calculated during the fitting procedure, is the ratio of the variances v_f1/v in the light curve, after and before subtraction of a harmonic fit with only the frequency f₁. This parameter is very useful for discriminating between multi- and monoperiodic pulsators. Its value is much smaller for monoperiodic pulsators, where most of the variance in the light curve can be explained with a harmonic fit with only f₁.

In total, we calculated 28 parameters starting from the original time series: the slope b of the linear trend, 3 frequencies, 12 amplitudes, 11 phases (PH₁₁ is always zero and can be dropped) and 1 variance ratio. This way, the original time series, which can vary in length and number of measurements, were transformed into an equal number of descriptive parameters for every star.

We calculated the same parameter set for each star, irrespective of the variability class they belong to. This set provided us with an overall good description of the light curves for pulsating stars, and even did well for eclipsing binaries. It is clear, however, that the whole parameter set might not be needed for distinguishing, say, between class A and class B. The distinction between a Classical Cepheid and a Mira star is easily made with only the parameters f₁ and A₁₁, other parameters are thus not necessary and might even be completely irrelevant for this example. For other classes, we have to use more parameters to reach a clear distinction.

With these 28 selected parameters, we found a good compromise between maximum separability of all the classes and a minimum number of descriptive parameters. Our class definitions are based on the entire parameter set described above. A more detailed study on statistical attribute selection methods is presented in Sect. 3.2.1, as this is closely related to the performance of a classifier.

2.2 Stellar variability classes in parameter space

The different variability classes can now be represented as sets of points in multi-dimensional parameter space. Each point in every set corresponds to the light curve parameters of one of the class' member stars. The more the clouds are separated from each other, the better the classes are defined, and the fewer the misclassifications which will occur in the case of a supervised classification, using these class definitions. As an external check for the quality of our class definitions, we performed a visual inspection of phase plots made with only f₁, for the complete set. If these were of dubious quality (or the wrong variability type), the objects were deleted from the class definition. It turned out to be very important to retain only definition stars with high-quality light curves. This quality is much more important than the number of stars to define the class, provided that enough stars are available for a good sampling of the class' typical parameter ranges. Visualizing the classes in multi-dimensional space is difficult. Therefore we plot only one parameter at a time for every class. Figures 2, 5, 6-10 show the spread of the derived light curve parameters for all the classes considered. Because the range can be quite large for frequencies and amplitudes, we have plotted the logarithm of the values (base 10 for the frequencies and base 2 for the amplitudes). As can be seen from Fig. 2, using only f₁ and A₁₁, we already attain a good distinction between monoperiodically pulsating stars such as Miras, RR-Lyrae and Cepheids. For the multiperiodic pulsators, a lot of overlap is present and more parameters (f₂, f₃, the A_2j and the A_3j) are needed to distinguish between those classes. If we look at the frequencies and amplitudes, we see that clustering is less apparent for the non-periodic variables such as Wolf-Rayet stars, T-Tauri stars and Herbig Ae/Be stars. For some of those classes, we only have a small number of light curves, i.e. we do not have a good "sampling'' of the distribution (selection effect). The main reason for their broad distribution is, however, the frequency spectrum: for the non-periodic variables, the periodogram will show a lot of peaks over a large frequency range, and selecting three of them is not adequate for describing the light curve. Selecting more than three, however, entails the danger of picking non-significant peaks. The phase values PH_1i corresponding to the harmonics of f₁ cluster especially well for the eclipsing binary classes, as can be expected from the nature of their light curves. These parameters are valuable for separating eclipsing binaries from other variables. The phase values for the harmonics of f₂ and f₃ do not show significant clustering structure. On the contrary, they tend to be rather uniformly distributed for every class and thus, they will likely not constitute very informative attributes for classification. This is not surprising, since these phases belong to less significant signal components and will vary more randomly for the majority of the stars in our training set. In the next section, we discuss more precise methods for assessing the separation and overlap of the classes in parameter space.

Complementary to these plots, we have conducted a more detailed analysis of the statistical properties of the training set. This analysis is of importance for a sensible interpretation of the class assignments obtained for unknown objects, since the class boundaries of the classifiers depend critically on the densities of examples of each class as functions of the classification parameters. This analysis comprises i) the computation of box-and-whiskers plots for all the attributes used in classification (see Figs. 3, 11, and 12 for example); ii) the search for correlations between the different parameters; iii) the computation of 1d, 2d and 3d nonparametric density estimates (see Fig. 4 for an easily interpretable hexagonal histogram); iv) clustering analysis of each class separately and for the complete training set. The results of this analysis are especially useful for guiding the extension of the training set as new examples become available to users, such as those from CoRoT and Gaia.

$\begin{figure} \par\includegraphics[angle=180,width=14.5cm,clip]{7638f2.ps} \end{figure}$

Figure 2: The range for the frequency f₁ (in cycles/day), its first harmonic amplitude A₁₁ (in magnitude), the phases PH₁₂ (in radians) and the variance ratio v_f1/v (varrat) for all the 35 considered variability classes listed in Table 1. For visibility reasons, we have plotted the logarithm of the frequency and amplitude values. Every symbol in the plots corresponds to the parameter value of exactly one light curve. In this way, we attempt to visualize the distribution of the light curve parameters, in addition to their mere range.

Open with DEXTER

$\begin{figure} \par\includegraphics[angle=-90,width=14.2cm,clip]{7638f3.eps} \end{figure}$

Figure 3: Box-and-whiskers plot of the logarithm of f₁ for 29 classes with sufficient members to define such tools in the training set. Central boxes represent the median and interquantile ranges (25 to 75%) and the outer whiskers represent rule-of-thumb boundaries for the definition of outliers (1.5 the quartile range). The box widths are proportional to the number of examples in the class.

Open with DEXTER

$\begin{figure} \includegraphics[origin=rb,angle=-90,width=12cm,clip]{7638f4.eps} \end{figure}$	Figure 4: Hexagonal representation of the two dimensional density of examples of the Classical Cepheids class in the $\log(P)$ - $\log(R_{21})$ space. The quantity $R_{21}\equiv A_{12}/A_{11}$ represents the ratio of the second to the first harmonic amplitude of f₁. This plot is comparable to Fig. 3 of Udalski et al. (1999b).
Open with DEXTER

$\begin{figure} \par\includegraphics[width=15cm,angle=180,scale=0.94,clip]{7638f5.ps} \end{figure}$	Figure 5: The range in amplitudes A_1j for the 3 higher harmonics of f₁, and the linear trend b. For visibility reasons, we have plotted the logarithm of the amplitude values.
Open with DEXTER

$\begin{figure} \par\includegraphics[width=15cm,angle=180,scale=0.94,clip]{7638f6.ps} \end{figure}$

Figure 6: The range for the frequencies f₂ and f₃ and the phases PH_1j of the higher harmonics of f₁. For visibility reasons, we have plotted the logarithm of the frequency values. Note the split into two clouds of the phase values PH₁₃ for the eclipsing binary classes. This is a computational artefact: phase values close to $-\pi$ are equivalent to values close to $+\pi$ , so the clouds actually represent a single cloud.

Open with DEXTER

$\begin{figure} \par\includegraphics[width=15cm,angle=180,scale=0.94,clip]{7638f7.ps} \end{figure}$	Figure 7: The range in amplitudes A_2j for the 4 harmonics of f₂. For visibility reasons, we have plotted the logarithm of the amplitude values.
Open with DEXTER

$\begin{figure} \par\includegraphics[width=15cm,angle=180,scale=0.94,clip]{7638f8.ps} \end{figure}$	Figure 8: The range in amplitudes A_3j for the 4 harmonics of f₃. For visibility reasons, we have plotted the logarithm of the amplitude values.
Open with DEXTER

$\begin{figure} \par\includegraphics[width=15cm,angle=180,scale=0.94,clip]{7638f9.ps} \end{figure}$	Figure 9: The range in phases PH_2j for the 4 harmonics of f₂. As can be seen from the plots, the distribution of these parameters is rather uniform for every class. They are unlikely to be good classification parameters, since for none of the classes, clear clustering of the phase values is present.
Open with DEXTER

$\begin{figure} \par\includegraphics[width=15cm,angle=180,scale=0.94,clip]{7638f10.ps} \end{figure}$	Figure 10: The range in phases PH_3j for the 4 harmonics of f₃. The same comments as for Fig. 9 apply here.
Open with DEXTER

3 Supervised classification

The class descriptions we attained, form the basis of the so-called "Supervised Classification''. This classification method assigns every new object to one of a set of pre-defined classes (called "training classes''), meaning that, given the time series characteristics described above, the system gives a set of probabilities that the source of the time series belongs to one of the set of classes listed in Table 1.

$\begin{figure} \par\includegraphics[angle=-90,width=13.8cm,clip]{7638f11.eps} \end{figure}$	Figure 11: Box-and-whiskers plot of the logarithm of R₂₁ for all classes in the training set. Central boxes represent the median and interquantile ranges (25 to 75%) and the outer whiskers represent rule-of-thumb boundaries for the definition of outliers (1.5 the quartile range). The boxes widths are proportional to the number of examples in the class.
Open with DEXTER

$\begin{figure} \par\includegraphics[angle=-90,width=13.8cm,clip]{7638f12.eps} \end{figure}$	Figure 12: Box-and-whiskers plot of the logarithm of the variance ratio v_f1/v (varrat) for all classes in the training set. Central boxes represent the median and interquantile ranges (25 to 75%) and the outer whiskers represent rule-of-thumb boundaries for the definition of outliers (1.5 the quartile range). The boxes widths are proportional to the number of examples in the class.
Open with DEXTER

A supervised classification can be done in many ways. The most suitable method depends on the kind of data to be classified, the required performance and the available computational power. We focus here on a statistical method based on multivariate analysis, also known as the "Gaussian Mixture Model''. We have chosen for a fast and easily adaptable code written in FORTRAN. We also summarize the results of a detailed study of other supervised classification methods, based on Artificial Intelligence techniques.

3.1 Multivariate Gaussian mixture classifier

We assume that the descriptive parameters for every class have a multinormal distribution. This is a reasonable assumption for a first approach. There is no reason to assume a more complicated distribution function, unless there is clear evidence. The added advantages of the multinormal distribution are the well-known properties and the relatively simple calculations. We use our derived light curve parameters to estimate the mean and the variance of the multinormal distributions. If the vector X_ij represents the parameters of light curve number j belonging to class i, the following quantities are calculated for every variability class. The class mean vector of length P (number of light curve parameters, P=28 in our method for example):

$\begin{displaymath}\overline{X_i}=\frac{1}{N_i}\sum_{j=1}^{N_i} X_{ij} \end{displaymath}$

(8)

and the class variance-covariance matrix of dimension $P~\times~P$ :

$\begin{displaymath}S_i=\frac{1}{N_i-1}\sum_{j=1}^{N_i}(X_{ij}-\overline{X_i})(X_{ij}-\overline{X_i})'. \end{displaymath}$

(9)

Every class is now defined by a mean vector $\overline{X_i}$ and a variance-covariance matrix S_i, which corresponds to the mean and the variance of a normal distribution in the one-dimensional case.

If we want to classify a new object, we first have to calculate the same light curve parameters as described in Sect. 2. We can then derive the statistical distance of this object with respect to the different classes, and assign the object to the nearest (most probable) class. If X denotes the parameters for the new object, we calculate the following statistical distance for every class:

$\begin{displaymath}D_i=(X-X_i)'S_i^{-1}(X-X_i)+\ln\vert S_i\vert, \end{displaymath}$

(10)

with |S_i| the determinant of the variance-covariance matrix (e.g. Sharma 1996). The first term of D_i is known as the squared Mahalanobis distance. The object is now assigned to class i for which D_i is minimal. This minimum of D_i is equivalent to the maximum of the corresponding density function (under the assumption of a multinormal distribution):

$\begin{displaymath}f_i(X)=\frac{1}{(2\pi)^{P/2}\vert S_i\vert^{1/2}}\exp{-\frac{1}{2}(X-\overline{X_i})'S_i^{-1}(X-\overline{X_i})}. \end{displaymath}$

(11)

This statistical class assignment method can cause an object to be assigned to a certain class even if its light curve parameters deviate from the class' typical parameter values. This is a drawback, and can cause contamination in the classification results. It does, however, has an important advantage: objects near the border of the class can still be correctly assigned to the class. If one is only interested in objects that are very similar to the objects used to define the class, one can define a cutoff value for the Mahalanobis distance. Objects that are too far from the class centers will then not be assigned to any of the classes. To illustrate this, consider a classifier where only f₁ would be used as a classification attribute, and suppose we are interested in $\beta$ -Cephei stars. If we do not want a star to be classified as $\beta$ -Cephei if the value of f₁ is larger than 15 c/d, we have to take a cutoff value for the Mahalanobis distance equal to 4 in frequency space (this value only holds for our definition of the $\beta$ -Cephei class). In terms of probabilities: objects with a Mahalanobis distance larger than 4 are more than $4\sigma$ away from the class center (the class' mean value for f₁), and are therefore very unlikely to belong to the class.

We emphasize the difference between a supervised classification method as defined here and an extraction method: a supervised classifier assigns new objects to one of a set of definition classes with a certain probability, given the object's derived parameters. An extractor, on the other hand, will only select those objects in the database for which the derived parameters fall within a certain range. Extractor methods are typically used by scientists only interested in one class of objects. The specified parameter range for an extractor (based on the knowledge of the variability type) can be chosen as to minimize the number of contaminating objects, to make sure that the majority of the selected objects will indeed be of the correct type. Of course, extraction methods can also be applied to our derived parameter set. The goal of our supervised classifier is much broader, however: we consider all the known variability classes at once and also get a better view on the differences and similarities between the classes. Moreover, our method does not need visual inspection of the light curves, while this was always needed in practice with extraction. On top of this, our class definitions can also be used to specify parameter ranges for extraction methods.

3.2 Machine learning classifiers

Following standard practice in the field of pattern recognition or statistical learning, we have adopted a parallel approach where we allow for more flexibility in the definition of the models used to classify the data. The Gaussian mixtures model presented in the previous section induces hyperquadratic boundaries between classes (with hyperspheres/hyperellipses as special cases). This has the advantage of providing a fast method for the detection of outliers (objects at large Mahalanobis distances from all centers) and easy interpretation of results. On the other hand, more sophisticated methods offer the flexibility to reproduce more complicated boundaries between classes, at the expense of more complex models with varying degrees of interpretability.

A common problem presented in the development of supervised classification applications based on statistical learning methods, is the search for the optimal trade-off between the two components of the classifier error. In general, this error is composed of two elements: the bias and the variance. The former is due to the inability of our models to reproduce the real decision boundaries between classes. To illustrate this kind of error, we can imagine a set of training examples such that any point above the $y=\sin(x)$ curve belongs to class A and any point below it, to class B. Here, classes A and B are separable (unless we add noise to the class assignment), and the decision boundary is precisely the curve $y=\sin(x)$ . Obviously, if we try to solve this toy classification problem with a classifier inducing linear boundaries we will inevitably have a large bias component to the total error. The second component (the variance) is due to the finite nature of the training set and the fact that it is only one realization of the random process of drawing samples from the true (but unknown) probability density of having an object at a given point in the hyperspace of attributes.

If the model used to separate the classes in the classification problem is parametric, then we can always reduce the bias term by adding more and more degrees of freedom. In the Gaussian mixtures case, where we model the probability densities with multivariate Gaussians, this would be equivalent to describing each class by the sum of several components (i.e. several multivariate Gaussians). It has to be kept in mind, however, that there is an optimal number of components beyond which the decrease in the bias term is more than offset by an increase of the variance, due to the data being overfitted by the complexity of the model. The natural consequence is the loss of generalization capacities of the classifier, where the generalization ability is understood as the capacity of the model to correctly predict the class of unseen examples based on the inferences drawn from the training set.-0.5mm

We computed models allowing for more complex decision boundaries where the bias-variance trade-off is sought, using standard procedures. Here we present brief outlines of the methods and a summary of the results, while a more detailed analysis will be published in a forthcoming paper (Sarro et al., in preparation). We made use of what is widely known as Feature Selection Methods. These methods can be of several types and are used to counteract the pernicious effect of irrelevant and/or correlated attributes for the performance of classifiers. The robustness of a classifier to the degradation produced by irrelevance and correlation depends on the theoretical grounds on which the learning algorithms are based. Thus, detailed studies have to be conducted to find the optimal subset of attributes for a given problem. The interested reader can find a good introduction to the field and references to the methods used in this paper in Guyon & Elisseeff (2003).

We adopted two strategies: training a unique classifier for the 29 classes with sufficient stars for a reliable estimate of the errors, or adopting a multistage approach where several large groups with vast numbers of examples and well identified subgroups (eclipsing binaries, Cepheids, RR-Lyrae and Long Period Variables) are classified first by specialized modules in a sequential approach and then, objects not belonging to any of these classes are passed to a final classifier of reduced complexity.

3.2.1 Feature selection

Feature selection methods fall into one of two categories: filter and wrapper methods. Filter methods rank the attributes (or subsets of them) based on some criterion independent of the model to be implemented for classification (e.g., the mutual information between the attribute and the class or between attributes, or the statistical correlation between them). Wrapper methods, on the other hand, explore the space of possible attribute subsets and score each combination according to some assessment of the performance of the classifier trained only on the attributes included in the subset. The exhaustive search for an optimal subset in the space of all possible combinations rapidly becomes unfeasible as the total number of attributes in the original set increases. Therefore, some sort of heuristic search, based on expected properties of the problem, has to be used in order to accomplish the selection stage in reasonable times.

We applied several filtering techniques based on Information Theory (Information Gain, Gain Ratio and symmetrical uncertainty) and statistical correlations to the set of attributes described in Sect. 2.1, extended with peak-to-peak amplitudes, harmonic amplitude ratios (within and across frequencies) and frequency ratios. These techniques were combined with appropriate search heuristics in the space of feature subsets. Furthermore, we also investigated feature relevance by means of wrapper techniques applied to Bayesian networks and decision trees, but not to the Bayesian combination of neural networks or to Support Vector Machines due to the excessive computational cost of combining the search for the optimal feature subset and the search for the classifier's optimal set of parameters. The Bayesian model averaging of neural networks in the implementation used here, incorporates automatic relevance determination by means of hyperparameters. For this reason, we have not performed any feature selection.

In general, there is no well-founded way to combine the results of these methods. Each approach conveys a different perspective of the data and it is only by careful analysis of the rankings and selected subsets that particular choices can be made. As a general rule, we have combined the rankings of the different methodologies when dealing with single stage classifiers, whereas for sequential classifiers, each stage had its own feature selection process. When feasible in terms of computation time (e.g. for Bayesian networks), the attribute subsets were scored in the wrapper approach. Otherwise, several filter methods were applied and the best results used.

3.2.2 Bayesian networks classifier

Bayesian networks are probabilistic graphical models where the uncertainty inherent to an expert system is encoded into two basic structures: a graphical structure S representing the conditional independence relations amongst the different attributes, and a joint probability distribution for its nodes (Pearl 1988). The nodes of the graph represent the variables (attributes) used to describe the examples (instances). There is one special node corresponding to the class attribute. Here, we have constructed models of the family known as k-dependent Bayesian classifier (Sahami 1996) with k, the maximum number of parents allowed for a node in the graph, set to a maximum of 3 (it was checked that higher degrees of dependency did not produce improvements in the classifier performance).

The induction of Bayesian classifiers implies finding an optimal structure and probability distribution according to it. We have opted for a score and search approach, where the score is based on the marginal likelihood of the structure as implemented in the K2 algorithm by Cooper & Herskovits (1992). Although there are implementations of the k-dependent Bayesian classifiers for continuous variables, also known as Gaussian networks, we have obtained significantly better results with discretized variables. The discretization process is based on the Minimum Description Length principle as proposed in Fayyad & Irani (1993). It is carried out as part of the cross validation experiments to avoid overfitting the training set.

3.2.3 Bayesian average of artificial neural networks classifier

Artificial neural networks are probably the most popular methodology for classification and clustering. They are taken from the world of Artificial Intelligence. In its most frequent implementation, it is defined as a feedforward network made up of several layers of interconnected units or neurons. With appropriate choices for the computations carried out by the neurons, we have the well-known multilayer perceptron. Bishop (1995) has written an excellent introductory text to the world of neural networks, statistical learning and pattern recognition.

We do not deviate here from this widely accepted architecture but use a training approach other than the popular error backpropagation algorithm. Instead of the maximum likelihood estimate provided by it, we use Bayesian Model Averaging (BMA). BMA combines the predictions of several models (networks) weighting each by the a posterior probability of its parameters (the weights of network synapses) given the training data. For a more in-depth description of the methods, see Neal (1996) or Sarro et al. (2006). In the following, we use the acronym BAANN to refer to the averaging of artificial neural networks.

3.2.4 Support vector machines classifier

Support vector machines (SVM) are based on the minimization of the structural risk (Gunn et al. 1997). The structural risk can be proven to be upper-bounded by the sum of the empirical risk and the optimism, a quantity dependent on the Vapnik-Chervonenkis dimension of the chosen set of classifier functions. For linear discriminant functions, minimizing the optimism amounts to finding the hyperplane separating the training data with the largest margin (distance to the closest examples called support vectors). For nonlinearly separable problems, the input space can be mapped into a higher dimensional space using kernels, in the hope that the examples in the new hyperspace are linearly separable. A good presentation of the foundations of the method can be found in Vapnik (1995). Common choices for the kernels are nth degree polynomial and Gaussian radial basis functions. The method can easily incorporate noisy boundaries by introducing regularization terms. We used radial basis functions kernels. The parameters of the method (the complexity or regularization parameter and the kernel scale) are optimized by grid search and 10-fold cross validation.

4 Classifier performance

One of the central problems of statistical learning from samples is that of estimating the expected error of the developed classifiers. The final goal of automatic classification, as mentioned above, is to facilitate the analysis of large amounts of data which would otherwise be left unexplored because the amount of time needed for humans to undertake such an analysis is incommensurably large. This necessity cannot mask the fact that classifiers have errors and these need to be quantified if scientific hypotheses are to be drawn from their products.

When developing a classifier, the goal is to maximize the number of correct classifications of new cases. Given the classification method, the performance of a supervised classification depends, amongst other factors that measure the faithfulness of the representation of the real probability densities by the training set, on the quality of the descriptive parameters used for classifying. We seek a set of classification attributes which describes most light curves well and provides a good separation of the classes in attribute space.

Several methodologies exist to evaluate classifiers. A common way of testing a classifier's performance is feeding it with objects with a known member class and derive how many of them are correctly classified. This method is called "cross validation'' in the case that the complete training set is split up into two disjointed sets: a training set and a set that will be classified, called the validation set. It is also possible to use the complete set for both training and classifying. This is known as "resampling''. This is no longer a cross validation experiment, since the objects used for training and for classifying are the same. For a real cross validation experiment, the objects to classify must be different from the objects in the training set, in order to have statistical independence. The resampling method thus has a bias towards optimistic assessment of the misclassification rate, compared to a cross validation method. Another possibility (called holdout procedure) consists of training the classifier with a subset of the set of examples and evaluating its error rates with the remainder. Depending on the percentage split it can be biased as well, but this time in the opposite (pessimistic) direction. Finally, the most common approach to validating classification models is called k-fold cross validation. This consists of dividing the set of examples into k folds, repeating k times the process of training the model with k-1 folds and evaluating it with the kth fold not used for training. Several improvements to this method can be implemented to reduce the variance of its estimates, e.g. by assuring a proportional representation of classes in the folds (stratified cross validation). Recent proposals include bolstered resubstitution and several variants. Good and recent overviews of the problem with references to relevant bibliography can be found in Demsar (2006) and Bouckaert (2004).

4.1 Gaussian mixture classifier

For the simplest classifier, we also considered the simplest performance test by adopting the resampling approach. Using this method, we already get an idea of the overlap and separability of the classes in parameter space.

The total number of correct classifications expressed as a percentage, can be rather misleading. For example, if our training set contains many example light curves for the well-separated classes, we will have a high rate of correct classifications, even if the classifier performs very badly for some classes with only a small number of training objects. Therefore, it is better to judge the classifier's performance by looking at the "confusion matrix''. This is a square matrix with rows and columns having a class label. It lists the numbers of objects assigned to every class in the training set after cross validation. The diagonal elements represent the correct classifications, and their sum (trace of the matrix) divided by the total number of objects in the training set, equals the total correct classification rate. The off-diagonal elements show the number of misclassified (confused) objects and the classes they were assigned to. In this way, we get a clear view on the classifier's performance for every class separately. We can see which classes show high misclassification rates and are thus not very well separated using our set of classification attributes.

Table 3 shows the confusion matrix for a subset of 25 variability classes. These are the classes with more than 13 member stars each. We have chosen not to take the classes with fewer member stars into account here, because the number of classification attributes is limited by the number of member stars in the class. This is merely a numerical limitation of the multivariate Gaussian mixture classifier: if the number of defining class members is equal to or lower than the number of classification attributes, the determinant of the variance-covariance matrix will become equal to zero. This makes it impossible to calculate the statistical distance with respect to the class using Eq. (10). We used 12 light curve parameters to perform the classification (the smallest class in this set contains 13 member stars): the three frequencies f_i, the four amplitudes A_1j, the phases PH_1j, the linear trend b and the variance ratio. The average correct classification rate is about $69\%$ for this experiment. As can be seen from the matrix in Table 3, the monoperiodic pulsators such as MIRA, CLCEP, DMCEP, RRAB, RRC and RRD are well separated. Some of the multiperiodic pulsators are also well identified (SPB, GDOR). A lot of misclassifications (fewer than $50\%$ correct classifications) occur for the following multiperiodic pulsators: BE, PVSG, DSCUT. Also, some of the irregular and semi-regular variables show poor results (SR, WR, HAEBE, TTAU) as was emphasized in Sect. 2.2.

Depending on the intended goal, it can be better to take fewer classes into account. For example, when the interest is focused on a few classes only, using fewer classes will decrease the risk of misclassifying members of those classes. To illustrate this, Table 4 shows the confusion matrix for only 14 classes using the complete set of 28 light curve parameters defined in Sect. 2.1 to perform the classification. We did not include the classes with very irregular light curves or the less well-defined classes such as BE, CP, WR and PVSG.

The average correct classification rate amounts to $92\%$ for this experiment. It is clear that the monoperiodically pulsating stars are again very well separated (MIRA, CLCEP, DMCEP, RRAB, RRC and RRD). Most of the classes with multiperiodic variables also show high correct classification rates now (SPB, GDOR). Confusion is still present for the DSCUT and the BCEP classes. This is normal, as these stars have non-radial oscillations with similar amplitudes and with overlap in frequencies. For those classes in particular, additional (or different) attributes are necessary to distinguish them, e.g. the use of a color index as we will discuss extensively in our future application of the methods to the OGLE database. Parameter overlap (similarity) with other classes is the main reason for misclassifications if only light curves in a single photometric band are available. Note the high correct classification rate for the three classes of eclipsing binaries (EA, EB and EW). Some of their light curves (mainly from the EA class) are highly non-sinusoidal, but they are nevertheless well described with our set of attributes.

The higher correct classification rates for this classification experiment with 14 classes is caused by the removal of the most "confusing'' classes compared to the experiment with 25 classes, and the increased number of discriminating attributes (this was tested separately). Note that the use of fewer classes for classifying also implies more contamination of objects which actually belong to none of the classes in the training set. This can effectively be solved by imposing limits on the Mahalanobis distance to the class centers. Objects with a Mahalanobis distance larger than a certain user-defined value, will then not be assigned to any class.

Table 3: The Confusion Matrix for the Gaussian Mixture method, using 25variability classes and 12 classification attributes. The last but one line lists the total number of light curves (TOT) to define every class. The last line lists the correct classification rate (CC) for every class separately. The average correct classification rate is about $69\%$ .

Table 4: The confusion matrix for the Gaussian mixture method using 14variability classes and 28 classification attributes. The last but one line lists the total number of light curves (TOT) to define every class. The last line lists the correct classification rate (CC) for every class separately. The average correct classification rate is about $92\%$ .

4.2 Machine learning techniques

Selecting a methodology amongst several possible choices is in itself a statistical problem. Here we only summarize the results of a complete study comparing the approaches listed in Sect. 3.2, the details of which will be published in a specialized journal in the area of Pattern Recognition.

As explained in Sect. 3.2, one reason models can fail is that they are not flexible enough to describe the decision boundaries required by the data (the bias error). Another reason is because the training set, due to its finite number of samples, is never a perfect representation of the real probability densities (otherwise one would work directly with them and not with examples). Since learning algorithms are inevitably bound to use the training set to construct the model, any deficiency or lack of faithfulness in their representation of the probability densities will translate into errors. The bias-variance trade-off explained above is somehow a way to prevent the learning algorithm from adjusting itself too tightly to the training data (a problem known as overfitting) because its ability to generalize depends critically on it. Finally, irrespective of all of the above, we cannot avoid dealing with confusion regions, i.e., regions of parameter space where different classes coexist.

For the machine learning technique, we selected the combination of 10 sorted runs of 10-fold cross validation experiments together with the standard t-test (Demsar 2006). This combination assures small bias, a reduced variance (due to the repetition of the cross validation experiments) and replicability, an issue of special importance since these analyses will be iterated as the training set will be completed with new instances for the poorly represented classes and new attributes from projects such as CoRoT, Kepler and Gaia.

In the following, we have split the results for single stage and sequential classifiers. It should be born in mind that the misclassification rates used in the following sections include entries in the confusion matrices which relate eclipsing binary subtypes. These are amongst the largest contributions to the overall misclassification rate and are due to a poor definition of the subtypes as argued in Sarro et al. (2006) and as is widely known. In future applications of the classifiers (i.e. for CoRoT data) the specialized classifier presented in Sarro et al. (2006) and its classification scheme will be used. Therefore, the misclassification rates quoted below are too pessimistic by an estimated 2%.

4.2.1 Single stage classifiers

Table 5 shows the confusion matrix for the Bayesian model averaging of artificial neural networks. This methodology produces an average correct classification rate of 70%. For comparison, the second best single stage classifier measured by this figure is the 3-dependent Bayesian classifier with an overall rate of success of 66%.

According to the t-test run applied to the ten sorted runs of 10-fold cross validation, the probability of finding this difference under the null hypothesis is below 0.05%. However, this difference (equivalent to 73 more instances classified correctly by the ensemble of neural networks) has to be put into the context of a more demanding computational requirement of the method (several hours training time in a single 2.66 GHz processor) compared to the almost instantaneous search for the Bayesian network. For comparison, the classical C4.5 algorithm (Quinlan 1993) attains only slightly worse performances (averages of 65.2) at the expense of a more costly determination of the optimal parameters and greater variance with respect to the training sample.

Support Vector Machines obtain much poorer results (of the order of 50% correct identifications). We searched the parameter space as closely as possible given the computational needs of a cross validation experiment with 30 classes. The best combination found is not able to compete with other approaches. It is always possible that we missed an island of particularly good performance in the grid search but the most plausible explanation for the seemingly poor results is that SVMs are not optimized for multiclass problems. These are typically dealt with by reducing them to many two-class problems, but most implementations assume a common value of the parameters (complexity and radial basis exponent in our case) for all boundaries.

Table 5: The confusion matrix for the Bayesian model averaging of artificial neural networks. The last but one line lists the total number of light curves (TOT) to define every class. The last line lists the correct classification rate (CC) for every class separately as measured by 10 fold cross validation.

4.2.2 Sequential classifiers

One of the most relevant characteristics of the stellar variability classification problem is the rather high number of classes dealt with. Trying to construct a single stage classifier for such a large number of different classes implies a trade off between the optimal values of the model parameters in different regions of attribute space. We constructed an optimal multistage classifier in the perspective of dividing the classification problem into several stages, during each of which a particular subset of the classes is separated from the rest.

Table 6: The confusion matrix for the Bayesian model averaging of artificial neural networks and the two class problem. The last but one line lists the total number of light curves (TOT) to define every class. The last line lists the correct classification rate (CC) for every class separately as measured by 10-fold cross validation. Separation between: A: eclipsing binaries (ECL) and all other types; B: Cepheids (CEP) and all other types; C: long period variables (LPV) and all other types except ECL and CEP; D: RR Lyrae stars (RR) from all other types except ECL, CEP and LPV.

Table 7: The confusion matrix for the Bayesian model averaging of artificial neural networks. The last but one line lists the total number of light curves (TOT) to define every class. The last line lists the correct classification rate (CC) for every class separately as measured by 10-fold cross validation. Separation between: A: Cepheids; B: long period variables; C: RR Lyrae stars.

Table 8: The confusion matrix for the Bayesian model averaging of artificial neural networks for the variables not assigned to any group. The last but one line lists the total number of light curves (TOT) to define every class. The last line lists the correct classification rate (CC) for every class separately as measured by 10-fold cross validation.

Table 9: The complete confusion matrix for the Bayesian model averaging of artificial neural networks. The last but one line lists the total number of light curves (TOT) to define every class. The last line lists the correct classification rate (CC) for every class separately as measured by 10-fold cross validation.

We have selected four subgroups, one for each of the stages of the classifier. The choice was based on the internal similarities between instances in a group (intra cluster distances) and the separations between different groups. The four groups are eclipsing binaries (EA, EB, EW), Cepheids (CLCEP, PTCEP, RVTAU, DMCEP), long period variables (MIRA, SR) and RR-Lyrae stars (RRAB, RRC, RRD). These groups are characterized by having significant entries in the confusion matrices for elements in each group and small contributions to these matrices across groups. We have trained sequential classifiers in the sense that the subsequent classifiers are not trained with the classes identified first. For example, if the first stage classifier is trained to separate eclipsing variables from the others, the second classifier will not have eclipsing variables in its training set. This way, given an instance, we can construct the complete class probability table as a product of conditional probabilities.

The experiments consists of performing 10 runs of 10-fold cross validation for each stage with SVMs, Bayesian k-dependent networks and Bayesian neural network averages. The order in which the groups are filtered is altered in order to test the 24 possible permutations. Each stage is preceded by a feature selection process that selects the optimal subset of features for each particular problem (as opposed to the single feature selection step of single stage classifiers). The results of the experiments consist of several confusion matrices of dimension 2 for each 2 class problem, and several other confusion matrices for the classification of instances within these main groups. These latter matrices do not depend on the order of the assignment of groups to stages. With only one exception, all statistical tests were inconclusive in the sense of not providing enough evidence for the rejection of the null hypothesis (having a threshold of 99.95%) that the classifiers have equal performance. The only exception is the eclipsing binaries classifier, where BAANN clearly outperforms all other methods. In all other cases the similarities in performance are remarkable.

Table 6 shows the BAANN confusion matrices for the different classification stages, while Tables 7 and 8 show the corresponding matrices for the internal classification problem of each group and the remaining classes not assigned to any group. Finally, Table 9 shows the combined confusion matrix constructed by multiplying conditional probabilities. For example, the probability of an instance being a classical Cepheid (CLCEP) is the probability of not being an eclipsing binary (first stage) times the probability of belonging to the Cepheids group (second stage) times the probability of being a classical Cepheid (specialized classifier). The average correct classification rate is about 66% for this classifier.

5 Conclusions and future work

We presented a uniform description of the most important stellar variability classes currently known. Our description is based on light curve parameters from well-known member stars for which high quality data are available. The parameters are derived using Fourier analysis and harmonic fitting methods, and can be calculated on short timescales. The class descriptions obtained form the basis for a supervised classification method which produces class probabilities given a set of time series attributes. It is shown that our class descriptions are accurate enough to separate the monoperiodic variables, some of the multiperiodic variables, and eclipsing binaries. An obvious improvement to these capabilities will come from the addition of color information to the class definitions. This will be discussed in a subsequent paper, where our methodology will be applied to the OGLE database.

We obtained overall good classification results. The Gaussian mixture method is relatively simple and robust, and allows for an easy astrophysical interpretation. The machine learning algorithms, on the other hand, achieve a lower rate of misclassifications at the expense of longer training times, reduced interpretability and a higher level of statistical knowledge of the user. The following extensions/improvements are planned for the future:

Extending the number of variability classes and the number of member stars when more and better quality light curves become available, e.g. those from the exoplanet field data of the CoRoT mission. In particular, we will add the classes of stars with transits due to exoplanets, main-sequence stars with solar-like oscillations and stars with magnetic activity.
Improve the description of light curves by using methods other than Fourier analysis. Wavelet analysis, e.g., may be more suitable for describing non-periodic variables. Also, additional information derived from the shape of the power spectrum will be considered.
Adapt our codes with the intention to apply them to future large scale databases, such as those to be assembled by CoRoT, Kepler and Gaia. In particular, we are now preparing the classification of light curves which will be measured in the framework of the CoRoT Exoplanet programme.
Introduce cost matrices to generate specialized classifiers. When the goal of a classifier is to generate clean samples of a given class, it is generally preferred that the number of false positives be diminished even at the expense of missing some of the less clear candidates. In these cases, the introduction of cost matrices in machine learning algorithms allows for the differential weighting of errors in the training process, resulting in classifiers specialized in particular tasks.

The results presented here are a brief summary of all the experiments and analyses that were applied to the data. For example, only summaries of the performance measures of some of the approaches taken were included. The full statistical analysis and detailed research to characterize the confusion regions of the classifiers will be published in a specialized journal (Sarro et al., in preparation). In particular, questions about the general properties of the subsamples of class i misclassified as j, their class probability distributions and if they constitute a separable subset of class i were not discussed here, to avoid entering into a highly technical discussion. Given the large number of classes that constitute this classification problem, it is difficult to include the answers to these questions for the more than 400 possible combinations of i and j but, at the same time, they are of paramount importance for the correct interpretation of classifier results. This analysis is available from the authors upon request.

It is important to realize that the methodology presented here should be evaluated in a statistical sense, i.e., one can never be sure that each individual star is correctly and unambiguously classified. Our method was specifically designed for databases that are so large that individual inspection of all the stars is impossible. Of course, such inspection can and should be done after a first classification with our methods has been achieved, for the specific class of interest to the user. We also note that our basic methods described here assume the simplest possible input: single-band photometric time series. Any additional independent information (color indices or time series, radial velocity time series, spectral type or $\log g$ , etc.) will imply an improved performance as will be shown in our application to the OGLE database for which such additional attributes were retrieved through the Virtual Observatory. We will judge the performance of the different classifiers presented here in a future paper by comparing our classifications for the OGLE stars with published results obtained with extractor-type methods requiring human interaction.

Acknowledgements

We are very grateful to the following colleagues for providing us with high quality light curves: Don Kurtz, Simon Jeffery, Gilles Fontaine, Antonio Kanaan, Kepler Oliveira, Vik Dhillon, Suzanna Randall, Laure Lefèvre, Joris De Ridder, Mario van den Ancker, Bram Acke, Chris Sterken and Stéphane Charpinet. This work is made possible thanks to support from the Belgian PRODEX programme under grant PEA C90199 (COROT Mission Data Exploitation II), from the research Council of Leuven University under grant GOA/2003/04 and from the Spanish Ministerio de Educación y Ciencia through grant AYA2004-08067-C03-03 and AYA2005-04286. LMS wishes to thank M. García for the very useful discussions on the feature selection problem and JD wishes to thank A. Debosscher for the inspiring discussions.

References

Ábrahám, P., Kóspál, Á., Csizmadia, S., et al. 2004, A&A, 428, 89 [NASA ADS] [CrossRef] [EDP Sciences] [FUORI]
Acke, B. 2005, Ph.D. Thesis, Catholic University of Leuven, Belgium [HAEBE]
Adelman, S. J. 2000, A&AS, 146, 13 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Adelman, S. J. 2002, Baltic Astron., 11, 475 [NASA ADS] [CP]
Adelman, S. J. 2003, A&A, 401, 357 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Adelman, S. J., & Meadows, S. A. 2002, A&A, 390, 1023 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Adelman, S. J., Pyper, D. M., Lopez-Garcia, Z., & Caliskan, H. 1995, A&A, 296, 467 [NASA ADS] [CP]
Aerts, C., Eyer, L., & Kestens, E. 1998, A&A, 337, 790 [NASA ADS] [GDOR]
Aerts, C., De Boeck, I., Malfait, K., & De Cat, P. 1999a, A&A, 347, 524 [NASA ADS] [SPB]
Aerts, C., De Cat, P., Peeters, E., et al. 1999b, A&A, 343, 872 [NASA ADS] [SPB]
Aerts, C., De Cat, P., Kuschnig, R., et al. 2006a, ApJ, 642, L165 [NASA ADS] [CrossRef] [SPB]
Aerts, C., Jeffery, C. S., Fontaine, G., et al. 2006b, MNRAS, 367, 1317 [NASA ADS] [SDBV]
Albacete-Colombo, J. F., López-García, Z., Levato, H., Malaroda, S. M., & Grosso, M. 2002, A&A, 392, 613 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Alonso, M. S., López-García, Z., Malaroda, S., & Leone, F. 2003, A&A, 402, 331 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Alvarez, R., & Mennessier, M.-O. 1997, A&A, 317, 761 [NASA ADS] [MIRA]
Andrievsky, S. M., Luck, R. E., & Kovtyukh, V. V. 2005, AJ, 130, 1880 [NASA ADS] [CrossRef] [CLCEP]
Arentoft, T., Kjeldsen, H., Nuspl, J., et al. 1998, A&A, 338, 909 [NASA ADS] [DSCUT]
Aslan, Z. 1976, The Observatory, 96, 149 [NASA ADS] [SR]
Aslanov, A. A., & Khruzina, T. S. 1990, Soviet Astron., 34, 508 [NASA ADS] [ELL]
Bagnulo, S., Landi Degl'Innocenti, M., Landolfi, M., & Mathys, G. 2002, A&A, 394, 1023 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Bakos, G. A., & Tremko, J. 1989, Contributions of the Astronomical Observatory Skalnate Pleso, 18, 17 [NASA ADS] [ELL]
Baldry, I. K., Kurtz, D. W., & Bedding, T. R. 1998, MNRAS, 300, L39 [NASA ADS] [ROAP]
Balmforth, N. J., Cunha, M. S., Dolez, N., Gough, D. O., & Vauclair, S. 2001, MNRAS, 323, 362 [NASA ADS] [ROAP]
Balog, Z., Vinko, J., & Kaszas, G. 1997, AJ, 113, 1833 [NASA ADS] [CrossRef] [PTCEP]
Balona, L. A. 1995, MNRAS, 277, 1547 [NASA ADS] [BE]
Beech, M. 1985, Ap&SS, 117, 69 [NASA ADS] [CrossRef] [ELL]
Bergeron, P., Fontaine, G., Billères, M., Boudreault, S., & Green, E. M. 2004, ApJ, 600, 404 [NASA ADS] [CrossRef] [DAV]
Bessell, M. S. 1974, in Stellar Instability and Evolution, ed. P. Ledoux, A. Noels, & A. W. Rodgers, IAU Symp., 59, 63 [DSCUT, RR]
Bishop, C. M. 1995, Neural Networks for Pattern Recognition (New York, NY, USA: Oxford University Press, Inc.) (In the text)
Blake, R. M., Khosravani, H., & Delaney, P. A. 2000, JRASC, 94, 124 [NASA ADS] [SXPHE]
Bouckaert, R. R. 2004, in ICML '04: Proceedings of the twenty-first international conference on Machine learning (New York, NY, USA: ACM Press), 15 (In the text)
Breger, M. 1974, ApJ, 188, 53 [NASA ADS] [CrossRef] [HAEBE]
Breger, M., Pamyatnykh, A. A., Pikall, H., & Garrido, R. 1999, A&A, 341, 151 [NASA ADS] [DSCUT]
Catalano, F. A., Leone, F., & Kroll, R. 1993, in Peculiar versus Normal Phenomena in A-type and Related Stars, ed. M. M. Dworetsky, F. Castelli, & R. Faraggiana, IAU Colloq., 138, ASP Conf. Ser., 44, 605 [CP]
Catalano, F. A., Leone, F., & Kroll, R. 1998, A&AS, 129, 463 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Chapellier, E., Sadsaoud, H., Valtier, J. C., et al. 1996, A&A, 307, 91 [NASA ADS] [SPB]
Chapellier, E., Mathias, P., Le Contel, J.-M., et al. 2000, A&A, 362, 189 [NASA ADS] [SPB]
Cieslinski, D., Steiner, J. E., Jablonski, F. J., & Hickel, G. R. 2000, PASP, 112, 642 [NASA ADS] [CrossRef] [RVTAU]
Clark, J. S., Larionov, V. M., & Arkharov, A. 2005, A&A, 435, 239 [NASA ADS] [CrossRef] [EDP Sciences] [LBV]
Clarke, C., Lodato, G., Melnikov, S. Y., & Ibrahimov, M. A. 2005, MNRAS, 361, 942 [NASA ADS] [FUORI]
Clausen, J. V. 1996, A&A, 308, 151 [NASA ADS] [SPB]
Clemens, J. C. 1993, Baltic Astron., 2, 407 [NASA ADS] [DAV]
Cohen, M., & Schwartz, R. D. 1976, MNRAS, 174, 137 [NASA ADS] [TTAU]
Cooper, G. F., & Herskovits, E. 1992, Mach. Learn., 9, 309 (In the text)
Córsico, A. H., Althaus, L. G., Benvenuto, O. G., & Serenelli, A. M. 2001, A&A, 380, L17 [NASA ADS] [CrossRef] [EDP Sciences] [DAV]
Covino, E., Terranegra, L., Vittone, A. A., & Russo, G. 1984, AJ, 89, 1868 [NASA ADS] [CrossRef] [HAEBE]
Cristian, V.-C., Donahue, R. A., Soon, W. H., Baliunas, S. L., & Henry, G. W. 1995, PASP, 107, 411 [NASA ADS] [CrossRef] [SR]
Cunha, M. S., Fernandes, J. M. M. B., & Monteiro, M. J. P. F. G. 2003, MNRAS, 343, 831 [NASA ADS] [ROAP]
Daou, D., Wesemael, F., Fontaine, G., Bergeron, P., & Holberg, J. B. 1990, ApJ, 364, 242 [NASA ADS] [CrossRef] [DAV]
Davies, B., Oudmaijer, R. D., & Vink, J. S. 2005, A&A, 439, 1107 [NASA ADS] [CrossRef] [EDP Sciences] [LBV]
Dawson, D. W. 1979, ApJS, 41, 97 [NASA ADS] [CrossRef] [RVTAU]
De Cat, P. 2002, in Radial and Nonradial Pulsationsn as Probes of Stellar Physics, ed. C. Aerts, T. R. Bedding, & J. Christensen-Dalsgaard, IAU Colloq., 185, ASP Conf. Ser., 259, 196 [SPB, BCEP]
De Cat, P., & Aerts, C. 2002, VizieR Online Data Catalog, 339, 30965 [NASA ADS] [SPB]
De Cat, P., de Ridder, J., Uytterhoeven, K., et al. 2004, in Variable Stars in the Local Group, ed. D. W. Kurtz, & K. R. Pollard, IAU Colloq., 193, ASP Conf. Ser., 310, 238 [SPB]
De Ridder, J., Gordon, K. D., Mulliss, C. L., & Aerts, C. 1999, A&A, 341, 574 [NASA ADS] [SPB]
De Ruyter, S., van Winckel, H., Dominik, C., Waters, L. B. F. M., & Dejonghe, H. 2005, A&A, 435, 161 [NASA ADS] [CrossRef] [EDP Sciences] [RVTAU]
Debosscher, J., Aerts, C., & Vandenbussche, B. 2006, in ASP Conf. Ser., ed. C. Sterken, & C. Aerts, 219
Demers, S., & Harris, W. E. 1974, AJ, 79, 627 [NASA ADS] [CrossRef] [PTCEP]
Demsar, J. 2006, JMLR, 7, 1 (In the text)
Deroo, P., Reyniers, M., van Winckel, H., Goriely, S., & Siess, L. 2005, A&A, 438, 987 [NASA ADS] [CrossRef] [EDP Sciences] [RVTAU]
Dhillon, V., & Marsh, T. 2001, New Astron. Rev., 45, 91 [NASA ADS] (In the text)
Douglas, P., & Henson, G. D. 1996, International Amateur-Professional Photoelectric Photometry Communications, 64, 51 [NASA ADS] [SR]
Dupret, M.-A., Grigahcène, A., Garrido, R., et al. 2005, MNRAS, 360, 1143 [NASA ADS] [GDOR]
Elkin, V. G., Riley, J. D., Cunha, M. S., Kurtz, D. W., & Mathys, G. 2005, MNRAS, 358, 665 [NASA ADS] [ROAP]
ESA 1997, VizieR Online Data Catalog, 1239, 0
Evans, T. L. 1985, MNRAS, 217, 493 [NASA ADS] [RVTAU]
Eyer, L. 2005, in The Three-Dimensional Universe with Gaia, ed. C. Turon, K. S. O'Flaherty, & M. A. C. Perryman, ESA SP-576, 513
Eyer, L., & Aerts, C. 2000, A&A, 361, 201 [NASA ADS] [GDOR]
Eyer, L., & Bartholdi, P. 1999, A&AS, 135, 1 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
Eyer, L., & Blake, C. 2005, MNRAS, 358, 30 [NASA ADS]
Fayyad, U. M., & Irani, K. B. 1993, in Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, 1022 (In the text)
Fokin, A. B. 1994, A&A, 292, 133 [NASA ADS] [RVTAU]
Fontaine, G., Bergeron, P., Brassard, P., et al. 1991, ApJ, 378, L49 [NASA ADS] [CrossRef] [GWVIR]
Fontaine, G., Bergeron, P., Billères, M., & Charpinet, S. 2003, ApJ, 591, 1184 [NASA ADS] [CrossRef] [DAV]
Gahm, G. F. 1975, in Variable Stars and Stellar Evolution, ed. V. E. Sherwood, & L. Plaut, IAU Symp., 67, 101 [TTAU, FUORI]
Gál, J., & Szatmáry, K. 1995, International Amateur-Professional Photoelectric Photometry Communications, 59, 30 [NASA ADS] [SR]
Giridhar, S., Lambert, D. L., & Gonzalez, G. 1998, ApJ, 509, 366 [NASA ADS] [CrossRef] [RVTAU]
Giridhar, S., Lambert, D. L., Reddy, B. E., Gonzalez, G., & Yong, D. 2005, ApJ, 627, 432 [NASA ADS] [CrossRef] [RVTAU]
Glagolevskij, Y. V. 2003, Astrophysics, 46, 319 [NASA ADS] [CrossRef] [CP]
Glagolevskii, Y. V., & Chuntonov, G. A. 2002, Astrophysics, 45, 408 [NASA ADS] [CrossRef] [CP]
Goldsmith, M. J., Evans, A., Albinson, J. S., & Bode, M. F. 1987, MNRAS, 227, 143 [NASA ADS] [RVTAU]
Gonzalez, G., & Wallerstein, G. 1996, MNRAS, 280, 515 [NASA ADS] [PTCEP]
Gonzalez, G., Lambert, D. L., & Giridhar, S. 1997, ApJ, 481, 452 [NASA ADS] [CrossRef] [RVTAU]
Grankin, K. N., Melnikov, S. Y., Bouvier, J., Herbst, W., & Shevchenko, V. S. 2007, A&A, 461, 183 [NASA ADS] [CrossRef] [EDP Sciences] [TTAU] (In the text)
Gunn, S. R., Brown, M., & Bossley, K. M. 1997, in IDA '97: Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data (London, UK: Springer-Verlag), 313 (In the text)
Guo, J.-H., Li, Y., & Shan, H.-G. 2005, Chin. J. Astron. Astrophys., 5, 245 [NASA ADS] [CrossRef] [LBV]
Guyon, I., & Elisseeff, A. 2003, J. Mach. Learn. Res., 3, 1157 (In the text)
Hall, D. S. 1990, AJ, 100, 554 [NASA ADS] [CrossRef] [ELL]
Handler, G. 2001, MNRAS, 323, L43 [NASA ADS] [DBV]
Handler, G., Weiss, W. W., Paunzen, E., et al. 2002, MNRAS, 330, 153 [NASA ADS] [ROAP]
Handler, G., O'Donoghue, D., Müller, M., et al. 2003, MNRAS, 340, 1031 [NASA ADS] [DBV]
Harris, H. C. 1985, AJ, 90, 756 [NASA ADS] [CrossRef] [PTCEP]
Henry, G. W., & Kaye, A. B. 1999, Informational Bulletin on Variable Stars, 4684, 1 [NASA ADS] [ELL]
Herbst, W., Herbst, D. K., Grossman, E. J., & Weinstein, D. 1994, AJ, 108, 1906 [NASA ADS] [CrossRef] [TTAU]
Herbst, W., Racine, R., & Warner, J. W. 1978, ApJ, 223, 471 [NASA ADS] [CrossRef] [HAEBE]
Hintz, E., Hintz, M. L., & Joner, M. D. 1997, PASP, 109, 1073 [NASA ADS] [CrossRef] [SXPHE]
Hintz, E. G., Joner, M. D., McNamara, D. H., et al. 1997c, PASP, 109, 15 [NASA ADS] [CrossRef] [SXPHE]
Hintz, E. G., Joner, M. D., Ivanushkina, M., & Pilachowski, C. A. 2004, PASP, 116, 543 [NASA ADS] [CrossRef] [SXPHE]
Hintz, M. L., Joner, M. D., & Hintz, E. G. 1998, AJ, 116, 2993 [NASA ADS] [CrossRef] [DSCUT]
Jeffery, C. S., Aerts, C., Dhillon, V. S., Marsh, T. R., & Gänsicke, B. T. 2005, MNRAS, 362, 66 [NASA ADS] [SDBV]
Kato, K.-I., & Sadakane, K. 1999, PASJ, 51, 23 [NASA ADS] [CP]
Kaye, A. B., Handler, G., Krisciunas, K., Poretti, E., & Zerbi, F. M. 1999, PASP, 111, 840 [NASA ADS] [CrossRef] [GDOR]
Kepler, S. O., Mukadam, A., Winget, D. E., et al. 2000, ApJ, 534, L185 [NASA ADS] [CrossRef] [DAV]
Kepler, S. O., Nather, R. E., Winget, D. E., et al. 2003, A&A, 401, 639 [NASA ADS] [CrossRef] [EDP Sciences] [DBV]
Kilkenny, D., Whittet, D. C. B., Davies, J. K., et al. 1985, South African Astronomical Observatory Circular, 9, 55 [NASA ADS] [HAEBE, TTAU]
Kiss, L. L., Derekas, A., Alfaro, E. J., et al. 2002, A&A, 394, 97 [NASA ADS] [CrossRef] [EDP Sciences] [DSCUT]
Kleinman, S. J., Nather, R. E., Winget, D. E., et al. 1998, ApJ, 495, 424 [NASA ADS] [CrossRef] [DAV]
Koester, D., Weidemann, V., & Vauclair, G. 1983, A&A, 123, L11 [NASA ADS] [DBV]
Kotak, R., van Kerkwijk, M. H., & Clemens, J. C. 2004, A&A, 413, 301 [NASA ADS] [CrossRef] [EDP Sciences] [MIRA]
Kovács, G. 2000, A&A, 360, L1 [NASA ADS] [DMCEP]
Kovtyukh, V. V., Andrievsky, S. M., Belik, S. I., & Luck, R. E. 2005, AJ, 129, 433 [NASA ADS] [CrossRef] [CLCEP]
Kroll, R. 1993, in Peculiar versus Normal Phenomena in A-type and Related Stars, ed. M. M. Dworetsky, F. Castelli, & R. Faraggiana, IAU Colloq. 138, ASP Conf. Ser., 44, 173 [CP]
Kudryavtsev, D. O., & Romanyuk, I. I. 2003, Astrophysics, 46, 234 [NASA ADS] [CrossRef] [CP]
Kupka, F., & Piskunov, N. E. 1998, Contributions of the Astronomical Observatory Skalnate Pleso, 27, 228 [NASA ADS] [CP]
Kupka, F., Ryabchikova, T., Bolgova, G., et al. 1994, in Chemically Peculiar and Magnetic Stars, ed. J. Zverko, & J. Ziznovsky, 130 [CP]
Kurtz, D. W. 1985, MNRAS, 213, 773 [NASA ADS] (In the text)
Kurtz, D. W., & Martinez, P. 2000, Baltic Astron., 9, 253 [NASA ADS] [ROAP]
Kurtz, D. W., Matthews, J. M., Martinez, P., et al. 1989, MNRAS, 240, 881 [NASA ADS] [ROAP]
Kurtz, D. W., Martinez, P., Tripe, P., & Hanbury, A. G. 1997a, MNRAS, 289, 645 [NASA ADS] [ROAP]
Kurtz, D. W., van Wyk, F., Roberts, G., et al. 1997b, MNRAS, 287, 69 [NASA ADS] [ROAP]
Lamers, H. J. G. L. M., Morris, P. W., Voors, R. H. M., et al. 1996, A&A, 315, L225 [NASA ADS] [LBV]
Lamers, H. J. G. L. M., Bastiaanse, M. V., Aerts, C., & Spoon, H. W. W. 1998, A&A, 335, 605 [NASA ADS] [LBV]
Lefevre, L., Moffat, A. F. J., & Marchenko, S. V. M. 2005, JRASC, 99, 130 [NASA ADS] [WR]
Lloyd Evans, T. 1983, The Observatory, 103, 276 [NASA ADS] [PTCEP]
Lomb, N. R. 1976, Ap&SS, 39, 447 [NASA ADS] [CrossRef]
Lopez-Garcia, Z., & Adelman, S. J. 1994, A&AS, 107, 353 [NASA ADS] [CP]
López-García, Z., & Adelman, S. J. 1999, A&AS, 137, 227 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
López-García, Z., Adelman, S. J., & Pintado, O. I. 2001, A&A, 367, 859 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Luck, R. E., & Andrievsky, S. M. 2004, AJ, 128, 343 [NASA ADS] [CrossRef] [CLCEP]
Maas, T., Van Winckel, H., & Waelkens, C. 2002, A&A, 386, 504 [NASA ADS] [CrossRef] [EDP Sciences] [RVTAU]
Malanushenko, V. P., Polosukchina, N. S., & Weiss, W. W. 1994, A&AS, 105, 125 [NASA ADS] [CP]
Manfroid, J., Sterken, C., Bruch, A., et al. 1991, A&AS, 87, 481 [NASA ADS] [PVSG, LBV]
Manfroid, J., Sterken, C., Cunow, B., et al. 1994, European Southern Observatory Scientific Report, 14, 1 [NASA ADS] [PVSG, LBV]
Mantegazza, L. 1988, A&A, 196, 109 [NASA ADS] [SR]
Mantegazza, L., & Poretti, E. 1995, A&A, 294, 190 [NASA ADS] [ELL]
Mathias, P., Aerts, C., Briquet, M., et al. 2001, A&A, 379, 905 [NASA ADS] [CrossRef] [EDP Sciences] [SPB]
Mathys, G., Kharchenko, N., & Hubrig, S. 1996, A&A, 311, 901 [NASA ADS] [ROAP]
McLaughlin, D. B. 1943, Publications of Michigan Observatory, 8, 107 [NASA ADS] [SR]
McNamara, D. 1997, PASP, 109, 1221 [NASA ADS] [CrossRef] [SXPHE, DSCUT, RR]
McNamara, D. H. 1995, AJ, 109, 1751 [NASA ADS] [CrossRef] [SXPHE]
McNamara, D. H., Powell, J. M., & Joner, M. D. 1996, PASP, 108, 1098 [NASA ADS] [CrossRef] [SXPHE]
McSaveney, J. A., Pollard, K. R., & Cottrell, P. L. 2005, MNRAS, 362, 331 [NASA ADS] [PTCEP]
Metcalfe, T. S., Montgomery, M. H., & Kanaan, A. 2004, ApJ, 605, L133 [NASA ADS] [CrossRef] [DAV]
Michalowska-Smak, A., & Smak, J. 1965, Acta Astron., 15, 333 [NASA ADS] [PTCEP]
Morris, S. L. 1985, ApJ, 295, 143 [NASA ADS] [CrossRef] [ELL]
Mukadam, A. S., Mullally, F., Nather, R. E., et al. 2004, ApJ, 607, 982 [NASA ADS] [CrossRef] [DAV]
Neal, R. M. 1996, Bayesian Learning for Neural Networks (New York: Lecture Notes in Statistics Springer Verlag) (In the text)
Neiner, C., Hubert, A.-M., Floquet, M., et al. 2002, A&A, 388, 899 [NASA ADS] [CrossRef] [EDP Sciences] [BE]
Neiner, C., Floquet, M., Hubert, A. M., et al. 2005, A&A, 437, 257 [NASA ADS] [CrossRef] [EDP Sciences] [BE]
Nemec, J., & Mateo, M. 1990, in Confrontation Between Stellar Pulsation and Evolution, ed. C. Cacciari, & G. Clementini, ASP Conf. Ser., 11, 64 [SXPHE]
North, P., & Adelman, S. J. 1995, A&AS, 111, 41 [NASA ADS] [CP]
North, P., & Paltani, S. 1994, A&A, 288, 155 [NASA ADS] [SPB]
Nugis, T., & Lamers, H. J. G. L. M. 2000, A&A, 360, 227 [NASA ADS] [WR]
O'Connell, D. J. K. 1961, Ricerche Astron., 6, 353 [NASA ADS] [SR]
Oreiro, R., Ulla, A., Pérez Hernández, F., et al. 2005, in 14th European Workshop on White Dwarfs, ed. D. Koester, & S. Moehler, ASP Conf. Ser., 334, 631 [SDBV]
Pamyatnykh, A. A., Dziembowski, W. A., Handler, G., & Pikall, H. 1998, A&A, 333, 141 [NASA ADS] [DSCUT]
Paunzen, E. 1999, Ap&SS, 266, 379 [NASA ADS] [CrossRef] [LBOO]
Peña, J. H., Peniche, R., Gonzalez, S. F., & Hobart, M. A. 1987, Rev. Mex. Astron. Astrofis., 14, 429 [NASA ADS] [SXPHE]
Peña, J. H., González, D., & Peniche, R. 1999, A&AS, 138, 11 [NASA ADS] [CrossRef] [EDP Sciences] [DSCUT]
Pearl, J. 1988, Probabilistic reasoning in intelligent systems: networks of plausible inference (San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.) (In the text)
Percy, J. R., & Hale, J. 1998, PASP, 110, 1428 [NASA ADS] [CrossRef] [PTCEP]
Percy, J. R., Bezuhly, M., Milanowski, M., & Zsoldos, E. 1997, PASP, 109, 264 [NASA ADS] [CrossRef] [RVTAU]
Perryman, M. A. C., & ESA 1997, The HIPPARCOS and TYCHO catalogues. Astrometric and photometric star catalogues derived from the ESA HIPPARCOS Space Astrometry Mission (The Hipparcos and Tycho catalogues. Astrometric and photometric star catalogues derived from the ESA Hipparcos Space Astrometry Mission (Noordwijk, Netherlands: ESA Publications Division), ESA SP Ser. 1200
Petersen, J. O., & Andreasen, G. K. 1987, A&A, 176, 183 [NASA ADS] [PTCEP]
Pollard, K. R., Cottrell, P. L., Lawson, W. A., Albrow, M. D., & Tobin, W. 1997, MNRAS, 286, 1 [NASA ADS] [RVTAU]
Ponman, T. 1981, MNRAS, 196, 583 [NASA ADS] (In the text)
Preston, G. W., Krzeminski, W., Smak, J., & Williams, J. A. 1963, ApJ, 137, 401 [NASA ADS] [CrossRef] [RVTAU]
Quinlan, J. R. 1993, C4.5: programs for machine learning (San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.) (In the text)
Quirion, P., Fontaine, G., & Brassard, P. 2006, ArXiv Astrophysics e-prints [GWVIR]
Rao, N. K., & Reddy, B. E. 2005, MNRAS, 357, 235 [NASA ADS] [RVTAU]
Raveendran, A. V. 1989, MNRAS, 238, 945 [NASA ADS] [RVTAU]
Rivinius, T., Baade, D., & Stefl, S. 2003, A&A, 411, 229 [NASA ADS] [CrossRef] [EDP Sciences] [BE]
Robberto, M., & Herbst, T. M. 1998, ApJ, 498, 400 [NASA ADS] [CrossRef] [LBV]
Rodriguez, E., Rolland, A., & Lopez de Coca, P. 1990, Ap&SS, 169, 113 [NASA ADS] [CrossRef] [SXPHE]
Rodriguez, E., Rolland, A., & Lopez de Coca, P. 1993, A&AS, 100, 571 [NASA ADS] [SXPHE]
Rodríguez, E., López-González, M. J., & López de Coca, P. 2000a, in Delta Scuti and Related Stars, ed. M. Breger, & M. Montgomery, ASP Conf. Ser., 210, 499 [DSCUT]
Rodríguez, E., López-González, M. J., & López de Coca, P. 2000b, A&AS, 144, 469 [NASA ADS] [CrossRef] [EDP Sciences] [DSCUT]
Rodríguez, E., López-González, M. J., & López de Coca, P. 2002, in Stellar Structure and Habitable Planet Finding, ed. B. Battrick, F. Favata, I. W. Roxburgh, & D. Galadi, ESA SP-485, 317 [SXPHE]
Romanyuk, I. I. 1994, in Chemically Peculiar and Magnetic Stars, ed. J. Zverko, & J. Ziznovsky, 24 [CP]
Ryabchikova, T., Piskunov, N., Savanov, I., & Kupka, F. 1998, Contributions of the Astronomical Observatory Skalnate Pleso, 27, 359 [NASA ADS] [CP]
Saffe, C., Levato, H., & López-García, Z. 2005, Rev. Mex. Astron. Astrofis., 41, 415 [NASA ADS] [CP]
Sahami, M. 1996, in Second International Conference on Knowledge Discovery in Databases (In the text)
Sandage, A. 1993, AJ, 106, 703 [NASA ADS] [CrossRef] [RR]
Sarro, L. M., Sánchez-Fernández, C., & Giménez, Á. 2006, A&A, 446, 395 [NASA ADS] [CrossRef] [EDP Sciences] [EA, EB, EW] (In the text)
Savanov, I. S., Kochukhov, O. P., & Tsymbal, V. V. 2001, Astrophysics, 44, 206 [NASA ADS] [CrossRef] [CP]
Scargle, J. D. 1982, ApJ, 263, 835 [NASA ADS] [CrossRef]
Schmidt, E. G., Langan, S., Lee, K. M., et al. 2003, AJ, 126, 2495 [NASA ADS] [CrossRef] [PTCEP]
Schmidt, E. G., Johnston, D., Langan, S., & Lee, K. M. 2005, AJ, 130, 832 [NASA ADS] [CrossRef] [PTCEP]
Sharma, S. 1996, Applied Multivariate Techniques (Wiley) (In the text)
Shavrina, A. V., Polosukhina, N. S., Zverko, J., et al. 2001, A&A, 372, 571 [NASA ADS] [CrossRef] [EDP Sciences] [CP]
Shenton, M., Albinson, J. S., Barrett, P., et al. 1992, A&A, 262, 138 [NASA ADS] [RVTAU]
Shenton, M., Evans, A., Albinson, J. S., et al. 1994a, A&A, 292, 102 [NASA ADS] [RVTAU]
Shenton, M., Monier, R., Evans, A., et al. 1994b, A&A, 287, 866 [NASA ADS] [RVTAU]
Simon, L. W., & Buscombe, W. 1973, in Spectral Classification and Multicolour Photometry, ed. C. Fehrenbach, & B. E. Westerlund, IAU Symp., 50, 33 [SR]
Smith, H. A. 1974, Journal of the American Association of Variable Star Observers (JAAVSO), 3, 20 [NASA ADS] [SR]
Solano, E., & Fernley, J. 1997, A&AS, 122, 131 [NASA ADS] [CrossRef] [EDP Sciences] [DSCUT]
Soszynski, I., Udalski, A., Szymanski, M., et al. 2002, Acta Astron., 52, 369 [NASA ADS] [RRAB, RRC, RRD]
Stahl, O., Gäng, T., Sterken, C., et al. 2003, A&A, 400, 279 [NASA ADS] [CrossRef] [EDP Sciences] [LBV]
Stankov, A., & Handler, G. 2005, VizieR Online Data Catalog, 215, 80193 [NASA ADS] [BCEP]
Sterken, C., Manfroid, J., Anton, K., et al. 1993, European Southern Observatory Scientific Report, 12, 1 [NASA ADS] [PVSG, LBV]
Sterken, C., Manfroid, J., Beele, D., et al. 1995, A&AS, 113, 31 [NASA ADS] [PVSG, LBV] (In the text)
Sterken, C., de Groot, M., & van Genderen, A. M. 1998, A&A, 333, 565 [NASA ADS] [LBV]
Stothers, R. B., & Chin, C.-W. 1995, ApJ, 451, L61 [NASA ADS] [CrossRef] [LBV]
Teodorani, M., Errico, L., Vittone, A. A., Giovannelli, F., & Rossi, C. 1997, A&AS, 126, 91 [NASA ADS] [CrossRef] [EDP Sciences] [FUORI]
Terranegra, L., Chavarria-K., C., Diaz, S., & Gonzalez-Patino, D. 1994, A&AS, 104, 557 [NASA ADS] [HAEBE, FUORI]
Udalski, A., Soszynski, I., Szymanski, M., et al. 1999a, Acta Astron., 49, 1 [NASA ADS]
Udalski, A., Soszynski, I., Szymanski, M., et al. 1999b, Acta Astron., 49, 223 [NASA ADS] [CLCEP] (In the text)
van der Hucht, K. A. 2001, New Astron. Rev., 45, 135 [NASA ADS] [WR]
van Genderen, A. M. 1989, A&A, 208, 135 [NASA ADS] [PVSG, LBV]
van Genderen, A. M. 1998, J. Astron. Data, 4, 10 [PVSG, LBV]
van Genderen, A. M., & Sterken, C. 1996, A&A, 308, 763 [NASA ADS] [PVSG, LBV]
van Genderen, A. M., & Sterken, C. 1999, A&A, 349, 537 [NASA ADS] [PVSG]
van Genderen, A. M., Bovenschen, H., Engelsman, E. C., et al. 1989a, A&AS, 79, 263 [NASA ADS] [PVSG]
van Genderen, A. M., Breukers, R. J. L. H., Houtekamer, P., et al. 1989b, A&A, 213, 161 [NASA ADS] [PVSG]
van Genderen, A. M., Sterken, C., & de Groot, M. 1998, A&A, 337, 393 [NASA ADS] [PVSG]
van Leeuwen, F., van Genderen, A. M., & Zegelaar, I. 1998, A&AS, 128, 117 [NASA ADS] [CrossRef] [EDP Sciences] [PVSG]
Van Winckel, H., Waelkens, C., Waters, L. B. F. M., et al. 1998, A&A, 336, L17 [NASA ADS] [RVTAU]
Vapnik, V. N. 1995, The nature of statistical learning theory (New York, NY, USA: Springer-Verlag New York, Inc.) (In the text)
Vauclair, G., Belmonte, J. A., Pfeiffer, B., et al. 1992, A&A, 264, 547 [NASA ADS] [DAV]
Vinko, J., Remage Evans, N., Kiss, L. L., & Szabados, L. 1998, MNRAS, 296, 824 [NASA ADS] [PTCEP]
Vogt, N., Barrera, L. H., & Navarro, M. 1990, Ap&SS, 173, 145 [NASA ADS] [CrossRef] [BE]
Waelkens, C. 1991, A&A, 246, 453 [NASA ADS] [SPB]
Waelkens, C. 1996, A&A, 311, 873 [NASA ADS] [SPB]
Wahlgren, G. M. 1992, AJ, 104, 1174 [NASA ADS] [CrossRef] [RVTAU]
Weintraub, D. A., Sandell, G., & Duncan, W. D. 1989, ApJ, 340, L69 [NASA ADS] [CrossRef] [TTAU, FUORI]
Weis, K. 2003, A&A, 408, 205 [NASA ADS] [CrossRef] [EDP Sciences] [LBV]
Wesemael, F., Lamontagne, R., & Fontaine, G. 1986, AJ, 91, 1376 [NASA ADS] [CrossRef] [DAV]
Wozniak, P. R., Udalski, A., Szymanski, M., et al. 2002, Acta Astron., 52, 129 [NASA ADS]
Wyrzykowski, L., Udalski, A., Kubiak, M., et al. 2004, Acta Astron., 54, 1 [NASA ADS] [CrossRef] [EA, EB, EW]
Yushchenko, A. V., Gopka, V. F., Khokhlova, V. L., Musaev, F. A., & Bikmaev, I. F. 1998, Contributions of the Astronomical Observatory Skalnate Pleso, 27, 365 [NASA ADS] [CP]
Zhou, A.-Y., Fu, J.-N., & Jiang, S.-Y. 1999a, Ap&SS, 268, 397 [NASA ADS] [CrossRef] [SXPHE]
Zhou, A.-Y., Rodríguez, E., Jiang, S.-Y., Rolland, A., & Costa, V. 1999b, MNRAS, 308, 631 [NASA ADS] [SXPHE]
Ziznovsky, J., & Zverko, J. 1995, Contributions of the Astronomical Observatory Skalnate Pleso, 25, 39 [NASA ADS] [CP]
Zsoldos, E. 1996, A&AS, 119, 431 [NASA ADS] [CrossRef] [EDP Sciences] [RVTAU]
Zverko, J., Zboril, M., & Ziznovsky, J. 1994, A&A, 283, 932 [NASA ADS] [CP]