The total amount of (compressed) science data generated in the course
of the five-year mission is about 2 1013 bytes (20 TB).
Most of this consists of CCD raw or binned pixel values with associated
identification tags. The data analysis aims to
"explain'' these values in terms of astronomical objects and their
characteristics. In principle the analysis is done by
adjusting the object, attitude and instrument models until a satisfactory
agreement is found between predicted and observed data (dashed lines
in Fig. 8). Successful implementation of the
data analysis task will require expert knowledge from several
different fields of astronomy, mathematics and computer science to be
merged in a single, highly efficient system (O'Mullane & Lindegren 1999).
The global astrometric reductions must be formulated in a
fully general relativistic framework, including
post-post-Newtonian effects of the spherical Sun at the 1
as
level, as well as including corrections due to oblateness and angular
momentum of Solar System bodies.
Processing these vast amounts of data will require highly automated and efficient numerical methods. This is particularly critical for the image centroiding of the elementary astrometric and photometric observation in the astrometric instruments, and the corresponding analysis of spectral data in the spectrometric instrument.
Accurate and efficient estimation of the centroid coordinate based on the noisy CCD samples is crucial for the astrometric performance. Simulations indicate that 6 samples approximately centred on the peak can be read out from the CCD. The centroiding, as well as the magnitude estimation, must be based on these six values. Results of a large number of Monte Carlo experiments, using a maximum-likelihood estimator as the centroiding algorithm, indicate that a rather simple maximum-likelihood algorithm performs extremely well under these idealized conditions, and that six samples is sufficient to determine the centroid accurately. Much work remains to extend the analysis to more complex cases, including in particular overlapping stellar images.
A preliminary photometric analysis, for discovery of variables, supernovae, etc., can be carried out using standard photometric techniques immediately after data delivery to the ground. In addition, more detailed modelling of the local background and structure in the vicinity of each target using all the mission data in all the passbands will be required. A final end-of-mission re-analysis may benefit from the astrometric determination of the image centroids, locating a well-calibrated point spread function for photometric analysis. Studies of these photometric reductions have begun.
The high-resolution (radial velocity) spectrometer will produce spectra for
about a hundred million stars, and multi-epoch, multi-band photometry will be
obtained for about one billion stars. The analysis of such large
numbers of spectra and photometric measurements needs to be performed
in a fully automated fashion, with no manual intervention. Automatic
determination of (at least) the surface temperature
,
the
metallicity [M/H], and the relative
element abundance
[
/Fe] is necessary; determination of
is, given the
availability of parallaxes for most stars, of lesser importance.
A fully automated system for the derivation of
astrophysical parameters from the large number of spectra and
magnitudes collected by GAIA, using all the available information for
each star, has been studied, showing the feasibility of an approach
based on the use of neural networks.
In the classification system foreseen, spectra and photometric
measurements will be sent to an "initial classifier'', to sort objects
into stellar and non-stellar. Specialist networks then treat each
class. For example, stellar data sets are passed to an "automated
stellar parameterization'' sub-package.
It is the physical parameters of stars which are really of interest; therefore the proposed system aims to derive physical parameters directly from a stellar spectrum and photometry. Detailed simulations of the automated stellar parameterization system have been completed using a feed-forward neural network operating on the entire set of spectral and photometric measurements. In such a system, the derived values for the stellar parameters are naturally linked to the models used to train the network. Given the extreme rapidity of neural networks, when stellar atmosphere models are improved, re-classification of the entire data set can be done extremely quickly: an archive of 108 spectra or photometric measurements could be reclassified in about a day with the present-day computing power of a scientific workstation.
The overall data analysis task would be impossible without certain regularizing
assumptions: one must
assume that a substantial fraction of stars follow a very simple model,
viz. (apparently) single stars with little or no photometric
variability, whose motions can be described by the standard five
astrometric parameters (
,
,
,
,
). For the satellite attitude and instrument
characteristics it must be assumed that sudden changes are rare,
so that time-averaging and smoothing are effective in reducing
observational noise. Without these assumptions the problem would
simply have too many degrees of freedom.
While such regularity conditions must be valid in a broad sense, it
is clear that they cannot be guaranteed to hold in any
particular situation or for a specific object. The data analysis
must be able to filter out cases where the conditions do not
apply, and divert them to a separate analysis branch. The
efficiency of the filtering process depends critically on the
quality of the instrument calibrations and attitude determination,
which initially is quite low. Thus an iterative process is needed
in which the object selection and observations are successively
improved, along with the calibrations and attitude determination.
The computational complexity of the data analysis arises not just from the amount of data to be processed, but even more from the intricate relationships between the different pieces of information gathered by the various instruments throughout the mission. It is difficult to assess the magnitude of the data analysis problem in terms of processing requirements. Certain basic algorithms that have to be applied to large data sets can be translated into a minimum required number of floating-point operations. Various estimates suggest of order 1019 floating-point operations, indicating that very serious attention must be given to the implementation of the data analysis, and that this effort must start very early.
Observations of each object are distributed throughout the mission, so that calibrations and analysis must be feasible both in the time-domain and in the object domain. Flexibility and interaction is needed to cope with special objects, while calibrations must be protected from unintentional modification. Object Oriented (OO) methodologies for data modeling, storage and processing are ideal for meeting the challenges faced by GAIA.
The feasibility of the OO approach has been demonstrated by a short prototyping exercise carried out during the present study phase. Algorithms for three processes were provided and incorporated into the OO model, underlining one important feature of OO design: the ability to have complex data structures and operations described in a single model. Java code was generated from the model and the algorithms implemented. The prototype was highly successful and reinforced confidence in the OO approach for treating the data. The reduction process is inherently distributed, and naturally matched to distributed parallel processors.
Copyright ESO 2001