Gaia Data Release 3: Apsis II -- Stellar Parameters

The third Gaia data release contains, beyond the astrometry and photometry, dispersed light for hundreds of millions of sources from the Gaia prism spectra (BP and RP) and the spectrograph (RVS). This data release opens a new window on the chemo-dynamical properties of stars in our Galaxy, essential knowledge for understanding the structure, formation, and evolution of the Milky Way. To provide insight into the physical properties of Milky Way stars, we used these data to produce a uniformly-derived, all-sky catalog of stellar astrophysical parameters (APs): Teff, logg, [M/H], [$\alpha$/Fe], activity index, emission lines, rotation, 13 chemical abundance estimates, radius, age, mass, bolometric luminosity, distance, and dust extinction. We developed the Apsis pipeline to infer APs of Gaia objects by analyzing their astrometry, photometry, BP/RP, and RVS spectra. We validate our results against other literature works, including benchmark stars, interferometry, and asteroseismology. Here we assessed the stellar analysis performance from Apsis statistically. We describe the quantities we obtained, including our results' underlying assumptions and limitations. We provide guidance and identify regimes in which our parameters should and should not be used. Despite some limitations, this is the most extensive catalog of uniformly-inferred stellar parameters to date. These comprise Teff, logg, and [M/H] (470 million using BP/RP, 6 million using RVS), radius (470 million), mass (140 million), age (120 million), chemical abundances (5 million), diffuse interstellar band analysis (1/2 million), activity indices (2 million), H{$\alpha$} equivalent widths (200 million), and further classification of spectral types (220 million) and emission-line stars (50 thousand). More precise and detailed astrophysical parameters based on epoch BP, RP, and RVS are planned for the next Gaia data release.


Introduction
Studying the present-day structure and substructures of the Milky Way is one of the most direct ways of understanding the true nature of the Galaxy formation mechanism and evolutionary history. Gaia is an ambitious space mission of the European Space Agency (ESA) to primarily provide a three-dimensional map of the Milky Way with an unprecedented volume and precision (Gaia Collaboration et al. 2016). It represents a revolution in galactic archaeology and a leap forward to reveal how galaxies take shape and investigate our own's exciting complexities. Although it observes only one percent of our Galaxy's stellar population, Gaia still characterizes ∼ 1.8 billion stars across the Milky Way, measuring their positions, parallaxes, and proper motions.
In the previously released data, Andrae et al. (2018) published the first set of stellar parameters from the analysis of the integrated photometry and parallaxes available in Gaia DR2 (Gaia Collaboration et al. 2018b). In contrast, Gaia DR3 provides a complex set of astrophysical parameters obtained from the analysis of the Gaia's astrometry measurements and the BP, RP, and RVS spectra. This wide variety of information enables us to conduct a hyper-dimensional analysis of the Milky Way populations that have never been possible before the Gaia era.
The present work is one of a series of three papers on the Gaia DR3 astrophysical parameters. Creevey (2022a) presents an overview of the astrophysical parameters inference system (Apsis) and its overall contributions to Gaia DR3. This paper focuses on the stellar content description and quality assessments. The non-stellar content is presented in Delchambre (2022). For more technical details on the Apsis modules, we refer readers to the online documentation 1 (Gaia Collaboration 2022) and specific publications describing some of the modules (GSP-Phot in Andrae 2022; GSP-Spec in Recio-Blanco 2022b;and ESP-CS in Lanzafame 2022). We listed the relevant module acronyms in Table 1.
We only process stellar sources down to G = 19 mag for which Gaia provides us with a BP/RP or RVS spectrum, except for ultra-cool dwarfs (UCDs) where we selectively processed 78 739 sources fainter than this limit (see Fig. 1). This limiting magnitude choice was driven primarily by the limited processing time of the BP/RP spectra. The astrophysical parameter dataset contains stellar spectroscopic and evolutionary parameters for 470 million sources. These comprise T eff , log g, and [M/H] (470 million using BP/RP, 6 million using RVS), radius (470 million), mass (140 million), age (120 million), chemical abundances (up to 5 million), diffuse interstellar band analysis (0.5 million), activity indices (2 million), Hα equivalent widths (200 million), and further classification of spectral types (220 million) and emission-line stars (50 thousand).
The work described here was carried out under the umbrella of the Gaia Data Processing and Analysis Consortium (DPAC) within Coordination Unit 8 (CU8; see Gaia Collaboration et al. 2016 for an overview of the DPAC). We realize that one can create more precise, and possibly more accurate, estimates of the stellar parameters by cross-matching Gaia with other survey data, such as GALEX (Morrissey et al. 2007), Pan-STARRS (Chambers et al. 2016), or catWISE (Eisenhardt et al. 2020) and spectroscopic surveys such as LAMOST (Luo et al. 2019), GALAH (Buder et al. 2021), or APOGEE (Jönsson et al. 2020). For example, Fouesneau et al. (2022), Anders et al. (2022), and Huang et al. (2022) combined Gaia data with other photometry and spectroscopic surveys to derive APs for millions of stars. 2 However, the remit of the Gaia-DPAC is to process the Gaia data. Further exploitation, for instance, including data from other catalogs, is left to the community at large. Yet, these "Gaia-only" stellar parameters will assist the exploitation of Gaia DR3 and the validation of such extended analyses. 1 Gaia DR3 online documentation: https://gea.esac.esa.int/ archive/documentation/GDR3 2 Survey acronyms: GALEX: the Galaxy Evolution Explorer; Pan-STARRS: the Panoramic Survey Telescope and Rapid Response System; APOGEE: the Apache Point Observatory Galactic Evolution Experiment; catWISE: the catalog from the Wide-field Infrared Survey Explorer; LAMOST: the Large Sky Area Multi-Object Fibre Spectroscopic Telescope; and GALAH: the Galactic archaeology with HER-MES. We continue this article in Sect. 2 with a brief overview of our assumptions and key processing aspects. In Sect. 3, we describe the Gaia DR3 AP content and the validation of our results, their internal consistency, and we compare them against other published results (e.g., benchmark stars, interferometry, and asteroseismology). Finally, we highlight a few applications of our catalog in Sect. 4 and its limitations in Sect. 5 before we summarize in Sect. 6.

Overview of Stellar APs in GDR3
The goal of Apsis is to classify and estimate astrophysical parameters for the Gaia sources using (only) the Gaia data (Bailer-Jones et al. 2013;Creevey 2022a). In addition to assisting the exploitation of Gaia DR3, the DPAC data processing itself uses these APs internally, for example, to help the template-based radial velocity extraction from the RVS spectra, the identification of quasars used to fix the astrometric reference frame or the optimization of the BP/RP calibration.
We designed the Apsis software to provide estimates for a broad class of objects covering a significant fraction of the Gaia catalog, rather than treating specific types of objects. Apsis consists of several modules with different functions and source selections. Creevey (2022a) presents the architecture and the modules of Apsis separately. We provide in Fig. 2 a schematic overview of the source selection per Apsis module in the Kiel diagram. Some modules do not appear on this diagram as they have a more complex role (e.g., emission lines, classification).

Source processing selection function
This section details the source selection and assumptions we applied during the processing of stellar objects.
First, we processed only sources for which one of the BP, RP, or RVS spectra was available with at least 10 focal plane transits (repeated observations). Which sources are processed by which modules depends on (1) the availability of the necessary data; (2) the signal-to-noise ratio (SNR) of the data, brightness to first order; and (3) potentially the outputs from other modules.
GSP-Phot (Andrae 2022) operates on all sources with BP/RP spectra down to G = 19 mag. As we expect that more than 99% of sources down to this brightness are stars, there is a minor overhead of computation time in applying GSP-Phot to every source  Fig. 1. Distribution of the sources in color-magnitude space processed by Apsis according to the available measurements. The top panels show the observed color-magnitude diagram. In contrast, the bottom panels show their absolute magnitude computed using the inverse parallax as the distance and assuming zero extinction for sources with positive parallax measurements. From left to right, the sources with G, BP, and RP photometry ("all"), those with published BP/RP spectra (gaia_source.has_xp) and with RVS spectra (gaia_source.has_rvs), respectively. The gray density in the middle and right panels indicates the whole sample's distribution for reference. We note the peculiar distribution of BP/RP fainter than G = 17.65 mag in the top middle panel, corresponding to selected UCDs (red sources) and extragalactic sources (blue sources). The inverse parallax used in the bottom panels includes low-quality parallaxes responsible for the non-physical high brightness of many sources. and GSP-Spec (Recio-Blanco 2022b) on all sources which have RVS spectra with SNR > 20, i.e., G < ∼ 13 − 14 mag.
Following these two independent general analyses, Apsis refines the characterization of Gaia sources with specific modules. FLAME operates on a subset of sources with APs of "sufficient" precision from GSP-Phot (G < 18.25 mag) and GSP-Spec (G < 14 mag), based on their reported uncertainties. MSC analyses all sources with G < 18.25 mag and treats every source as though it were a system of two unresolved stars. The remaining modules, specifically ESP-CS (Lanzafame 2022), ESP-HS, ESP-ELS, and ESP-UCD only analyze objects of "their" class, i.e., active coolstars, hot stars, emission-line stars, and ultra-cool dwarfs. Apart from ESP-UCD which analyses UCDs fainter than G = 19 mag, the other specific modules only produce results for sources with G < 17.65 mag. Finally, GSP-Phot also provides the A 0 estimates used by TGE to produce an all-sky (two-dimensional) map of the total Galactic extinction, meaning the cumulative amount of extinction in front of objects beyond the edge of our Galaxy (see Sect. 3.4 and Delchambre 2022). The various quoted magnitude limits are independent of the stars' physical properties and the quality of the spectra. Instead, these limits came from the Apsis processing scheme and processing time limitations. 3 In addition to and in contrast with the classifications from some these analysis modules, Apsis comprises two modules dedicated to empirical classifications of sources. DSC classifies sources probabilistically into five classes -quasar, galaxy, star, white dwarf, physical binary star -although it is primarily intended to identify extragalactic sources and OA complement this classification by clustering those sources with the lowest classification probabilities from DSC. See Sect. 3.6 and details in Creevey (2022a), and Delchambre (2022).
We summarize the Apsis modules' target selections in Fig. 3. We use the inverse parallax as a proxy to emphasize the stellar loci of the targets. Even though we did not explicitly select on G BP -G RP colors, we note that most of the sources with G BP -G RP < -0.8 mag in Gaia DR3 are not stellar objects according to the Apsis processing definitions. This selection translates that stellar evolution models (e.g. PARSEC 4 ) do not predict bluer stars than G BP -G RP < -0.6 mag in the absence of noise in the measurements and within the chemical abundance regime of our analysis.

Stellar processing modules & stellar definition(s)
A principle of Apsis in Gaia DR3 is to use only Gaia data on individual sources when inferring the APs. We only use non-Gaia observations for validation and calibration. We define stellar objects as those that remain after removing other kinds of objects: for instance, extragalactic sources (i.e., galaxies and quasars; Bailer-Jones 2022) through dedicated modules such as DSC and with proper motion, Gaia brightness, and color selections. Apsis presently ignores morphological information (Ducourant 2022) and does not take stellar variability (Rimoldini et al. 2022) into account. As it works with combined epoch spectra (BP, RP, and RVS), some time-variable sources (e.g., Cepheids) received spurious APs from Apsis. Eyer et al. (2022) summarizes the characterization of variable sources with dedicated pipelines. In the future we plan to investigate using epoch data and whether variability information could improve the quality of our results.
A consequence of our analysis design is that Apsis can assign multiple sets of APs to any given source. Figure 2 illustrates the overlap between modules, which for example, leads to four estimates of temperatures for some main-sequence stars. The values we derive not only depend on the data we measure but also on the stellar models we adopt (as embodied in the training data) and other assumptions made, see Creevey (2022a) for a brief overview and the online documentation for details. We can never know a star's "true" APs with 100% confidence. Which estimate to use inevitably remains a decision for the user. For those users who don't want to make this choice, GSP-Phot estimates APs for all the stars, so there is always a homogeneous set of stellar APs available.
The situation is in the details even more complex because a few of the modules themselves comprise multiple algorithms or multiple sets of assumptions, each providing separate estimates. One reason for this choice is to cross-validate our results: if two or more algorithms give similar results for the same source (and training data), our confidence in the results may increase. For example, GSP-Spec provides estimates from Matisse-Gauguin (Recio-Blanco et al. 2016) and a neural-network approach (Manteiga et al. 2010) using the same RVS data. Another reason is that we do not use a common set of stellar models: GSP-Phot operates with four different atmospheric libraries with overlapping parameter spaces but significant differences (see Sect. 3.2.1).
Finally, while Gaia DR3 reports APs on a wide range of stellar types, we did not optimize Apsis to derive parameters for white dwarfs (WDs), horizontal-branch (HB), and asymptotic giant-branch (AGBs) stars. We did not attempt to model their specific physical conditions (e.g. compositional changes due to dredge-up, atomic diffusion, enriched atmosphere, and circumstellar dust).

Input data of Apsis processing
As Creevey (2022a) describes the Apsis input data and their preprocessing exhaustively, here we briefly summarize the most relevant aspects to the stellar APs. In the context of determining the stellar APs, we used sky positions, the parallaxes, the integrated photometry measurements, and the BP/RP and RVS spectra. However, we note that the classifications by DSC also used proper motions.
Although Apsis mainly processed the sources independently (apart from TGE and OA), their positions on the sky were informative to determine their APs. For instance, we may see a source located near the Galactic center behind a significant amount of extinction, while it would be less likely towards high Galactic latitudes. Therefore, we defined sky position dependent priors, using Rybizki et al. (2020) as a representative view of the Gaia sky, for instance. The details vary from module to module.
We implemented the parallax zero points from Lindegren et al. (2021), which vary with magnitude, color, ecliptic latitude, and astrometric solution type (gaia_source.astrometric_params_solved). A code is provided with Gaia DR3 to compute the parallax zero points. 5 We used the integrated photometry in the G, G BP and G RP bands, in association with the zero-points provided by Riello et al. (2020). In addition, we also implemented the correction to the G-band photometry from Montegriffo et al. (2022), which depends on G, G BP -G RP color, and the astrometric solution type. We emphasize that the parallax zero-point remains calibrated on the original G-band photometry. However, Gaia DR3 publishes these corrected values in gaia_source.phot_g_mean_mag.
Apsis derived some of the APs from the analysis of the RVS spectra. The RVS processing pipeline provided us with the time-or epoch-averaged spectra, also called mean spectra, after removing potential cosmic rays and the deblending of overlapping sources. The pipeline delivers the spectra in their stellar rest-frame -corrected for the star's radial velocity (gaia_source.radial_velocity) -and normalized at the local (pseudo-)continuum (T eff ≥ 3 500 K). Our analysis used these final spectra re-sampled from 846 to 870 nm, with a constant spacing of 0.01 nm. Seabroke & et al. (2022) describe in detail the processing of RVS spectra. However, Apsis modules rebin the spectra to their optimal use-cases in the perspective of increasing the signal-to-noise ratio of their relevant spectral features (Creevey 2022a, for details).
Most of the Apsis modules produced APs from the analysis of the BP and RP spectra (see examples in Fig. 4). Gaia DR3 provides us with the (epoch) mean BP and RP spectra in a series of coefficients associated with Gauss-Hermite polynomials. This format results from the complexity of the prism observations. Carrasco et al. (2021) describes the processing of the spectra. These coefficients contain a flux calibrated (mathematical) continuous representation of the spectra that the Apsis pipeline internally samples 6 approximately uniformly in pseudo instru- mental pixel space, but non-uniform in wavelengths (see Fig. 4 from Creevey 2022a).

Typical examples and challenges of stellar BP/RP spectra
The BP and RP spectra reside at the boundary between photometry and spectroscopy. Due to the low effective spectral resolution from the prisms, these data present only a few noticeable features, as opposed to individual spectral lines in spectroscopy.
On the other hand, where spectroscopy often provides uncertain determination of a stellar continuum, the BP/RP data provide robust determinations with high signal-to-noise ratios similar to photometric measurements. To illustrate further, Fig.4 shows how the spectra of dwarf stars vary with the effective temperature. On this figure, we divided the spectrum fluxes by the instrument filter responses as provided by the simulation tool internally available to DPAC (Montegriffo et al. 2022). GaiaXPy provides the community with a similar tool 7 . Ultra-cool stars mainly emit photons in the RP passband, and their spectra depict strong molecular features. The almost featureless A-, B-, and O-type stars exhibit the Balmer hydrogen lines and jump. And in between, we have the F-, G-, K-, M-type stars characterized by the appearance of TiO bands and metal line blends. Figure 4 from Fig. 4. Variations of the BP and RP spectra of main-sequence stars with effective temperature. The background color-coding follows the effective temperature scale provided by ESP-UCD, GSP-Phot, and ESP-HS (also indicating their optimal T eff performance regimes from our validation). We highlighted some spectra for reference and labelled some spectral features. We normalized the spectra to their integrated flux after correcting the BP/RP by the instrument response (see Montegriffo et al. 2022). We further stretched and vertically shifted the resulting normalized flux (F norm. ). We restricted our selection to comparable dwarfs: GSP-Phot stars with 4 ≤ log g < 4. 5  Creevey (2022a) compares the variation of the BP and RP spectra with effective temperature and extinction using simulations and observational examples. Based on these data, we also classify emission-line stars (ELS) by their stellar class by measuring the Hα line strength and identifying significant emissions in other wavelength domains. In Fig. 5 we plot the spectral energy distribution (SED) of some of the stellar classes that the ESP-ELS module estimated. While one can usually find the strongest features in some planetary nebula and Wolf-Rayet stars, weaker Hα emission is more challenging to measure due to the low resolving power of BP and RP spectra. 8 The difficulty increases further for the cool ELS stars (T eff ≤ 5 000 K), which spectra show mainly a weak Hα emission blended into the local pseudocontinuum shaped by the TiO molecular bands. Combining the BP and RP data with higher resolution spectra (e.g., RVS, LAM-OST, APOGEE) will become an obvious path of choice for the next decades. 8 The effective resolution of BP and RP spectra decreases towards the red wavelengths, the RP response steeply drops on the blue edge at 640 nm

Typical RVS spectra
The RVS spectra share a lot of similarities with RAVE. The RVS have a slightly shorter wavelength window but a higher resolution (∼11 500): from 846 to 870 nm with a resolution element of 0.001 nm. Figure 6 presents a selection of Gaia DR3 typical RVS spectra in the OBAFGKM sequence, a sequence from the hottest (Otype) to the coolest (M-type). Each letter class subdivides itself using numbers with 0 being hottest and 9 being coolest (e.g., A0, A4, A9, and F0 from hotter to cooler). We selected these spectra from their spectroscopic temperatures and surface gravities.
The variations of the RVS spectra with the effective temperature are strong and the spectra of F-, G-, and K-type stars present many atomic lines, but their reliable measurements depends strongly on the temperature and gravity of the star. The Gaia Image of the Week 2021-07-09 presents an animation of several Gaia RVS stellar spectra and their element abundances. This figure also illustrates the challenge of characterizing O-type stars which present nearly featureless RVS observations.

AP content description and performance
This section describes the AP content of Gaia DR3, their performance, and limitations. We first discuss the object APs individually: their distances in Sect. 3.1, their stellar atmospheric parameters in Sect. 3.2 (i.e., T eff , log g, metallicity, individual abundances, rotation, and activity), and their evolution parameters in Sect. 3.3 (i.e., absolute and bolometric luminosities, radius, gravitational redshift, mass, age and evolution stage). These require us to account for dust effects along the line-of-sight summarized in Sect. 3.4 and analyzed in-depth in Delchambre (2022) and Schultheis (2022). In Sect. 3.5, we further assess the quality of our APs by focusing on objects in groups (i.e., clusters and binaries). Finally, we discuss the detection of peculiar cases and outliers in Sect. 3.6.
We emphasize that to avoid repetitions, we summarize only the complete description of some internal precisions of the APs as a function of magnitudes, colors, sky position, and other parameters that appear in other publications (e.g. Andrae  To guide the reader, Appendix D compiles the various estimates of stellar parameters from Gaia DR3 cast into the mentioned categories (corresponding to the following subsections). The compilation indicates which Apsis module produces them, and which table and fields store the values in the Gaia catalog. We emphasize that the field names correspond to the catalog in the Gaia Archive but names may differ when using partner data centers.

Distances
Two Apsis modules provide distance estimates: GSP-Phot for single stars and MSC for unresolved binary stars. Both modules analyze the BP and RP spectra with the Gaia parallaxes to derive distance estimates simultaneously with other astrophysical parameters. We listed the catalog fields related to both modules' distance estimates in Table D.1. For GSP-Phot, the distances are reliable out to ∼2 kpc. Beyond 2 kpc, GSP-Phot does systematically underestimate distance, as is evident, e.g., from star clusters. Fig. 7 compares the median GSP-Phot distances of stellar members for each cluster with their literature values by Cantat-Gaudin et al. (2020) derived using Gaia DR2 data through maximum likelihood. We included the Gaia DR3 variable zero point on parallaxes mentioned in Sect. 2.3. We obtain similar results when comparing to the photometric distances by Kharchenko et al. (2013) and in BOCCE (Bragaglia & Tosi 2006;Cantat-Gaudin et al. 2018) catalogs based on color-magnitude diagram fitting. However, when the parallax measurement is good (about /σ > 10), the GSP-Phot distances remain reliable even out to 10 kpc, as we show in Fig. 8a. The reason for this systematic underestimation of distances by GSP-Phot is an overly harsh distance prior. Andrae (2022) discussed the prior and showed that we could resolve this issue by updating its definition. A prior optimization remains necessary and will be part of further releases. Figure 8 compares also the distances from Bailer-Jones et al. (2021) and Anders et al. (2022) to the Gaia DR3 parallaxes and we note that they perform better than GSP-Phot distances. 9 For this reason, various DR3 publications chose to not use the GSP-Phot distances but rather EDR3 distances from Bailer-Jones et al. (2021) (e.g., Drimmel 2022; Recio-Blanco 2022a; Schultheis 2022). A further comparison of GSP-Phot distances with those from asteroseismic analyses confirmed a good agreement to 2 kpc, and some outliers beyond (see Fig. 9).
MSC provides distance estimates assuming sources are unresolved binaries with luminosity ratios ranging from 5 to 1. At best, MSC's distance estimates would differ from GSP-Phot's estimates (equivalent to infinite luminosity ratio) by a factor 10 to 50%, respectively. We highlight that distances with luminosity ratios of 5 significantly differ from single-star assumptions. Figure 10 compares MSC's distance estimates and those from GSP-Phot to the Gaia parallaxes for the spectroscopic binary samples from Pourbaix et al. (2004) (mostly G < 10 mag) and Traven et al. (2020) (mostly between G = 10 and 15 mag). Overall there is a qualitative agreement between the distances from both modules and the measured parallaxes. However, GSP-Phot distances exhibit a significantly tighter agreement with the parallaxes than the ones from MSC, despite the single star assumption: their mean absolute differences are only half of those for MSC and the RMS differences are more than ten times smaller.
A&A proofs: manuscript no. gdr3_stellar_aps However, the RMS differences are dominated by a handful of outliers, whereas the absolute difference at 90% confidence is more robust yet still much higher for MSC than GSP-Phot. One source of this mismatch likely comes from the differences in exploiting the BP and RP spectra information: while both MSC and GSP-Phot make use of the parallax and the apparent G magnitude, MSC normalizes the spectra, whereas GSP-Phot keeps their calibrated amplitudes in their spectra likelihoods (see Andrae 2022, for further details).
Furthermore, interpreting the difference between the two sets of estimates is more complex in practice. Modules adjust their AP sets altogether to fit the observed BP and RP spectra. We emphasize that MSC's double-star assumption allows for more free fit parameters than GSP-Phot's single-star assumption (8 and 5, respectively). The increased number of fit parameters is likely a source of the more significant dispersion in the MSC estimates. We discuss the other APs from MSC in Sect. 3.5.2.   Fig. 11. Parameter space in the Kiel diagram spanned by the stellar atmosphere libraries used by GSP-Phot. Boxes indicate the spans of the libraries producing independent estimates. The density distribution represents the content from gaiadr3.gaia_source, which contains only one set of APs per source using the (statistically) "best" library (libname_gspphot field) for that one source.

Atmospheric APs
The atmospheres of stars produce the photons that Gaia collects. Through these photons, we can infer the physical conditions of these layers, which relate to the fundamental stellar parameters. In this section, we characterize the Gaia DR3 APs that describe the atmospheric state of the observed stars. We loosely split the APs into three groups: first, the basic static (equilibrium) state of an atmosphere defined by T eff , log g, metallicity, [M/H], and α-abundance, [α/Fe] 10 ; then the dynamic (departure from equilibrium) state given by the stellar classes, rotation, line emissions, magnetic activity, and mass loss or accretion; and finally the chemical abundances.
The Gaia data set is primarily magnitude-limited and does not select objects on any specific color or class of stars. Consequently, the atmospheric parameters span a great variety of spectral types, from O to M, and even some L-type stars, some of which require target-specific treatment (partly handled by the ESP-modules in Apsis). Depending on the star (spectral and luminosity) class, we used either empirical or theoretical atmospheric models to estimate the atmospheric parameters of the stars, and sometimes both. The theoretical models try to model the relevant physical processes of the matter-light interaction in stellar atmospheres, while the empirical ones capture some hardto-model observational effects. The overlap between models and application ranges of Apsis modules allows us to check for consistency or the lack thereof (see overlaps in Figs Table D.2). We first focus on the FGK-type stars as these constitute the majority of stars in the Gaia data set. Mainly GSP-Phot and GSP-Spec overlap on this stellar-type interval. We emphasize that the application range of the Apsis modules varies significantly. To help the reader, we thus organize the description per module. One way to validate the Gaia-based APs and simultaneously quantify their precision is to compare them with large stellar surveys in the literature. The numbers below serve as a guideline for the global precision of the Gaia DR3 results relative to literature works. Accuracy is harder to quantify globally, but we can assess it in some specific cases, for instance, relative to Gaia benchmark stars (e.g., Heiter et al. 2015) and spectroscopic solar analogs (e.g., Tucci Maia et al. 2016).
GSP-Phot. Analyzing BP/RP spectra, GSP-Phot provides multiple sets of APs, one for each of the four supporting theoretical atmospheric libraries: MARCS (Gustafsson et al. 2008), PHOENIX (Brott & Hauschildt 2005), A (Shulyak et al. 2004), and OB (Lanz & Hubeny 2003, 2007. Figure 11 shows their parameter space. GSP-Phot analyzes the BP/RP spectra with a Markov-Chain Monte-Carlo approach (MCMC), which also characterizes the uncertainties (method in Andrae 2022). The reported estimates and uncertainties correspond to the 50th (median) and 16th and 84th percentiles of the (marginalized) MCMC samples, respectively. We also publish the MCMC chains with the catalog through the DataLink protocol (Dowler et al. 2015) implemented by the Gaia Archive. We compared our APs to those reported in the APOGEE (  This loss of sensitivity is typical of optical photometric metallicity indicators, which is one of the rea-sons behind dedicated passband designs (e.g., Jordi et al. 2010;Starkenburg et al. 2017;López-Sanjuan et al. 2021) and spectral indices (e.g., Johansson et al. 2010). Andrae (2022) interpret this as a consequence of [M/H] having the weakest impact on BP and RP spectra and thus being the parameter that is easiest to compromise.
GSP-Spec. Analyzing RVS spectra with primarily SNR > 20 (i.e, G 16 mag), GSP-Spec estimates the stellar APs using synthetic spectra based on MARCS models and with two different algorithms ("Matisse-Gauguin" and "ANN"; see Manteiga et al. 2010;Recio-Blanco et al. 2016; Recio-Blanco 2022b for details.). Unlike GSP-Phot, GSP-Spec does not exploit additional information like parallax or photometric measurements. GSP-Spec estimates uncertainties per star from the ensemble of APs from 50 Monte-Carlo realizations of the spectra: for each, GSP-Spec draws a spectrum from the noise (i.e., spectral flux covariances estimated by Seabroke & et al. 2022) and derives a set of atmospheric parameters and chemical abundances (see Sect. 3.2.3). The reported lower and upper confidence values correspond to the 16th and 84th percentiles of the MC results per star, respectively. In addition, we provide quality flags to identify estimates potentially suffering from bad pixels, low signal-to-noise ratio, significant line broadening due, for instance, to stellar rotation (v sin i), poor radial velocity (RV) correction, and grid border effects. We discuss below the results from the Matisse-Gauguin and ANN algorithms, available in the astrophysical_parameters table and astrophysical_parameters_supp, respectively.
We validated and quantified the accuracy of the Matisse-Gauguin parameters for FGK stars against literature data. We selected results with corresponding AP flags equal to zero and compared our estimates with APOGEE DR17 (Abdurro'uf et al. 2021), GALAH-DR3 (Buder et al. 2021) and RAVE-DR6 (Steinmetz et al. 2020). We find with a comparison with APOGEE-DR17 a median offsets and MAD of (−32; 58) K, (−0.32, 0.12) dex and (+0.04, 0.08) dex, for T eff , log g and [M/H]. The spectra from RAVE and RVS share very similar wavelength coverage, which led Recio-Blanco (2022b) to extensively compare the GSP-Spec performance against those stellar parameters. We find similar statistics when comparing with the other catalogs (see details in Recio-Blanco 2022b, , esp. their  Figure 12 compares the dispersion of the [M/H] abundance distributions of member stars per clusters for GSP-Spec Matisse-Gauguin algorithm before and after the recommended adjustments. Even though the corrections did not affect the overall agreement, we note that we did not apply filters based on the associated flags. We further restricted ourselves to the FGK members in 162 open clusters of Cantat-Gaudin et al. (2020), and we found an average MAD of 0.11 dex per cluster. We noted a larger dispersion and a negative offset (−0.12 dex) for dwarfs. For 64 globular clusters ([M/H] ≤ −0.50 dex), the typical dispersion per cluster is 0.20 dex with an median offset of +0.12 dex. However, these statistics describe the data regardless of the quality flags. If we require unset [M/H] flag bit zero (see details in Recio-Blanco 2022b), the metallicities agree better with the literature, with absolute offsets values lower than 0.10 dex, and with typical dispersions of 0.075 dex for open clusters and 0.05 dex for globular clusters. Note, however, that the filtering also reduces the number of stars significantly, leaving us with 40% of the 2 271 members of open clusters and only 4% of 1 224 members in globular clusters. These sources are primarily removed for low-SNR spectra mostly due to GCs being far away. These settings also remove fast rotators, hot stars, and some K-, and M-giants in OCs. Finally, stars nearby the model grid borders, predominantly hot dwarfs, and cool giants in the case of the OC and GC, respectively. However from this test, we should not conclude to a metallicity dependent performance as metal-poor stars are rare and predominantly known in GCs.
The artificial neural networks algorithm (ANN) in GSP-Spec ANN provides a different parametrization of the RVS spectra, independent from the Matisse-Gauguin approach. In contrast with Matisse-Gauguin's forward modeling approach, ANN projects the RVS spectra onto the AP label space. We trained the network on the same grid of synthetic spectra as the Matisse-Gauguin algorithm, in this case adding noise according to different signalto-noise scales in the observed spectra (Manteiga et al. 2010). ANN's internal errors are of the order of a fraction of the modelgrid resolution and show no significant bias, confirming the ANN projections' consistency of the synthetic spectra grid. In Recio-Blanco (2022b), we compared the ANN results with the literature values and found similar biases to those of Matisse. Equivalently, we provide also calibration relations for T eff , log g, [M/H] and [α/Fe] to correct these biases. Figure 13 compares the APs from both algorithms of GSP-Spec on a sample of 1 084 427 in Gaia DR3 with respective estimates. We also restricted this comparison to the good flag status: the first thirteen and eight values in (astrophysical_parameters.flags_gspspec) and (astrophysical_parameters_supp.flags_gspspec_ann) equals to zero. Overall, the algorithms agree with each other. Once we apply the calibration relations to both algorithms estimates, we found for spectra with SNR ≥ 150 deviations with median values of −94 K, −0.05 dex , 0.1 dex, and 0.04 dex for T eff , log g, [M/H] and [α/Fe], respectively. For the sample sample, we found MAD values of 93 K, 0.11 dex, 0.10 dex, and 0.05 dex, respectively.
GSP-Phot and GSP-Spec overlaps. Figure 14 compares the temperatures and gravity estimates from GSP-Phot and GSP-Spec. The T eff estimates strongly agree overall but some outliers remain visible on the plot, most likely originating from GSP-Phot sensitivity to low-quality parallaxes. In particular, we traced back to variable stars the plume at log 10 teff_gspphot ∼ 3.8 (see Andrae 2022 for details). On this sample, we found a median offset of 98 K, an MAD of 246 K. It is very apparent that the log g estimates systematically differ strongly between the mod- Numbers at the bottom indicate how many estimates were available for representing the distribution. Ideally, all predictions are within a small interval, which agrees with the triangles. We did not filter the estimates using the flags to keep enough stars per cluster, but nevertheless, the agreement is remarkable. ules. The recalibration prescription from Recio-Blanco (2022b) mitigates the differences, but does not remove them completely. We found a median offset of 0.35 dex, and an MAD of 0.34 dex. Recio-Blanco (2022b) identified a similar trend in the GSP-Spec log g values when comparing to those of the literature (see their Fig. 10). Solar analog stars are stars closest to the Sun in temperature, gravity, and metallicity. We selected 200+ spectroscopic solar analogs from the literature (mostly from Datson et al. 2015 andTucci Maia et al. 2016) with relative T eff within ±100 K, log g and [Fe/H] within ±0.1 dex to those of the Sun values. We compare the biases and dispersion of the GSP-Phot and GSP-Spec Matisse-Gauguin APs on this sample of stars. We note that solar analogs are dwarf stars, which are little to not affected by the Matisse-Gauguin corrections mentioned above. We find that GSP-Phot underestimates T eff by between 30 K (PHOENIX) and 90 K (MARCS), with a standard deviation σ ∼ 100 K in both cases. In contrast, GSP-Spec estimates have essentially no T eff bias (+10 K) but slightly larger dispersion (σ ∼ 130 K). Irrespective of the atmosphere library (libname_gspphot), GSP-Phot underestimated the log g values by 0.12 dex, but with a standard deviation of σ ∼ 0.14 dex they remain statistically compatible with the solar value. GSP-Spec results are as accurate as those from GSP-Phot around the solar locus, but they present a larger dispersion of 0.42 dex (calibration of log g does not change this value). We recall that GSP-Spec uses only the RVS spectra as input, while GSP-Phot also uses parallaxes and constraints from isochrones. [M/H] values are nearly solar for GSP-Spec with an offset of 0.1 dex and σ ∼ 0.05 (again, without significant impact of the recommended corrections), but we found larger offsets for GSP-Phot when using PHOENIX (−0.4 ± 0.2 dex) and MARCS models (−0.2 ± 0.2 dex). Andrae (2022) discussed the systematic and significant discrepancies between APs based on the PHOENIX and MARCS libraries. For solar-like stars, they found substantial differences in the original atmosphere models that are still under investigation at the time of writing this manuscript.
Ideally, GSP-Phot and GSP-Spec would return results in perfect agreement with each other. In practice, they don't, but rather complement each other. The two modules analyze data with different spectroscopic resolutions and wavelength ranges. To first order, GSP-Phot relies on the stellar continuum over the whole optical range from the BP/RP low-resolution spectra (from 330 to 680 nm). In contrast, GSP-Spec investigates atomic and molecular lines in the continuum-normalized medium-resolution spectra in the narrow infrared window of RVS (from 846 to 870 nm). Hence the modules analyze different aspects of the light emitted from stars. Additionally, interstellar extinction significantly affects the BP and RP spectra, but RVS data only in the region of the diffuse interstellar band around 860 nm (e.g., Schultheis 2022). Therefore, GSP-Phot's AP determination significantly depends on determining the amount of extinction correctly, while it has little impact on GSP-Spec's AP inference (see Sect. 3.4). We plotted in gray the ∼ 3.2 million sources in Gaia DR3 with both astrophysical_parameters.teff_gspphot, teff_gspspec and astrophysical_parameters.logg_gspphot, logg_gspspec. The highlighted distribution corresponds to those with the first thirteen values in flags_gspspec equals to zero (∼ 1 million sources). We indicated the identity lines and the identified divergence in log g between the modules. We note that the GSP-Spec recommended calibration of log g does not affect significantly this comparison.
ESP-HS. Stars hotter than 7 500 K (O, B, and A-type stars) undergo a specific analysis by the ESP-HS module. It operates in two modes: simultaneous analysis of BP, RP, and RVS spectra ("BP/RP+RVS"), or BP and RP only. ESP-HS first estimates the star's spectral type 11 from its BP and RP spectra to further analyze O, B, and A-type stars only. (astrophysical_parameters.spectraltype_esphs: CSTAR, M, K, G, F, A, B, and O). Hot stars of these spectral types are inherently massive, short-lived according to stellar evolution, and consequently these are young stars 12 . Hence, ESP-HS assumes a Solar chemical composition, and therefore it does not provide any metallicity estimate. See module details in the Gaia DR3 online documentation, Sect. 3.3.8. For the stars hotter than 7 500 K, the overlap between GSP-Phot and ESP-HS allows us to cross-validate our effective temperature esti-11 Originally produced by ESP-HS, the spectral type classification procedure moved to the ESP-ELS module for practical reasons. 12 We assume our data dominated by disk stars, and therefore ignoring horizontal branch stars from the Halo and ESP-HS does not include models for white dwarf atmospheres  mates. We find that ESP-HS tend to provide T eff greater than the GSP-Phot values due to different internal ingredients. We quantify further the potential systematics from ESP-HS with respect to catalogs in the literature. Figures 15 and 16 show the residuals relative to literature compilations for T eff and log g, respectively. Below 25 000 K, we obtain reasonable agreement of ESP-HS's temperatures with the catalogs estimates. Overall, the dispersion in T eff increases with temperature from ∼ 300 K for the A-type stars to 500 − 2 000 K for B-type stars. Above 25 000 K, we find, relative to the T eff vs. spectral type scale of Weidner & Vink (2010), a systematic underestimation of our temperatures by 1 000 K to 5 000 K for the Galactic O-type stars, while it can be up to 10 000 K for their LMC target samples. However, we also recall that this particular LMC sample is subsolar metallicity, i.e, outside the model limits of ESP-HS. Similarly the dispersion in log g increases from about 0.2 dex in the A-type stars temperature range to ∼0.4 dex for the O-type stars. More detailed numbers for the offset and dispersion of T eff and log g relative to the catalogs considered in Fig. 15 and Fig. 16 are available in Gaia Collaboration (2022). We found that ESP-HS underestimated uncertainties by a factor of 5 to 10 in the BP/RP+RVS mode while reporting the correct order of magnitude in the BP/RP-only mode. We did not inflate the reported uncertainties in the Gaia DR3 catalog accordingly. The first digit of astrophysical_parameters.flags_esphs reports which mode ESP-HS estimates come from (i.e. 0: "BP/RP+RVS", 1: "BP/RP-only"). We emphasize that we filtered out a significant number of bad fits of ESP-HS, but known outliers remain present (e.g. T eff > 50 000 K). In addition, ESP-HS processed white dwarfs (WD) despite not using a suitable library. Finally, some classes of stars intrinsically cooler than 7 500 K (e.g., RR Lyrae stars) were misclassified as O, B, or Atype stars and ESP-HS analyzed and reported on them assuming a correct classification.
ESP-UCD. At the faint end of the luminosity distribution, we transition between the "standard" stars burning hydrogen and the brown dwarfs not massive enough to sustain nuclear fusion. We define ultracool dwarfs (UCDs) as sources of spectral type M7 or later (Kirkpatrick et al. 1997) which corresponds to T eff ≤ 2 656 K according to the calibration by Stephens et al. (2009). Using a combination of parallaxes, color indices, and RP spectra, we identified 94 158 UCD candidates in Gaia DR3 with T eff < 2, 700 K despite the Gaia instruments being suboptimal to observe these intrinsically faint sources. We note that unsurprisingly the flux in the BP band is negligible (or even absent) for these very red and faint sources. The adopted threshold (2 700 K) is slightly hotter and more inclusive than the quoted 2 656 K to take into account the T eff estimate uncertainties. Creevey (2022a) detail our characterization module, the complete UCD selection criteria, our quality filters, and our training set definition. ESP-UCD produced effective temperatures for 94 158 UCD candidates in Gaia DR3, the vast majority of them (78 108) having T eff > 2 500 K. However, Gaia DR3 provides temperature estimates from ESP-UCD (astrophysical_parameters.teff_espucd) but it does not include the corresponding log g or [M/H] estimates due to the poor performance of ESP-UCD on these properties and a severe lack of literature reference in this regime. We plan to publish them in Gaia DR4. ESP-UCD provides a flag (astrophysical_parameters.flags_espucd) to encode the quality of the data in one of three categories based on the Euclidean distance between a given RP spectrum and the closest template in the training set and the signal to noise ratio of the integrated RP flux. Quality Flag 0 corresponds to the best RP spectra distance below 0.005; quality 1 corresponds to sources with distances between 0.005 and 0.01 and SNR > 30 relative uncertainties σ RP / f RP <= 0.03; and finally quality flag 2 corresponds to sources with distances between 0.005 and 0.01 but SNR < 30.
(The Gaia DR3 online documentation provides a more detailed description of the quality flags.) Figure 17 shows the color-absolute magnitude diagram (CAMD) for all the UCD candidates we detected for the three ESP-UCD quality categories. We find good consistency in CAMD positions and the inferred effective temperatures: as expected for these stars, their temperatures strongly correlate with M G . We note that Fig. 17 uses the inverse parallax as a good distance proxy to approximate M G , because 95% of the sources have SNR /σ > 5 (the median parallax SNR in the three quality categories, 0, 1 and 2, are 25, 11, and 7.5 respectively). Overall, as the quality degrades, the vertical sequence spreads and becomes noisier w.r.t. the temperature scale.
More quantitatively, we compare our inferred temperatures with those of the Gaia UltraCool Dwarf Sample (GUCDS; Smart et al. 2017Smart et al. , 2019. We translated the GUCDS spectral types using the calibration by Stephens et al. (2009), and we found an RMS of 103 K and a MAD of 88 K for the entire sample (see Fig. 18). We note that these statistics include the low-metallicity and young sources. Figure 19 compares the ESP-UCD effective temperatures with SIMBAD spectral types when available, which sample includes and extends the GUCDS. We indicate the two spectral type-T eff calibration relations by Stephens et al. (2009) for optical and infrared spectral types to provide a comparison reference. These two relations are those we used to define the empirical training set of the ESP-UCD module. We note that the spectral type M6 corresponds to an effective temperature ∼ 2 800 K. This temperature is hotter than the ESP-UCD parameter space limit. However, ESP-UCD attributed cooler T eff values to some of these stars, which we published but led to the apparent negative bias for the M6V bin in Fig. 19. Creevey (2022b) (Sect. 7) further explore the stellar population of UCDs in the Galaxy, and their properties.

Secondary atmospheric estimates: stellar classes, rotation, emission, activity
Classification. There are four main stellar classifications from Apsis (see fields in Table D.3). First, DSC primarily distinguishes between extragalactic sources (quasars and galaxies) and stars (single stars, physical binaries, and white dwarfs). Users can classify sources using DSC's probabilities of a source to belong to a given class. However, 99% of Gaia DR3 sources processed by Apsis are most certainly stars (or binaries). Hence DSC's classification is not the most relevant for stellar objects (see Bailer-Jones et al. 2021;Creevey 2022a). OA measures similarities between observed BP and RP spectra of different sources to produce an unsupervised classification using self-organizing maps (SOMs; Kohonen 2001). One can use these maps to find similar groups of stars once labeled (details in Creevey 2022a) and peculiar or outlier sources (see Sect. 3.6). Finally, the user might prefer using the spectral types from ESP-HS and the classification of ESP-ELS for emissionline star types of stellar sources. This section focuses on the ESP-HS and ESP-ELS classification tailored to stellar objects.
ESP-HS estimates the spectral type of a source from its BP/RP spectra. While primarily focused on hot stars, it provides the following main classes: CSTAR, M, K, G, F, A, B, and O). We find from a cross-match with the LAMOST OBA catalog of (Xiang et al. 2021) that ESP-HS obtained 62% of the Galactic A-& B-stars (assuming the other catalog is complete). Conversely, we find only 186 (30%) of the 612 Galactic O-type stars published in the Galactic O-type Stars catalog (GOSC, Maíz Apellániz et al. 2013). This low fraction reflects the persisting difficulties of deriving reliable hot star APs from Gaia BP and RP spectra.
ESP-ELS identifies the BP and RP spectra that present emission features and classifies the corresponding target into one of Article number, page 13 of 37 A&A proofs: manuscript no. gdr3_stellar_aps  Fig. 18. Comparison of the effective temperatures (in Kelvin) between ESP-UCD estimates and those obtained by converting the GUCDS spectral types using the calibration by Stephens et al. (2009). Black circles correspond to quality 0, dark gray squares to quality 1, and light gray triangles to quality 2. Cyan symbols denote low-metallicity sources and red symbols denote young sources.
the seven ELS classes listed in Table 2. We recall that ESP-ELS processed stars brighter than G < 17.65 mag (see Sect. 2). The ESP-ELS classification as ELS relies on detecting line-emission and primarily on measuring the Hα pseudo-equivalent width (see below). We tagged particular failure modes with the quality flag (astrophysical_parameters.classlabel_espels_flag; see Table 2). Primarily, this flag takes values ranging from 0 (best) to 4 (worst) depending on the relative strength of the two most probable classes (i.e., ESP-ELS published random forest classifier class probability estimates in astrophysical_parameters.classprob_espels_wcstar, classprob_espels_wnstar, etc.).
In addition, astrophysical_parameters.indicates the GSP-Phot AP values we used to make the classification was removed by the final Gaia DR3 filtering or when those APs disagreed with the q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q spectral type estimated by ESP-ELS. These two modes correspond to classlabel_espels_flag first bit 1 and 2, respectively (  Stellar Rotation While deriving the astrophysical parameters, ESP-HS also measures the line broadening on the RVS spectrum by adopting a rotation kernel. This by-product of the ESP-HS processing corresponds to a projected rotational velocity (v sin i; astrophysical_parameters.vsini_esphs) obtained on co-added mean RVS spectra (Seabroke & et al. 2022). It therefore differs from gaia_source.vbroad obtained on epoch data by the radial velocity determination pipeline (Frémat & et al. 2022). The ESP-HS estimate suffers from the same limitations as vbroad -mostly the limited resolving power of the RVS -increased by the poor v sin i-related information for OBA stars in this wavelength domain. In addition, the determination of vsini_esphs is affected by the higher uncertainty of the epoch RV determination expected for stars hotter than 10 000 K (Blomme & et al. 2022), and by the use of a Gaussian mean ALong-scan LSF with a resolving power of 11 500 (Creevey 2022a, Sect.2.2).
In Fig. 20 we present a comparison between the v sin i measurements by ESP-HS to those obtained in the framework of the LAMOST survey for OBA stars which presents the largest overlap with the results of ESP-HS compared to other surveys. The agreement rapidly decreases with magnitude, and effective temperature, while the most sensitive features to rotational broadening disappears from the RVS domain. The half inter-quantile dispersion (i.e. 14.85 % -15.15 %) varies from 25 km.s −1 to 40 km.s −1 in the A-type T eff domain when the magnitude G ranges from 8 to 12, respectively. At hotter temperatures, it varies from 60 km.s −1 to 75 km.s −1 at G = 8 and G = 12, respectively.
Hα emission The ESP-ELS classification of a star as ELS relies primarily on measuring the Hα pseudo-equivalent width (pEW; astropysical_parameters.ew_espels_halpha). However, measuring the Hα emission line is challenging due to the low resolving power of BP and RP spectra and the steep loss of transmission at that wavelength (blue side).  Newton et al. 2017;Silaj et al. 2010;Manoj et al. 2006). We found a general consistency between the estimates, except for stars cooler than 4 000 K, for which overlapping spectral molecular bands significantly alter the local continuum. We mitigated this effect using synthetic spectra and the GSP-Phot's APs. However, the mismatches between the observed and theoretical spectra and some systematics in the APs we used to select the synthetic spectra led us to misclassify active M dwarf and T Tauri stars. For the hotter targets, we attempted to link the ESP-ELS estimate, pEW(Hα), and the published measurements presented in Fig. 21 with the following linear relation:   Table 3).
Article number, page 15 of 37 A&A proofs: manuscript no. gdr3_stellar_aps where Table 3 provides the coefficients, α and β, with their uncertainty. We indicated by the orange lines the fitted relations n Fig. 21. The activity index is the excess of the Ca II IRT lines from comparing the observed RVS spectrum with a purely photospheric model (assuming radiative equilibrium). The latter depends on a set of T eff , log g, and [M/H] from either GSP-Spec or GSP-Phot (activityindex_espcs_input set to "M1" or "M2", respectively), and a line broadening estimate gaia_source.vbroad when available. We measure the excess equivalent width in the core of the Ca II IRT lines by computing the observed-totemplate ratio spectrum in a ±∆λ = 0.15nm interval around the core of each of the triplet lines. This measurement translates the stellar chromospheric activity and, in more extreme cases, the mass accretion rate in pre-main-sequence stars. Lanzafame (2022) detail the ESP-CS module, method, and scientific validation.

Chemical Abundances
In Gaia DR3, GSP-Spec -most specifically the Matisse-Gauguin algorithm -provides us with 13 chemical abundance ratios from 12 individual elements (N, Mg, Si, S, Ca, Ti, Cr, Fe, Ni, Zr, Ce, and Nd; with the FeI and FeII species) as well as equivalentwidth estimates of the CN line at 862.9 nm. These chemical indexes rely on the line-list and models from Contursi et al. (2021) and Recio-Blanco (2022b), respectively. For each of the 13 abundance estimates, GSP-Spec reports two quality flag bits, a confidence interval, the number of used spectral lines, and the line-to-line scatter (when more than one line  Figure 22 shows the spatial extent of the abundance estimates in a top-down Galactic view. The coverage indicates that Gaia DR3 provides abundance estimates for a significant fraction of the stars observed by Gaia within 4 kpc as indicated by the 99% quantile contour. The contours indicate the 50, 90, and 99% quantiles of the distribution, corresponding to ∼ 1, 3, and 6 kpc, respectively. dance estimates in the context of the chemistry and Milky Way structure, stellar kinematics, and orbital parameters. The validation of individual abundances is challenging as no fundamental standards exist for stars other than the Sun. One needs particular attention when comparing with literature data, which suffers from different zero points and underlying assumptions (e.g., assumed solar-scaled composition).
We expect our derived abundances to have the usual limitations discussed in the literature stemming from model assumptions (e.g., 1D or 3D model atmospheres, hydrostatic, local thermodynamic equilibrium, the atomic line list) to observational effects (e.g., possible line blends, limited resolution of RVS, instrumental noise). These effects can lead to systematic offsets in the abundance determinations that depend on the atmospheric parameters. However, we could estimate (and correct) these systematic offsets using the GSP-Spec outputs alone and specific samples of stars. For instance, we selected stars from the immediate solar neighborhood (±250 pc from the Sun), with metallicities close to solar (±0.25) and velocities close to the local standard of rest (±25 km/s). In this sample, any ratio of abundances (i.e.,[X 1 /X 2 ] for two elements X 1 and X 2 ) deviating from zero (i.e., solar value) indicates systematics independent of the atmospheric parameters. In Recio-Blanco (2022b), we detail our samples and analysis, and we provide log g-dependent calibration relations for 10 chemical abundances, out of the 13, in the form of polynomials (of the third or fourth-order). In particular, Table 3 of Recio-Blanco (2022b) lists the coefficients values as well as the log g intervals over which the calibration is

Evolutionary APs
Gaia DR3 provides several parameters describing the evolution of a star that we group in two sets. GSP-Phot and FLAME produce these parameters (see Table D.4). We emphasize that FLAME produces two sets of estimates: one using GSP-Phot's APs and one using GSP-Spec's obtained from the BP/RP and RVS spectra analysis, respectively, in addition to using photometry and distance (or parallax).
We first discuss in Sect. 3.3.1 the "observed" parameters: luminosity L, absolute magnitude M G , radius R, and gravitational redshift rv GR . These are relatively model-independent in contrast with the mass M, age τ, and evolutionary stage , which strongly depend on evolution models. We discuss these in Sect. 3.3.2.
3.3.1. Radius, luminosity, absolute magnitude, and gravitational redshift Stellar radius. From the analysis of the BP and RP spectra, GSP-Phot estimates the stellar radii astrophysical_parameters.radius_gspphot and the distances astrophysical_parameters.distance_gspphot. We validate the ratio of twice the estimated radius to the estimated distance, 2R/d, by comparing them with interferometric measurements of angular diameters. Figure 24 presents the excellent agreement with the samples from Boyajian et al. (2012aBoyajian et al. ( ,b, 2013; Duvert (2016), and van Belle et al. (2021). We note that all of these targets are brighter than G < 9.6 and more than 90% of them have high-quality parallaxes with σ > 20 such that GSP-Phot results should be very reliable (Andrae 2022). FLAME also provides radii estimates with a different approach based on the APs from either GSP-Phot or GSP-Spec combined with the Gaia photometry, and parallaxes. The top panels in Fig. 25 compare astrophysical_parameters.radius_flame and astrophysical_parameters.radius_flame_spec with asteroseismic radii for giants from Pinsonneault et al. (2018). The agreement is at the 1% level with a scatter of 4%. Comparisons with other similar catalogs show agreement at the 1 -2% level, see further comparisons in online documentation.
Bolometric luminosity. FLAME estimates the bolometric luminosities, L, using bolometric corrections based on GSP-Phot's and GSP-Spec APs. We compared the L estimates with bolometric fluxes from Stevens et al. (2017). We selected a random subset of 90 000 main-sequence sources with Gaia DR3 parallaxes (panels from the second row in Fig. 25). We found that astropphysical_parameters.lum_flame and astropphysical_parameters.lum_flame_spec agree well with the literature with a median offset of 2-3% and a dispersion of around 5-6%. We also compared our estimates with other catalogs, such as Casagrande et al. (2011), with a median offset of +0.01 L and similar dispersion.
Absolute magnitude M G . Apsis provides two sets of absolute magnitude: one from GSP-Phot obtained from the direct analysis of the BP and RP spectra, G magnitude (and parallax); one from FLAME if we use its luminosity L and the bolometric correction Article number, page 17 of 37 A&A proofs: manuscript no. gdr3_stellar_aps  Figure 26 compares these two magnitude estimates. We find that most of the stars follow the bisector indicating consistent results. However, we find a median absolute deviation of the order of 0.1 mag, and some artifacts. For instance, there are a couple of vertical stripes (e.g., mg_gspphot = 3 mag), which could indicate anomalies due to GSP-Phot's models. In general, we find that FLAME tends to overestimate luminosity, leading to underestimating M G , when using parallaxes when fractional uncertainties are on the order of 15-20%. In contrast, but not surprisingly, we find a stronger agreement when FLAME use distance_gspphot than when it uses parallax as a distance proxy. flags_flame indicates which distances proxy led to the luminosity estimates.
Typical values range from 0.05 to 0.8 km s −1 . Figure 27 compares gravredshift_flame and gravredshift_flame_spec We found a good consistency between the two flavors, with median offset values of −0.05 km s −1 . This disagreement is a direct reflection of the different input data used to produce the value: log g and T eff from GSP-Spec and GSP-Phot, and R from FLAME. Additionally, we selected solar-analog stars from a random subset of 2 million stars from Gaia DR3, those for which GSP-Phot gave T eff within 100 K, and log g within 0.2 dex of the Sun's values. This selection contained 46 667 stars, with a mean rv GR of 588 ± 15 m s −1 , in agreement with the expected value of 600.4 ± 0.8 m s −1 for the Sun (Roca Cortés & Pallé 2014). We repeated this test for the GSP-Spec based result, and we obtained a mean rv GR of 590 ± 8 m s −1 . Although the second sample contained only 386 sources, we also found a good agreement with the known Sun's value.

Mass, age and evolution stage
This section focuses on the most intrinsic evolution parameters: the mass M, age τ, and evolution stage . These are unique products of FLAME (with both GSP-Phot-and GSP-Spec-based flavors). These parameters are strongly model-dependent as they directly relate to the stellar evolution models, here the BASTI models (Hidalgo et al. 2018). In addition, we emphasize that FLAME assumes solar metallicity during the determination of those parameters. Hence, we recommend using those estimates cautiously for stars with [M/H] < −0.5 dex.
Stellar masses. We compared FLAME's masses with those from Casagrande et al. (2011) for main-sequence stars (see third panel in Fig. 25). Although we do not expect a significant influence, we note that Casagrande et al. (2011) also used the BASTI models in their analysis, but they used an older version from (Pietrinferni et al. 2004). We find excellent agreement between the two estimates with a MAD of 0.002 M with a scatter of 0.042 M . Overall, FLAME produces results comparable to literature results, with some outliers or disagreement with other catalogs that we traced back to the different input T eff or log g estimates. In particular one can reduce these outliers for giants if their retrict the M estimates (mass_flame, mass_flame_spec) to only when (i) 1.0 < M < 2.0 M and (ii) τ > 1.5 Gyr.
Stellar ages. Overall, we find an agreement between the ages from FLAME and the literature for non-evolved stars (i.e., main-sequence stars). The bottom panel of Fig. 25 compares the astrophysical_parameters.age_flame and astrophysical_parameters_supp.age_flame_spec with ages from Casagrande et al. (2011). In this comparison, we found a mean offset on the order of 0.1 to 0.3 Gyr with a dispersion around 0.25 Gyr. However, it is more delicate to estimate ages for the giant stars reliably because their ages are very dependent on their fitted mass. In addition, FLAME only relies on L and T eff to obtain ages and masses, which suffers from significant degeneracies. In addition, ages rely heavily on the solar abundance assumption in the FLAME processing. One can trace most differences compared with the literature to the different input T eff and L estimates. To support this statement, we compared FLAME's ages to the ones we obtained with the SPinS public code (Lebreton & Reese 2020). We generated random sets of 600 stars with the SPinS code using the same Gaia DR3 APs that FLAME uses and compared the output ages in four different magnitude intervals. Figure 28 compares the estimates with astrophysical_parameters.age_flame. The agreement for the main sequence stars is always to 1-σ. The agreement is poorer for the evolved stars, but it remains within 3-σ (see Creevey & Lebreton 2022 for more details).
Section 3.5 presents further analysis of the masses and ages using clusters and further comparisons of mass and age with external data. We also present the analysis of the turn-off ages of some clusters in the online documentation, see online documentation.
Evolution stage. The parameter is an integer that takes values between 100 and 1 300, representing the time step along a stellar evolutionary sequence. To first order, we tag main-sequence stars with values between 100 and 420, subgiant stars those between 420 and 490, and the giants above as defined in the BASTI models (Hidalgo et al. 2018). Figure 29 represents the evolution stage for members of four open star clusters (top panels; roughly solar metallicity) and four metal-poor globular clusters (bottom panels). We took the system members from Gaia Collaboration et al. (2018a). These clusters were selected to contain a statistically significant number of stars in the three evolution phases estimated from FLAME. Overall, the main-sequence and giants evolution stages cover the expected color-magnitude space. Although less numerous, the subgiant evolution stages are consis- Fig. 25. Comparison of R, L, M, and age from FLAME to literature values. The left and right panels compares the estimates based on GSP-Phot and GSP-Spec from the astrophysical_parameters and astrophysical_parameters_supp, respectively. The top panel compares radius_flame and radius_flame_spec for giants with asteroseismic radii from Pinsonneault et al. (2018). The second panel compares main sequence luminosities lum_flame and lum_flame_spec with those from Stevens et al. (2017) using a random selection of 90 000 stars. The third panel compares mass_flame and mass_flame_spec with masses from Casagrande et al. (2011), and the bottom panel compares age_flame and age_flame_spec from that same catalog. tent with the expected color-magnitude space. However, we also find discrepancies with expectations due to the stellar models only covering the Zero-Age-Main-Sequence (ZAMS) to the tip of the red giant branch. The bottom panels in Fig. 29 clearly show horizontal giant branch stars incorrectly labeled as mainsequence stars. Outside the ZAMS to the tip of the red-giant branch phases, FLAME labels any star incorrectly. Again, the assumption of solar abundance in FLAME is challenged in those metal-poorer globular clusters.
As no other module produces M, age, or parameters, the only other method to assess their quality is to determine their consistency within other open clusters or wide binaries, which we discuss in Sect. 3.5.

Extinction, Dust & ISM
When estimating the intrinsic stellar APs, it is also necessary to consider the effect of interstellar extinction on the observed SED, resulting in an estimation of the line-of-sight extinction for each star. We thus have extinction estimates from GSP-Phot, ESP-HS (for hot stars), and MSC (for double stars) as one of the spectroscopic parameters estimated from BP/RP spectra (A 0 , A G , A BP , A RP , E(G BP − G RP )). We also have an independent extinction estimate by GSP-Spec based on the analysis of the diffuse interstellar bands (DIB) (see field details in Table D.6).

GSP-Phot.
For all processed sources, GSP-Phot primarily estimates the monochromatic extinction A 0 at 541.4 nm (astrophysical_parameters.azero_gspphot) by fitting the observed BP and RP spectra, parallax and apparent G magni-  Fig. 26. Comparison of luminosities (left) and absolute magnitudes (right) from GSP-Phot and FLAME for all Gaia DR3 sources with estimates from both modules. Numbers quote the median absolute difference (MAD) and the root mean squared error (RMS). We indicated the equations we used to construct the luminosities from GSP-Phot from the radius and temperatures, and the absolute magnitudes from FLAME from the luminosities and bolometric corrections. tude. However, GSP-Phot also estimates the broadband extinctions A G , A BP , and A RP , as well as E(G BP − G RP ) obtained from the models (astrophysical_parameters.ag_gspphot, abp_gspphot, arp_gspphot, and ebpminrp_gspphot respectively). Extinction is a positive quantity, thus GSP-Phot imposes a non-negativity constraint on all estimates. Consequently, it can lead to a small systematic overestimation of extinction in truly low-extinction regions (A 0 < 0.1 mag) 14 . Andrae (2022) demonstrated this effect for the Local Bubble where GSP-Phot estimates a mean extinction of A 0 = 0.07 mag instead of zero. Yet, a decreasing exponential approximates reasonably well the distribution of GSP-Phot's A 0 in the Local Bubble, and it is also the maximum-entropy distribution of a non-negative random vari- 14 The mean or median of a positive distribution is always strictly positive, but never null Fig. 28. Difference between astrophysical_parameters.age_flame with the age derived using the SPinS code normalized by their joint uncertainties. The Gaussian represents the ideal case but centered on the peak difference (-0.4σ) of the results using all stars irrespective of their evolutionary status. The input data are identical, and we assumed a solar-metallicity prior for both codes. We highlighted the sample of MS stars discussed in Sect. 3.3.2.
ate with a true value of zero. In other words, the exponential is equivalent to a Gaussian noise in more common contexts. Consequently, the exponential's standard deviation (identical to the mean value) of 0.07 mag provides an error estimate for A 0 . Similarly, Andrae (2022) reported similar values of 0.07 mag for A BP , 0.06 mag for A G , and 0.05 mag for A RP within the Local Bubble. These values are in agreement with Leike & Enßlin (2019) finding a 0.02 mag. While one could allow small values of negative extinctions such that results for low-extinction stars may scatter symmetrically around zero, Andrae (2022) showed that this is not sufficient in the case of StarHorse2021 (Anders et al. 2022), whose av50 in the Local Bubble peaks around 0.2 mag twice as much as GSP-Phot. We found that StarHorse2021 extinction Comparison of TGE and GSP-Phot extinction estimates A 0 limited to giant stars. We calculated the mean extinction astrophysical_parameters.azero_gspphot per healpix level 9 to compare to TGE optimized map total_galactic_extinction_map.a0. We partially included the TGE tracer selection: 3 000 <teff_gspphot< 5 700 Kand −10 <mg_gspphot< 4 mag (we did not filter on distances). This represents 21 244 458 and 9 271 775 stars for MARCS and PHOENIX library, respectively.
av50 estimates appear globally larger than A 0 from GSP-Phot by 0.1 mag, which is likely a bias in the StarHorse2021 catalog (see Anders et al. 2022, their Fig. 15). Andrae (2022) also observed that in high-extinction regions av50 can become significantly larger than A 0 . It is currently unclear whether this is an overestimation by StarHorse2021 or an underestimation by GSP-Phot (or both).
Using Solar-like stars, Creevey (2022b) investigated the G BP − W 2 color, which uses the Gaia and AllWise passbands for two reasons: (i) a color is a quantity independent of distance, and (ii) as the extinction in the AllWISE W 2 band is negligible, we can safely associate any correlation to G BP (i.e., a proxy for A BP ). We find that the G BP − W 2 color agrees closely to a linear trend with GSP-Phot's A BP estimate to within 0.087 mag RMS scatter, which is consistent with the 0.07 mag obtained for A BP in the Local Bubble. We also found that the linear relation holds from the low-extinction regimes to high extinctions ones. Additionally, Fig. 31 shows also good agreement of our A 0 estimates with our expectations in open clusters with only a mild overestimation of ∼ 0.1 mag (see Sect. 3.5.1).
TGE. GSP-Phot also provides the A 0 estimates used by TGE to produce an all-sky (two-dimensional) map of the total Galactic extinction, meaning the cumulative amount of extinction in front of objects beyond the edge of our Galaxy. TGE selects giant star "tracers" at the edge of the Milky Way, more specifically, stars with gaia_source.classprob_dsc_combmod_star > 0.5, gaia_source.teff_gspphot between 3 000 and 5 700 K, A&A proofs: manuscript no. gdr3_stellar_aps gaia_source.mg_gspphot between −10 and 4 mag, and distances from the galactic plane beyond 300 pc using the gaia_source.distance_gspphot. Once selected, TGE groups the tracers per HEALpix with levels adapting from 6 (∼ 0.08 deg 2 ) to 9 (∼ 0.01 deg 2 ) to have at least 3 stars per group. Finally, TGE estimates A 0 from the median and standard deviation of the ensemble of gaia_source.azero_gspphot values per defined HEALpix. We emphasize that TGE provides two tables: total_galactic_extinction_map, which contains the map with a variable HEALpix resolution (healpix_level) and total_galactic_extinction_map_opt, which contains the resampled information at HEALpix level 9. It is important to remark that TGE primarily uses gaia_source.azero_gspphot, which contains estimates with a mixture of atmosphere libraries, so-called "best fit" estimates. Figure 30 compares the TGE estimates to those of GSP-Phot, for the MARCS and PHOENIX atmosphere libraries providing APs for the giant stars. Although one could expect some AP variations from a set of atmosphere models to another, we find statistically no significant differences between the two libraries and TGE estimates. The large dispersion along the y-axis mostly reflects the low numbers of stars beyond 16 kpc from the Galactic center, especially with high extinction values. Delchambre (2022) provides a more detailed description of the methodology and performance assessment of the TGE maps, especially comparisons with non-stellar tracers (e.g., Planck).

ESP-HS.
For hot stars with G < 17.65 mag, ESP-HS also estimates A 0 by fitting the observed BP and RP spectra (azero_espels). And likewise GSP-Phot, ESP-HS also provides A G , and E(G BP − G RP ). We compared the extinction A 0 from GSP-Phot and ESP-HS using star clusters for the hotter stars (Fig. 31). Both modules find consistent A 0 estimates when deriving extinctions greater than 0.3 mag. However, over this hot star sample, we find that GSP-Phot tend to overestimate extinction by about 0.1 mag constantly, and ESP-HS overestimate by a factor 1.2. Overall, for all stars with GSP-Phot and ESP-HS estimates, we found a MAD of 0.120 mag, and RMS of 0.380 mag. However, we emphasize that these differences, esp. RMS statistics, also vary with the spectral libraries (gaia_source.libname_gspphot or astrophysical_parameters.libname_gspphot). If we restrict the comparison to the OB star library that best describes this temperature regime, we found an improved RMS of 0.170 mag. Hence this illustrates the importance of choosing or exploring which spectral library is appropriate for the sources of interest.
MSC. MSC also estimates the A 0 parameter by assuming that the BP and RP spectra represent a composite of an unresolved binary: two blended coeval stars at the same distance (azero_msc). MSC's performance is similar to GSP-Phot (see Sect. 3.5.2).

GSP-Spec-DIBs
In addition to the stellar APs, GSP-Spec estimated the equivalent width of diffuse interstellar bands (DIBs) in the RVS spectra for 476 117 stars in Gaia DR3. The DIB spectral feature arises from largely unidentified molecules ubiquitously present in the interstellar medium (ISM). GSP-Spec measures the DIB profile parameters: the equivalent width (astrophysical_parameters.dibew_gspspec) and characteristic central wavelength (astrophysical_parameters.dib_gspspec_lambda) using a Gaussian profile fit for cool stars and a Gaussian process for .We compute the GSP-Phot and ESP-HS median estimates using stars with T eff > 7 500 K only. We color-coded the data by the number of hot star members with estimates we found in the cluster (w.r.t. colorbar at the top). On both panels, the gray lines represent the identity relation, and the blue lines a linear regression through the data points. The insets show the normalized distribution of the differences, A 0 (GSP-Phot or ESP-HS) -A 0 (literature). hot stars. We described in detail the DIB measurements procedure in Recio-Blanco (2022b) (Sect. 6.5) and further assessed the performance of those in Schultheis (2022). We emphasize that one should restrict themselves to using the DIB estimates with quality flags astrophysical_parameters.dibqf_gspspec ≤ 2 (Definition in Table 2 of Schultheis 2022). Although one can question the standard analysis in this field, we applied the approach to compare our results with the literature. We estimated a linear relation between dibew_gspspec and ebpminrp_gspphot as E(G BP − G RP ) = 4.508(±0.137) × EW 862 − 0.027(±0.047). (2) We identified the strong outliers to this relation as having an overestimated E(G BP − G RP ) from GSP-Phot (linked to an incorrect temperature estimate; see Schultheis 2022). GSP-Spec also measured DIBs for hot stars (T eff > 7500 K), providing us with a total of 1 142 high-quality DIB measurements. We compared these with the extinction estimates from ESP-HS (astrophysical_parameters.ebpminrp_epshs) and found an excellent agreement with the relation we obtained above (see Fig. 9 of Schultheis 2022). We further compared the DIB EW with the A 0 values of the TGE HEALPix level 5 map (total_galactic_extinction_map) where we found a strong linear correlation given by EW = 0.07 × A 0 + 0.03 up to A 0 ∼ 1.5 mag after which we found a shallower trend. We suspect the slope change originates from TGE providing total extinction far beyond the distance of stars with DIB λ862 measurements. (1) : median estimates of the residuals; (2) : mean absolute deviation (MAD) of the residuals. flags_gspspec with f1,f2,f4,f5,f8=0.
Finally, we estimated the standard quantity E(B-V)/EW of 3.105 ± 0.048, which lies in the range of the derived ratios in the literature (Compilation in Table 3 in Schultheis 2022).

Clusters
Star clusters are very effective in assessing the stellar parameters' qualities, as proven in previous Gaia data releases. Open star clusters are coeval populations: same age, same metallicity, about the same extinction, and distance.
Apsis processed all the stars independently and, in particular, did not exploit the coevolution of stars. This section presents some of the key results concerning the global quality of the APs in star clusters. We provide additional validation, known issues, some calibration relations, and the optimal use of the quality flags in Andrae (2022), Recio-Blanco (2022b) and Fabricius (2022).
We selected a sample of star clusters from the Cantat-Gaudin et al. (2020) catalog. Drimmel (2022) refined the cluster memberships using Gaia eDR3 astrometry. Our selection corresponds to about 230, 000 stars: the number of stars per cluster varies significantly from 40 to more than 700 with an average of ∼ 60 stars. Open clusters contain mostly main-sequence stars with a median G=15.6 mag but their populations significantly vary with the ages of the systems. We approximated the stellar population of each cluster by an isochrone to obtain reference estimates for T eff , log g, mass, age, and distance. Additionally, we assumed homogeneity throughout the color-magnitude diagram of A 0 and [M/H]. For the former, we avoid where differential extinction is more likely to be present by excluding clusters younger than 100 Myr from our samples. We use the PARSEC isochrones 15 for this purpose, associated with the clusters' age, distance, extinction, and metallicity from our literature catalog. Here, we summarized the statistical analysis of the accuracy of the relevant APs over the cluster members.
We compare the atmospheric and evolution APs from GSP-Phot, GSP-Spec, and FLAME to the cluster isochrones. We emphasize that when analyzing the GSP-Spec results, we selected the stars having astrophysical_parameters.flags_gspspec with f1,f2,f4,f5,f8=0. Table 4 presents the median, and MAD of the residuals to the isochrones for T eff , log g, A G , M, and τ derived by GSP-Phot GSP-Spec and FLAME. We note GSP-Phot. We found that T eff , log g, A G from GSP-Phot are in general agreement with expectations, albeit sometimes large dispersions. It is important to note that we analyzed the "best" library estimates (e.g., astrophysical_parameters.teff_gspphot), but the results may vary with different choices of library (e.g., astrophysical_parameters_supp.teff_marcs_gspphot).
GSP-Phot performs better for G < 16 mag, where the SNR of the BP/RP spectra remains high (SNR > 100). Figure 32 illustrates our analysis with the example of Messier 67 (aka NGC2682). In this cluster, we found 4% of outliers defined as ∆T eff /T eff > 0.5. But this fraction varies across the entire Gaia DR3 sample. Overall, we identified that GSP-Phot overestimated T eff values for giants, and underestimated them for supergiants (see Fig.33). In the details, we find that the distribution of the GSP-Phot's log g values has a long tail towards overestimating values on the main sequence. Still, in contrast, GSP-Phot underestimates gravity for hot stars and giants. We also note the issue with metallicity and the extinction estimates reported in Sect. 3.2.1. Messier 67 is at ∼ 850 pc from us, a close distance that GSP-Phot's a priori assumes mostly free of extinction. This prior leads to underestimating the reddening of these stars. As a result of preserving the observed stellar SEDs, GSP-Phot underestimate [M/H]. Andrae (2022) discuss this extinction-distance prior and related issues in detail.

GSP-Spec.
We also analyzed GSP-Spec's APs and found that log g from GSP-Spec could show biases up to −0.3 dex compared to isochrone predictions (similarly to Sect. 3.2.1). In particular, we found significant underestimation for hot stars, and we caution the user against using GSP-Spec's log g values for AGBs as we find them of poorer qualities. We refer to Recio-Blanco (2022b) for the details and especially emphasize that these comparison results depend strongly on the quality flag selection. Recio-Blanco (2022b) also encouraged the user to define calibration relations for their specific use-cases. FLAME We also found that the FLAME APs are in good agreement when we restrict our analysis to the best-measured stars, those with astrophysical_parameters.flag_flame=00,01. The fact that FLAME assumes solar metal metallicity produced poor τ and M estimates in low metallicity clusters, unsurprisingly. However, in the solar metallicity regime, M is in good agreement with expectations (see Table 4). It also seems that FLAME overestimated τ for young stars and underestimates for old stars, with the most significant discrepancies with the literature appearing for cool main-sequence stars.
Using star clusters also has the advantage of assessing if the reported uncertainties are overall of the correct order. FLAME reported underestimated uncertainties on M and τ derived either from GSP-Spec or GSP-Phot APs. Figure 34 demonstrates that the M residuals between GSP-Phot and the isochrones disperse significantly more than the uncertainties (on average of the size of the symbols.).  By comparison to the PARSEC isochrones, we found estimates commonly to the right of the isochrones in the Kiel diagrams, suggesting somewhat older cluster ages than the literature references. Such findings may relate to a systematic underestimation of T eff and log g. Although unlikely, the literature may underestimate the clusters' ages. Still, more likely, our results may suffer from gravitational darkening due to axial rotation on the spectral energy distribution of OBA stars.

ESP-HS.
ESP-UCD. As we detailed in the online documentation, ESP-UCD detects significant overdensities at the positions of several clusters and star-forming regions. We used the BANYAN Σ (Gagné et al. 2018) to identify UCD members of nearby young associations within 150 pc from the Sun. Table C.1 contains the number of sources with membership probability greater than 0.5 in each association and the effective temperature of the coolest UCD. We also include entries for associations beyond 150 pc derived from our clustering analysis using the OPTICS algorithm (Ankerst et al. 1999) in the space of Galactic coordinates, proper motions, and parallax. We did not use these stars to assess the performance of ESP-UCD, but we reported our strong UCD candidates.   (2020). We selected estimates with flag_flame=00. Error bars indicate the FLAME's uncertainties.

Unresolved binaries
In Apsis, the MSC module aims to distinguish between the two components of binaries by analyzing their composite BP/RP spectra. It assumes these sources are blended coeval stars (same distance, extinction, and metallicity). We could not create sufficiently high-quality synthetic models of BP and RP spectra of unresolved binaries; these could not fully model these sources' instrumental (and data reduction) effects. Instead, MSC implements an empirical set of models constructed from observed BP and RP spectra of spectroscopic binary stars(see Creevey 2022a, for details). As a result of the limited number of unresolved binaries for reference with APs, MSC adopted a strong [M/H] prior centered on solar values.
MSC analyzes all sources with G < 18.25 mag and therefore inherently analyzes single stars as well (assuming a binary source). Similarly, GSP-Phot takes all sources to be single stars. As internally MSC operates very similarly to GSP-Phot, we can compare their overlapping results more robustly than any other Apsis module. Figure 36 compares APs from MSC and GSP-Phot parameters with those from the binary sample of El-Badry et al. (2018). It is not surprising that we find a negative bias in temperature and log g from GSP-Phot since it assumed these sources are single stars. These correspond to a luminosityweighted average between the primary and the secondary. Commonly, this leads to a lower T eff and log g to reach the observed brightness of the binary system with a single star. We find that despite its strong solar metallicity prior, the posterior of [M/H] from MSC are broad. Overall MSC is performing better than GSP-Phot on this particular sample of binaries.
The GALAH survey (Martell et al. 2017) provides another set of 11 263 spectroscopic binaries (Traven et al. 2020) with a component flux ratio of less than 5 (i.e., A&A proofs: manuscript no. gdr3_stellar_aps within the MSC parameter ranges). As above, we compared MSC with GSP-Phot on this sample and we find their APs have comparable accuracies. Figure 37 compares the seven APs from MSC with those from GALAH. We note that the plots' color-coding indicates the goodness-of-fit (using astrophysical_parameters.logposterior_msc) rather than a source density. Except for A 0 , the goodness-of-fit is best around the identity line. Such behavior confirms that MSC fits well the composite spectra of binaries when the MCMC procedure converges. The goodness-of-fit also indicates that MSC did not converge for many sources properly. We can flag bad convergence as sources with low logposterior_msc values. Finding a unique threshold for all science applications is challenging. However, Table 5 provides the evolution of the residual statistics with the GALAH sample when changing the goodness-of-fit threshold. By construction, the residuals and the overall biases improve as the threshold increases, but we remove a significant number of sources from the sample. Regardless of this filtering, MSC tend to overestimate log g 1 , log g 2 , and [M/H] for the GALAH sample. We suspect that MSC's prior favoring solar metallicity leads to overestimating [M/H]. As a result, to match the BP and RP spectra, MSC compensates high [M/H] by decreasing the intrinsic luminosity, requiring higher log g values. However, we cannot exclude the existence of biases in the GALAH data as suggested by the fact  (Table D.2 for the corresponding catalog fieldnames). On each panel, we indicate the 1:1 line for reference and color corresponds to the average astrophysical_parameters.logposterior_msc of all stars per bin. We provide associated statistics in Table 5.  (Traven et al. 2020, Sec:8.3). This open issue is also supported by the discrepancies with APs reported by the APOGEE binary sample (El-Badry et al. 2018) with 26 sources in common. We also found chemically homogeneous spectroscopic parameters from Gaia for the components of wide binaries when compared with high-resolution data from Hawkins et al. (2020). In their sample of 25 wide binaries, 20 had a metallicity difference less than 0.05 dex, while the remaining five showed deviations of ∼0.1 dex. From Table 3 of Hawkins et al. (2020), we selected the 20 homogeneous binaries (excluding WB02, WB05, WB09, WB16, WB21) and compared the metallicities from Apsis for each of the two components 16 , without applying any calibrations to the data. These are dwarf stars with T eff between 5000 and 6400 K and metallicities above −0.8 dex. For 16 out of the 20 homogeneous binaries according to Hawkins et al. (2020), the metallicities from GSP-Phot (astrophysical_parameters.mh_gspphot) agree within 0.15 dex. For the remaining 4 binaries they deviate by 0.2 to 0.3 dex (WB08, WB13, WB18, WB22). 18 of the 20 binaries have metallicity determinations from GSP-Spec (astrophysical_parameters.mh_gspspec) for both components, and all except two agree within 0.15 dex. The exceptions are WB14 with a difference of 0.16 dex, and WB15 with a difference of 0.5 dex. WB15 also has a difference in log g (astrophysical_parameters.logg_gspspec) of 1.1 dex, whereas the two components should have equal surface gravity according to Hawkins et al. (2020). This indicates that the Gaia metallicities are reliable (at least in a statistical sense) in the parameter space covered by the binary sample.
We further explored the possibility of "clean" the MSC results by excluding sources with possible spurious astrometric solutions. It is not a surprise that Gaia astrometry may be affected by binarity. We applied the method from Rybizki et al. (2021) and we kept sources with fidelity_v2 > 0.5. After 16 The Gaia DR2 source IDs listed in Table 3 of Hawkins et al. (2020) are the same as the Gaia DR3 source IDs, except for WB13B, which has DR3 source ID 3230677874682668672. this selection, the GALAH sample shrunk from 11,263 to 9,836 sources. The RMS for the distance comparison improves from 617 to 429 pc, and its bias from −184 to −157 pc (when we assume inverse parallax as the "true" distance). It also improves the statistics of the other parameters and overall the agreement with GSP-Phot's APs.
Overall, MSC's performance remains challenging to estimate. Only a few reference catalogs exist, and they rarely provide statistically significant samples (many thousands) with APs. In addition, one needs to use the astrometric measurements of binary systems with caution. We expect Gaia DR4 to provide a significant improvement in the future.

Identification and analysis of peculiar cases (outliers)
Galactic sources dominate the content of Gaia DR3. These with BP and RP spectra are essentially intermediate-mass stars of FGK spectral types with G < 17.65 mag, with the addition of a set of UCDs and extragalactic objects (see Fig. 1). Outliers in this context mean objects that are not "similarly consistent" with the rest of the sample. The similarity in this context relates to the distance metric implemented in the clustering algorithm in the OA module summarized below.
On the one hand, Apsis provides multiple classifications and flags that one can use to identify outliers (see Table D.3). For instance, one can remove stars with emission lines using ESP-ELS parameters, or one can generate a pure sample of solar analogs by combining APs and flags from GSP-Phot, GSP-Spec (see Creevey 2022b, and other examples herein). However, these derive from supervised classifications and comparisons against models, limiting discoveries of peculiar objects.
On the other hand, the OA (outlier analysis) software is an Apsis module that aims at identifying groups of similar objects in the Gaia DR3 sample according to their BP and RP spectra exclusively. OA's approach to unsupervised clustering is entirely empirical by implementing self-organizing maps (Kohonen 2001). One can further explore the resulting clusters and label them or identify new classes of objects. However, OA analyzes only 10% of the sources processed by DSC, those with the lowest DSC combined probabilities of membership to astronomical classes. These represent about 56 million sources in Gaia DR3. We note that the analysis scope will expand in Gaia DR4.
To compare the results from OA to those of DSC, we identified OA's clusters associated with the DSC classes (see Section 11.3.12.3.4 in the online documentation for further details). Table 6 presents the resulting confusion matrix between DSC and OA. We find an 83% agreement between the two classifications for galaxies, however only 35% agreement for quasars where OA confused them with stars and white dwarfs. We assume that the extragalactic classification from DSC is accurate as shown in Delchambre (2022). We note that DSC includes astrometric information in its analysis which OA does not. It is thus not surprising to find significant differences. These results show that both classifications are complementary.
One way to analyze OA's neurons (or clusters) is to compare their prototype spectra with templates. We constructed our templates from averaged spectra having reliable spectral classifications in the literature, mainly from APOGEE-DR17 and GALAH-DR3. The online documentation (Section 11.3.12) details our procedure. Based on these stellar templates, OA attributed spectral labels (A, F, G, K, and M-type stars) to its relevant clusters. We compared these labels to the GSP-Phot temperatures (teff_gspphot). We cast the T eff scale of GSP-Phot stars into: O (T eff ≥ 30000 K), B (10000 ≤ T eff < 30000 K), A (7300 ≤ T eff < 10000 K), stars), F (5950 ≤ T eff < 7300 K), G (5200 ≤ T eff < 5950 K), K (3760 ≤ T eff < 5200 K) and M (T eff < 3760 K), and we constructed the confusion matrix shown in Table 7, which shows the agreement between the two modules.
Overall, the agreement between both classifications is very high. However, we found 51 O-type stars, 6 B-type stars, and 10 A-type stars from GSP-Phot that OA classified as late-type stars. Figure 38 shows 18 BP/RP spectra from stars labeled as M-type by OA but with GSP-Phot T eff > 30 000 K. All these objects have their SED peaked around 850 nm, typically expected for cool stars. As a result of visual inspection, OA identified erroneous T eff labels from GSP-Phot.
On the one hand, the richness and variety of information about Milky-way stars are present in Gaia DR3. On the other hand, different interpretations and inconsistencies in the analysis we provide in the catalog warn the reader to proceed with caution. Fig. 38. BP and RP spectra of 18 stars labelled as M-type by OA, but having T eff > 30 000 K from GSP-Phot. The dashed line indicates the best stellar template for this cluster, corresponding to a M-type star.

Candidates for deeper science analyses
We provide a list of six example use cases below as follows.
First is the identification of sources within some AP ranges. One should use the confidence intervals to find all sources of interest. For instance, Drimmel (2022) select upper main sequence stars from their apparent colors. Creevey (2022b) defined various "golden" samples of stars using our APs, stars with the most accurate and precise astrophysical parameters: for example, FGK star samples supporting many Galactic surveys, solar analogs, ultra-cool dwarfs, carbon stars, and OBA stars challenging our stellar evolution and atmosphere models.
The second is constructing the chemodynamical distribution of stars in some region of space. For instance, Recio-Blanco (2022a) analyzed the chemical patterns in the positions and orbital motions of stars to reveal the flared structure of the Milky Way disk and the various orbital substructures associated with chemical patterns.
The third is constructing the three-dimensional spatial properties of the ISM. Using published extinctions and distances, Dharmawardena et al. (2021) inferred the individual structure of the Orion, Taurus, Perseus, and Cygnus X star-forming regions and found the coherent ISM filaments that may link the Taurus and Perseus regions. One could easily replace those estimates with the ones (or a subset) we presented. Similarly, Schultheis (2022) explores the ISM kinematics using our DIB measurements.
A fourth is the age dating of wide binaries in the field. If an MS star has a white dwarf (WD) companion and a known distance, the age of such a binary system can then be determined precisely from the WD cooling sequence as long as the MS companion gives the chemical composition, much harder to obtain from the WD directly (e.g., Fouesneau et al. 2019;Qiu et al. 2021).
A fifth is providing the largest uniformly derived set of APs that one could use to calibrate theoretical or data-driven stellar models. For instance, Green et al. (2021) developed a data-driven modeling technique to map stellar parameters (e.g., T eff , log g, [M/H]) accurately to spectrophotometric space, supporting more accurate 3D mapping of the Milky Way.
A sixth application could be understanding the details of star formation and the dynamical evolution of star clusters. For instance, Fig. 39 compares the FLAME's (current) mass estimates with a simulation of stars drawn for a universal initial mass function (IMF; assumed here a Kroupa 2001). This simulation is created by sampling the mass function (over the given mass range) for each cluster with their respective given number of Gaia identified members with mass estimates. Although we make a comparison of current with initial stellar masses, the agreement is overall very good. The lower-mass end is affected by how many low-mass stars Gaia can extract from these clusters and thus cannot be well reproduced without a selection function. The uppermass end agrees perfectly with our predictions from a single IMF. We note that FLAME cannot predict masses above 10 M with its current models. Such analysis could support the study of cluster evaporation and mass segregation when also accounting for stellar mass loss.
Of course, this list is not exhaustive. The previous Gaia data releases led to thousands of studies ranging from solar system M. Fouesneau et al.: Gaia Data Release 3: Apsis II -Stellar Parameters

Limitations
Users should keep in mind the following assumptions and limitations of our Gaia DR3 catalog. We produced APs that summarized many-dimensional posterior distributions using only quantile numbers such as mean, median, and percentile values (computed on one-dimensional marginal distributions). It is rarely possible to recover the complexity of the posterior distributions per object. One can query the MCMC chains published by GSP-Phot and MSC. These summary statistics cannot capture the full complexity of these distributions. One should not ignore the confidence intervals.
Most sources in Gaia DR3 have substantial fractional parallax uncertainties. Hence, the spectro-photometric data (BP/RP) often dominate the inference of our distances and APs. However, the parallax remains generally sufficient to limit the dwarf versus giant degeneracies.
The poorer the data, the more our prior dominates our estimates. Our prior varies significantly per Apsis module. None of which included a three-dimensional extinction or Milky Way detailed model. One should expect significant differences with other AP catalogs when prior dominates. However, in reality, if the actual stellar population, extinction, or reddening distributions are very different from Galactic models, those differences may partially hint at these deviations.
To derive stellar APs, we implicitly assumed that all Gaia sources are single stars in the Galaxy (apart from MSC). Those estimates are most likely incorrect for any non-single star (binaries, extended sources, extragalactic).
Furthermore, our stellar models also had intrinsic limitations in the range of parameters they could handle. For instance, our models did not include specific physics inherent to WDs, AGBs, and HB-stars.
Finally, by design, we infer properties for each source independently. If a set of stars is known to be in a cluster, they have a similar distance, extinction, chemical patterns, and age. It consti-tutes a prior that one should exploit to infer the properties of the individual stars more accurately than what we have done here.

Summary
We have produced a catalog of distances, astrophysical, and dust extinction parameters using the Gaia BP, RP, RVS spectra, integrated G photometry, and parallaxes available with Gaia DR3. More specifically, we provide: -470 million distances, T eff , log g, and [M/H] estimates using BP/RP, -6 million using RVS T eff , log g, [M/H], [α/Fe] estimates; -470 million radius estimates; -140 million mass, and 120 million age estimates; -5 million chemical abundance ratios; half-a-million diffuse interstellar band analysis parameters; -2 million stellar activity indices -200 million Hα equivalent widths, and further stellar classification with 220 million spectral types and 50 thousand emission-line stars.
We presented only a high-level overview of the validation and performance of these data products. We detail some of these tests and results in Creevey (2022a), Delchambre (2022), Andrae (2022), Recio-Blanco (2022b), Lanzafame (2022), Fabricius (2022) and the online documentation. Our tests comprised checking the astrophysical consistency of our data through, for example, HR or Kiel diagrams, which help to point out weaknesses in our analyses or failure in specific regions of the stellar parameter spaces. In addition, we compared our estimates with external literature data to assess the performance of Apsis. The complexity and spread of our products often led us to restrict our tests to sub-samples and extrapolate our conclusions.
We emphasize that we did not calibrate Apsis APs to mimic external catalogs. Many of these external catalogs are not consistent with each other. As we do not know the true absolute scale of each AP dimension, we used external catalogs sometimes to obtain statistical relations to anchor our APs to a common ground. We recommend using these relations, but we did not apply them before the publication and instead provided the community with internally consistent APs.
First and foremost, our models have limitations in the range of parameters they can handle, and we made assumptions that we discussed in Sect. 5.
Our data necessarily demanded several extreme simplifications and assumptions. Therefore, one should use the data with great care. We recommend always using the flags/filters, defined in Appendix A.
Our catalog increases the availability of APs in the literature while offering results based on assumptions that differ from previous works. Such works helped to validate our results. In addition, it provides the community with values of reference to explore and understand better the content of Gaia DR3.
Gaia DR3 is not an incremental improvement of the Gaia data. It multiplies the quantities of multi-messenger information of Gaia with new data products (e.g., BP, RP, RVS, APs). We increased the volume of sources with APs by a factor of 5, but also increased the number of APs from two to ∼ 40. Gaia DR3 represents a significant step forwards to anchor all current and future spectroscopic surveys to a common ground, and it provides us with the most comprehensive view of our Galaxy. Table C.1 lists the young associations for which we have identified candidate UCD members using BANYAN Σ (Gagné et al. 2018) or the OPTICS clustering algorithm (Ankerst et al. 1999).
Appendix D: AP estimates, producers, and where to find them In this section, we compiled the various estimates of stellar parameters from Gaia DR3, which Apsis module producing them, and which table and field store the values in the Gaia catalog.