Press Release
Open Access
Issue
A&A
Volume 691, November 2024
Article Number A98
Number of page(s) 19
Section Catalogs and data
DOI https://doi.org/10.1051/0004-6361/202451427
Published online 31 October 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

Galactic Astronomy has become an exceedingly data-driven field with a vast and rapidly growing collection of multi-temporal and multi-wavelength data being observed by a multitude of stellar surveys. The Gaia mission (Gaia Collaboration 2016), which is currently charting a six-dimensional (6D) phase-space map of the Milky Way, is currently providing accurate measurements for almost two billion stars in our Galaxy and the Local Group (Gaia Collaboration 2018b, 2021, 2023c). The volume of astronomical data available passed the 100 Terabyte (1014 bytes) mark by the end 20th century and is expected to continue rapidly expanding towards the Petabyte (1015 bytes) and Exabyte (1018 bytes) scales in the near future (Fluke & Jacobs 2020; Vavilova et al. 2020).

In response to the growing amount of data being collected, a variety of machine learning (ML) techniques have been integrated to complement more classical statistical methods of data analysis. These ML techniques are now at the forefront of data mining and analytics, thanks to their ability to cope with the current scale of the data and in anticipation of even greater volumes of data being released. Furthermore, the versatility of ML techniques allow us to use them in performing a wide variety of tasks, such as classification, regression, outlier detection, and data compression on large scales (e.g. Ivezić et al. 2014). Overall, ML techniques have quickly become effective tools for astronomers to to turn a copious amount of raw data into useful knowledge (for reviews, see e.g. Baron 2019; Fluke & Jacobs 2020; Sen et al. 2022).

Recently, the scope of Gaia data has been expanded by the inclusion of more than 219 million low-resolution XP spectra (Carrasco et al. 2021; De Angeli et al. 2023; Montegriffo et al. 2023) in the third Gaia data release, DR3 (Gaia Collaboration 2023c). For Gaia DR3, the mean XP spectra have primarily been used to infer much more precise stellar parameters, such as: Teff, log ɡ, d, A0, and [M/H] (Andrae et al. 2023a; Fouesneau et al. 2023). Various groups have already shown that this new dataset can potentially be used for a variety of Galactic archaeology purposes: to estimate rough stellar α-element abundances (Gavel et al. 2021; Hattori 2024; Laroche & Speagle 2024; Li et al. 2024), efficiently locate extremely metal-poor stars (Witten et al. 2022; Xylakis-Dornbusch et al. 2022, 2024; Lucey et al. 2023; Yao et al. 2024), classify white dwarfs (Echeverry et al. 2022; Gaia Collaboration 2023b), study hydrogen emission and absorption lines (Weiler et al. 2023), or improve the accuracy of stellar parameters determined from Gaia RVS spectra (Guiglion et al. 2024).

Here, we aim to take advantage of the Gaia DR3 XP spectra by combining this dataset with additional information, both from Gaia and photometric databases. This paper aims to investigate the use of a relatively new but robust and well-tested regression technique, xgboost, trained using spectroscopically derived data from a variety of stellar surveys, to determine the following stellar parameters: extinction, effective temperature, metallicity, surface gravity, and mass from astrometric and photometric data provided by the Gaia mission. Similarly to other ML techniques, xgboost achieves this by learning the complex, non-linear relationships between the raw input data and the desired outputs.

For the training set, we mainly use the spectroscopy-derived data from Queiroz et al. (2023, hereafter Q23) obtained with StarHorse (Queiroz et al. 2018, 2020), a Bayesian isochrone-fitting code that yields precise results for spectroscopic surveys. When only supplied with astrometric and multi-band photometric data, StarHorse still produces very competitive results, compared to other techniques (as we have shown in Anders et al. 2019, 2022). However, there are two main disadvantages with Bayesian isochrone fitting for these large datasets: (1) it is considerably slower and computationally more expensive, since the likelihood for each star has to be computed for a larger stellar-parameter range; and (2) it cannot make direct use of additional information (e.g. Gaia proper motions, colour-excess factor, or XP spectra1). Therefore, in this paper we explore the possibility of circumventing these problems with a well-trained ML algorithm, thus freeing computational resources and greatly improving the CO2 budget of the new catalogue. In addition, by taking into account the information contained in the Gaia DR3 XP spectra, we are able to improve, for instance, the inferred metallicities for a large sample of stars covering large portions of the Galaxy. We stress, however, that in this work we do not attempt to derive new age or distance estimates, since the information about these quantities is not directly encoded in the Gaia DR3 XP spectra.

This paper is structured as follows: in Sect. 2 we describe the data used to train and predict stellar parameters for Gaia stars. In Sect. 3 we explain the concept and implementation of xgboost, the supervised regression algorithm we use in this paper. Section 4 presents the scope of the produced catalogues, including a discussion of the caveats of the present catalogue (Sect. 4.2). Some immediate results derived from the catalogue are shown in Sect. 5. We conclude the paper with a short discussion and our conclusions in Sect. 6. The appendices of the paper contain comparisons to other catalogues (Appendix A), data model of the produced catalogue (Appendix B), and a possible re-calibration of the metallicity scale based on open and globular clusters (Appendix C).

2 Data

In supervised machine learning, the most important ingredient for building a well-performing classifier or regressor is the availability of a sufficiently large, precise, and accurate training dataset that covers as much of the parameter space as possible. For the case of the vast Gaia DR3 dataset (totalling more than 1.8 billion stars), a small but sizeable subset (≳7 million stars) has already been co-observed spectroscopically by several major multiplex stellar surveys. This subset covers all major components of the Milky Way (discs, halo, bar, and bulge) and, in particular, the Apache Point Observatory Galactic Evolution Experiment (APOGEE; Majewski et al. 2017) even exceeds the volume reached by Gaia towards the inner Galaxy. We can therefore consider the part of the Gaia DR3 catalogue observed by ground-based spectroscopic surveys a well-sampled subset of the full catalogue, at least down to the typical limiting magnitude of the most relevant spectroscopic surveys.

2.1 Spectroscopic labels

Our training set consists of roughly 7 million stars that have been observed by large ground-based spectroscopic stellar surveys, such as APOGEE, GALactic Archaeology with HERMES survey (GALAH; De Silva et al. 2015), Gaia-ESO Survey (Gilmore et al. 2012, 2022), LAMOST surveys (Cui et al. 2012; Deng et al. 2012), RAdial Velocity Experiment (RAVE; Steinmetz et al. 2006, 2020), and Sloan Extension for Galactic Understanding and Exploration (SEGUE; Yanny et al. 2009). For all these surveys, Q23 reported precise distances, extinctions, and stellar parameters using the Bayesian isochrone-fitting code StarHorse (Queiroz et al. 2018, 2020). The data cover the main parts of the Hertzsprung-Russell diagram, from the hot main sequence (although sparsely sampled) to M-type stars (including dwarfs and asymptotic giant-branch stars). For more information on the individual spectroscopic survey datasets, we refer to Sects. 34 and Figs. 15 of Q23.

To cover an important part the parameter range that has been missed by StarHorse due to the lack of consistent stellar models in this regime, we added the HotPayne catalogue of LAMOST OBA stars (Xiang et al. 2022), a curated set of very metal-poor (VMP) stars2, along with the Gaia EDR3 white-dwarf catalogue of Gentile Fusillo et al. (2021) and the Gaia EDR3 hot sub-dwarf catalogue of Culpan et al. (2022) to this training set. The de-reddened and distance-corrected colour-magnitude diagrams (CMDs) of each of the training datasets are shown in Fig. 1. After joining the different datasets, we cleaned the sample from duplicate stars, giving preference to higher-resolution surveys and higher signal-to-noise (S/N) observations if multiple entries of the same stars were included.

thumbnail Fig. 1

Distance- and extinction-corrected Gaia DR3 CMDs for each of the spectroscopic stellar surveys used in the training and test data for this work. The bottom right panel shows the joint dataset. Note: not all data points were used in the training and testing of the xgboost models; duplicates were removed and different quality cuts applied for each training label.

Table 1

Cleaning conditions applied to each of the training labels.

2.2 Training and test datasets

We have aimed to keep the training data simple: we only used as our input columns (aka ‘features’) the Gaia DR3 XP spectral coefficients (Carrasco et al. 2021; De Angeli et al. 2023), Gaia EDR3 astrometry (l, b, ϖ, σϖ,µra, µdec), and broad-band {G, BP, RP} photometry (Gaia Collaboration 2021), and Gaia EDR3 astrometric fidelity flag fidelity_v2 (Rybizki et al. 2022), as well as infrared (IR) JHKs and W1W2W3W4 magnitudes from 2MASS (Cutri et al. 2003) and AllWISE (Cutri et al. 2013), respectively. The XP coefficients have been normalised per star by the value of the first coefficient.

The ML algorithms (especially neural networks) are often susceptible to the varying quality of the training data. To test their impact, we performed tests using different quality cuts for the input data (i.e. varying the maximum uncertainty for each parameter; see second line in Table 1), finding no significant improvements. We therefore chose to use the full set of columns for each of the training datasets.

For each label (extinction AV, effective temperature, log Teff, surface gravity, log ɡ, metallicity, [M/H], and stellar mass, M), we applied a set of criteria to clean the training data from poorly determined or unreliable labels, as detailed in Table 1. For each label, 80% of the data were used during the training phase (see the next section), while 20% of the data are reserved for estimating the overall performance (test dataset; see Fig. 2).

thumbnail Fig. 2

Performance of the xgboost models for the test datasets for each of the training labels. In the top row, we show the SHBoost (mean) parameters predicted from Gaia DR3, 2MASS, and AllWISE against the spectroscopic values (test labels). The middle row shows the residuals (predicted: ‘true’). The bottom row shows the formal uncertainties (derived with xgboost-distributions). Each panel contains logarithmic density plots of the full sample of 217 million stars. The lines and shaded regions in the middle and bottom rows show the running median and 1σ quantiles, respectively.

3 xgboost regression

A large and still growing variety of ML algorithms exists on the market, each of them optimised for slightly different tasks. The problem we are dealing with here boils down to regression on tabular data. Borisov et al. (2021) and Grinsztajn et al. (2022) recently surveyed the performance of several state-of-the art ML regression techniques on tabular data. They concluded that algorithms based on gradient-boosted tree ensembles still outperform deep learning models for this kind of task (although some neural-network implementations, if sufficiently trained, can perform similarly well on large enough datasets; see e.g. Klambauer et al. 2017). Also, xgboost has the additional advantage that it requires much less training time and parameter tuning (Shwartz-Ziv & Armon 2021).

After some initial tests and complementary efforts using artificial neural networks, we therefore chose the best-performing tree-based algorithm considered by Borisov et al. (2021, see their Fig. 4), xgboost, for the purpose of this work. Following previous implementations (see Anders et al. 2023b), our approach was successfully adopted by other groups (Rix et al. 2022; Andrae et al. 2023b). Overall, xgboost (Chen & Guestrin 2016) stands short for extreme gradient-boosted trees and it is a supervised ML algorithm that can perform both classification and regression. The algorithm has been used to some extent in astronomy for classification tasks (e.g. Bethapudi & Desai 2018; Yi et al. 2019; Li et al. 2021; Cunha & Humphrey 2022; Tolamatti et al. 2023; Xu et al. 2024); more recently also its regression capability has been explored in some astronomical use cases – e.g. photometric redshifts (Li et al. 2022a; Jia et al. 2023), sunspot prediction (Dang et al. 2022), or spectroscopic stellar ages (Hayden et al. 2022; He et al. 2022; Anders et al. 2023a).

In principle, there are various ways to estimate uncertainties with xgboost (e.g. training the model on various subsets of the training data, or training models with a wide range of hyper-parameters). Perhaps the most rigorous way is to modify xgboost’s cost function to include the prediction of confidence intervals in the inference (Duan et al. 2019). Thankfully, a fast implementation of this method based on the latest versions (>2.0) of the xgboost library3 is available in the python module xgboost-distribution4, which we therefore used to build our models. The results of this module can be interpreted as pseudo-posterior distributions (in the Bayesian terminology). As in Anders et al. (2023a), we provide both the ‘traditional’ xgboost point estimates as well as the mean and standard deviation provided by xgboost-distribution.

In addition to high accuracy, stability to noise, and relatively short training time, another advantage of using xgboost is that it can be coupled with a powerful feature-attribution code, SHAP (Lundberg et al. 2020), which makes tree-based regression methods much more interpretable, even at the level of individual data points. In this manner, xgboost is no longer an ML ‘black box’, but it also allows us to measure which of the input data (features) carry most of the information about the desirable parameters (labels). We analyse the SHAP values in more detail in Sect. 3.3.

3.1 Tuning hyper-parameters

Like most ML techniques, xgboost has a number of hyper-parameters that can be tuned to optimise the performance. We tested each of the main parameters individually as well as with scikit-learn’s GridSearchCV, a cross-validation algorithm that can be used to loop over hyper-parameter configurations. Our tests confirmed that xgboost is extremely stable under different configurations. The range of hyper-parameters that can be considered close to optimal is quite large. Using a sub-sample of 1% of the training and test data, we found that for our case the following configuration yields optimal results in an acceptable runtime: learning_rate=0.1–0.2, max_depth= 6–10, min_child_weight= 1, and subsample= 0.8.

thumbnail Fig. 3

Uncertainty distributions. Top row: logarithmic histograms of the uncertainties for the training (yellow), test (green), and full Gaia DR3 XP (blue) samples. The median uncertainties are indicated in each panel. Second row: uncertainties as a function of Gaia G magnitude. Bottom row: Uncertainties as a function of distance. In the second and third row, the white lines and shaded bands show the median trends and corresponding 1σ quantiles, respectively.

3.2 Typical accuracy and precision

Figure 2 shows the performance of xgboost regression on the Gaia DR3 XP spectra dataset for each of the desirable parameter labels for the test dataset not used in the xgboost training phase. The top panels of Fig. 2 show direct comparisons between the labels and the label predictions, the middle panels show the residuals, and the bottom panels the estimated uncertainties. Focusing on the middle row, we find that the residuals are typically small in the regions where the bulk of the training and test data are located; furthermore, they typically grow in regimes where training data is sparse (e.g. log Teff > 3.9, [M/H] < −0.8, M > 1.8 M). This behaviour is expected and typical for machine-learning regression techniques, both tree-based and neural networks. It can, in some cases, be mitigated by carefully weighting or balancing the training data (e.g. Ciucă et al. 2021) or by defining a different cost function in the fitting process (Ivezić et al. 2014, Chapters 8.3 and 8.9). In our case, we tried both attempts but then found worse performances for the full dataset, which in the end led us to keep the unbalanced, but large training set.

The test data uncertainties derived with xgboost-distribution are shown in the bottom row of Fig. 2. They nicely follow the expected trends with the corresponding stellar parameter (e.g. greater AV, Teff, or mass values are associated with a greater uncertainty). In particular, the derived uncertainties are always larger than (or at least compatible with) the residuals shown in the middle row of the figure, which shows that the uncertainties are not underestimated, at least for the test dataset.

Some particular features in Fig. 2 may be noteworthy. For example, it appears that the extinction estimates in the low-extinction regime are typically overestimated, but part of this trend stems from (nonphysical) negative extinction estimates in the training data that xgboost is compensating for. Another expected feature is the rectangular structure in the log g comparison (top middle panel in Fig. 2), which hints at an occasional confusion between dwarfs and red-clump stars in cases where the surface gravity is poorly constrained by the Gaia parallax. Finally, the thin vertical stripes in the metallicity panels (fourth column in Fig. 2) are inherited from the training set; namely, the StarHorse runs of Queiroz et al. (2023). A similar (although less pronounced) effect is observed also in the panels that decribe the mass (last column).

The typical precisions achieved for each label in the full Gaia DR3 XP sample are presented in Fig. 3. In particular, we achieve median precisions of 0.20 mag in AV, 0.014 dex in log Teff (or 174 K in Teff), 0.20 mag in log ɡ, 0.18 dex in [M/H], and 12% in mass. These uncertainties are highly heteroscedastic: for example, more than 50 million stars (using the 25th percentile of the label uncertainty distributions) have uncertainties smaller than 0.11 mag in AV, 0.010 dex in log Teff, 0.13 dex in log ɡ, 0.14 dex in [M/H], or 8% in mass, respectively. As expected, the uncertainty distributions for the full dataset (top row of Fig. 3) are broader than for the training and test data (since the bulk of the sample is fainter than most of the training set), but the typical uncertainties are still smaller than the ones obtained from pure astro-photometric isochrone fitting (as performed in e.g. Anders et al. 2022; see Appendix A.2). As expected, the reported uncertainties increase with magnitude for most of the labels (except for stellar mass, which is also expected). The uncertainties also increase with distance, as expected.

thumbnail Fig. 4

Feature importance plots for each output parameter determined with SHapley Additive exPlanation values (SHAP; Lundberg & Lee 2017). In each panel, the SHAP values for 20 000 random stars are aggregated. The rows in each panel are ordered by decreasing feature importance (from top to bottom). Only the 20 most important columns are shown; the abscissa ranges are cut to 99.8% of the SHAP value range for better visibility.

3.3 Feature importance

The concept of SHAP (SHapley Additive exPlanations) values provides an elegant way to understand the output of a machine-learning model (Lundberg & Lee 2017). They provide an empirical explanation of how each feature of a given data point impacts the predictions of the model (as also done in Anders et al. 2023a for the case of spectroscopic stellar age determination). Positive SHAP values mean that the respective feature increases the output’s model value, while a negative SHAP decreases it. The exact behaviour depends on the combination of all other feature values, so that in many cases similar feature values for two particular stars may have opposite effects on the label prediction. For illustrative examples and a detailed explanation, we refer to the documentation of the shap package5.

Figure 4 shows the SHAP beeswarm summary plots6 for each of the five predicted labels, calculated for a random subset of 20 000 stars. We can think of each star as a pixel in each row. Each subplot shows, from top to bottom, the 20 most influential features (and the exact effect that each of them has) in the prediction of the corresponding label (AV, Teff, log ɡ, [M/H], and M). It is clear that the Gaia XP coefficients contain very valuable information for the prediction of the stellar labels. For example, for AV the most important features are the fifth BP coefficient and the first RP coefficient, followed by Galactic latitude (which is obviously a good first prior for the presence of dust), the third RP coefficient, the eighth BP coefficient, and the 2MASS Ks and J magnitudes. In the case of effective temperature (second panel in Fig. 4), we observe that the first BP coefficient (obviating the zeroth coefficient that was used to normalise the XP spectra) is the most influential piece of information (in accordance with e.g. Figs. 25 and 26 in De Angeli et al. 2023). For most labels, the Gaia parallax also plays an important role in the determination of the output.

Recently, Guiglion et al. (2024) also explored feature importance using neural network gradients in the context of their hybrid-CNN method for the Gaia RVS spectra. The hybrid-CNN method employs an entirely different machine learning approach and input dataset, with a combination of Gaia DR3 RVS spectra, photometry (G, G_BP, G_RP), parallaxes, and XP coefficients. They demonstrated that for each label (i.e. Teff, log ɡ, [M/H], [α/Fe], [Fe/H]) their neural network learns from the characteristic spectral features in the Gaia RVS spectra and the specific Gaia XP coefficients, in addition to the parallax and magnitudes. The difference in the machine learning methodology, the dataset and the method of estimation of the feature importance can result in a different relative importance of the features.

For Teff values found by the hybrid-CNN, in decreasing order of importance, the fifth, fourth, second, sixteenth, and first BP coefficients provide the most information. For log g, the fourth, sixteenth, second, fifth and eighth BP coefficients are found to be important. For [M/H], the sixteenth BP coefficient was found to be most important followed by the fourth, sixth, fourteenth and eleventh RP coefficients. The parallax information is also found to be very crucial for log ɡ and Teff (as previously demonstrated by Guiglion et al. 2020 for the RAVE spectra). The most important XP coefficients identified for the hybrid-CNN method (as listed above and also see their Fig. 10) are found to be common with most important XP coefficients we present in Fig. 4 for the xgboost method. A more detailed comparison with the stellar parameter catalogue provided by Guiglion et al. (2024) is given in Appendix A.1.

4 The SHBoost Gaia DR3 catalogue

In this paper, we use an xgboost regression to produce a catalogue of stellar properties derived from Gaia DR3 XP spectra, astrometry, and multi-wavelength photometry. This catalogue, referred to as SHBoost, comprises the extinction, effective temperature, surface gravity, [M/H], and mass estimates for more than 217 million stars and is made available via CDS/VizieR. The data model is described in Appendix B.

As mentioned in Sect. 3.2, Fig. 3 shows a summary plot of the uncertainties for each parameter, as histograms (top row) and as a function of magnitude (second row) and distance (bottom row), respectively. Below, we give a few examples of how our catalogue can be used to study stellar populations and interstellar dust in the Milky Way and the Local Group. Comparisons with other stellar-parameter catalogues based on Gaia DR3 are presented in Appendix A.

4.1 Kiel and colour-magnitude diagrams

In Fig. 5, we show the Kiel diagrams (log ɡ vs. log Teff) obtained from the full Gaia DR3 XP run in four magnitude bins (G < 14, 14 < G ≤ 16, 16 < G ≤ 17, and 17 < G ≤ 19). Figure 6 shows the colour-magnitude diagrams (corrected for distance and extinction) for the full Gaia DR3 XP sample for which GBP, GRP, and parallaxes are available (217.97M stars) and for a sub-sample of white-dwarf stars (one of the novelties of the present work with respect to Anders et al. 2022).

At a first glance, the diagrams are unsurprising and show expected trends (for example, the Kiel diagrams in the four magnitude bins shown in Fig. 5 look similar, and the full CMD in Fig. 6 resembles the one in the lower right panel of Fig. 1). However, upon a closer inspection, we notice a series of interesting details.

The top panels of Fig. 5 show Kiel diagrams that are almost fully consistent with our expectations from stellar evolutionary models. As our composite training set covers most of the areas of Kiel diagram actually occupied by the Milky Way’s stellar populations and is indirectly based on stellar models, this is expected. However, since our predictions of Teff and log g are independent (in the sense that we train an xgboost model for each label separately), this is not a completely trivial result. In fact, the upper right panel, for example, shows that some stars around log Teff ≃ 3.6 fall in unphysical log ɡ ranges, which indicates that our capacity to determine their surface gravity is not at the same level as our Teff accuracy.

The brightest magnitude bin (upper left panel of Fig. 5 does not contain any WDs, since these are too intrinsically faint. The bulk of the WDs is contained in the faintest magnitude bin (beyond the magnitude limit of G = 17.65 for non-WDs in Gaia DR3; Montegriffo et al. 2023). Nevertheless, the total number of white dwarfs contained in the XP dataset is low (see also Fig. 6; lower row).

The second magnitude bin (upper right panel of Fig. 5) contains a small number of stars classified as white dwarfs (log ɡ ≃ 8) and hot sub-dwarfs (log ɡ ≃ 6), the latter well separated from hot main-sequence stars (log ɡ ≃ 4). This separation becomes less obvious in the fainter magnitude bins (the hot part of the MS tilts towards higher log g in the lower panels of Fig. 5), which means that our results do not allow for a completely clean separation between O-type main-sequence stars and hot sub-dwarfs (consistent with the findings of Gaia Collaboration 2018a).

The lowest-mass MS is missing in our training set, which results in overestimated effective temperatures for ultra-cool dwarfs. These objects accumulate at the lower end of the temperature sequence in Fig. 5.

The top left panel of Fig. 6 shows that there is a sizeable number of (mainly red-clump, but also upper-MS) stars with inaccurate/uncertain extinction estimates, which results in the unphysical diagonal stripes (the most obvious one extending above and below the red clump). This feature is also visible in parts of the training set; in particular in the StarHorse LAM-OST LRS and MRS data (central panels of Fig. 1). We tested the exclusion of these data from the training set, but achieved worse results – indicating that the fainter magnitude limit of LAMOST makes these stars crucial during the training phase. We thus look forward to a re-calibration of all spectroscopic survey data to a common stellar parameter scale (see e.g. the recent efforts of Thomas et al. 2024), which should lead to a significant improvement of the stellar-parameter homogenisation problem in the future.

The second and third panels of Fig. 6 show the intrinsic CMD colour-coded by metallicity (median xgbdist_met_mean per pixel) and metallicity uncertainty (median xgbdist_met_std per pixel), respectively. In particular, the right panel highlights the CMD regions in which the XP spectra are most sensitive to metallicity (FGK dwarfs and giants, as expected). The derived metallicities of WDs and stars in edge regions of the Kiel diagram/CMD should not be used.

4.2 Caveats

In this subsection, we reiterate the observational and methodological caveats associated with our SHBoost catalogue. Some of them were already discussed with the validation plots shown in the previous subsections or in the comparisons to other stellar-parameter catalogues shown in Appendix A. The most important caveats of our method are described in this section.

Most importantly, the accuracy of any supervised machine learning model is limited by the accuracy of the training set it uses. In our case, most of our training set is coming from the StarHorse catalogues of Queiroz et al. (2023) derived from ground-based spectroscopic surveys and Gaia DR3. These stellar parameters are based on state-of-the-art evolutionary models, but do not take into account unresolved or interacting binaries.

A second (and related) caveat is that in order to cover a larger parameter space of the Hertzsprung-Russell diagram, we joined the Queiroz et al. (2023) catalogues with other catalogues of hot stars, white dwarfs, very metal-poor stars, and hot sub-dwarfs (see Fig. 1). At the margin between these regimes, systematics can be expected. These systematics certainly contribute to both the statistical and the systematic uncertainty budget of the SHBoost catalogue and, thus, the dedicated catalogues for smaller parts of the Hertzsprung-Russell diagram will likely be more precise within their specified realm of validity.

A minor caveat in this respect is that there are also systematic differences in the input spectroscopic stellar parameters of the different StarHorse catalogues. Our training set is based on the StarHorse posteriors, so the systematics between different surveys are to a large degree homogenised, but some systematics might still be present (see discussion in Queiroz et al. 2023). A true inter-survey homogenisation should follow approaches such as Ness et al. (2015); Ting et al. (2019); Guiglion et al. (2020); Ambrosch et al. (2023), or Thomas et al. (2024). Homogeneous stellar parameter catalogues might become available for more surveys in the near- and mid-term future (Tsantaki et al. 2022).

Due to our philosophy of using the StarHorse posteriors as the input label scale, the produced xgb_met metallicities are based on [M/H] (assuming the Salaris correction for α-enhancement). Thus, we see significant trends with respect to [Fe/H] values, particularly at low metallicities (see Appendix C).

Although we included a significant set of very metal-poor stars in the training data, our catalogue results also comes with much larger uncertainties at low metallicities.

By construction, xgboost, unlike StarHorse or similar Bayesian stellar inference codes, cannot take into account co-variances between the stellar parameters. Correlations between, for example, Teff and log ɡ are therefore not taken into account. This weakness cannot be overcome with currently available tree-based methods, while it is intrinsically taken care of by many multi-label output neural networks (e.g. Fallows & Sanders 2022, 2024; Anders et al. 2023b; Guiglion et al. 2024). Nevertheless, we have achieved physically meaningful results for the Kiel diagram (see Fig. 5), for instance. This means that many stellar-parameter degeneracies that appear in purely photo-astrometric inferences (e.g. Anders et al. 2019, 2022) disappear when using the Gaia XP spectra.

We obtained stellar parameter and extinction estimates for all objects with Gaia DR3 XP spectra (217 974770). This sample includes also a small portion of bright (G < 17.65) galaxies and quasars. For a significant portion of stars, some of the derived quantities are very uncertain or even meaningless (see e.g. the metallicities of white dwarfs in Fig. 6). All these results should be filtered out either by their uncertainties or the corresponding parameter flags.

In this work, we did not attempt to derive new age or distance estimates, since the information about these quantities is not directly encoded in the Gaia DR3 XP spectra. All the maps in this paper use Bayesian distance estimates from Anders et al. (2022) or Bailer-Jones et al. (2021), both of which are based on Gaia (E)DR3 astrometry and photometry.

thumbnail Fig. 5

Unfiltered Kiel diagrams of the full Gaia DR3 XP sample, in four broad bins of observed G magnitude.

5 Results

5.1 Metallicity maps

Figures 7 and 8 show examples of metallicity maps that can be created from our dataset. The left panel of Fig. 7 shows a face-on map of the Milky Way and its surroundings from afar (box size: 100 kpc × 100 kpc), based on distances from Anders et al. (2022) when available, otherwise from Bailer-Jones et al. (2021). The right panel of the figure shows the same map (but colour coded by the median xgbdist_met_mean per pixel) demonstrating that it is possible to see some astrophysical signatures in the data (e.g. we appreciate the expected metallicity difference between the Large and Small Magellanic Clouds), while others have not been recovered (e.g. a negative metallicity gradient in the MW halo; e.g. Castellani et al. 1983; Carney et al. 1990; Tunçel Güçtekin et al. 2019, but see also Monachesi et al. 2016; Das & Binney 2016; Conroy et al. 2019; Vickers et al. 2021). We suggest that this is partly due to uncertain distance estimates and partly due to the poor performance of the xgbdist_met estimator in the low-metallicity regime, which can be mitigated by using calibrated [Fe/H] estimates (see Appendix C).

In Fig. 8, we show an edge-on view of the Milky Way disc (RGal vs. ZGal), using only red-clump stars with small xgbdist_met_std. The median metallicity map (bottom panel) displays the expected signatures of the radial and vertical metallicity gradients, covering a much larger range of the disc with good statistics than currently possible with high- or medium-resolution spectroscopic surveys. A detailed quantitative comparison is out of the scope of the present paper, but the plot clearly reaffirms that the information content of the low-resolution Gaia XP spectra should not be underestimated (see also e.g. Andrae et al. 2023b; Lucey et al. 2023; Weiler et al. 2023; Li et al. 2024).

thumbnail Fig. 6

Extinction-corrected colour-magnitude diagrams. Top row: Full Gaia DR3 XP sample (before filtering results with large uncertainties). Top left: 2D histogram. Top middle: colour-coded by median metallicity per bin. Top right: Colour-coded by median metallicity uncertainty per bin. Middle row: Same as top row, but flag-cleaned (using xgb_av_outputflag, xgb_logteff_outputflag, and xgb_met_outputflag). Bottom row: white-dwarf portion of the colour-magnitude diagram (filtered by log g > 7). Bottom left: 2D histogram. Bottom middle: mass colour-coded by median mass. Bottom right: mass colour-coded by mass uncertainty.

5.2 Hot stars

One of the strong points of our method lies in the diversity of the training set. Since we explicitly included hot stars in our training set, the stellar-parameter predictions for these stars are significantly better than in Anders et al. (2022) or Andrae et al. (2023a). As an example, we show in Fig. 9, a map of about 375 000 B star candidates selected only based on their effective temperature and surface gravity. This map extends further than most previous maps of hot stars (e.g. Zari et al. 2021; Pantaleoni González et al. 2021) and provides fully consistent results with the Gaia-derived maps of young upper main-sequence stars (UMS, Poggio et al. 2021; Gaia Collaboration 2023a).

Figure 9 demonstrates that even without a dedicated effort to determine better distances for these stars (which would certainly be necessary for further science exploitation), all known major over- and under-densities have been recovered (the plot can be compared to e.g. Fig. 11 in Zari et al. 2021 or Fig. 5 in Pantaleoni González et al. 2021). In addition, the new over-densities in the Milky Way’s fourth quadrant seen in the Gaia EDR3 maps of Poggio et al. (2021) (at (−9.5, −2) or at (−9, −3)) have been confirmed. As pointed out in Gaia Collaboration (2023a), they also coincide with over-densities of young open clusters seen in recent catalogues (Cantat-Gaudin et al. 2020; Hunt & Reffert 2023).

The hot star map shown in Fig. 9 also highlights two under-densities (dashed ellipses) that appear in many different samples of young stellar tracers (star clusters, O stars, upper main-sequence stars; e.g. Castro-Ginard et al. 2019)7. In both cases, we find a significant number of sources that have been detected in the same line of sight but further away; in other words, they have not been produced by extinction cones. Recent 3D extinction maps (e.g. Vergely et al. 2022, Fig. 11) also indicate that there is little absorption in the respective areas. The sample of IPHAS-selected A stars investigated by Ardèvol et al. (2023) is limited to longitudes 30 deg < l < 215 deg; thus, it does not reach this region.

Comparing Fig. 9 to the map of Gaia EDR3 upper main-sequence stars (Poggio et al. 2021, see Fig. 1), we find that both these two over-densities and the under-density near Vela can also be sensed, albeit close to the limit of their sample. This is a further point indication that these features are not biases of this new sample. We argue that the mapping of stellar under-densities should be just as important as mapping their overdensities. We see more and more evidence for lumpy or flocculent structure in the solar neighbourhood, as opposed to a smooth distribution with well-defined spiral arms (e.g. Shetty & Ostriker 2008; Dobbs et al. 2011; Dobbs & Baba 2014; Castro-Ginard et al. 2021).

thumbnail Fig. 7

Large-scale Cartesian map (box size: 100 × 100 kpc2) of the Gaia DR3 XP sample. Left panel: density map. Right panel: median metallicity map of the same volume (integrating over all ZGal), using all 204 million stars with σ[M/H] < 0.3 dex and only showing bins containing more than 3 stars. Some salient features of the map are annotated.

thumbnail Fig. 8

Distribution of red-clump stars with Gaia DR3 XP spectra in the Galactic disc. Top panel: Density distribution in cylindrical Galactocentric coordinates. Bottom panel: Median metallicity map of 7.5 million red-clump stars with σ[M/H] < 0.2 dex.

thumbnail Fig. 9

Spatial distribution of a Gaia DR3 XP sample of 376 321 B star candidates (selected as 4.0 < xgbdist_logteff_mean < 4.5 & xgbdist_logg_mean < 6 &xgbdist_logteff_std < 0.1 & xgbdist_logg_std < 0.5) . Known over-densities corresponding to OB associations are annotated. The dotted ellipses correspond to (potential) star formation voids.

5.3 Extinction maps

Figure 10 shows an example of how our results perform in terms of extinction. The four panels correspond to consecutive distance bins of a 30 deg × 30 deg region around the Orion nebula; each panel shows a differential extinction map (i.e. the amount of extinction that is added in the respective distance range). We thus observe how the extinction from the molecular dust clouds gradually fills the Galactic plane with increasing distance. This effect is in principle already observable in the median StarHorse Gaia DR2 extinction maps of the Orion region shown in Fig. 11 of Anders et al. (2019), but now the scatter in extinction per star (and thus also per pixel) is lower and the amount of unphysical artefacts has diminished.

It is thus possible to infer the 3D dust distribution almost by eye from sequential 2D slices such as those shown in Fig. 10. In the case of farther regions and/or greater statistical noise, however, the inversion problem of inferring the 3D dust distribution in the extended solar neighbourhood from photo-astrometric data requires significant conceptional and computational effort (Lallement et al. 2014, 2019, 2022; Sale & Magorrian 2018; Green et al. 2019; Leike et al. 2020; Rezaei Kh. et al. 2020, e.g.).

thumbnail Fig. 10

Differential ∆AV extinction maps for the Orion region (30 deg × 30 deg, d < 1.5 kpc; left) in four bins of distance, as indicated in each panel. Some artefacts from the Gaia scanning law are visible in these maps, but the distance-sliced AV maps are generally complete down to AV ≳ 6 mag, with a high resolution, and correspond very well to existing 3D dust maps.

5.4 Metal-poor stars

The plethora of recent works providing improved stellar metal-licity estimates from Gaia XP spectra (e.g. Andrae et al. 2023b; Zhang et al. 2023; Xylakis-Dornbusch et al. 2024; Yao et al. 2024) is (to some degree) driven by the need for reliable input catalogues of ongoing and upcoming surveys targeting metal-poor and very metal-poor stars (e.g. Christlieb et al. 2019; Conroy et al. 2019; Cioni et al. 2019; Chiappini et al. 2019; Arentsen et al. 2020a).

Pre-Gaia DR3 surveys of metal-poor stars, such as SkyMap-per (Keller et al. 2007), J-PAS/J-PLUS (Marín-Franch et al. 2012), Pristine (Starkenburg et al. 2017), or PIGS (Arentsen et al. 2020b), have fundamentally relied on the combination of narrow- and intermediate-band photometry to select candidates (e.g. Frebel & Norris 2015; Youakim et al. 2017; Whitten et al. 2019; Chiti et al. 2021; Galarza et al. 2022). The high signal-to-noise ratios of the Gaia XP spectra make it possible to construct synthetic narrow-band filters from these spectra, for example using the gaia-xpy package (Ruz-Mieres 2022), without the need for costly narrow-band observations with ground-based telescopes.

In this work, we have used the Gaia DR3 XP spectra in their native form (using the coefficients of the Hermite basis functions that describe the spectra; Carrasco et al. 2021). Other works, such as Andrae et al. (2023b), have also used the synthetic narrow-band magnitudes and focussed on a narrower portion of the Hertzsprung-Russell diagram. This enabled their study to achieve a higher formal precision in the obtained metallicities (see Appendix A.3).

The comparison with open and globular clusters in Appendix C shows that our xgb_met global metallicity estimates are systematically overestimated for low metallicities (see also Fig. 2 and Appendix A.3). However, these estimates are still useful to select a pure sample of low-metallicity candidates to be followed up with spectroscopic surveys. Figure 11 shows an example selection of bona-fide metal-poor stars (selected as xgbdist_met_mean + xgbdist_met_std < −1) closer than 5 kpc to the Galactic plane. A comparison with Fig. C.1 shows that while this selection is not optimised for very or extremely metal-poor stars (see Xylakis-Dornbusch et al. 2024 or Yao et al. 2024 for dedicated attempts), it does result in a disc sample that has a purity of metal-poor stars (i.e. spectroscopic [M/H] < −1) greater than 90%. The metallicity range −2 ≲ [M/H] ≲ −1 is in itself a very interesting regime that provides crucial hints to deciphering the early evolution and merger history of the Milky Way’s disc (e.g. Nepal et al. 2024b). The selection shown in Fig. 11 was therefore chosen as the base sample for the 4MIDABLE-LR (Chiappini et al. 2019) sub-survey on metal-poor disc stars, which will commence observations as part of 4MOST project (de Jong et al. 2019) in 2025 (de Jong et al. 2022).

thumbnail Fig. 11

ZGal vs. RGal map of low-metallicity candidate stars (defined as stars that have xgbdist_met_mean < −1 and xgb_met_outputflag== “0”). There are 766k stars in the region covered by the map (943k in total). The white dashed ellipse marks the typical extent of a classical bulge region.

thumbnail Fig. 12

Galactic maps of super-metal-rich stars (xgbdist_met > +0.5). The lower panel shows a top-down view of the Galaxy, marking the solar position (dotted lines), the approximate extent of the Galactic bar (ellipse), and the bulge and bar region studied by Queiroz et al. (2021).

5.5 Super-metal-rich stars

As shown in Appendix A, our metallicity estimates are most reliable in the solar and super-solar regime. An immediate application is thus the mapping of super-metal-rich (SMR) stars. Figure 12 shows the distribution of stars with xgbdist_met > +0.5 in the Galaxy, highlighting that most of these stars are located in the Galactic bar and bulge region (which is known to host a rich reservoir of such stars; see e.g. Queiroz et al. 2021; Rix et al. 2024). It also shows a few artefacts that are likely to be related to erroneous distances (in particular, the ring-like structure at ~l kpc from the Sun; see also Anders et al. 2019).

Figure 13 demonstrates another possible application of our catalogue: the slicing of the XP sample into metallicity bins, which allows for statistical stellar-population studies that are otherwise difficult to carry out. In particular, Fig. 13 shows the distribution of orbital guiding radii (calculated assuming a flat rotation curve, roughly equivalent to angular momentum) in bins of metallicity between −0.5 and +0.5 for over 7 million stars in the disc (|ZGal| < 1 kpc and heliocentric distances ≲4 kpc). The most obvious observation is of course the gradual inward shift of the Rguide distributions with growing metallicity, caused by the disc’s negative radial abundance gradient (e.g. Janes 1979; Grenon 1987; Anders et al. 2014; Luck 2018). We also observe a second effect: the super-solar metallicity stars (xgb_met> +0.1) show peaks in the Rguide distribution similar to the bi-modality, which was found and discussed in Nepal et al. (2024a) using the much smaller Gaia DR3 RVS CNN dataset (Guiglion et al. 2024, containing ~ 169 000 MSTO stars). The XP sample, thanks to its larger statistics, now allows us to also confirm another signature that was already hinted at by the data shown in Nepal et al. (2024a): There is an abrupt scarcity of metal-rich stars (xgb_met> +0.1) beyond ~9.2 kpc (see also Khoperskov & Gerhard 2022). It has been suggested that the Galactic bar’s outer Lindblad resonance (OLR) may limit the migration of stars from the inner galaxy beyond this location (see e.g. Halle et al. 2015; Khoperskov et al. 2020). We therefore suggest that this break in density at ~9.2 kpc could mark the position of the bar’s OLR, which provides a natural barrier for the radial migration of stars on cold orbits.

These trends displayed by the metal-rich stars, previously shown on the basis of smaller samples, is impossible to explain only by selection biases and hint at a major role of the galactic bar in the chemo-dynamical evolution of the disc. In a future work (in preparation), we aim to use the SMR stars from the SHBoost catalogue to constrain the bar properties.

thumbnail Fig. 13

Guiding-radius distributions of the XP sample with radial velocities from Gaia RVS, in bins of xgbdist_met. The curves of SMR stars are highlighted as thicker lines. The dotted vertical line at Rguide highlights the point after which the density of metal-rich stars reaches a floor, which might possibly be related to the outer Lindblad resonance of the Galactic bar.

6 Conclusions and lessons for future work

In this paper, we present the SHBoost catalogue, derived from a set of xgboost regression models that determine stellar parameters and line-of-sight extinctions for more than 217 million stars with Gaia DR3 XP spectra. Through the curation of a high-quality training set (mostly coming from the spectroscopic survey data compiled in Queiroz et al. 2023), the code yields results that are competitive with its classical isochrone-fitting counter-part, the StarHorse Gaia EDR3 data presented in Anders et al. (2022). For reference, Table 2 summarises the stellar-parameter, distance, and extinction catalogues available from our group.

Our new SHBoost catalogue comes with reliable uncertainties estimated via xgboost-distributions. We achieve median (best 25% percentile in brackets) precisions of 0.20 (0.11) mag in line-of-sight extinction, AV, 0.014 (0.010) dex for effective temperature, logo Teff (or 174 [119] K in Teff), 0.20 (0.13) mag in surface gravity, log g, 0.18 (0.14) dex in metallicty [М/Н], and 12% (8%) in mass. The validation of the individual uncertainty estimates shows that they are correctly estimated. In addition, our xgboost models can be interpreted by using SHAP values (see Sect. 3.3 and Fig. 4).

We demonstrate that the combination of training data from several different surveys produces satisfactory results in parameter regimes that are traditionally not well covered by large-scale spectroscopic surveys (e.g. white dwarfs, OB stars, sdO stars) and other catalogues based on Gaia XP spectra. Other available stellar-parameter catalogues derived from XP spectra mostly rely on APOGEE data (e.g. Andrae et al. 2023b; Zhang et al. 2023; Fallows & Sanders 2024).

The whole inference pipeline (building the training set, training of the xgboost regression of stellar parameters, postprocessing) could be run in 2 days on a 48-core machine at the Leibniz-Institut für Astrophysik Potsdam. This is a significant improvement in computational speed compared to the latest StarHorse EDR3 run, which took roughly three weeks to complete on a similar machine. We emphasise, however, that the need for classical Bayesian inference (at least for the construction of large robust training sets) will persist in the future.

In Sect. 5, we showcase several potential applications of our new catalogue, including differential extinction maps, metal-licity trends in the Milky Way, and extended maps of young massive stars, metal-poor stars, and metal-rich stars. In particular, our hot star map reveals a new stellar void adjacent to the Vela region, suggesting that more such inhomogeneities are hiding in the extended solar neighbourhood.

The distribution of metal-poor stars shows homogeneity around the Sun and a spherical over-density in the Galactic centre, corroborating previous findings of an old bulge population distinct from the boxy peanut bulge (Barbuy et al. 2018; Zoccali 2019; Rix et al. 2022; Ardern-Arentsen et al. 2024). In addition, mapping metal-rich stars across the Galaxy reveals a significant reservoir of these stars in the inner regions (Queiroz et al. 2021; Rix et al. 2024). The large number of metal-rich stars in our catalogue provides new opportunities to study the formation and evolution of the inner disk and bar structure. As recently found in Nepal et al. (2024a), but now with a much larger number of stars, we have identified a bi-modality in the guiding radius distribution of super-metal-rich stars and a scarcity of such stars beyond a Galactocentric distance of ~9.2 kpc, which may be related to the outer Lindblad resonance of the Galactic bar.

In view of Gaia DR4, which will contain a much greater data volume (including many millions of previously unavailable XP and RVS spectra), we foresee that convolutional neural-network approaches will likely prevail over future tree-based implementations. This is supported by recent developments, which allow for similar levels of performance in terms of precision and computational cost, while naturally allowing for the inclusion of uncertain and correlated output labels (e.g. Fallows & Sanders 2024).

Table 2

Stellar-parameter catalogues produced by the StarHorse group over the past years.

Data availability

Table B.1 provides the data model for the provided SHBoost output tables. It is available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr (130.79.128.5) or via https://cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/691/A98. Alternatively, the data can also be accessed via a dedicated web page (https://data.aip.de/projects/shboost2024.html), where we provide detailed instructions and python notebooks that illustrate how to download and use the catalogue efficiently: data.aip.de/projects/shboost2024.html.

Acknowledgements

This work was partially funded by the Spanish MICIN/AEI/10.13039/501100011033 and by the “ERDF A way of making Europe” funds by the European Union through grants RTI2018-095076-B-C21, PID2021-122842OB-C21, PID2021-125451NA-I00, and CNS2022-135232, and the Institute of Cosmos Sciences University of Barcelona (ICCUB, Unidad de Excelencia ‘Maria de Maeztu’) through grant CEX2019-000918-M. FA acknowledges financial support from MCIN/AEI/10.13039/501100011033 through grants IJC2019-04862-I and RYC2021-031638-I (the latter co-funded by European Union NextGenerationEU/PRTR). GG acknowledges support by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project-IDs: eBer-22-59652 (GU 2240/1-1 “Galactic Archaeology with Convolutional Neural-Networks: Realising the potential of Gaia and 4MOST”). This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 949173). This work made use of the following software packages: astropy (Astropy Collaboration 2013, 2018, 2022), Jupyter (Perez & Granger 2007; Kluyver et al. 2016), matplotlib (Hunter 2007), numpy (Harris et al. 2020), pandas (Wes McKinney 2010; The pandas development team 2023), python (Van Rossum & Drake 2009), scipy (Virtanen et al. 2020; Gommers et al. 2024), astroquery (Ginsburg et al. 2019; Ginsburg et al. 2024), Cython (Behnel et al. 2011), h5py (Collette 2013; Collette et al. 2023), scikit-learn (Pedregosa et al. 2011; Buitinck et al. 2013; Grisel et al. 2024) and seaborn (Waskom 2021). This research has made use of NASA’s Astrophysics Data System. Some of the results in this paper have been derived using healpy and the HEALPix package8 (Zonca et al. 2019; Górski et al. 2005; Zonca et al. 2024). Software citation information were aggregated using The Software Citation Station (Wagg & Broekgaarden 2024a; Wagg & Broekgaarden 2024b). This work has made use of data from the European Space Agency (ESA) mission Gaia (http://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, http://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement. Funding for the SDSS Brazilian Participation Group has been provided by the Ministério de Ciência e Tecnologia (MCT), Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and Financiadora de Estudos e Projetos (FINEP). Funding for the Sloan Digital Sky Survey IV has been provided by the Alfred P. Sloan Foundation, the U.S. Department of Energy Office of Science, and the Participating Institutions. SDSS-IV acknowledges support and resources from the Center for High-Performance Computing at the University of Utah. The SDSS web site is www.sdss.org. SDSS-IV is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS Collaboration including the Brazilian Participation Group, the Carnegie Institution for Science, Carnegie Mellon University, the Chilean Participation Group, the French Participation Group, Harvard-Smithsonian Center for Astrophysics, Instituto de Astrofísica de Canarias, The Johns Hopkins University, Kavli Institute for the Physics and Mathematics of the Universe (IPMU) / University of Tokyo, Lawrence Berkeley National Laboratory, Leibniz-Institut für Astrophysik Potsdam (AIP), Max-Planck-Institut für Astronomie (MPIA Heidelberg), Max-Planck-Institut für Astrophysik (MPA Garching), Max-Planck-Institut für Extraterrestrische Physik (MPE), National Astronomical Observatory of China, New Mexico State University, New York University, University of Notre Dame, Observatário Nacional / MCTI, The Ohio State University, Pennsylvania State University, Shanghai Astronomical Observatory, United Kingdom Participation Group, Universidad Nacional Autónoma de México, University of Arizona, University of Colorado Boulder, University of Oxford, University of Portsmouth, University of Utah, University of Virginia, University of Washington, University of Wisconsin, Vanderbilt University, and Yale University. Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences. Funding for RAVE has been provided by: the Australian Astronomical Observatory; the Leibniz-Institut für Astrophysik Potsdam (AIP); the Australian National University; the Australian Research Council; the French National Research Agency; the German Research Foundation (SPP 1177 and SFB 881); the European Research Council (ERC-StG 240271 Galactica); the Istituto Nazionale di Astrofisica at Padova; The Johns Hopkins University; the National Science Foundation of the USA (AST-0908326); the W. M. Keck foundation; the Macquarie University; the Netherlands Research School for Astronomy; the Natural Sciences and Engineering Research Council of Canada; the Slovenian Research Agency; the Swiss National Science Foundation; the Science & Technology Facilities Council of the UK; Opticon; Strasbourg Observatory; and the Universities of Groningen, Heidelberg and Sydney. The RAVE web site is at https://www.rave-survey.org. This work has also made use of data from Gaia-ESO based on data products from observations made with ESO Telescopes at the La Silla Paranal Observatory under programme ID 188.B-3002.

Appendix A Comparison to previous catalogues

An exhaustive comparison to similar efforts in the literature is beyond the scope of this paper; the comparisons shown in this appendix serve as illustrations of the magnitude of systematic differences one can expect between the various stellar-parameter catalogues based on Gaia DR3 that have become available in the past four years.

A.1 Gaia DR3 RVS CNN catalogue (Guiglion et al. 2024)

In Fig. A.1 we compare the SHBoost parameterisation with the Gaia RVS atmospheric parameters derived by Guiglion et al. (2024) based on a hybrid Convolutional Neural-Network (CNN). In their study, Guiglion et al. (2024) combined the full Gaia DR3 data products in the form of Gaia DR3 RVS spectra, parallaxes, G, GBP, and GRP magnitudes, as well as XP coefficients. This unique combination provides precise atmospheric parameters and robust [α/M] estimates even for low signal-to-noise RVS spectra (down to S/NRVS = 15), for about 800000 stars. The authors characterised the [α/M] vs [M/H] bi-modality from the inner to the outer disc, for the first time with Gaia RVS data. The CNN was trained on a high-quality training sample of ~40 000 stars in common between the Gaia DR3 RVS sample and APOGEE DR17. We cross-matched the RVS-CNN catalogue of Guiglion et al. (2024) with SHBoost catalogue. After the recommended quality cuts for the RVS-CNN catalogue (see Guiglion et al. (2024) for details) and requiring xgb_inputflag = = ‘XP5pGBPRPJHKsW1W2W3W4’ for the SHBoost catalogue, we obtained a total of 655 950 stars.

As can be appreciated in Fig. A.1, we obtain very similar results to Guiglion et al. (2024) for the three stellar parameters in common (Teff, log 𝑔, and [M/H]), modulo some small systematic trends. This is reassuring, especially since Guiglion et al. (2024) used also the Gaia RVS spectra (with spectral resolution R ~ 11000), which add much more precise information on metallicity than can be obtained from the low-resolution XP spectra.

By construction, our allowed range of Teff is larger than in the catalogue of Guiglion et al. (2024), which used APOGEE DR17 as a training set. This explains the deviating behaviour at the edges of the Teff range shown in the left panels of Fig. A.1. The log g comparison (middle panels) shows systematic differences below the 0.2 dex level (mean difference with respect to Guiglion et al. 2024 is +0.08 dex). For metallicity (right panels of Fig. A.1), we find a very good agreement in the disc-like metallicity regime (> −0.6 dex), while at lower metallicities our values are systematically higher than the ones of Guiglion et al. (2024), suggesting that our metallicities are overestimated in this regime (see also the sections below). We also inspected the weak horizontal branch visible in the top right panel of Fig. A.1 and find that for these stars our metallicities agree very well with the high-resolution metallicity scale of APOGEE, suggesting that for this sample the Guiglion et al. (2024) metallicities are underestimated.

A.2 Gaia EDR3 StarHorse (Anders et al. 2022)

In Fig. A.2 we compare our SHBoost results with the results obtained from traditional Bayesian isochrone fitting to photo-astrometric data with StarHorse (Anders et al. 2022). In that paper we published a catalogue of distances, extinctions, and stellar parameters for 362 million Gaia EDR3 stars brighter than G < 18.5 (see Table 2).

We observe that the effective temperature estimates between both works agree well for the vast majority of the data (between 3000 K and 10000 K), albeit with a considerable scatter (MAE of 211 K). At greater Teff, we had already shown in Anders et al. (2022) that the StarHorse EDR3 Teff scale is not very reliable, probably due to the sparse sampling of massive-star isochrones, binarity, and the initial-mass-function prior. We also refer to the discussion of hot luminous stars in Sect. 5.2.

For log 𝑔 and AV (second and forth panel of Fig. A.2, respectively), we see a very good concordance between StarHorse EDR3 and SHBoost, with little global shifts (−0.02 dex and −0.11 mag, respectively) and small scatter around the one-to-one correspondence (MAEs of 0.10 dex and 0.20 mag, respectively). The group of stars with StarHorse log g ≃ 3 – 4 which has xgb_logg around 4.5–5 seems to be composed of genuine dwarf stars, which were affected by uncertain Gaia (E)DR3 parallax measurements, which lead to a StarHorse solution that is compatible for stars in either the dwarf, sub-giant, or giant evolutionary stages.

The greatest discrepancy of our SHBoost catalogue with the StarHorse EDR3 results arises for the metallicities (third column of Fig. A.2). This is generally expected because the StarHorse metallicity estimates are only based on broad-band photometry (in the best cases including 𝑔riz photometry from PanSTARRS-1 or SkyMapper); whereas the SHBoost results use the low-resolution Gaia XP spectra, which contain much more precise information about both narrow and broad absorption features (Weiler et al. 2023). The large scatter with respect to the StarHorse EDR3 metallicity scale is thus fully anticipated. The systematic trend at low metallicity, however, is less intuitive. Since our SHBoost metallicities agree well with the ones obtained by Guiglion et al. (2024), as shown in Sect. A.1, we suggest that the StarHorse EDR3 photo-astrometric metallicities should be used with great caution, and not on an individual-star level.

A.3 Gaia DR3 XP spectra + xgboost: Andrae et al. (2023b)

The first published stellar parameters derived from Gaia XP spectra using xgboost regression were delivered by Andrae et al. (2023b). These authors also included as input columns, in addition to the XP coefficients, synthetic narrow-band photometry derived from the XP spectra, which allowed them to obtain more precise metallicity estimates.

In Fig. A.3 we compare our results to the catalogue of Andrae et al. (2023b). Some of the patterns in the Teff and log g comparisons (left and middle column, respectively) are similar to the ones seen in Figs. A.1 and A.2. The larger range of effective temperatures in our training set allows for a better coverage of especially upper main-sequence stars. The diagonal branch in the lower left panel of Fig. A.3 is due to a set of stars potentially misclassified in the Andrae et al. (2023b) catalogue. The log g comparison in the middle panels of Fig. A.3 shows little dispersion (MAE: 0.11 dex) and only small offsets (mean: +0.03 dex). We also see similar trends of the log 𝑔 systematics with respect to both the Andrae et al. (2023b) and Anders et al. (2022) catalogues.

The most discrepant results with respect to Andrae et al. (2023b) arise for metallicity. On the one hand, this is due to the different metallicity definitions used (Andrae et al. 2023b trained on the APOGEE DR17 [Fe/H] labels, while we use [M/H], which results in significant offsets for [α/Fe]-enhanced populations; see also Sect. C). On the other hand, the training set of Andrae et al. (2023b) itself is much smaller, but less heterogeneous than ours, resulting in better performance at least for G < 16, which is the range well covered by their training data. In addition, it appears that their approach to include also synthetic photometry in the training data further improved the precision. In principle, xgboost is able to fit also complex non-linear dependencies of multiple input quantities accurately, but the work of Andrae et al. (2023b) shows that in particular for metallicity it does help to include more explicit information about metallicity-sensitive features (such as the synthetic Pristine CaHK filter magnitude) in the training data.

thumbnail Fig. A.1

Comparison to the stellar-parameter catalogue of Guiglion et al. (2024), based on Gaia DR3 data (especially RVS spectra, but also XP spectra, parallaxes, and broad-band photometry). Their catalogue was obtained with a convolutional neural network; the labels were trained on APOGEE DR17.

thumbnail Fig. A.2

Comparison of the SHBoost parameters with the StarHorse Gaia EDR3 parameters (Anders et al. 2022). Top panels: One-to-one comparisons of effective temperature, surface gravity, metallicity, and extinction (from left to right). Bottom panels: Residuals.

thumbnail Fig. A.3

Comparison to the stellarparameter catalogue of Andrae et al. (2023b), trained mostly on APOGEE DR17.

Appendix B Data model

Table B.1 provides the data model for the provided SHBoost output tables, available at the CDS and the AIP web page9, or through its DOI10.

Table B.1

Data model of the Gaia DR3 SHBoost catalogue.

Appendix C Empirical a posteriori metallicity calibration

When comparing the metallicity estimates obtained with xgboost to a large sample of open and globular clusters (see Figs. C.1 and C.2), we find important systematic offsets that scale both with metallicity itself, but also with effective temperature and surface gravity. Since our raw xgb_met results correspond to [M/H] rather than [Fe/H], part of the trend with [Fe/H] (left panels of Fig. C.1) is partly expected due to α-enhancement at low metallicities. In addition, overestimated metallicities for globular clusters have been found to be common even in spectroscopic surveys (Soubiran et al. 2022). However, we also find important residual trends with stellar parameters (see left panels of Fig. C.2). We therefore decided to derive cluster-calibrated estimates of [Fe/H] for the full dataset, based on a third-order polynomial fit of {xgb_met, xgb_logteff, xgb_logg, phot_g_mean_mag} to the residuals.

We cross-matched our Gaia DR3 XP sample with the recent star-cluster membership catalogue of Hunt & Reffert (2023), which was in turn cross-matched with the [Fe/H] compilations of Joshi et al. (2024) and Harris (2010), resulting in a sample of 61660 stars in 202 open clusters and 9687 stars in globular clusters (blue and orange symbols in Fig. C.1, respectively). We find that two separate trends are discernible for stars with xgb_met > −0.6 and ≤ −0.6, so these regimes were treated separately. The results of our [Fe/H] calibration are shown in the right panels of Figs. C.1 and C.2, demonstrating that it is possible to (at least partly) calibrate out the large biases in xgb_met when compared to literature [Fe/H] measurements. In the case of the open cluster M67 (a.k.a. NGC 2682; top panels of Fig. C.2), for example, the calibrated [Fe/H] values (right panel) show a very small spread and bias with respect to the literature, while the xgb_met values do show significant trends as a function of position in the CMD. In other cases, however, the calibration does not improve the results significantly.

thumbnail Fig. C.1

[Fe/H] calibration based on members of open and globular clusters. Top left panel: One-to-one comparison of the xgb_met values for open and globular cluster members with spectroscopic [Fe/H] measurements from the literature (using Joshi et al. 2024 for open clusters and Harris 2010 for globular clusters). Second row: residuals between xgb_met and literature [Fe/H] (left panel) and feh_calibrated and literature [Fe/H], showing the improvement achieved by our proposed calibration. Top right panel: comparison of the three metallicity distributions (literature, xgb_met, and feh_calibrated) for the cluster sample.

thumbnail Fig. C.2

Effect of the [Fe/H] calibration for a few example open and globular clusters. Each panel shows the observed Gaia DR3 colour-magnitude diagrams (G vs GGRP), colour-coded by either xgb_met (left panels) or feh_calibrated (right panels). The colour code limits are different in each row to highlight the differences with respect to the respective literature [Fe/H]. Top row: The solar-metallicity open cluster M67 (NGC 2682). Middle row: The metal-poor open cluster NGC 2158. Bottom row: Bulge globular cluster NGC 6752.

In this paper, however (in particular in Sects. 5.1 and 5.4), we use only the uncalibrated xgb_met values, keeping in mind that these tend to be significantly overestimated for low-metallicity objects, and slightly underestimated for super-solar metallicities (left panels of Fig. C.1). This is an effect that is commonly seen in machine-learning regression estimates of unbalanced datasets (e.g. Guiglion et al. 2024; see also Appendix A.1).

References

  1. Ambrosch, M., Guiglion, G., Mikolaitis, Š., et al. 2023, A&A, 672, A46 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  2. Anders, F., Chiappini, C., Santiago, B. X., et al. 2014, A&A, 564, A115 [Google Scholar]
  3. Anders, F., Chiappini, C., Santiago, B. X., et al. 2018, A&A, 619, A125 [Google Scholar]
  4. Anders, F., Khalatyan, A., Chiappini, C., et al. 2019, A&A, 628, A94 [Google Scholar]
  5. Anders, F., Khalatyan, A., Queiroz, A. B. A., et al. 2022, A&A, 658, A91 [Google Scholar]
  6. Anders, F., Gispert, P., Ratcliffe, B., et al. 2023a, A&A, 678, A158 [Google Scholar]
  7. Anders, F., Khalatyan, A., Queiroz, A., Nepal, S., & Chiappini, C. 2023b, in Highlights on Spanish Astrophysics XI, 349 [Google Scholar]
  8. Andrae, R., Fouesneau, M., Sordo, R., et al. 2023a, A&A, 674, A27 [Google Scholar]
  9. Andrae, R., Rix, H.-W., & Chandra, V. 2023b, ApJS, 267, 8 [Google Scholar]
  10. Ardern-Arentsen, A., Monari, G., Queiroz, A. B. A., et al. 2024, MNRAS, 530, 3391 [Google Scholar]
  11. Ardèvol, J., Monguió, M., Figueras, F., Romero-Gómez, M., & Carrasco, J. M. 2023, A&A, 678, A111 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  12. Arentsen, A., Starkenburg, E., Martin, N. F., et al. 2020a, MNRAS, 496, 4964 [Google Scholar]
  13. Arentsen, A., Starkenburg, E., Martin, N. F., et al. 2020b, MNRAS, 491, L11 [Google Scholar]
  14. Astropy Collaboration (Robitaille, T. P., et al.) 2013, A&A, 558, A33 [Google Scholar]
  15. Astropy Collaboration (Price-Whelan, A. M., et al.) 2018, AJ, 156, 123 [Google Scholar]
  16. Astropy Collaboration (Price-Whelan, A. M., et al.) 2022, ApJ, 935, 167 [Google Scholar]
  17. Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Demleitner, M., & Andrae, R. 2021, AJ, 161, 147 [Google Scholar]
  18. Barbuy, B., Chiappini, C., & Gerhard, O. 2018, ARA&A, 56, 223 [Google Scholar]
  19. Baron, D. 2019, arXiv e-prints [arXiv:1904.07248] [Google Scholar]
  20. Behnel, S., Bradshaw, R., Citro, C., et al. 2011, Comput. Sci. Eng., 13, 31 [Google Scholar]
  21. Bethapudi, S., & Desai, S. 2018, Astron. Comput., 23, 15 [NASA ADS] [CrossRef] [Google Scholar]
  22. Borisov, V., Leemann, T., Seßler, K., et al. 2021, arXiv e-prints [arXiv:2110.01889] [Google Scholar]
  23. Buitinck, L., Louppe, G., Blondel, M., et al. 2013, in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108 [Google Scholar]
  24. Cantat-Gaudin, T., Anders, F., Castro-Ginard, A., et al. 2020, A&A, 640, A1 [Google Scholar]
  25. Carney, B. W., Aguilar, L., Latham, D. W., & Laird, J. B. 1990, AJ, 99, 201 [NASA ADS] [CrossRef] [Google Scholar]
  26. Carrasco, J. M., Weiler, M., Jordi, C., et al. 2021, A&A, 652, A86 [Google Scholar]
  27. Castellani, V., Maceroni, C., & Tosi, M. 1983, A&A, 128, 64 [NASA ADS] [Google Scholar]
  28. Castro-Ginard, A., Jordi, C., Luri, X., Cantat-Gaudin, T., & Balaguer-Núñez, L. 2019, A&A, 627, A35 [Google Scholar]
  29. Castro-Ginard, A., McMillan, P. J., Luri, X., et al. 2021, A&A, 652, A162 [Google Scholar]
  30. Chen, T., & Guestrin, C. 2016, arXiv e-prints [arXiv:1603.02754] [Google Scholar]
  31. Chiappini, C., Minchev, I., Starkenburg, E., et al. 2019, The Messenger, 175, 30 [NASA ADS] [Google Scholar]
  32. Chiti, A., Frebel, A., Mardini, M. K., et al. 2021, ApJS, 254, 31 [Google Scholar]
  33. Christlieb, N., Battistini, C., Bonifacio, P., et al. 2019, The Messenger, 175, 26 [NASA ADS] [Google Scholar]
  34. Cioni, M. . R. L., Storm, J., Bell, C. P. M., et al. 2019, The Messenger, 175, 54 [Google Scholar]
  35. Ciucă, I., Kawata, D., Miglio, A., Davies, G. R., & Grand, R. J. J. 2021, MNRAS, 503, 2814 [Google Scholar]
  36. Collette, A. 2013, Python and HDF5 (O’Reilly) [Google Scholar]
  37. Collette, A., Kluyver, T., Caswell, T. A., et al. 2023, https://doi.org/10.5281/zenodo.7560547 [Google Scholar]
  38. Conroy, C., Bonaca, A., Cargile, P., et al. 2019, ApJ, 883, 107 [Google Scholar]
  39. Cui, X.-Q., Zhao, Y.-H., Chu, Y.-Q., et al. 2012, Res. Astron. Astrophys., 12, 1197 [Google Scholar]
  40. Culpan, R., Geier, S., Reindl, N., et al. 2022, A&A, 662, A40 [Google Scholar]
  41. Cunha, P. A. C., & Humphrey, A. 2022, A&A, 666, A87 [Google Scholar]
  42. Cutri, R. M., Skrutskie, M. F., van Dyk, S., et al. 2003, 2MASS All Sky Catalog of point sources [Google Scholar]
  43. Cutri, R. M., Wright, E. L., Conrow, T., et al. 2013, Explanatory Supplement to the AllWISE Data Release Products, Tech. rep. [Google Scholar]
  44. Dang, Y., Chen, Z., Li, H., & Shu, H. 2022, Appl. Artif. Intell., 36, 1 [Google Scholar]
  45. Das, P., & Binney, J. 2016, MNRAS, 460, 1725 [Google Scholar]
  46. De Angeli, F., Weiler, M., Montegriffo, P., et al. 2023, A&A, 674, A2 [Google Scholar]
  47. de Jong, R. S., Agertz, O., Berbel, A. A., et al. 2019, The Messenger, 175, 3 [NASA ADS] [Google Scholar]
  48. de Jong, R. S., Bellido-Tirado, O., Brynnel, J. G., et al. 2022, SPIE Conf. Ser., 12184, 1218414 [Google Scholar]
  49. Deng, L.-C., Newberg, H. J., Liu, C., et al. 2012, Res. Astron. Astrophys., 12, 735 [Google Scholar]
  50. De Silva, G. M., Freeman, K. C., Bland-Hawthorn, J., et al. 2015, MNRAS, 449, 2604 [Google Scholar]
  51. Dobbs, C., & Baba, J. 2014, PASA, 31, e035 [Google Scholar]
  52. Dobbs, C. L., Burkert, A., & Pringle, J. E. 2011, MNRAS, 417, 1318 [Google Scholar]
  53. Duan, T., Avati, A., Ding, D. Y., et al. 2019, Thirty-seventh International Conference on Machine Learning 2020, [arXiv:1910.03225] [Google Scholar]
  54. Echeverry, D., Torres, S., Rebassa-Mansergas, A., & Ferrer-Burjachs, A. 2022, A&A, 667, A144 [Google Scholar]
  55. Fallows, C. P., & Sanders, J. L. 2022, MNRAS, 516, 5521 [Google Scholar]
  56. Fallows, C. P., & Sanders, J. L. 2024, MNRAS, 531, 2126 [Google Scholar]
  57. Fluke, C. J., & Jacobs, C. 2020, WIREs Data Mining Knowledge Discov., 10, e1349 [Google Scholar]
  58. Fouesneau, M., Frémat, Y., Andrae, R., et al. 2023, A&A, 674, A28 [Google Scholar]
  59. Frebel, A., & Norris, J. E. 2015, ARA&A, 53, 631 [Google Scholar]
  60. Gaia Collaboration (Prusti, T., et al.) 2016, A&A, 595, A1 [Google Scholar]
  61. Gaia Collaboration (Babusiaux, C., et al.) 2018a, A&A, 616, A10 [Google Scholar]
  62. Gaia Collaboration (Brown, A. G. A., et al.) 2018b, A&A, 616, A1 [Google Scholar]
  63. Gaia Collaboration (Brown, A. G. A., et al.) 2021, A&A, 649, A1 [Google Scholar]
  64. Gaia Collaboration (Drimmel, R., et al.) 2023a, A&A, 674, A37 [Google Scholar]
  65. Gaia Collaboration (Montegriffo, P., et al.) 2023b, A&A, 674, A33 [Google Scholar]
  66. Gaia Collaboration (Vallenari, A., et al.) 2023c, A&A, 674, A1 [Google Scholar]
  67. Galarza, C. A., Daflon, S., Placco, V. M., et al. 2022, A&A, 657, A35 [Google Scholar]
  68. Gavel, A., Andrae, R., Fouesneau, M., Korn, A. J., & Sordo, R. 2021, A&A, 656, A93 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  69. Gentile Fusillo, N. P., Tremblay, P. E., Cukanovaite, E., et al. 2021, MNRAS, 508, 3877 [Google Scholar]
  70. Gilmore, G., Randich, S., Asplund, M., et al. 2012, The Messenger, 147, 25 [NASA ADS] [Google Scholar]
  71. Gilmore, G., Randich, S., Worley, C. C., et al. 2022, A&A, 666, A120 [Google Scholar]
  72. Ginsburg, A., Sipocz, B. M., Brasseur, C. E., et al. 2019, AJ, 157, 98 [Google Scholar]
  73. Ginsburg, A., Sipo?cz, B., Brasseur, C. E., et al. 2024, https://doi.org/10.5281/zenodo.10799414 [Google Scholar]
  74. Gommers, R., Virtanen, P., Haberland, M., et al. 2024, https://doi.org/10.5281/zenodo.10909890 [Google Scholar]
  75. Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
  76. Green, G. M., Schlafly, E., Zucker, C., Speagle, J. S., & Finkbeiner, D. 2019, ApJ, 887, 93 [Google Scholar]
  77. Grenon, M. 1987, J. Astrophys. Astron., 8, 123 [Google Scholar]
  78. Grinsztajn, L., Oyallon, E., & Varoquaux, G. 2022, arXiv e-prints [arXiv:2207.08815] [Google Scholar]
  79. Grisel, O., Mueller, A., Lars, et al. 2024, https://doi.org/10.5281/zenodo.11237090 [Google Scholar]
  80. Guiglion, G., Matijevic?, G., Queiroz, A. B. A., et al. 2020, A&A, 644, A168 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  81. Guiglion, G., Nepal, S., Chiappini, C., et al. 2024, A&A, 682, A9 [Google Scholar]
  82. Halle, A., Di Matteo, P., Haywood, M., & Combes, F. 2015, A&A, 578, A58 [Google Scholar]
  83. Harris, W. E. 2010, arXiv e-prints [arXiv:1012.3224] [Google Scholar]
  84. Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357 [Google Scholar]
  85. Hattori, K. 2024, AJ, submitted [arXiv:2404.01269] [Google Scholar]
  86. Hayden, M. R., Sharma, S., Bland-Hawthorn, J., et al. 2022, MNRAS, 517, 5325 [Google Scholar]
  87. He, X.-J., Luo, A. L., & Chen, Y.-Q. 2022, MNRAS, 512, 1710 [Google Scholar]
  88. Hunt, E. L., & Reffert, S. 2023, A&A, 673, A114 [Google Scholar]
  89. Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [Google Scholar]
  90. Ivezić, Ž., Connelly, A. J., VanderPlas, J. T., & Gray, A. 2014, Statistics, Data Mining, and Machine Learning in Astronomy [Google Scholar]
  91. Janes, K. A. 1979, ApJS, 39, 135 [Google Scholar]
  92. Jia, Y., Guo, S., Zhu, C., et al. 2023, Res. Astron. Astrophys., 23, 105012 [Google Scholar]
  93. Joshi, Y. C., Deepak, & Malhotra, S. 2024, Front. Astron. Space Sci., 11, 1348321 [Google Scholar]
  94. Keller, S. C., Schmidt, B. P., Bessell, M. S., et al. 2007, PASA, 24, 1 [Google Scholar]
  95. Khoperskov, S., & Gerhard, O. 2022, A&A, 663, A38 [Google Scholar]
  96. Khoperskov, S., Di Matteo, P., Haywood, M., Gómez, A., & Snaith, O. N. 2020, A&A, 638, A144 [Google Scholar]
  97. Klambauer, G., Unterthiner, T., Mayr, A., & Hochreiter, S. 2017, arXiv e-prints [arXiv:1706.02515] [Google Scholar]
  98. Kluyver, T., Ragan-Kelley, B., Pérez, F., et al. 2016, in ELPUB, 87 [Google Scholar]
  99. Lallement, R., Babusiaux, C., Vergely, J. L., et al. 2019, A&A, 625, A135 [Google Scholar]
  100. Lallement, R., Vergely, J. L., Valette, B., et al. 2014, A&A, 561, A91 [Google Scholar]
  101. Lallement, R., Vergely, J. L., Babusiaux, C., & Cox, N. L. J. 2022, A&A, 661, A147 [Google Scholar]
  102. Laroche, A., & Speagle, J. S. 2024, ApJ, submitted [arXiv:2404.07316] [Google Scholar]
  103. Leike, R. H., Glatzle, M., & Enßlin, T. A. 2020, A&A, 639, A138 [Google Scholar]
  104. Li, C., Zhang, Y., Cui, C., et al. 2021, MNRAS, 506, 1651 [Google Scholar]
  105. Li, C., Zhang, Y., Cui, C., et al. 2022a, MNRAS, 509, 2289 [Google Scholar]
  106. Li, H., Aoki, W., Matsuno, T., et al. 2022b, ApJ, 931, 147 [Google Scholar]
  107. Li, J., Wong, K. W. K., Hogg, D. W., Rix, H.-W., & Chandra, V. 2024, ApJS, 272, 2 [NASA ADS] [CrossRef] [Google Scholar]
  108. Lucey, M., Al Kharusi, N., Hawkins, K., et al. 2023, MNRAS, 523, 4049 [Google Scholar]
  109. Luck, R. E. 2018, AJ, 156, 171 [Google Scholar]
  110. Lundberg, S. M. & Lee, S.-I. 2017, in Advances in Neural Information Processing Systems 30, eds. I. Guyon, U. V. Luxburg, S. Bengio, et al. (Curran Associates, Inc.), 4765 [Google Scholar]
  111. Lundberg, S. M., Erion, G., Chen, H., et al. 2020, Nat. Mach. Intell., 2, 2522 [Google Scholar]
  112. Majewski, S. R., Schiavon, R. P., Frinchaboy, P. M., et al. 2017, AJ, 154, 94 [Google Scholar]
  113. Marín-Franch, A., Chueca, S., Moles, M., et al. 2012, SPIE Conf. Ser., 8450, 84503S [Google Scholar]
  114. Monachesi, A., Bell, E. F., Radburn-Smith, D. J., et al. 2016, MNRAS, 457, 1419 [Google Scholar]
  115. Montegriffo, P., De Angeli, F., Andrae, R., et al. 2023, A&A, 674, A3 [Google Scholar]
  116. Nepal, S., Chiappini, C., Guiglion, G., et al. 2024a, A&A, 681, L8 [Google Scholar]
  117. Nepal, S., Chiappini, C., Queiroz, A. B., et al. 2024b, A&A, 688, A167 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  118. Ness, M., Hogg, D. W., Rix, H. W., Ho, A. Y. Q., & Zasowski, G. 2015, ApJ, 808, 16 [Google Scholar]
  119. Pantaleoni González, M., Maíz Apellániz, J., Barbá, R. H., & Reed, B. C. 2021, MNRAS, 504, 2968 [Google Scholar]
  120. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
  121. Perez, F., & Granger, B. E. 2007, Comput. Sci. Eng., 9, 21 [Google Scholar]
  122. Poggio, E., Drimmel, R., Cantat-Gaudin, T., et al. 2021, A&A, 651, A104 [Google Scholar]
  123. Queiroz, A. B. A., Anders, F., Santiago, B. X., et al. 2018, MNRAS, 476, 2556 [Google Scholar]
  124. Queiroz, A. B. A., Anders, F., Chiappini, C., et al. 2020, A&A, 638, A76 [Google Scholar]
  125. Queiroz, A. B. A., Chiappini, C., Perez-Villegas, A., et al. 2021, A&A, 656, A156 [Google Scholar]
  126. Queiroz, A. B. A., Anders, F., Chiappini, C., et al. 2023, A&A, 673, A155 [Google Scholar]
  127. Rezaei Kh., S., Bailer-Jones, C. A. L., Soler, J. D., & Zari, E. 2020, A&A, 643, A151 [EDP Sciences] [Google Scholar]
  128. Rix, H.-W., Chandra, V., Andrae, R., et al. 2022, ApJ, 941, 45 [Google Scholar]
  129. Rix, H.-W., Chandra, V., Zasowski, G., et al. 2024, ApJ, submitted [arXiv:2406.01706] [Google Scholar]
  130. Ruz-Mieres, D. 2022, https://doi.org/10.5281/zenodo.6674521 [Google Scholar]
  131. Rybizki, J., Green, G. M., Rix, H.-W., et al. 2022, MNRAS, 510, 2597 [Google Scholar]
  132. Sale, S. E., & Magorrian, J. 2018, MNRAS, 481, 494 [Google Scholar]
  133. Sen, S., Agarwal, S., Chakraborty, P., & Singh, K. P. 2022, Exp. Astron., 53, 1 [Google Scholar]
  134. Shetty, R., & Ostriker, E. C. 2008, ApJ, 684, 978 [NASA ADS] [CrossRef] [Google Scholar]
  135. Shwartz-Ziv, R., & Armon, A. 2021, arXiv e-prints [arXiv:2106.03253] [Google Scholar]
  136. Soubiran, C., Brouillet, N., & Casamiquela, L. 2022, A&A, 663, A4 [Google Scholar]
  137. Starkenburg, E., Martin, N., Youakim, K., et al. 2017, MNRAS, 471, 2587 [Google Scholar]
  138. Steinmetz, M., Zwitter, T., Siebert, A., et al. 2006, AJ, 132, 1645 [Google Scholar]
  139. Steinmetz, M., Matijevic?, G., Enke, H., et al. 2020, AJ, 160, 82 [NASA ADS] [CrossRef] [Google Scholar]
  140. Suda, T., Katsuta, Y., Yamada, S., et al. 2008, PASJ, 60, 1159 [NASA ADS] [Google Scholar]
  141. The pandas development team, T. 2023, https://doi.org/10.5281/zenodo.10426137 [Google Scholar]
  142. Thomas, G. F., Battaglia, G., Gran, F., et al. 2024, A&A, 690, A54 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  143. Ting, Y.-S., Conroy, C., Rix, H.-W., & Cargile, P. 2019, ApJ, 879, 69 [Google Scholar]
  144. Tolamatti, A., Singh, K. K., & Yadav, K. K. 2023, MNRAS, 523, 5341 [Google Scholar]
  145. Tsantaki, M., Pancino, E., Marrese, P., et al. 2022, A&A, 659, A95 [Google Scholar]
  146. Tunçel Güçtekin, S., Bilir, S., Karaali, S., Plevne, O., & Ak, S. 2019, Adv. Space Res., 63, 1360 [CrossRef] [Google Scholar]
  147. Van Rossum, G. & Drake, F. L. 2009, Python 3 Reference Manual (Scotts Valley, CA: CreateSpace) [Google Scholar]
  148. Vavilova, I., Pakuliak, L., Babyk, I., et al. 2020, in Knowledge Discovery in Big Data from Astronomy and Earth Observation, eds. P. Škoda, & F. Adam, 57 [Google Scholar]
  149. Vergely, J. L., Lallement, R., & Cox, N. L. J. 2022, A&A, 664, A174 [Google Scholar]
  150. Vickers, J. J., Li, Z.-Y., Smith, M. C., & Shen, J. 2021, ApJ, 912, 32 [NASA ADS] [CrossRef] [Google Scholar]
  151. Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nature Methods, 17, 261 [Google Scholar]
  152. Wagg, T., & Broekgaarden, F. 2024a, The Software Citation Station [Google Scholar]
  153. Wagg, T., & Broekgaarden, F. S. 2024b, arXiv e-prints [arXiv:2406.04405] [Google Scholar]
  154. Waskom, M. L. 2021, J. Open Source Softw., 6, 3021 [Google Scholar]
  155. Weiler, M., Carrasco, J. M., Fabricius, C., & Jordi, C. 2023, A&A, 671, A52 [Google Scholar]
  156. Wes McKinney. 2010, in Proceedings of the 9th Python in Science Conference, eds. S. van der Walt, & J. Millman, 56 [Google Scholar]
  157. Whitten, D. D., Placco, V. M., Beers, T. C., et al. 2019, A&A, 622, A182 [Google Scholar]
  158. Witten, C. E. C., Aguado, D. S., Sanders, J. L., et al. 2022, MNRAS, 516, 3254 [Google Scholar]
  159. Xiang, M., Rix, H.-W., Ting, Y.-S., et al. 2022, A&A, 662, A66 [Google Scholar]
  160. Xu, X.-j., Shao, Y., & Li, X.-D. 2024, ApJ, 962, 126 [NASA ADS] [CrossRef] [Google Scholar]
  161. Xylakis-Dornbusch, T., Christlieb, N., Lind, K., & Nordlander, T. 2022, A&A, 666, A58 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  162. Xylakis-Dornbusch, T., Christlieb, N., Hansen, T. T., et al. 2024, A&A, 687, A177 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  163. Yanny, B., Rockosi, C., Newberg, H. J., et al. 2009, AJ, 137, 4377 [Google Scholar]
  164. Yao, Y., Ji, A. P., Koposov, S. E., & Limberg, G. 2024, MNRAS, 527, 10937 [Google Scholar]
  165. Yi, Z., Chen, Z., Pan, J., et al. 2019, ApJ, 887, 241 [NASA ADS] [CrossRef] [Google Scholar]
  166. Yong, D., Da Costa, G. S., Bessell, M. S., et al. 2021, MNRAS, 507, 4102 [Google Scholar]
  167. Youakim, K., Starkenburg, E., Aguado, D. S., et al. 2017, MNRAS, 472, 2963 [Google Scholar]
  168. Zari, E., Rix, H. W., Frankel, N., et al. 2021, A&A, 650, A112 [Google Scholar]
  169. Zhang, X., Green, G. M., & Rix, H.-W. 2023, MNRAS, 524, 1855 [Google Scholar]
  170. Zoccali, M. 2019, Bol. Asoc. Argentina Astron. Plata Argentina, 61, 137 [Google Scholar]
  171. Zonca, A., Singer, L., Lenz, D., et al. 2019, J. Open Source Softw., 4, 1298 [Google Scholar]
  172. Zonca, A., Singer, L., crosset, et al. 2024, https://doi.org/10.5281/zenodo.11337740 [Google Scholar]

1

In principle, it is possible to use narrow- or intermediate-band photometry derived from the Gaia XP spectra to include their information content – however, this transformation adds additional systematic uncertainty, and in the case of StarHorse, the execution time scales with the number of photometric observations included in the inference.

2

Primarily taken from the 2021 version of the SAGA database (http://sagadatabase.jp/?p=43; Suda et al. 2008), selecting Milky-Way stars observed with spectral resolution R > 5000, and adding stars from Yong et al. (2021) and Li et al. (2022b).

All Tables

Table 1

Cleaning conditions applied to each of the training labels.

Table 2

Stellar-parameter catalogues produced by the StarHorse group over the past years.

Table B.1

Data model of the Gaia DR3 SHBoost catalogue.

All Figures

thumbnail Fig. 1

Distance- and extinction-corrected Gaia DR3 CMDs for each of the spectroscopic stellar surveys used in the training and test data for this work. The bottom right panel shows the joint dataset. Note: not all data points were used in the training and testing of the xgboost models; duplicates were removed and different quality cuts applied for each training label.

In the text
thumbnail Fig. 2

Performance of the xgboost models for the test datasets for each of the training labels. In the top row, we show the SHBoost (mean) parameters predicted from Gaia DR3, 2MASS, and AllWISE against the spectroscopic values (test labels). The middle row shows the residuals (predicted: ‘true’). The bottom row shows the formal uncertainties (derived with xgboost-distributions). Each panel contains logarithmic density plots of the full sample of 217 million stars. The lines and shaded regions in the middle and bottom rows show the running median and 1σ quantiles, respectively.

In the text
thumbnail Fig. 3

Uncertainty distributions. Top row: logarithmic histograms of the uncertainties for the training (yellow), test (green), and full Gaia DR3 XP (blue) samples. The median uncertainties are indicated in each panel. Second row: uncertainties as a function of Gaia G magnitude. Bottom row: Uncertainties as a function of distance. In the second and third row, the white lines and shaded bands show the median trends and corresponding 1σ quantiles, respectively.

In the text
thumbnail Fig. 4

Feature importance plots for each output parameter determined with SHapley Additive exPlanation values (SHAP; Lundberg & Lee 2017). In each panel, the SHAP values for 20 000 random stars are aggregated. The rows in each panel are ordered by decreasing feature importance (from top to bottom). Only the 20 most important columns are shown; the abscissa ranges are cut to 99.8% of the SHAP value range for better visibility.

In the text
thumbnail Fig. 5

Unfiltered Kiel diagrams of the full Gaia DR3 XP sample, in four broad bins of observed G magnitude.

In the text
thumbnail Fig. 6

Extinction-corrected colour-magnitude diagrams. Top row: Full Gaia DR3 XP sample (before filtering results with large uncertainties). Top left: 2D histogram. Top middle: colour-coded by median metallicity per bin. Top right: Colour-coded by median metallicity uncertainty per bin. Middle row: Same as top row, but flag-cleaned (using xgb_av_outputflag, xgb_logteff_outputflag, and xgb_met_outputflag). Bottom row: white-dwarf portion of the colour-magnitude diagram (filtered by log g > 7). Bottom left: 2D histogram. Bottom middle: mass colour-coded by median mass. Bottom right: mass colour-coded by mass uncertainty.

In the text
thumbnail Fig. 7

Large-scale Cartesian map (box size: 100 × 100 kpc2) of the Gaia DR3 XP sample. Left panel: density map. Right panel: median metallicity map of the same volume (integrating over all ZGal), using all 204 million stars with σ[M/H] < 0.3 dex and only showing bins containing more than 3 stars. Some salient features of the map are annotated.

In the text
thumbnail Fig. 8

Distribution of red-clump stars with Gaia DR3 XP spectra in the Galactic disc. Top panel: Density distribution in cylindrical Galactocentric coordinates. Bottom panel: Median metallicity map of 7.5 million red-clump stars with σ[M/H] < 0.2 dex.

In the text
thumbnail Fig. 9

Spatial distribution of a Gaia DR3 XP sample of 376 321 B star candidates (selected as 4.0 < xgbdist_logteff_mean < 4.5 & xgbdist_logg_mean < 6 &xgbdist_logteff_std < 0.1 & xgbdist_logg_std < 0.5) . Known over-densities corresponding to OB associations are annotated. The dotted ellipses correspond to (potential) star formation voids.

In the text
thumbnail Fig. 10

Differential ∆AV extinction maps for the Orion region (30 deg × 30 deg, d < 1.5 kpc; left) in four bins of distance, as indicated in each panel. Some artefacts from the Gaia scanning law are visible in these maps, but the distance-sliced AV maps are generally complete down to AV ≳ 6 mag, with a high resolution, and correspond very well to existing 3D dust maps.

In the text
thumbnail Fig. 11

ZGal vs. RGal map of low-metallicity candidate stars (defined as stars that have xgbdist_met_mean < −1 and xgb_met_outputflag== “0”). There are 766k stars in the region covered by the map (943k in total). The white dashed ellipse marks the typical extent of a classical bulge region.

In the text
thumbnail Fig. 12

Galactic maps of super-metal-rich stars (xgbdist_met > +0.5). The lower panel shows a top-down view of the Galaxy, marking the solar position (dotted lines), the approximate extent of the Galactic bar (ellipse), and the bulge and bar region studied by Queiroz et al. (2021).

In the text
thumbnail Fig. 13

Guiding-radius distributions of the XP sample with radial velocities from Gaia RVS, in bins of xgbdist_met. The curves of SMR stars are highlighted as thicker lines. The dotted vertical line at Rguide highlights the point after which the density of metal-rich stars reaches a floor, which might possibly be related to the outer Lindblad resonance of the Galactic bar.

In the text
thumbnail Fig. A.1

Comparison to the stellar-parameter catalogue of Guiglion et al. (2024), based on Gaia DR3 data (especially RVS spectra, but also XP spectra, parallaxes, and broad-band photometry). Their catalogue was obtained with a convolutional neural network; the labels were trained on APOGEE DR17.

In the text
thumbnail Fig. A.2

Comparison of the SHBoost parameters with the StarHorse Gaia EDR3 parameters (Anders et al. 2022). Top panels: One-to-one comparisons of effective temperature, surface gravity, metallicity, and extinction (from left to right). Bottom panels: Residuals.

In the text
thumbnail Fig. A.3

Comparison to the stellarparameter catalogue of Andrae et al. (2023b), trained mostly on APOGEE DR17.

In the text
thumbnail Fig. C.1

[Fe/H] calibration based on members of open and globular clusters. Top left panel: One-to-one comparison of the xgb_met values for open and globular cluster members with spectroscopic [Fe/H] measurements from the literature (using Joshi et al. 2024 for open clusters and Harris 2010 for globular clusters). Second row: residuals between xgb_met and literature [Fe/H] (left panel) and feh_calibrated and literature [Fe/H], showing the improvement achieved by our proposed calibration. Top right panel: comparison of the three metallicity distributions (literature, xgb_met, and feh_calibrated) for the cluster sample.

In the text
thumbnail Fig. C.2

Effect of the [Fe/H] calibration for a few example open and globular clusters. Each panel shows the observed Gaia DR3 colour-magnitude diagrams (G vs GGRP), colour-coded by either xgb_met (left panels) or feh_calibrated (right panels). The colour code limits are different in each row to highlight the differences with respect to the respective literature [Fe/H]. Top row: The solar-metallicity open cluster M67 (NGC 2682). Middle row: The metal-poor open cluster NGC 2158. Bottom row: Bulge globular cluster NGC 6752.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.