Open Access
Issue
A&A
Volume 662, June 2022
Article Number A125
Number of page(s) 22
Section Catalogs and data
DOI https://doi.org/10.1051/0004-6361/202141828
Published online 30 June 2022

© M. Fouesneau et al. 2022

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Open Access funding provided by Max Planck Society.

1. Introduction

Understanding the structure, formation, and evolution of our Galaxy requires a detailed study of its stars from the points of view of both their dynamics and their physical properties. The Gaia satellite’s main technological advance is the accurate determination of parallaxes and proper motions for over one billion stars (Gaia Collaboration 2016). But without knowledge of the stellar properties, the resulting three-dimensional maps and velocity distributions one can derive from these are of limited value to our understanding of our Galaxy. With kinematics and stellar properties, one can find members of streams and stellar populations to model the formation of the Galactic disk (e.g., Mo et al. 1998; Rix & Bovy 2013; Ibata et al. 2019). Gaia is therefore equipped with two low-resolution blue and red photometers (BP and RP) and a high-resolution Radial Velocity Spectrometer (RVS) instrument. These instruments operate over the entire optical range and from 845−872 nm, respectively (see Gaia Collaboration 2016 for the payload description).

The second Gaia data release (hereafter Gaia DR2; Gaia Collaboration 2018) contained 1.33 billion sources with positions, parallaxes, proper motions, and G-band photometry based on 22 months of mission observations. Gaia DR2 also includes the integrated G, BP, and RP fluxes. Andrae et al. (2018) used just data from Gaia DR2 to infer stellar parameters. They provided the first inference of effective temperatures, radii, and luminosities for 160 million stars and line-of-sight extinctions for 88 million of them. This exercise demonstrated that three broad Gaia passbands (G, BP, and RP) contain relatively little information that would enable one to discriminate between effective temperature and interstellar extinction (see Andrae et al. 2018 for details).

Deriving the astrophysical parameters (APs) of individual stars from their integrated photometry is a challenging project. A single set of multiwavelength photometry only contains limited amounts of information: whether limited by signal-to-noise or physical stellar degeneracies, one needs additional measurements and assumptions to distinguish details of apparently similar stars. Previous work and AP catalogs of Gaia sources are available in, for example, McDonald et al. (2017), Stevens et al. (2017), Mints & Hekker (2017, 2018), Bayestar (Green et al. 2019), and StarHorse (Queiroz et al. 2018; Anders et al. 2019). The growing number of these catalogs originates from the difficulty of identifying the perfect set of assumptions. Some may rely on open questions, such as the three-dimensional geometry of the Milky Way or the location of the Galactic Bar. Although the catalogs globally agree, they differ in the details due to diverse and subtle systematic errors. For instance, Bailer-Jones et al. (2018) discussed the significant effects of various prior distributions on estimating distances from parallaxes. But, of course, these effects are more complex when inferring the temperature, gravity, chemical pattern, and other APs simultaneously, as these estimates are conditionally dependent on one another and the distance to that star.

The more stringent the prior assumptions in an analysis, the greater the tensions between the models and the data are (i.e., model discrepancy) since models are not perfect representations of the real stars (e.g., Kennedy & O’Hagan 2001)1 Strains require compromises, which often lead to incorrect or incomplete results (e.g., significant uncertainties or biases). When results replicate our priors, we may wonder if we have learned anything from the data. It is often hard to recognize false results in Gaia scale size catalogs (millions to billions of sources). Our approach in this present work is to relax the commonly used strong assumptions. We do not assume the Milky Way three-dimensional stellar population structures, for instance. We analyze the stars for which Gaia DR2 provides us with Gaia integrated photometry (Riello et al. 2018; Evans et al. 2018) and where all external photometry is available from 2MASS (the Two Micron All-Sky Survey; Skrutskie et al. 2006; Cutri et al. 2003) and AllWISE (extension of the Wide-field Infrared Survey Explorer; Cutri et al. 2014). Where available, we also make use of Gaia DR2 parallaxes (Lindegren et al. 2018). This represents about 120 million stars. We first describe the construction of the photometric input catalog in Sect. 2. In Sect. 3 we detail the modeling of the spectral energy distribution (SED) of stars and the fitting procedure we use to infer stellar parameters (age, mass, temperature, luminosity, gravity, distance, and dust extinction). In Sect. 4, we validate our results against other published results (e.g, benchmark stars and interferometry), and we discuss our results in Sect. 5 before summarizing in Sect. 6.

2. Crossmatched photometric catalog

In the context of determining the APs of stars, one can achieve more reliable estimates by combining data from various surveys than those from an individual one. One could combine multiple photometric surveys, multiple spectroscopic surveys, or both (e.g., Serenelli et al. 2013; Wang et al. 2016; Santiago et al. 2016; McMillan et al. 2018). Stellar parameter estimates will then be more accurate and precise than those derived from any individual survey itself as long as the models are consistent with the data. However, one must be careful since systematics may arise if the data are not fully compatible, for example, source mismatch, selection functions of the surveys, measurement calibration, or contamination.

We constructed a crossmatch catalog by considering the “obvious” all-sky non-Gaia data provided by DPAC2 that span a different wavelength range, that is, 2MASS (Skrutskie et al. 2006; Cutri et al. 2003) and AllWISE (Cutri et al. 2014). For the latter, Wise provides us with four photometric bands spanning wavelengths from 1 to 30 μm. Because the interstellar dust emits at those wavelengths contaminating W3 and W4 photometry, we only included W1 and W2 in our analysis. We initially considered including the Galaxy Evolution Explorer data (GALEX) (Bianchi et al. 2011) in our analysis. As shown in Fig. 1, GALEX would offer an essential advantage to analyze blue and hot stars as well as helping us in inferring dust properties and chemical patterns (e.g., Kaviraj et al. 2007). However, DPAC did not provide us with a crossmatch between Gaia and GALEX. Furthermore, the GALEX catalog possesses complex photometric systematics and a significantly broad point spread function. As a result, constructing a reliable crossmatch is a significant endeavor on its own – for instance, involving forced photometry approaches as in Lang (2014) and Meisner et al. (2017). Our selected photometric bands cover the wavelength range from ∼300 nm to ∼5 μm. Special attention went into the passbands and the various photometric/magnitude systems present across the different surveys (see Sect. 3 for details).

thumbnail Fig. 1.

Photometric filters covering the various compiled data sets (details in Sect. 2) compared with spectra of typical stars: Vega (A0V), a G2V star (Sun-like star), and an M5III star. Bottom panel: Gaia DR2 transmissions of the three Gaia passbands and the selected all-sky survey ones (2MASS and WISE). Top panel: selected spectral templates from Pickles (1998), and we overlaid the Fitzpatrick (1999) dust extinction curve and its variations with R0 for reference. We include GALEX for reference, but we did not include the survey in this work (see Sect. 2).

Typical reported photometric uncertainties for Gaia, 2MASS, and AllWISE are reported in Table 1. At first glance, it may look plausible to have uncertainties of ∼0.1 mag (i.e., a few percent in flux measurements). However, a significant fraction of sources has lower reported uncertainties than 0.05 mag, which is not reasonable if one accounts for external calibration uncertainties (e.g., knowledge of the passbands). When combining photometric surveys, reported uncertainties may not be accounting for all the limits of the calibration. For our analysis, we altered uncertainties to be at least 0.05 mag (max(σ, 0.05); see Table 1).

Table 1.

Typical uncertainties in magnitudes of the (raw) input photometric catalogs: minimum and a few quantiles.

The Gaia DR2 photometry and parallaxes are also not free from systematic errors (Lindegren 2018; Maíz Apellániz & Weiler 2018; Zinn et al. 2019). Various studies (including the DPAC release papers) detail prescriptions to apply to the Gaia data to account in part for these systematics. In the present work, we applied the calibration rules summarized in Table 2. The recalibrated values are also reported in our catalog (see Sect. 5.1).

Table 2.

Prescriptions applied to the input data and references.

We also emphasize that here we consider crossmatch catalogs with very different astrometric accuracies. We are merging information from surveys with varying coverages of wavelengths and magnitude limits. Therefore, where we obtain incorrect matches between surveys, sources will inevitably have corrupted SEDs despite the careful compilation from Marrese et al. (2019). This issue will occur most often in dense stellar regions (e.g., clusters) and regions with high dust content (bulge, star-forming region). In Sect. 3 we describe how we rendered our model more flexible to account for (additional) unknown systematics.

In Sect. 5.1, we describe the catalog content. The final input catalog contains a total of 123 076 271 sources with Gaia parallaxes and all eight photometric bands from Gaia, 2MASS, and AllWISE.

Figure 2 shows the completeness of our catalog with respect to Gaia DR2. We can see that regions of high star density are the most incomplete due to the substantially lower spatial resolution than Gaia of the 2MASS and AllWISE surveys. While including non-Gaia data into the analysis will certainly help improve the results, non-Gaia data are not available for all the stars. One may hope that forced photometry catalogs of 2MASS and AllWISE using Gaia (and possibly GALEX) will become available in the future. Nevertheless, dedicated Gaia-only studies such as Andrae et al. (2018) have merit in their own right. They analyze self-consistent data, report about their limitations, and they set a baseline for further analysis.

thumbnail Fig. 2.

Completeness of our AP catalog with respect to all Gaia DR2 sources. Top panel: completeness and the catalog source count as a function of G magnitude in blue and orange, respectively. Middle and lower panels: completeness over the sky (in Mollweide projection and Galactic coordinates) for the magnitude ranges of the high-completeness and most-sources samples, respectively, as indicated in the top panel with the solid and dashed vertical lines, i.e., for sources with 9 < G < 13 (15 < G < 19).

3. Dust-attenuated stellar models and assumptions

Various authors have described approaches to extract stellar properties using models trained on empirical and synthetic SED data (e.g., support vector machine, forward-modeling, neural-networks). We do not need to reinvestigate these methods. Instead, we first briefly detail our adopted models and our fitting procedure.

We define a model that predicts the dust-attenuated magnitudes in the same photometric bands as our data {Mk}k ∈ [1…K] given a set of AP θ that fully defines a star for our model. This first includes its atmosphere (log10(Teff), log10(L/L),[Fe/H]), temperature, luminosity, and metallicity, as well as age to uniquely map the evolution stage, log10(age/yr). One could rightfully argue that age, (initial) mass, and metallicity should be sufficient to set the stellar SED uniquely (e.g., stellar tracks, isochrones). An (age, mass, [Fe/H]) model grid requires the sampling density variations to span multiple orders of magnitude to represent stellar evolution fairly. The choice of (age, mass, [Fe/H]) generates a suboptimal grid structure to support our accurate interpolation scheme (e.g., Andrae et al., in prep.). Instead, we adopt the atmosphere parameters that directly translate the SEDs and spread the grid points more evenly, but with the expense of an additional intrinsic dimension to uniquely tie the grid points to the stellar evolution. However, we eventually marginalize over the metallicity dimension. We discuss this point in Sect. 4.7. Additionally, we account for the effect of the interstellar medium (ISM) on the observed spectrum of the star through two extinction parameters, (A0,R0). Afterward, we can compute the final photometry with the eight passbands mentioned above3. Our parameters additionally include the distance modulus to the star, μ, and our noise model jitter, log10η.

We detail the various ingredients of this model below along with the prior assumptions we use in our inference.

Evolution models. For this work, we generated an extensive collection of models by combining the PARSEC isochrones (PARSEC 1.2s + Colibri (PR16), Chen et al. 2014; Marigo et al. 2013; Rosenfield et al. 2016) and the ATLAS9 atmosphere models (Castelli & Kurucz 2003, 3500 < Teff < 50 000). We had to recompute the spectra to apply the dust prescription detailed below adequately. We did not include non-local thermal equilibrium (non-LTE) models in this analysis for two reasons: (i) adopting the same atmospheric models allowed us to directly validate our models against the PARSEC predictions and (ii) overall variations of ∼0.01 mag between LTE and non-LTE models on integrated photometry (e.g., Young & Short 2017 across various wavelength ranges. We did not include pre-main-sequence, post thermally pulsating asymptotic giant branch (AGB), or white-dwarf stars in our model grid. The first two evolution phases are intrinsically rare, and the ultraviolet and near-infrared predictions are very model-dependent for all three evolution stages. The final set of evolution grid points covers log(A/yr) between 6.6 and 10.2 dex (0.2 dex step) from and [M/H] between −2 to 0.2 dex (0.2 dex steps). The isochrones define the sampling along the mass dimension to represent all evolution stages regardless of their durations.

Dust prescription. We must account for the fact that photons travel from the star to the observer through the ISM. Dust along the line of sight will absorb or scatter some photons. As a result, the star’s light will be dimmer (similarly to a distance effect) and redder than its intrinsic color. We adopted a model of dust extinction following Fitzpatrick (1999). This prescription depends on A0 and R0, the dust extinction at λ = 550 nm, and the average extinction per color excess unit (R0 = 3.1 is the average value in their model for the Milky Way, also referred to as the average grain size parameter). The literature also refers to these parameters at AV, RV. We adopted the notations from the Gaia Consortium, which aims at avoiding confusion with the effects on integrated photometry. We applied the dust extinction to the nominal spectra for every point from the evolution model while varying the A0 and R0 values. Our grid spans A0 from 0 to 20 mag (0.2 mag step) and R0 from 2.5 to 3.7 (0.3 step). Only then did we compute the photometry of the individual spectra through the relevant passbands. As discussed in detail in, for example, Gordon et al. (2016), one must apply the dust before the filters to avoid nonlinear effects.

Model interpolation scheme. For our application of fitting the integrated magnitudes with model predictions computed from synthetic spectra, we must keep in mind that most of the computation cost comes from the generation of the models. As we also approach this problem with Markov chain Monte Carlo (MCMC) sampling, we need to generate the models on the fly for every source at a given θ. The computation cost using a traditional interpolation scheme (e.g., multi-linear) becomes rapidly expensive for a single source. Instead, our solution resides in “emulating” our stellar models: we replace our grids of SED models with an accurate representation that is fast to evaluate. To do so, we adopted a multivariate adaptive regression with splines (MARS) approach (Friedman 1991)4. This approach is similar to a simplified neural net with rectified linear unit (ReLU) activation functions. This approach is very similar to a simplified neural net with ReLU activation functions. However, there are differences from a neural network approach. First, the model construction uses simple principles: making a local regression model in the stellar model space; each bin in the parameter space has a set of equations to preserve continuity between the bins. Second, the model is rigid by limiting the use of arbitrary functions; thus, overfit is less significant in this approach, and we obtain very mathematically smooth predictions. Finally, the model’s interpretability allows us to validate the model locally and globally rapidly; the final model is a series of extremely fast analytic expressions to evaluate. MARS models are essentially analytic spline equations, which are much faster to evaluate than real-time grid interpolation. The speed performance gain is significant compared with multidimensional interpolations or Support Vector Machine, for instance.

Noise model and bad SEDs. Corrupted SEDs and systematics can affect the photometric measurements (Sect. 2). We include in our likelihood a photometric jitter noise model that allows us to inflate the uncertainties on a source by source basis. More specifically, we define per star η as a common quantity across all the bands that we add in quadrature to all of our photometric uncertainties (on top of the prescriptions in Table 2). For the band k, the probability density of the observed data given the parameters, meaning the likelihood is

P ( m k m , σ k , η ) = 1 2 π ( σ k 2 + η 2 ) exp ( 1 2 ( m k m ) 2 ( σ k 2 + η 2 ) ) , $$ \begin{aligned} \mathcal{P} \left({m_k \mid m, \sigma _k, \eta }\right) = \frac{1}{\sqrt{2\pi (\sigma _k^2 + \eta ^2)}}\exp \left(-\frac{1}{2}\frac{(m_k - m)^2}{(\sigma _k^2 + \eta ^2)}\right), \end{aligned} $$(1)

where (mk, σk) are the magnitude measurements, and m is the true (unknown) magnitude value. The normalization of the Gaussian is the key that ensures η cannot arbitrarily increase: an optimization procedure will compromise between the exponential and the term in front of it.

The jitter term captures issues in our data assuming these are random effects on the magnitudes, in contrast with systematic ones coming from incorrect zero points. Photometric or parallax offsets are fully degenerate with distance, and there is not much we can do in our star-by-star inference, unfortunately.

We did not consider a jitter per survey or even per band. Preliminary tests showed that the individual survey jitter parameters attempt to equalize the signal to noise of each survey, especially affecting the Gaia bands. While this could help in capturing more systematics, it increases the fitting complexity with highly nonlinear terms and reduces the gain from high precision photometry.

Likelihood per source. For every source, we considered the set of the K = 8 photometric measurements {mk}k ∈ [1…8] and the parallax ϖ (all with Gaussian uncertainties σx). Given the model above the log-likelihood of a star given a set of APs θ, a distance modulus μ and jitter η:

2 · ln P ( { m k } k [ 1 8 ] , ϖ θ , η ) = ln ( 2 π σ ϖ 2 ) + 1 σ ϖ 2 ( ϖ 1 mas 1 kpc r ) 2 + k = 1 K [ ln ( 2 π ( σ k 2 + η 2 ) ) + ( m k M k ( θ ) μ ) 2 σ k 2 + η 2 ] · $$ \begin{aligned} -2 \cdot&\ln \mathcal{P} \left({\left\{ m_k\right\} _{k\in [1\ldots 8]}, \varpi \mid \boldsymbol{\theta }, \eta }\right)\\&= \ln ({2\pi \sigma _\varpi ^2}) + \frac{1}{\sigma _\varpi ^2}\left(\frac{\varpi }{1\,\mathrm{mas}} - \frac{1\,\mathrm{kpc}}{r}\right)^2\nonumber \\&\qquad +\sum _{k=1}^{K}\left[\ln \left({2\pi (\sigma _k^2+\eta ^2)}\right) + \frac{\left(m_k - M_k(\boldsymbol{\theta }) - \mu \right)^2}{\sigma _k^2 + \eta ^2}\right]\cdot \nonumber \end{aligned} $$(2)

Priors. The more stringent the prior assumptions in an analysis, the more tensions there will be between the models and the data. In this present work, we avoid detailed assumptions such as the Milky Way three-dimensional stellar population structures. Using less informed priors should not alter the results when the data are “good” – that is, when the (multidimensional) likelihood is sufficiently narrow and the photometry agrees with the parallax. When the likelihood distribution is broader, weak priors do not impose our knowledge on the data, allowing for potential discoveries.

Table 3 summarizes our prior assumptions. We detail these choices below.

Table 3.

Model priors.

First, the dust extinction prior allows for substantial values A0 (up to 20 mag). We also allow slightly negative values to avoid edge effects with the MCMC sampling. R0 is very likely to vary with environment (e.g., Cardelli et al. 1989; Bianchi et al. 1996; Valencic et al. 2004; Gordon et al. 2009) but there is no model of these variations yet. We assume variations to be around the average value of R0 = 3.1 from Fitzpatrick (1999) with a standard deviation of 0.2 (which includes the 2.7 mentioned in the cited paper).

Second, our distance prior is uniform in distance modulus between [ − 5, 19] mag, which corresponds to a range of 1 pc and 70 kpc. The upper distance limit allows us to include the Magellanic Clouds (∼40 − 50 kpc) and some space to account for outliers and spurious parallaxes.

Third, we include isochrones in our model. Therefore, we impose a Hertzsprung-Russell diagram (HRD) in the allowed range of temperatures and luminosities. We also enable the metallicity to vary over the model range from −2 to 0.2 dex. However, we found that we do not constrain the metallicity with our analysis (see Sect. 4.7). This prior is a constant Gaussian on solar value with 0.2 dex dispersion up to 1 Gyr, which then shifts to a mean of −1 dex and 0.5 dex dispersion at 10 Gyr (see right panel Fig. 18). We consider a uniform distribution of logarithmic ages from 6 to 13 dex, bounded by our isochrone selection. However, to avoid over-predicting very massive stars with a lot of extinction, we adopt the initial mass function (IMF) from Kroupa (2001) and use the stellar mass our model predicts.

Finally, our noise model parameter is assumed to be as small as possible through a Gaussian prior centered on zero. We also impose that η is smaller than 0.3 mag, which we set to avoid significant outliers.

It is essential to highlight that the above priors do not have any particular Galactic position dependence (including distance). As we allow more flexibility in general, we expect more significant uncertainties than studies with a Milky Way model prior (e.g., StarHorse; Anders et al. 2019; Bailer-Jones et al. 2018).

4. Validation

We sampled the posterior distribution over {log10Teff, log10(L),A0, R0, μ, [Fe/H], log10(age)} for 123 097 070 stars observed by Gaia, 2MASS, and AllWISE. We now assess the quality and limitation of our results including predicted log g values using statistical diagnostics and literature reference catalogs.

The reader will find a detailed analysis below, but we briefly highlight our main findings here. We find a very good agreement with the distances from Bailer-Jones et al. (2021), with 2 to 8% absolute fractional differences within 5 kpc. One can trace these differences to our distance prior. Our results may overall overestimate the extinction by ∼0.1 mag by comparing against literature catalogs and samples of red-clump stars. This may result from a lack of strong extinction prior in our analysis. We also qualitatively find that our R0 maps agree with previous work (limited overlaps) and the variations correlate. Our temperatures appear biased high by ∼300 K with respect to literature values. They are in better agreement with the StarHorse catalog than the values we compiled from various surveys. Our log g values agree with asteroseismology with a median absolute difference below 0.3 dex. We find that our ages, distances, and extinction estimates agree well for wide binaries.

4.1. CMD and HRD

Figure 3 presents an overview of the observables and their predictions from our inference. We also report the inferred photometry of each source as part of our inference outputs, which allows us to compare the observed and model color-magnitude diagrams (CMDs; top panels). They are nearly identical. The distance-corrected ones (lower panels) show a few more discrepancies: the lower main-sequence seems thinner, our inference leads to many blue sources being hot stars. Nevertheless, overall, we obtain a robust agreement. We also emphasize that we see around the red clump, the red-giant branch (RGB) and asymptotic-giant branch (AGB) bumps on the distance-corrected CMDs (lower-right panel). We show the same figure for the entire catalog in Fig. B.1.

thumbnail Fig. 3.

Overview of our analysis procedure on the validation sample that contains 853 610 stars. Top panels: present observed CMDs, with the left and right panels showing the input data and their median predictions, respectively. Lower panels: inverse parallax distance-corrected CMD and the distance-corrected one obtained from the AP estimates. The quantities on the y axes of these two panels would be identical in the absence of parallax noise. The entire catalog is shown in Fig. B.1, and the corresponding residuals between the top panels of Fig. B.3.

Figure 4 shows our results after removing the inferred dust contribution. It compares with the lower-right panel of Fig. 3. Based on the significant change of the red clump extend after removing the dust component, our inference reduces the extinction effects. No obvious spurious feature appears after correcting for extinction effects in the color absolute magnitude diagram (CAMD). The hot stars create the usual hook, although we note that it is not confirming these are real hot stars. (If their distances are also in agreement with their parallax values, this could be a confirmation; see below). The apparent gap at the top of the main sequence results from the sample selection. All the stars on the right-hand side of the giant branch move back to the giant and main sequence. The AGB sequence (RGB tip) is well populated. We also note the over-density on the left side of the RGB above the red-clump: these are He-burning stars. We show the same figure for the entire catalog in Fig. B.2.

thumbnail Fig. 4.

Inferred CAMD after accounting for the dust extinction. (The CAMD before accounting for the dust corresponds to the lower-right panel of Fig. 3). The entire catalog is shown in Fig. B.2.

We do not find significant issues in these results so far. We now look at the HRDs and Kiel diagrams of these sources as shown in Fig. 5. For reference, the right-hand side panel shows the Kiel diagram made from the literature values. On all these diagrams, all features are smooth: thin main-sequence, giant branch, clear red-clump. We note that the literature values have issues with the hot stars and a double giant sequence. Also, we note that the literature shows a wiggle in the MS, which cannot be physical. Overall, the agreement is excellent.

thumbnail Fig. 5.

HRDs and Kiel diagrams of the validation sample (853 610 stars). Right-hand panel: literature values. The features on the graphs are smooth, and the red clump is narrow. One can also see the blue helium-burning giants.

4.2. Distances

Figure 6 compares our distance with the Gaia parallaxes. In the left panel, we plot the product of distance and parallax as a function of the parallax signal-to-noise ratio. We corrected for the unity offset to center the distribution on 0. The more precise the parallax, the tighter our distance estimates as expected. A bias is visible when parallax uncertainties are large. This trend comes from the significant number of negative parallaxes (below 0). For these sources, we must infer a positive distance and within our Galaxy. Many of these sources also have values inconsistent with null parallaxes, making the distance difficult to reconcile. The right panel shows our estimates when selecting positive parallaxes only (regardless of the signal-to-noise ratio). The agreement is excellent. It should be noted that we doubt objects with inverse parallax further than 30 kpc (∼upper limit of the x axis) to be actual single stars from our Galaxy.

thumbnail Fig. 6.

Distance estimates with respect to the input parallaxes of the entire catalog. Left panel: parallax distance product as a function of the parallax signal-to-noise ratio. Right panel: distance vs. inverse parallax distribution for positive parallax objects. The distance saturation on this panel is due to our distance modulus prior (19 mag).

Bailer-Jones et al. (2021) inferred distances for all stars in Gaia early Data Release 3 (eDR3) that have parallaxes (including negative ones). These geometric distances used the parallax together with a direction-dependent distance prior, which they defined using a Galaxy model. For stars with BP − RP colors, they additionally inferred their photogeometric distances using this color along with the G magnitude and an HRD prior to keep distances consistent with stellar models. Figure 7 compares our distances with both of these estimates, using a million stars randomly selected in common to both catalogs. Both their study and ours provide posterior medians, so a direct comparison is possible. (Comparing a median to a mode or mean would introduce a bias.) We observe excellent agreement up to 1 kpc, which is not surprising because the high precision parallaxes dominate the distance inference within this range. At further distances, more significant differences occur due to the increasing importance of photometric information in our estimates and the difference in the methods’ priors. For the whole sample, the median fractional difference with respect to the geometric distances, defined as (rours − rBJ2021)/rBJ2021, is +0.07. This is a measure of the bias. A measure of the scatter is the absolute median fractional difference, |rours − rBJ2021|/rBJ2021, which is 0.15 for the geometric distances. Comparing our distances to the photogeometric distances, these two measures are +0.10 (bias) and 0.15 (scatter). The scattering amplitude is reasonable given the median fractional parallax uncertainty (fpu), |σϖ/ϖ|, is 0.20 for this sample (we recall that all parallaxes in the comparison are positive). We should also recall that we use Gaia DR2 parallaxes whereas Bailer-Jones et al. (2021) use Gaia EDR3 parallaxes, which have 20% more precise parallaxes on average. There are several possible causes of the bias, not least the different distance priors and additional assumptions about the relationship between color and absolute magnitude (i.e., the underlying stellar models). As we use a weaker distance prior than was adopted in Bailer-Jones et al. (2021), our estimates can extend to much larger distances. When limiting the comparison to sources with |rours < 5 kpc (median fpu is 0.13), we find median and median absolute fractional differences of 0.02 and 0.08 relative to the geometric distances (respectively), and 0.03 and 0.08 relative to the photogeometric distances. So clearly, the bias is dominated by distant sources.

thumbnail Fig. 7.

Comparison of the median distance in our catalog (vertical axis) with the median geometric (left) and median photogeometric (right) distances from the Gaia EDR3-based catalog of Bailer-Jones et al. (2021). The color scale indicates the density of sources in each panel on a log10 scale relative to the maximum. We note the logarithmic distance axes. The diagonal line is the identity line.

4.3. Dust extinction A0, R0

Red-clump stars. We first compare our distances to red-clump star samples. The absolute luminosities of stars in the red-clump are reasonably independent of their chemical composition, mass, or age. For instance, Stanek & Garnavich (1998) measured the absolute magnitudes for the red-clump at solar metallicity at −0.22 mag in the I band with a variance of about 0.15 mag. Therefore, red-clump stars are good proxies to isolate dust reddening and extinction effects, as argued, for example, in Andrae et al. (2018).

In Fig. 8, we compare our reddening and extinction estimates to the colors and magnitudes of red-clump stars from Bovy et al. (2014). Our reddening estimates (top panel) are in excellent agreement, whereas the lower panel of this figure suggests some additional scatter coming from either our AG or our distance estimates.

thumbnail Fig. 8.

Comparison of color vs. our median reddening estimates (top panel) and reddened absolute magnitude vs. our median AG estimates (bottom panel) for red-clump stars from Bovy et al. (2014). Horizontal dashed lines indicate the intrinsic color and magnitude, whereas diagonal dashed lines indicate the aspired identity relation.

The statistical validity of our A0 and R0 estimates is further attested to by Fig. 9. We recall that we do not use any sky position during our inference: each star estimate remains independent of any other. This plot shows sharp features distinct between the two panels and significantly different from just plotting the Gaia color. The sky maps of our extinction estimates highlight the complexity of the Milky Way disk’s ISM. We recover a wealth of features across a wide range of scales, from thin filaments to large cloud complexes. The Perseus, Taurus, and Orion complexes dominate the anti-central region (far left and right sides of the map, respectively), and the Ophiuchus molecular cloud complex (above the Galactic center) shows exquisite substructures. Using our extinctions and distances, Dharmawardena et al. (2022) inferred the structure of the Orion, Taurus, Perseus, and Cygnus X star-forming regions. They locate Cygnus X at 1300−1500 pc, in line with very-long-baseline interferometry (VLBI) measurements. They concluded our catalog would support studies of the changes in grain size or composition of dust as processed in molecular clouds.

thumbnail Fig. 9.

Sky distribution in Galactic coordinates (averaged over all distances) of the dust extinction parameters A0 (middle) and R0 (bottom). We indicate some molecular regions of our Galaxy by the rectangles in the top panel (overlay of the gray-scaled A0 map). The maps are centered on the Galactic center, with longitudes increasing toward the left. We only plot the 60° centered on the Milky Way disk in these panels. Zoomed-in views of the highlighted regions are shown in Figs. B.4 and B.5.

Local bubble. We also investigated the distribution of our extinction estimates for stars in the Local Bubble, that is, stars with ϖ > 20 mas (∼50 pc) and good parallax measurements (ϖ/σϖ > 5). In this spatially very nearby sample, we do not expect any significant extinction (e.g., Vergely et al. 2010). Figure 10 shows that results in the Local Bubble have virtually no differences between median and maximum posterior (“best”) estimates. However, our extinction values appear overall too high for this sample: half of the stars have A0 > 0.572 mag, and 25% of stars even A0 > 0.996 mag. This trend may result from the median values of truncated distributions may not be the most adapted statistics (e.g., Andrae et al. 2018), or spurious astrometry (Rybizki et al. 2021). It may well be the result of the weak priors we adopted.

thumbnail Fig. 10.

Distribution of A0 (top panel) and AG (bottom panel) estimates in the Local Bubble (ϖ > 20 mas and ϖ/σϖ > 5) for our median values (black histogram) and maximum posterior values (blue histogram). Numbers quote A0 statistics from our median values.

StarHorse catalog. Finally, we compared our extinction estimates, AG, to those of Anders et al. (2019), a catalog that similarly inferred properties of stars using Gaia data. The values in both methods are not a constant scaling from the models’ A0 parameters but account for the shape of the stellar spectrum. Overall, Fig. 11 shows a good agreement between the two sets despite the estimates differing in their constructions. As Anders et al. (2019) provided median estimate values, it is not surprising that our median AG values agree slightly better than our maximum posterior estimates with those of StarHorse.

thumbnail Fig. 11.

Differences in AG extinction estimates between our best result (top panel) and our median result (bottom panel) compared to values from Anders et al. (2019) (they provide median statistics). Numbers quote various statistics to summarize the differences.

Bayestar catalog. In Fig. 12 we compare our extinctions (median A0) to the Bayestar19 catalog extinctions (AV, Green et al. 2019) for a random sample of ∼1 million sources in common to both catalogs. For 62% of the random sources, we find that our median A0 values are larger than the estimates from Bayestar19, with a mean difference of 0.1 mag. Given that Bayestar19 predicts extinction with finite (grid) resolution while we do not impose such a constraint, the systematic difference may reflect unresolved structures for the Bayestar approach.

thumbnail Fig. 12.

Comparison of our extinctions (median A0) to the Bayestar19 catalog AV for a random sample of 925 527 sources in common. The dashed black lines in each plot represent the 1:1 line. Top left: our median A0 compared to Bayestar19 AV. Top right: residuals of Bayestar19 AV – our median A0 as a distribution on sky in Galactic coordinates. Bottom left: same residuals compared to our median R0 estimates. Bottom right: same residuals compared to our median estimate of intrinsic absolute magnitude, Gabs.

Figure 12 also compares the extinctions residuals between the two catalogs with respect to our R0 and absolute magnitudes. There is no significant correlation of the residuals with R0. In contrast, the comparison with absolute magnitudes suggests we slightly overestimate extinction at low luminosity (i.e., the lower part of the main sequence). This may result from our weak prior in comparison to Bayestar’s.

We compared our R0 values to those presented in Schlafly et al. (2017, their Fig. 1). The distributions agree qualitatively. A further quantitative comparison remains difficult, as their definition differs from the standard and the limited accessibility to their values. We will continue to investigate this issue in the future.

4.4. Temperature, gravity, mass, and radius

To further validate our stellar parameters, we created a sample of reference by compiling literature values from SDSS/APOGEE (Abolfathi et al. 2018), GALAH Buder et al. (2018), Gaia-ESO (Gilmore et al. 2012, iDR5), LAMOST (Wu et al. 2011, 2014), and RAVE (Kunder et al. 2017; Kordopatis et al. 2013)5. Details on the filtering on each of the catalogs are given in Appendix A.

We compare the residuals of Teff and log g to the reference sample in Fig. 13. This figure shows residuals on both axes: the differences from literature values for Teff and log g on the y axis and the apparent G magnitude (left panels) and the parallax (right panels) on the x axis. While log g is relatively well behaved throughout the sample, the Teff differences rise sharply for fit residuals below ΔG < −0.09 and above ϖ 1 / d σ ϖ > 4.5 $ \frac{\varpi-1/d}{\sigma_\varpi} > 4.5 $ (indicated by the vertical dotted lines on the plots). However, this affects less than 0.2% of our results. Unfortunately, there is also a systematic overestimation of Teff by ∼300 K, which is also visible on the top-right panel in Fig. 13.

thumbnail Fig. 13.

Residuals of Teff (top panels) and log g (bottom panels) with respect to the literature reference sample (Sect. 4.4; Appendix A) as functions of fit residuals for apparent G (left panels) and parallax ϖ (right panels). The solid lines indicate the median values in each bin, and the shaded regions are the central 68% and 90% intervals.

Figure 14 compares further our Teff estimates (median and maximum posterior) with the literature sample and the StarHorse catalog (Anders et al. 2019). This figure shows that our median values compare substantially better to literature temperatures than our “best” values. Unfortunately, it confirms a systematic overestimation of Teff by ∼300 K (left panels). However, our Teff estimates are substantially more in agreement with the StarHorse ones, with a departure for hot stars (Teff > 7000 K). It is unclear if the bias is due to our weak priors or a stellar model mismatch with the data. The larger discrepancies with literature may be the result of combining various heterogeneous catalogs.

thumbnail Fig. 14.

Differences in median Teff (top panels) and maximum posterior Teff (best; bottom panels) compared to literature values (Appendix A, left panels) and StarHorse values from Anders et al. (2019) (right panel). In each panel we quote several statistics that summarize the differences.

We also compare our log g estimates to asteroseismic values from Serenelli et al. (2017) and Yu et al. (2018). From Fig. 15, our best log g values compare well to asteroseismic values having median absolute differences below 0.3 dex. However, our approach appears to overestimate the median log g values, especially for giant stars (red points).

thumbnail Fig. 15.

Differences in our maximum posterior (top panel) and median (bottom panel) log g estimates compared to asteroseismic values from Serenelli et al. (2017) (main sequence stars; blue points) and Yu et al. (2018) (giant stars; red points).

4.5. Wide binaries

We also considered a sample of wide binaries from El-Badry et al. (2021) where Gaia observes both components individually. These systems are commonly coeval, making them one of the best testbeds to assess the quality of distances, extinctions along the lines of sight, and ages of field stars. Figure 16 compares the distance moduli, extinction A0, and log10(age) of both components. The distance moduli and also the log-ages agree very well between the components. However, the extinction estimates (middle panel of Fig. 16) are spuriously large for one component but not both, in some cases. The discrepancy could result from our weak priors but could also likely indicate where the crossmatch between surveys went wrong, resulting in producing incorrect input SEDs for our analysis. Nevertheless, the bulk of the wide binary pairs in the sample agree within 0.2 mag.

thumbnail Fig. 16.

Wide binaries from El-Badry et al. (2021) and a comparison of distance moduli (left panel), A0 (middle panel), and log-ages (right panel) for the primary and secondary. Left panel: red points mark pairs where one component is more than 1 kpc away. Such cases are excluded from the number statistics and from all other panels.

4.6. Uncertainties

Assessing the quality of reported uncertainties is always a challenge. We must assume that the reference data sets are unbiased and that their uncertainties are correctly calibrated. In this section, we first use the above wide binary sample, which allows us to check the internal consistency of our estimates. Then we use the literature sample define above for further assessment.

Using wide binaries. We used the previous sample of wide binaries of El-Badry et al. (2021) (Sect. 4.5) and investigated to what extent the confidence intervals overlap for distance moduli, extinctions, and ages that should be similar in both components. Table 4 gives the number of cases where the 68% and 50% confidence intervals do not overlap. Unfortunately, even if we assume that literature values and our estimates are unbiased or have identical systematic errors, we cannot predict how often this should happen6. Instead, we have to rely on these numbers being “plausible”. We report these statistics in Table 4. The “mismatch” seems to happen far too often for the distance modulus (μ) and extinction (A0), and virtually never for our age estimates (which is not a sampled parameter). These numbers could be reflecting underestimated uncertainties or the MCMC converging to local minima that are inconsistent with a binary system. It could happen that some pairs from El-Badry et al. (2021) not being genuine binaries at all. However, Fig. 16 suggests that the solutions of both components are in excellent agreement with each other consistent with most pairs from El-Badry et al. (2021) very likely being real binaries. Table 4 thus suggests that we systematically underestimate our uncertainties for distance and A0. For log-age, the results from Table 4 suggest that the uncertainties are too large, leading to too few cases where the intervals do not overlap.

Table 4.

Number of instances where the central 68% and 50% confidence intervals do not overlap between components of the 27 125 wide binaries taken from El-Badry et al. (2021).

Literature reference catalog. We inspected the distribution of differences between our Teff and log g estimates and literature values (sample described in Sect. 4.4). Figure 17 shows this distribution normalized by the symmetrized 68% confidence interval and the quoted literature error (we note the log-scale on the y axis). These normalized residuals do not peak at zero, thus indicating biases. We already noted the ∼300 K in Sect. 4.4. Otherwise, they are reasonably close to a Gaussian with unit standard deviation (for log g, the tails are a bit heavier than a unit Gaussian). One can attribute further departure from the unit Gaussian to the non-Gaussianity of the posterior distribution (multivariate or marginalized). This comparison with spectroscopic values implies that at least for Teff and log g our 68% confidence intervals are reasonable estimates of the random errors in our results7.

thumbnail Fig. 17.

Distribution of normalized residuals to literature values for Teff (top panel) and log g (bottom panel). Gray curves show a Gaussian that has the same mean and a unit standard deviation for comparison.

4.7. Metallicity

As we mentioned before, it is not a surprise that we cannot constrain metallicity from our data. The width of the Gaia passbands makes the photometric values barely changing from the variations in the optical metal lines. Hence, [Fe/H] becomes prior dominated. Figure 18 illustrates the prior dependent estimates of metallicity [Fe/H]. If one would adopt a uniform prior (left panel), the impact would be substantial as the degeneracies create multimodal solutions to match young stars’ photometry. In regions such as Orion in this figure, we can affirm that the metallicity estimates are not plausible.

thumbnail Fig. 18.

Distribution of metallicity estimates as a function of age compared to its prior. Each panel shows the posterior metallicity for a random subset of stars in the direction of the Orion star formation using both a uniform and an age-dependent prior. The median estimates are indicated by the point density, and the prior is shown by the solid lines (mean, 1-, and 5-sigma intervals). We note that the prior rejects anything outside the model range (below −2 and above 0.2 dex). The rightmost panel corresponds to our prior.

As a result, we adopted an age-dependent prior (right panel) that allows some metallicity range at older ages as one would expect chemical enrichment to produce a similar trend to first order. We did not find these values to be scientifically exploitable individually. Our tests did not suggest we could see population variations within the Galaxy. As a result, we marginalized over them when reporting our estimates. This allows us to produce more meaningful uncertainties (see Sect. 4.6).

To improve upon the determination of metallicity, one could include narrower photometric bands (e.g., SDSS and PS1). This would lead to more crossmatch potential issues and potentially more tensions between models and data. The third Gaia data release (Gaia DR3) will contain the BP/RP spectra (i.e., the dispersed light corresponding to the optical coverage of Gaia). If one thinks of these spectra as series of narrow photometric bands, the metallicity issue should be resolved without these additional limitations8.

4.8. Photometric jitter

We introduced a photometric jitter, η, in our model. This term acts as a random uncertainty in the photometry that we estimate per source. It aims at capturing (random) mismatches between our models and data, which could be the result of a variety of reasons. For example, photometric calibration errors could be large (e.g., 0.1%), or the data could be affected by crossmatch errors (especially in dense regions). Figure 19 summarizes the jitter behavior. The left panel shows the distribution of the jitter in the observed Gaia CMD. One can immediately see the prominence of the giants with small jitter values (yellow in this diagram). The other panels show the distributions after correcting the y axis for distance and further dust extinction (middle and right panels, respectively). The jitter becomes substantial for stars on short evolution stages, for instance OB stars of AGBs. One can also note that binaries (middle panel) also disagree with our models and thus force the jitter values to be larger than the single stars on the main sequence.

thumbnail Fig. 19.

Distribution of the photometric jitter, η, for all 123 076 271 stars in the observed, distance-corrected, and distance-extinction-corrected CMDs, from left to right respectively. The color scale is identical for all three panels. Large values of the jitter indicate where models and data statistically deviate, i.e., mostly in rapid evolution phases. We note that the binary sequence parallel to the main sequence requires some substantial jitter values.

5. Catalog

5.1. Content

The catalog includes 123 076 271 sources from Gaia DR2 that have a parallax and all G, BP, RP, J, H, Ks, W1, and W2 photometric measurements. Per star, we provide for each output quantity:

  • X: the input (recalibrated) quantity

  • X_sigma: the input (recalibrated) uncertainty

  • X_best: the value of the best posterior sample from the MCMC chains

  • X_min, X_max: the minimum and maximum values of the MCMC samples

  • X_p16, p50, p84: the 16th, 50th, and 84th percentiles from the MCMC samples

The set {X_best} represents the best model (i.e., prediction set preserving physical information of the model). In addition, we report the input features used during the fit that were “recalibrated” using the Gaia DR2 prescriptions (i.e., photometry and parallaxes, see Table 2) The catalog contains the following quantities:

  • source_id: the Gaia DR2 identifier

  • parallax: the parallax_sigma: recalibrated parallax and uncertainty values

  • J: the 2MASS J photometry [vegamag],

  • H: the 2MASS H band photometry [vegamag],

  • Ks: the 2MASS Ks photometry [vegamag],

  • BP: the GaiaBP magnitude (bright or faint) [vegamag]

  • G: the Gaia G magnitude [vegamag]

  • RP: the Gaia RP magnitude [vegamag]

  • W1: the AllWISE W1 magnitude [vegamag]

  • W2: the AllWISE W2 magnitude [vegamag]

  • A0: the extinction parameter [mag]

  • R0: the average dust grain size extinction parameter [unitless]

  • A_G: the extinction in the Gaia G-band [mag]

  • A_BP: the extinction in the Gaia BP-band [mag]

  • A_RP: the extinction in the Gaia RP-band [mag]

  • dmod: the distance modulus [mag]

  • lnlike: the log likelihood [unitless]

  • lnp: the log posterior [unitless]

  • log10jitter: the log photometric likelihood jitter common to all bands [log10 mag]

  • logA: the log10 (age/yr)

  • logL: the log10 (Luminosity/L)

  • logM: the log10 (mass/M)

  • logT: the log10 (Teff/K)

  • logg: the log10 (gravity/cgs)

One should recall that 3% of the sources source_id identified will have been updated in Gaia DR3 compared to Gaia DR2 (Fabricius et al. 2021). Thus, one should use the source_id crossmatch table dr2_neighbourhood provided with Gaia DR3 to find the best match before conducting any source-by-source comparisons between the two releases.

We have not filtered out any results from our catalog. Sources with spurious parallaxes remain. One must proceed to any filtering with care; any rules are most unlikely to generalize. Instead, selections need to adapt to specific use-cases and locations in the Galaxy.

5.2. Use cases

Seven example use cases are as follows.

The first is the look-up of distance (or distance modulus) for particular sources of interest using their source_id or another identifier matched to this. The Gaia data releases include a crossmatch to many existing catalogs. Positional crossmatch can also be done on the data site or using TAP uploads and at other sites that host our catalog.

Second is identification of sources within some AP ranges. One should use the confidence intervals to find all sources of interest. For instance, Poggio et al. (2021) select upper main sequence stars from their apparent colors; one could switch to Teff, or the absolute magnitude predictions from our catalog.

Third is construction of CAMDs. One of the reasons why we provide quantiles on distance modulus and the individual photometric band predictions and their extinction values. (This would not be possible if we reported the mean or mode, for example.)

The fourth is constructing the three-dimensional spatial distribution of stars in some region of space. It may also assist in selecting candidates for targeted follow-up surveys.

The fifth is constructing the three-dimensional spatial properties of the ISM. Using our extinctions and distances, Dharmawardena et al. (2022) inferred the individual structure of the Orion, Taurus, Perseus, and Cygnus X star-forming regions and found the coherent ISM filaments that may link the Taurus and Perseus regions.

The sixth is linking the spatial variations in R0 to local stellar properties to understand ISM processing cycles (e.g., Chastenet et al. 2017).

And finally, one can use our catalog as a baseline for comparison of APs or absolute magnitude estimates inferred by other means.

5.3. Data access

Our catalog is available from the German Astrophysical Virtual Observatory (GAVO)9 where one can query it via ADQL and the table access protocol (TAP). This server also hosts a reduced version of the main Gaia DR2 and Gaia DR3 catalogs. Typical queries are likely to involve a join of one of these two catalogs. A bulk download for the catalog is also available at the URL given above. Our catalog will also become available soon in the Gaia Archive10 and CDS VizieR and their partner data centers.

5.4. Limitations and discussions

Users should keep in mind the following assumptions and limitations of our catalog.

We summarize the six-dimensional posterior distributions using only quantile numbers (computed on marginal one-dimensional distributions). The _best estimates only give one point that was randomly sampled and turned out to obtain the best posterior value of the MCMC chains. These summary statistics cannot capture the full complexity of these distributions. One should not ignore the confidence intervals.

Most sources in Gaia DR2 have substantial fractional parallax uncertainties. Hence, the photometric data often dominate the inference of our distances and APs. However, the parallax remains generally sufficient to limit the dwarf versus giant degeneracies.

The poorer the data, the more our prior dominates our estimates. Our prior is not a sophisticated model of the Galaxy that includes three-dimensional extinction. One should expect significant differences with other AP catalogs when prior dominates. However, in reality, if the true stellar population, extinction, or reddening distributions are very different from Galactic models, our catalog (and disagreements with other studies) may partially hint at these deviations.

We implicitly assumed that all sources are single stars in the Galaxy. Our estimates will be incorrect for any non-single star (binaries, extended sources, extragalactic).

We assumed all crossmatch to produce correct results. Matching sources becomes a complex challenge in some cases (e.g., dense regions, high extinction). Wrong associations between surveys result in incorrect photometric data, leading to “bad fits”. We introduced η to capture some of these issues. In combination with lnlike in our catalog, one could assess how the model predictions were matching the sources’ photometric information.

Finally, by design, we infer properties for each source independently. If a set of stars is known to be in a cluster, they have a similar distance, extinction, chemical patterns, and age. It constitutes a prior that one should exploit to infer the properties of the individual stars more accurately than what we have done here.

6. Summary

We have produced a catalog of distance moduli and astrophysical and dust extinction parameters for 123 076 271 stars using the photometry from Gaia, 2MASS, and AllWISE and the Gaia parallaxes. These estimates, and their uncertainties, can also be used as estimates of the distances. We provide additional photometric estimates and their dust extinction values.

Our catalog increases the availability of APs in the literature while offering results based on assumptions that differ from those of previous works. Such works helped to validate our results.

In addition, we provide one of the first extensive catalogs of average extinction per color-excess unit (R0) derived uniformly across the whole sky.

We used external data, so-called “hybrid catalogs”, to leverage current multiwavelength information from independent data sources. However, combining heterogeneous data inevitably leads to inconsistencies, biases, and complex selection functions.

With Gaia DR3, DPAC will publish APs based upon analysis of the BP/RP spectra, that is, only Gaia data with detailed optical information according to the Gaia data release Scenario11. Such a product will eventually open up the possibility to further “hybrid” analysis. Gaia DR3 represents a critical first step toward anchoring all current and future spectroscopic surveys to a common ground and providing us with the most comprehensive view of our Galaxy.


1

One could make the parallel with the assumption of independent measurements from a single instrument: in very high signal-to-noise data, the systematics in the calibration challenge this assumption.

2

DPAC: (Gaia) Data Processing and Analysis Consortium.

3

We used the pyphot suite; https://github.com/mfouesneau/pyphot

4

Python or R packages use the name “earth”.

5

Survey acronyms: SDSS/APOGEE: Sloan Digital Sky Survey/Apache Point Observatory Galactic Evolution Experiment; Gaia-ESO: Gaia and the European Southern Observatory spectroscopic survey; LAMOST: the Large Sky Area Multi-Object Fibre Spectroscopic Telescope; and RAVE: the Radial Velocity Experiment.

6

Assuming Gaussian uncertainties, one could estimate the probability that two estimated values are outside each other’s confidence intervals for each source individually but not for the whole sample. Yet, our uncertainties are certainly not Gaussian.

7

Systematic errors are generally not accounted for in confidence intervals since those characterize only random errors.

8

The main challenge will become modeling the BP/RP spectra.

Acknowledgments

We thank our referee for their constructive and valuable comments. We also thank H.-W. Rix and D.W. Hogg for the fruitful discussions during this project. We also thank C. Subiran for providing us with compiled catalogs from the literature. M.F. thanks K. Gordon for sharing his expertise on dust properties. This work was funded in part by the DLR (German space agency) via grant 50 QG 1403. It has made use of data from the European Space Agency (ESA) mission Gaia (http://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, http://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement. This publication uses data products from the Two Micron All Sky Survey, a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation. This publication uses data products from the Wide-field Infrared Survey Explorer, a joint project of the University of California, Los Angeles, and the Jet Propulsion Laboratory/California Institute of Technology funded by the National Aeronautics and Space Administration. This research made use of matplotlib (Hunter 2007), NumPy (Harris et al. 2020), pyphot (https://github.com/mfouesneau/pyphot), the IPython package (Pérez & Granger 2007), Vaex (Breddels & Veljanoski 2018), TOPCAT (Taylor 2005), QuantStack xtensor (http://xtensor.readthedocs.io/).

References

  1. Abolfathi, B., Aguado, D. S., Aguilar, G., et al. 2018, ApJS, 235, 42 [NASA ADS] [CrossRef] [Google Scholar]
  2. Anders, F., Khalatyan, A., Chiappini, C., et al. 2019, A&A, 628, A94 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Andrae, R., Fouesneau, M., Creevey, O., et al. 2018, A&A, 616, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  4. Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Mantelet, G., & Andrae, R. 2018, AJ, 156, 58 [Google Scholar]
  5. Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Demleitner, M., & Andrae, R. 2021, AJ, 161, 147 [Google Scholar]
  6. Bianchi, L., Clayton, G. C., Bohlin, R. C., Hutchings, J. B., & Massey, P. 1996, ApJ, 471, 203 [NASA ADS] [CrossRef] [Google Scholar]
  7. Bianchi, L., Herald, J., Efremova, B., et al. 2011, Ap&SS, 335, 161 [Google Scholar]
  8. Bovy, J., Nidever, D. L., Rix, H.-W., et al. 2014, ApJ, 790, 127 [CrossRef] [Google Scholar]
  9. Breddels, M. A., & Veljanoski, J. 2018, A&A, 618, A13 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Buder, S., Asplund, M., Duong, L., et al. 2018, MNRAS, 478, 4513 [Google Scholar]
  11. Cardelli, J. A., Clayton, G. C., & Mathis, J. S. 1989, ApJ, 345, 245 [Google Scholar]
  12. Castelli, F., & Kurucz, R. L. 2003, in Modelling of Stellar Atmospheres, eds. N. Piskunov, W. W. Weiss, & D. F. Gray, 210, A20 [Google Scholar]
  13. Chastenet, J., Bot, C., Gordon, K. D., et al. 2017, A&A, 601, A55 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. Chen, Y., Girardi, L., Bressan, A., et al. 2014, MNRAS, 444, 2525 [Google Scholar]
  15. Cutri, R. M., Skrutskie, M. F., van Dyk, S., et al. 2003, VizieR Online Data Catalog: II/246 [Google Scholar]
  16. Cutri, R. M., Wright, E. L., Conrow, T., et al. 2014, VizieR Online Data Catalog: II/328 [Google Scholar]
  17. Dharmawardena, T. E., Bailer-Jones, C. A. L., Fouesneau, M., & Foreman-Mackey, D. 2022, A&A, 658, A166 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  18. El-Badry, K., Rix, H.-W., & Heintz, T. M. 2021, MNRAS, 506, 2269 [NASA ADS] [CrossRef] [Google Scholar]
  19. Evans, D. W., Riello, M., De Angeli, F., et al. 2018, A&A, 616, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  20. Fabricius, C., Luri, X., Arenou, F., et al. 2021, A&A, 649, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Fitzpatrick, E. L. 1999, PASP, 111, 63 [Google Scholar]
  22. Friedman, J. H. 1991, Ann. Stat., 19, 1 [NASA ADS] [Google Scholar]
  23. Gaia Collaboration (Prusti, T., et al.) 2016, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Gaia Collaboration (Brown, A. G. A., et al.) 2018, A&A, 616, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  25. Gilmore, G., Randich, S., Asplund, M., et al. 2012, The Messenger, 147, 25 [NASA ADS] [Google Scholar]
  26. Gordon, K. D., Cartledge, S., & Clayton, G. C. 2009, ApJ, 705, 1320 [NASA ADS] [CrossRef] [Google Scholar]
  27. Gordon, K. D., Fouesneau, M., Arab, H., et al. 2016, ApJ, 826, 104 [NASA ADS] [CrossRef] [Google Scholar]
  28. Green, G. M., Schlafly, E., Zucker, C., Speagle, J. S., & Finkbeiner, D. 2019, ApJ, 887, 93 [NASA ADS] [CrossRef] [Google Scholar]
  29. Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357 [Google Scholar]
  30. Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [Google Scholar]
  31. Ibata, R. A., Malhan, K., & Martin, N. F. 2019, ApJ, 872, 152 [Google Scholar]
  32. Kaviraj, S., Rey, S. C., Rich, R. M., Yoon, S. J., & Yi, S. K. 2007, MNRAS, 381, L74 [Google Scholar]
  33. Kennedy, M. C., & O’Hagan, A. 2001, J. R. Stat. Soc.: Ser. B, 63, 425 [CrossRef] [Google Scholar]
  34. Kordopatis, G., Gilmore, G., Steinmetz, M., et al. 2013, AJ, 146, 134 [Google Scholar]
  35. Kroupa, P. 2001, MNRAS, 322, 231 [NASA ADS] [CrossRef] [Google Scholar]
  36. Kunder, A., Kordopatis, G., Steinmetz, M., et al. 2017, AJ, 153, 75 [Google Scholar]
  37. Lang, D. 2014, AJ, 147, 108 [Google Scholar]
  38. Lindegren, L. 2018, Considerations for the use of DR2 Astrometry, Tech. Rep., Gaia DPAC [Google Scholar]
  39. Lindegren, L., Hernández, J., Bombrun, A., et al. 2018, A&A, 616, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  40. Maíz Apellániz, J., & Weiler, M. 2018, A&A, 619, A180 [Google Scholar]
  41. Marigo, P., Bressan, A., Nanni, A., Girardi, L., & Pumo, M. L. 2013, MNRAS, 434, 488 [Google Scholar]
  42. Marrese, P. M., Marinoni, S., Fabrizio, M., & Altavilla, G. 2019, A&A, 621, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  43. Mathur, S., Huber, D., Batalha, N. M., et al. 2017, ApJS, 229, 30 [NASA ADS] [CrossRef] [Google Scholar]
  44. McDonald, I., Zijlstra, A. A., & Watson, R. A. 2017, MNRAS, 471, 770 [Google Scholar]
  45. McMillan, P. J., Kordopatis, G., Kunder, A., et al. 2018, MNRAS, 477, 5279 [NASA ADS] [CrossRef] [Google Scholar]
  46. Meisner, A. M., Lang, D., & Schlegel, D. J. 2017, AJ, 154, 161 [NASA ADS] [CrossRef] [Google Scholar]
  47. Mints, A., & Hekker, S. 2017, A&A, 604, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  48. Mints, A., & Hekker, S. 2018, A&A, 618, A54 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  49. Mo, H. J., Mao, S., & White, S. D. M. 1998, MNRAS, 295, 319 [Google Scholar]
  50. Molenda-Żakowicz, J., Bruntt, H., Sousa, S., et al. 2010, Astron. Nachr., 331, 981 [CrossRef] [Google Scholar]
  51. Pérez, F., & Granger, B. E. 2007, Comput. Sci. Eng., 9, 21 [Google Scholar]
  52. Pickles, A. J. 1998, PASP, 110, 863 [NASA ADS] [CrossRef] [Google Scholar]
  53. Poggio, E., Drimmel, R., Cantat-Gaudin, T., et al. 2021, A&A, 651, A104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  54. Queiroz, A. B. A., Anders, F., Santiago, B. X., et al. 2018, MNRAS, 476, 2556 [Google Scholar]
  55. Ramírez-Agudelo, O. H., Sana, H., de Koter, A., et al. 2017, A&A, 600, A81 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  56. Riello, M., De Angeli, F., Evans, D. W., et al. 2018, A&A, 616, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  57. Rix, H.-W., & Bovy, J. 2013, A&ARv, 21, 61 [NASA ADS] [CrossRef] [Google Scholar]
  58. Rosenfield, P., Marigo, P., Girardi, L., et al. 2016, ApJ, 822, 73 [NASA ADS] [CrossRef] [Google Scholar]
  59. Rybizki, J., Green, G. M., Rix, H.-W., et al. 2021, MNRAS, 510, 2597 [Google Scholar]
  60. Santiago, B. X., Brauer, D. E., Anders, F., et al. 2016, A&A, 585, A42 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  61. Schlafly, E. F., Peek, J. E. G., Finkbeiner, D. P., & Green, G. M. 2017, ApJ, 838, 36 [NASA ADS] [CrossRef] [Google Scholar]
  62. Serenelli, A. M., Bergemann, M., Ruchti, G., & Casagrande, L. 2013, MNRAS, 429, 3645 [NASA ADS] [CrossRef] [Google Scholar]
  63. Serenelli, A., Johnson, J., Huber, D., et al. 2017, ApJS, 233, 23 [Google Scholar]
  64. Simón-Díaz, S., Godart, M., Castro, N., et al. 2017, A&A, 597, A22 [CrossRef] [EDP Sciences] [Google Scholar]
  65. Skrutskie, M. F., Cutri, R. M., Stiening, R., et al. 2006, AJ, 131, 1163 [Google Scholar]
  66. Soubiran, C., Lecampion, J., & Chemin, L. 2014, Auxiliary Data for CU6 – Atmospheric Parameters – version 2, gAIA-C6-TN-LAB-CS-011 [Google Scholar]
  67. Stanek, K. Z., & Garnavich, P. M. 1998, ApJ, 503, L131 [Google Scholar]
  68. Stevens, D. J., Stassun, K. G., & Gaudi, B. S. 2017, AJ, 154, 259 [NASA ADS] [CrossRef] [Google Scholar]
  69. Taylor, M. B. 2005, in Astronomical Data Analysis Software and Systems XIV, eds. P. Shopbell, M. Britton, & R. Ebert, ASP Conf. Ser., 347, 29 [Google Scholar]
  70. Valencic, L. A., Clayton, G. C., & Gordon, K. D. 2004, ApJ, 616, 912 [NASA ADS] [CrossRef] [Google Scholar]
  71. Vergely, J. L., Valette, B., Lallement, R., & Raimond, S. 2010, A&A, 518, A31 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  72. Wang, J., Shi, J., Pan, K., et al. 2016, MNRAS, 460, 3179 [NASA ADS] [CrossRef] [Google Scholar]
  73. Wu, Y., Luo, A.-L., Li, H.-N., et al. 2011, Res. Astron. Astrophys., 11, 924 [Google Scholar]
  74. Wu, Y., Du, B., Luo, A., Zhao, Y., & Yuan, H. 2014, in Statistical Challenges in 21st Century Cosmology, eds. A. Heavens, J. L. Starck, & A. Krone-Martins, IAU Symp., 306, 340 [NASA ADS] [Google Scholar]
  75. Young, M. E., & Short, C. I. 2017, ApJ, 835, 292 [NASA ADS] [CrossRef] [Google Scholar]
  76. Yu, J., Huber, D., Bedding, T. R., et al. 2018, ApJS, 236, 42 [NASA ADS] [CrossRef] [Google Scholar]
  77. Zinn, J. C., Pinsonneault, M. H., Huber, D., & Stello, D. 2019, ApJ, 878, 136 [Google Scholar]

Appendix A: Validation target sample

This appendix briefly describes how we compile a “reference catalog” from various literature resources. First, we present the general quality requirements from the literature values that any star needs to satisfy to enter our reference catalog. Afterward, we go over the various literature catalogs and describe their additional specific selection criteria. Table A.1 provides a summary of the diverse literature catalog contributions to our reference sample.

Table A.1.

Contributions to reference catalog from various literature catalogs.

A.1. General quality requirements for all catalogs

For any star with literature properties, the following two general quality requirements must be satisfied to admit that star into our reference catalog: (i) the star must have estimated values for all three stellar parameters; Teff, log g, and [M/H], and (ii) the star must have uncertainties estimated for all three stellar parameters, Teff, log g, and [M/H].

A.2. Catalog-specific selection criteria

APOGEE DR1 – Abolfathi et al. (2018)

We took the data file allStar-l31c.2.fits. It contains 277 371 stars with ASPCAP parameter estimates. We applied the following quality cuts:

  • ASPCAPFLAG is not 223, which excludes “bad overall for star: set if any of TEFF, LOGG, CHI2, COLORTE, ROTATION, SN error are set, or any parameter is near grid edge”.

  • COMMISS equals 0, which excludes APOGEE DR14-1 commissioning data.

  • APOGEE DR14_TARGET1 is not 211 + 212 + 213, which excludes short, intermediate, and long cohort target stars from APOGEE DR14-1 (i.e., commissioning data again).

  • Signal-to-noise ratio larger than 50.

This filtering reduces the number of selected stars down to 69 572, which contains several hundred duplicated entries that one can identify via their APOGEE DR14 IDs.

Gaia-ESO iDR5 – Gilmore et al. (2012)

Starting from the internal DR5 with 82 123 stars, we applied the general quality requirements without any additional cuts. The final sample contains 36 159 stars.

GALAH DR2 – Buder et al. (2018)

GALAH DR2 published stellar parameters for 342 682 stars, including uncertainty estimates. We apply the following quality cuts (private communication with S. Buder):

  • flag_cannon equals 0, which excludes spectra that are unusual (e.g., binaries) or had a problematic reduction, or outside the model validity (e.g., cool stars).

  • snr_c2 larger than 25 (in the green band; the official GALAH DR2 end-of-survey goal is 50).

  • Teff < 7 000 K, limitation from the CANNON’s empirical training sample under-representing hotter stars. (Cool stars are caught through the flag already).

The general and additional quality cuts leave 223 089 stars.

LAMOST DR4 – Wu et al. (2011, 2014)

The LAMOST DR412 contains parameter estimates for 4 540 986 AFGK stars. We apply the general quality requirements and the following additional cuts (private communication with C. Liu):

  • Signal-to-noise ratio of g filter snrg larger than 20.

  • Estimated temperature in range 4 000 K < Teff< 7 500 K.

We obtained 3 251 173 stars.

RAVE DR5 – Kunder et al. (2017), Kordopatis et al. (2013)

The RAVE DR5 contains parameter estimates for 520 701 stars. We applied additional quality cuts:

  • Stellar classification 1 equals d, g, h, n or o.

  • Stellar classification 2 equals d, g, h, n or o.

  • Stellar classification 3 equals d, g, h, n or o.

  • algo_conv equals 0.

  • Error of Heliocentric radial velocity is less than 10 km/s.

  • Signal-to-noise ratio in the K band is larger than 50.

  • 4000 < Teff < 7750

  • log g > 1

  • Removal of multiple entries, keeping the one with highest signal-to-noise ratio in the K band.

These cuts leave 123 988 stars, for which we use the calibrated parameter estimates only.

A.2.1. SDSS DR14 stars

Abolfathi et al. (2018)

For SDSS DR14, we queried their CAS server with the following13:

SELECT
   p.objid,
   p.ra, p.dec,
   p.u, p.g, p.r, p.i, p.z,
   p.run, p.rerun, p.camcol, p.field,
   s.specobjid,
   s.plate, s.mjd, s.fiberid,
   s.TEFFADOP,s.TEFFADOPUNC,
   s.TEFFSPEC,s.TEFFSPECUNC,
   s.LOGGADOP,s.LOGGADOPUNC,
   s.LOGGSPEC,s.LOGGSPECUNC,
   s.FEHADOP,s.FEHADOPUNC,
   s.FEHSPEC,s.FEHSPECUNC,
   s.FLAG,s.SNR,s.QA
FROM PhotoObj AS p
   JOIN sppParams AS s ON s.bestobjid = p.objid
WHERE
   p.g BETWEEN 0 AND 21
   AND (s.TEFFADOP>0.0 OR s.TEFFSPEC>0.0)

This request produced 430 164 stars. We use the “spectroscopic” parameters (not the “adopted”). In addition to the general quality requirements, we also selected the FLAG equals to nnnnn (i.e., “normal”). This resulted in 309 934 selected stars.

Raw and revised Kepler Input Catalog – Molenda-Żakowicz et al. (2010) and Mathur et al. (2017)

For the Kepler raw catalog, we downloaded the online data file kic_ct_join_12142009.txt14. It contains 6 569 685 stars. We caution that this catalog has limitations (Molenda-Żakowicz et al. 2010), and thus one should use it with caution. Furthermore, it does not contain any uncertainty estimates on stellar parameters, and so we could not apply the general quality requirements. Nonetheless, requiring all three stellar parameters (Teff, log g, [M/H]) resulted in 1 704 630 stars in this sample.

Mathur et al. (2017) provide revised stellar properties for 197 096 Kepler targets. After applying our general quality requirements, we obtain 185 321 stars. We replaced any duplicated entry with the raw input catalog with the parameters of this catalog.

APOGEE red-clump catalog – Bovy et al. (2014)

We downloaded the APOGEE DR14 red-clump catalog15, which contains 19 937 red-clump stars that satisfy our general quality requirements.

OB stars – Ramírez-Agudelo et al. (2017) and Simón-Díaz et al. (2017)

Ramírez-Agudelo et al. (2017) publish effective temperatures, log g, radius, mass, and bolometric luminosities for 72 OB stars. Unfortunately, they do not publish estimates of [M/H]. Simón-Díaz et al. (2017) publish effective temperatures and bolometric luminosities for 382 OB stars. Unfortunately, they do not publish estimates of log g or [M/H] nor their uncertainty estimates.

Nevertheless, since OB stars with parameters are scarce, we still admit both sets of 72 and 382 OB stars into our reference catalog.

Gaia CU6 Auxiliary catalog – Soubiran et al. (2014, private communication)

We adopt the Gaia DPAC CU6 Auxiliary catalog compiled to validate Gaia RVS spectra and CU8 stellar parameter estimates. In its latest version (V240114), it contains 1 930 105 stars. We apply the general quality cuts, but we do not insist on uncertainty estimates on log g or [Fe/H]. This selection leaves us with 368 153 stars.

A.3. Crossmatch to Gaia DR2

Given the reference catalog of 6 292 410 stars compiled from the literature, we crossmatch the coordinates to Gaia DR2. Further restrictions for the crossmatch process are:

  • an apparent magnitude G ≤ 20 mag.

  • at least two transits of the sources observed by Gaia,

  • a maximum crossmatch distance of 3 arcsec,

  • and if the target source has an expected G magnitude, we only consider sources within |ΔG|< 1 mag.

Of the total number of 6 292 410 targets from the reference catalog, we obtain found 5 080 846 matches in Gaia DR2. Still, duplicate Gaia source IDs originate from the same star occurring in more than one literature catalog. The final reference catalog contains 4 751 603 unique Gaia source IDs.

Appendix B: Additional figures

In this section we show the following figures. Figure B.1 is the equivalent of Fig. 3 for the 123 076 271 stars of the whole catalog. Figure B.2 is the equivalent of Fig. 4 for the 123 076 271 stars of the whole catalog. Figure B.3 presents the residuals between the observed and inferred CMDs (top panels of Fig. B.1) for the 123 076 271 stars of the whole catalog. Figures B.4 and B.5 show zoomed-in versions of the regions highlighted in Fig. 9.

thumbnail Fig. B.1.

Overview of our analysis procedure on the whole catalog, which contains 123 076 271 stars. The top panels present observed CMDs, with the left and right panels showing the input data and their median predictions, respectively. The lower panels show the inverse parallax distance-corrected CMD and the distance corrections obtained from the AP estimates. The quantities on the y axes of these two panels would be identical in the absence of parallax noise.

thumbnail Fig. B.2.

Inferred CAMD before (left) and after (right) accounting for the dust extinction for the entire catalog.

thumbnail Fig. B.3.

Residuals between the observed and inferred CMDs for the 123 076 271 stars of the whole catalog (top panels of Fig. B.1). The residuals are calculated on a 512 x 512 binning scheme (0.01 x 0.03 mag) within the axis ranges and normalized to the predicted counts (i.e., (obs-pred/pred)): red and blue colors indicate overestimated and underestimated counts, respectively. Systematics are discussed in Sect. 4. Stars bluer than what the stellar evolution model predicts pile up in the residuals at BP-RP ∼ 1 mag.

thumbnail Fig. B.4.

Sky distribution in Galactic coordinates (averaged over all distances) of the dust extinction parameters A0 (left) and R0 (right) of the regions highlighted in Fig. 9. The region’s names are indicated in the top-right corner of each panel. Continued in Fig. B.5.

thumbnail Fig. B.5.

Continuation of Fig. B.4.

All Tables

Table 1.

Typical uncertainties in magnitudes of the (raw) input photometric catalogs: minimum and a few quantiles.

Table 2.

Prescriptions applied to the input data and references.

Table 3.

Model priors.

Table 4.

Number of instances where the central 68% and 50% confidence intervals do not overlap between components of the 27 125 wide binaries taken from El-Badry et al. (2021).

Table A.1.

Contributions to reference catalog from various literature catalogs.

All Figures

thumbnail Fig. 1.

Photometric filters covering the various compiled data sets (details in Sect. 2) compared with spectra of typical stars: Vega (A0V), a G2V star (Sun-like star), and an M5III star. Bottom panel: Gaia DR2 transmissions of the three Gaia passbands and the selected all-sky survey ones (2MASS and WISE). Top panel: selected spectral templates from Pickles (1998), and we overlaid the Fitzpatrick (1999) dust extinction curve and its variations with R0 for reference. We include GALEX for reference, but we did not include the survey in this work (see Sect. 2).

In the text
thumbnail Fig. 2.

Completeness of our AP catalog with respect to all Gaia DR2 sources. Top panel: completeness and the catalog source count as a function of G magnitude in blue and orange, respectively. Middle and lower panels: completeness over the sky (in Mollweide projection and Galactic coordinates) for the magnitude ranges of the high-completeness and most-sources samples, respectively, as indicated in the top panel with the solid and dashed vertical lines, i.e., for sources with 9 < G < 13 (15 < G < 19).

In the text
thumbnail Fig. 3.

Overview of our analysis procedure on the validation sample that contains 853 610 stars. Top panels: present observed CMDs, with the left and right panels showing the input data and their median predictions, respectively. Lower panels: inverse parallax distance-corrected CMD and the distance-corrected one obtained from the AP estimates. The quantities on the y axes of these two panels would be identical in the absence of parallax noise. The entire catalog is shown in Fig. B.1, and the corresponding residuals between the top panels of Fig. B.3.

In the text
thumbnail Fig. 4.

Inferred CAMD after accounting for the dust extinction. (The CAMD before accounting for the dust corresponds to the lower-right panel of Fig. 3). The entire catalog is shown in Fig. B.2.

In the text
thumbnail Fig. 5.

HRDs and Kiel diagrams of the validation sample (853 610 stars). Right-hand panel: literature values. The features on the graphs are smooth, and the red clump is narrow. One can also see the blue helium-burning giants.

In the text
thumbnail Fig. 6.

Distance estimates with respect to the input parallaxes of the entire catalog. Left panel: parallax distance product as a function of the parallax signal-to-noise ratio. Right panel: distance vs. inverse parallax distribution for positive parallax objects. The distance saturation on this panel is due to our distance modulus prior (19 mag).

In the text
thumbnail Fig. 7.

Comparison of the median distance in our catalog (vertical axis) with the median geometric (left) and median photogeometric (right) distances from the Gaia EDR3-based catalog of Bailer-Jones et al. (2021). The color scale indicates the density of sources in each panel on a log10 scale relative to the maximum. We note the logarithmic distance axes. The diagonal line is the identity line.

In the text
thumbnail Fig. 8.

Comparison of color vs. our median reddening estimates (top panel) and reddened absolute magnitude vs. our median AG estimates (bottom panel) for red-clump stars from Bovy et al. (2014). Horizontal dashed lines indicate the intrinsic color and magnitude, whereas diagonal dashed lines indicate the aspired identity relation.

In the text
thumbnail Fig. 9.

Sky distribution in Galactic coordinates (averaged over all distances) of the dust extinction parameters A0 (middle) and R0 (bottom). We indicate some molecular regions of our Galaxy by the rectangles in the top panel (overlay of the gray-scaled A0 map). The maps are centered on the Galactic center, with longitudes increasing toward the left. We only plot the 60° centered on the Milky Way disk in these panels. Zoomed-in views of the highlighted regions are shown in Figs. B.4 and B.5.

In the text
thumbnail Fig. 10.

Distribution of A0 (top panel) and AG (bottom panel) estimates in the Local Bubble (ϖ > 20 mas and ϖ/σϖ > 5) for our median values (black histogram) and maximum posterior values (blue histogram). Numbers quote A0 statistics from our median values.

In the text
thumbnail Fig. 11.

Differences in AG extinction estimates between our best result (top panel) and our median result (bottom panel) compared to values from Anders et al. (2019) (they provide median statistics). Numbers quote various statistics to summarize the differences.

In the text
thumbnail Fig. 12.

Comparison of our extinctions (median A0) to the Bayestar19 catalog AV for a random sample of 925 527 sources in common. The dashed black lines in each plot represent the 1:1 line. Top left: our median A0 compared to Bayestar19 AV. Top right: residuals of Bayestar19 AV – our median A0 as a distribution on sky in Galactic coordinates. Bottom left: same residuals compared to our median R0 estimates. Bottom right: same residuals compared to our median estimate of intrinsic absolute magnitude, Gabs.

In the text
thumbnail Fig. 13.

Residuals of Teff (top panels) and log g (bottom panels) with respect to the literature reference sample (Sect. 4.4; Appendix A) as functions of fit residuals for apparent G (left panels) and parallax ϖ (right panels). The solid lines indicate the median values in each bin, and the shaded regions are the central 68% and 90% intervals.

In the text
thumbnail Fig. 14.

Differences in median Teff (top panels) and maximum posterior Teff (best; bottom panels) compared to literature values (Appendix A, left panels) and StarHorse values from Anders et al. (2019) (right panel). In each panel we quote several statistics that summarize the differences.

In the text
thumbnail Fig. 15.

Differences in our maximum posterior (top panel) and median (bottom panel) log g estimates compared to asteroseismic values from Serenelli et al. (2017) (main sequence stars; blue points) and Yu et al. (2018) (giant stars; red points).

In the text
thumbnail Fig. 16.

Wide binaries from El-Badry et al. (2021) and a comparison of distance moduli (left panel), A0 (middle panel), and log-ages (right panel) for the primary and secondary. Left panel: red points mark pairs where one component is more than 1 kpc away. Such cases are excluded from the number statistics and from all other panels.

In the text
thumbnail Fig. 17.

Distribution of normalized residuals to literature values for Teff (top panel) and log g (bottom panel). Gray curves show a Gaussian that has the same mean and a unit standard deviation for comparison.

In the text
thumbnail Fig. 18.

Distribution of metallicity estimates as a function of age compared to its prior. Each panel shows the posterior metallicity for a random subset of stars in the direction of the Orion star formation using both a uniform and an age-dependent prior. The median estimates are indicated by the point density, and the prior is shown by the solid lines (mean, 1-, and 5-sigma intervals). We note that the prior rejects anything outside the model range (below −2 and above 0.2 dex). The rightmost panel corresponds to our prior.

In the text
thumbnail Fig. 19.

Distribution of the photometric jitter, η, for all 123 076 271 stars in the observed, distance-corrected, and distance-extinction-corrected CMDs, from left to right respectively. The color scale is identical for all three panels. Large values of the jitter indicate where models and data statistically deviate, i.e., mostly in rapid evolution phases. We note that the binary sequence parallel to the main sequence requires some substantial jitter values.

In the text
thumbnail Fig. B.1.

Overview of our analysis procedure on the whole catalog, which contains 123 076 271 stars. The top panels present observed CMDs, with the left and right panels showing the input data and their median predictions, respectively. The lower panels show the inverse parallax distance-corrected CMD and the distance corrections obtained from the AP estimates. The quantities on the y axes of these two panels would be identical in the absence of parallax noise.

In the text
thumbnail Fig. B.2.

Inferred CAMD before (left) and after (right) accounting for the dust extinction for the entire catalog.

In the text
thumbnail Fig. B.3.

Residuals between the observed and inferred CMDs for the 123 076 271 stars of the whole catalog (top panels of Fig. B.1). The residuals are calculated on a 512 x 512 binning scheme (0.01 x 0.03 mag) within the axis ranges and normalized to the predicted counts (i.e., (obs-pred/pred)): red and blue colors indicate overestimated and underestimated counts, respectively. Systematics are discussed in Sect. 4. Stars bluer than what the stellar evolution model predicts pile up in the residuals at BP-RP ∼ 1 mag.

In the text
thumbnail Fig. B.4.

Sky distribution in Galactic coordinates (averaged over all distances) of the dust extinction parameters A0 (left) and R0 (right) of the regions highlighted in Fig. 9. The region’s names are indicated in the top-right corner of each panel. Continued in Fig. B.5.

In the text
thumbnail Fig. B.5.

Continuation of Fig. B.4.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.