Free Access
Issue
A&A
Volume 659, March 2022
Article Number A95
Number of page(s) 24
Section Catalogs and data
DOI https://doi.org/10.1051/0004-6361/202141702
Published online 11 March 2022

© ESO 2022

1. Introduction

The last decades, Galactic astronomy has seen a revolution mainly thanks to the large data sets from various surveys monitoring our Galaxy. In particular, there has been an exponential increase in spectroscopic data where sample sizes have grown from hundreds of stars to several hundred thousands observed principally by multi-object spectrographs. The dedicated Galactic spectroscopic surveys have used this technology delivering a plethora of interesting results on the structure and evolution of the Milky Way, mainly from: the Radial Velocity Experiment (RAVE, Steinmetz et al. 2006), the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST, Zhao et al. 2012), the Gaia-ESO Survey (GES, Gilmore et al. 2012; Randich et al. 2013), the GALactic Archaeology with HERMES (GALAH, De Silva et al. 2015), the Apache Point Observatory Galactic Evolution Experiment (APOGEE, Majewski et al. 2017), and others. Among the surveys, the Gaia space mission (Gaia Collaboration 2016) holds a unique place because it has provided high precision astrometric, photometric, and spectroscopic data for an unprecedented number of sources, for 1.3 billion, 1.6 billion, and 7.2 million sources, respectively.

In order to fully exploit the wealth of data delivered from the above surveys, such as radial velocities (RVs), stellar atmospheric parameters, and chemical abundances, it is essential to understand their properties in detail, as well as their capabilities and limitations. Moreover, each survey focusses on different parts of the Galaxy or region of the parameter space and uses different instruments and data analysis techniques, each having different strengths and weaknesses. A homogenisation process is therefore necessary when combining data sets from various sources to reveal the subtle physical phenomena in our Galaxy, for instance the uniform chemical and kinematic properties of star clusters. As a standard procedure, each survey team compares their output parameters with benchmark samples to understand systematic effects, assess random errors, and evaluate different analysis methods for different resolution regimes, also highlighted in other works (e.g. Lee et al. 2015; Xiang et al. 2015; Deepak & Reddy 2018; Anguiano et al. 2018). However, there are still only a few studies employing homogenisation techniques to merge different literature samples in comprehensive catalogues, and they mostly focus on high-resolution spectroscopy (e.g. Hinkel et al. 2014; Soubiran et al. 2016).

In this proof-of-concept study, we aim at combining radial velocity measurements from the six large surveys mentioned above, merging them in a homogeneous catalogue, the Survey of Surveys (SoS), which will be a valuable tool to the community for data mining studies in our Galaxy and its satellites. In a future study, we will also add atmospheric parameters (effective temperature, Teff, surface gravity, log g, metallicity, [M/H]) and chemical abundances to the SoS. Precise and accurate RVs play an important role in understanding the structure of the Galaxy (e.g. Binney et al. 2000), in detecting substructures in the halo and disc and thus unraveling the merger history of the Galaxy (e.g. Helmi 2020), the kinematics and membership of star clusters (e.g. Kharchenko et al. 2013), the orbits of globular clusters (e.g. Balbinot & Gieles 2018), or the determination of orbital parameters and properties of binaries (e.g. Hełminiak et al. 2017) and planetary systems (e.g. Trifonov et al. 2020).

To achieve our goal, we will first place all surveys on a common (albeit perhaps arbitrary) reference system, defined as the RV zero point (ZP), and remove from each survey any trends with parameters the RVs depend, such as magnitude, Teff, log g, iron metallicity ([Fe/H])1, or signal-to-noise ratio (S/N). Since no survey is completely free from bias, a comparison of the RVs from each survey with each other will reveal its intrinsic biases. Using Gaia as a reference system in this work is advantageous because it is the only survey with a significant number of stars in common with each of the ground based thanks to its sheer size and all-sky coverage (see Fig. 1).

thumbnail Fig. 1.

Surface density distribution in a Mollweide projection of the galactic coordinates of the six surveys used in this work, obtained using a HEALPix (Hierarchical Equal Area isoLatitude Pixelisation) tessellation with different resolutions. The colour scales of the maps are in logarithmic scale and are different for each survey.

In Sect. 2, we describe the data used in this work and in Sect. 3 the cross match analysis of the surveys with Gaia. In Sect. 4, we explore the treatment of the RV errors using the information from the repeated measurements and the three-cornered hat method. In Sect. 5, we perform the internal RV homogenisation process. We present our unified catalogue in Sect. 6 as well as the comparisons with external samples to validate the absolute calibration system. Finally, in Sect. 7, we present the science validation of our results with open clusters.

2. Data sources

The core of the SoS is built on five ground-based spectroscopic surveys and Gaia. Each survey targets different stellar types but all of them have a common goal: to map the kinematics and chemistry of stars across the Galaxy to reconstruct its present-day structure, and past evolution history. In SoS I, we use the data releases (DR) of the surveys presented in Table 1 with their corresponding number of stars. Figures 1 and 2 show the distributions in surface density and magnitude of the considered data sets. In Fig. 3, we present the Hertzsprung-Russell diagram (HRD) of the six surveys used in this work for stars with Gaia photometry available to showcase the stellar populations each survey targets. These plots are made after using the same filters as for the construction of the HRD for Gaia DR2 by Gaia Collaboration (2018a) based on Eq. (1) (see below) and we have included the extinction and reddening magnitudes taken from Gaia DR2 as well.

thumbnail Fig. 2.

G magnitude distribution of the surveys in this work for stars in common with Gaia.

thumbnail Fig. 3.

HR diagrams of the surveys used in this work using Gaia photometry and parallaxes, colour coded to the stellar density in log scale.

Table 1.

Description of number of objects included per survey.

2.1. Gaia DR2

The Gaia DR2 (Gaia Collaboration 2018b) apart from the astrometric parameters (e.g. positions, parallaxes) has provided median radial velocities for 7.2 million stars, in the range of effective temperatures between 3550−6900 K. The spectra have been collected by the Radial Velocity Spectrometer (RVS) on board, with resolution R (=λλ) ≃ 11 500 over the wavelength range 8450−8720 Å centred on the Calcium triplet. Each observed star is visited many times and the RVs are measured for each transit, as derived by a series of modules that compare the three CCD spectra corresponding to each field of view transit with a template of synthetic spectra (Sartoretti et al. 2018). The DR2 contains the median value from the multiple visits. The properties and validation of the Gaia RVs are described in detail in Katz et al. (2019). The authors indicate that the Gaia radial velocity differences with respect to the ground-based surveys do not exceed 0.25−0.30 km s−1 and that the Gaia RVs show a positive trend as a function of magnitude (see Sect. 5). The overall precision estimated from the Gaia RV uncertainties is 1.05 km s−1 and mainly depends on the effective temperature and magnitude. The targets with available RVs cover the 4−13 G magnitude range (see Fig. 2). The Gaia DR3 will include updated radial velocities with atmospheric parameters to be published in 2022. Throughout this paper we use the Gaia DR2 source ID which should be treated independently from the DR3 even if the changes between the IDs in the catalogues are at the 2−3% of stars (Torra et al. 2021).

We apply quality filtering criteria used for the RV calibrations in Sect. 5 based on Gaia photometry suggested by Evans et al. (2018) and Arenou et al. (2018) to mitigate astrometric calibration problems and also contamination from double stars:

(1)

where GBP is the magnitude from the Gaia Blue Photometer (BP), GRP the magnitude from the Gaia Red Photometer (RP), E the phot_bp_rp_excess_factor2, χ2 is the astrometric goodness-of-fit and, ν the number of good observations.

We further flagged Gaia sources with potentially spurious RV estimations as indicated by Boubert et al. (2019) due to possible contamination by nearby objects3 (see Sect. 6 for flags). The latter stars are flagged with the worse quality flag and comprise ∼1%. Gaia DR2 does not contain duplicate sources.

2.2. APOGEE DR16

The APOGEE and its successor APOGEE-2, is a high-resolution (R ≃ 22 500), high signal-to-noise (S/N > 100 per half resolution element) spectroscopic survey using the 2.5 m Sloan telescope in the northern hemisphere and the du Pont telescope at Las Campanas Observatory in the southern hemisphere. The survey operates in the near-infrared H-band, targeting mainly red giants.

In this work, we use the APOGEE DR16 which is the first release to include data from APOGEE-2, and therefore data from across the entire sky (Ahumada et al. 2020). The spectroscopic analysis of the 473 307 spectra from 437 303 unique stars observed in different telescope configurations is described in Jönsson et al. (2020). APOGEE provides RVs calculated mainly in two ways: (i) the RVs were determined by cross-correlation iteratively of each visit spectrum against the combined observed spectrum since most stars are observed multiple times, and (ii) by cross-correlation against a best-matching synthetic template. The final RVs are selected as the ones to provide the smallest scatter in the individual RVs from either of the above methods (VHELIO_AVG in the DR16).

We applied the following filters to the APOGEE data to exclude stars from the calibration routines in Sect. 5 but they are later calibrated given different quality flags. Our criteria exclude stars which exhibit significant differences in the RVs from the synthetic template and from those from the combined template (STARFLAGSUSPECT_RV_COMBINATION) for which we assign the lowest RV quality flag. We also exclude spectra with low S/N (< 5) and erroneous stellar parameters (STARFLAGLOW_SNR, ASPCAPSTAR_BAD) assigning them with an intermediate RV quality flag. After this filtering, we retained about 88% of the measurements with the highest RV quality flag.

2.3. GALAH DR2

The GALAH survey uses the High Efficiency and Resolution Multi-Element Spectrograph (Sheinis et al. 2015) at the Anglo-Australian Telescope in high resolution (R ≃ 28 000) with four discrete optical wavelength channels: 4713−4903 Å, 5648−5873 Å, 6478−6737 Å, and 7585−7887 Å. The GALAH survey is an on-going spectroscopic survey to observe one million stars in the V magnitude range 12−14 mag across the southern sky. The data products of DR2 include RVs, stellar atmospheric parameters and chemical abundances for 32 different elements (Buder et al. 2018).

The RVs of GALAH DR2 are derived following the methodology of Zwitter et al. (2018) for 342 682 stars. For the calculation of the RVs, stacked median observed spectra were used from multiple visits to build a reference library already shifted to rest wavelength. Then, for each star, its RV is obtained by comparison with the reference library via cross-correlation which is further improved by comparing with synthetic spectra from models which account for 3-dimensional convective motions in the stellar atmospheres. The GALAH DR2 does not contain any duplicate entries. The best quality stellar parameters are derived from their spectral analysis pipeline, CANNON (Buder et al. 2018) (flag_cannon = 0), which we used for the RV calibrations in Sect. 5, retaining 78% of the stars. We note that the cleaner sample is used for calibration purposed but we provide calibrated RVs for the whole sample with a lower RV quality flag (see Sect. 6). The typical accuracy of the GALAH RVs is estimated at 0.1 km s−1 (Zwitter et al. 2018). We note that the third release of GALAH has just been published (Buder et al. 2021) to include observations for the K2 and TESS follow-up programmes as well as other ancillary observations, increasing the sample by 30%.

2.4. Gaia-ESO DR3

The Gaia-ESO survey is a large public survey (Gilmore et al. 2012; Randich et al. 2013) carried out at the ESO Very Large Telescope (UT-2 Kueyen) with the FLAMES multi-object instrument (Pasquini et al. 2002). The survey has obtained high-quality spectra with the GIRAFFE spectrograph in different wavelength ranges depending on the spectrograph setting used (R ≃ 16 000−25 000). The brighter stars however, have been observed with UVES (R ≃ 45 000) and correspond to about 10% of the total sample. The GES targets cover a wide range of properties, from dwarfs to giants, from O to M stars focusing on relatively faint stars (mainly V > 16 mag), for which Gaia will not be able to provide accurate RVs and abundances.

The RVs from GIRAFFE spectra are derived based on cross-correlation with a grid of synthetic spectra to obtain an initial RV estimate and then by direct spectral fitting with a polynomial (Koposov et al. 2011). The RVs from UVES on the other hand, are derived via a standard cross-correlation method with a grid of synthetic template spectra at a range of temperatures, metallicities and gravities (Sacco et al. 2014).

The best RV precision reached for the majority of the GES spectra from GIRAFFE spectrograph is 0.22−0.26 km s−1 for stars with low rotational broadening and large S/N and dependents on instrumental configuration in a study of repeated RV measurements of the same stars (Jackson et al. 2015). The typical RV error from the UVES spectra is 0.40 km s−1 (Sacco et al. 2014). The number of stars with available RVs in GES DR3 is 25 533 and we did not apply any further quality filters for this survey. We obtained the GES DR3 from the public ESO archive4.

2.5. RAVE DR6

We use the final data release of RAVE (Steinmetz et al. 2020a) which is magnitude-limited (9 < I < 12) and located in the southern hemisphere. The RAVE was set to use the 6dF multi-object spectrograph at the UK Schmidt Telescope from 2003 to 2013. The medium-resolution spectra (R ≃ 7500) cover the Calcium triplet region (8410−8795 Å), a very similar wavelength range as the Gaia RVS.

The RAVE DR6 consists of, among other products, measurements of radial velocities, stellar atmospheric parameters, chemical abundances, and cross matches with other relevant catalogues for 518 387 observations of 451 783 stars (Steinmetz et al. 2020a,b). The RVs are derived with the pipeline SPARV which matches the observations to a grid of synthetic spectra with a standard cross-correlation algorithm in a two step process (see details in Zwitter et al. 2008). The internal RV error distribution peaks at around 1.0 km s−1.

Steinmetz et al. (2020a) suggest a set of quality criteria to select a clean sample of radial velocities, namely selecting stars based on (i) the zero-point correction applied to radial velocity (|correctionRV| < 10 km s−1), (ii) the RV error (hrv_error_sparv < 8 km s−1), and (iii) the Tonry–Davis correlation coefficient (correlationCoeff > 10). These selection criteria define the core sample of the survey as indicated in Steinmetz et al. (2020a). The RV measurements after this filtering reach 69% which are set with the lowest RV quality flag.

2.6. LAMOST DR5

The LAMOST is a national scientific research facility operated by the Chinese Academy of Sciences in low-resolution (R ≃ 1800) in the optical wavelength range (3650−9000 Å) and covering the northern hemisphere. The LAMOST Experiment for Galactic Understanding and Exploration (LEGUE) is an ongoing Galactic survey with a current sample of more than five million stellar spectra (Deng et al. 2012) and is the largest survey considered here besides Gaia. LAMOST DR5 includes around 9 million objects but 5 348 712 of them consist of stars with published parameters, including 85 845 A-type stars, 1 694 182 F-type stars, 2 739 467 G-type stars and 829 218 K-type stars5.

The LAMOST pipeline (Luo et al. 2015) measures the RVs using the cross-correlation method. The pipeline recognises the stellar spectral classes and simultaneously determines the RVs from the best-fit correlation function between the observed spectra and the template. The RV error distribution of LAMOST peaks around 5 km s−1.

We adopted for LAMOST the quality cuts recommended by Luo et al. (2015). In particular, we select spectra with S/N > 15 where according to the survey lie the reliable parameters and exclude stars with negative RV errors. Hence, 8% of the total sample which do not satisfy the above criteria have the lowest RV quality flag.

3. The cross match with Gaia

An essential part of this work is to find which stars from the spectroscopic surveys are the counterpart sources in Gaia. The algorithms for the cross match (XM) of the Gaia DR2 astrometric data with external catalogues are described in Marrese et al. (2017, 2019)6. The defined algorithms are positional and exploit the enormous number of Gaia sources with accurate positions, proper motions, and parallax measurements on an object-by-object basis. The XM is not only a source-to-source problem but also a local one, meaning that the neighbourhood around the possible match is investigated by assigning probabilities to all neighbours based on their angular distances, but also on the local surface density of the external catalogue to choose the best neighbour among them.

The XM algorithm is slightly different for large dense surveys, and for sparse catalogues. The ground-based spectroscopic surveys in this work are sparse, meaning that the XM treats them as the leading catalogues and Gaia as the second. In this case, a given object in the leading catalogue is matched with all nearby objects in the second catalogue whose position is compatible within position errors with these nearby targets defined as neighbours. When a single neighbour is found, it is the counterpart that is the best neighbour (case 1 in Fig. 4). When more than one neighbour is found, the best neighbour is selected according to a figure of merit (case 3 in Fig. 4).

thumbnail Fig. 4.

Sketch of four possible scenarios for the XM algorithm. Case 1: one to one match for isolated sources. Case 2: two stars have the same match which in most cases it is a duplicated source. Case 3: the best match is selected from a neighbourhood. Case 4: wrong match possibly because the right one is missing.

If two or more objects from the leading catalogue are matched to the same object in the second, then these sources are referred to as mates. For sparse catalogues as the spectroscopic ones in this work, mates are not allowed because a one-to-many match is forced since Gaia has a higher spatial resolution. Blends therefore, are not expected as opposed to dense catalogues where Gaia with its higher resolution is the leading catalogue. Mates for sparse catalogues are usually duplicate sources from repeated measurements and are labelled with the same identification in the leading catalogues (case 2 in Fig. 4). APOGEE, LAMOST, and RAVE have reported duplicates but GALAH and GES provide only unique sources. There are cases however, in which two or more objects have the same Gaia match but are not identified as the same source in the original catalogue. This implies that either the survey mislabelled one of the sources and they actually are duplicates (case 2 in Fig. 4), or there is a missing source in Gaia and one of the objects has no counterpart (case 4 in Fig. 4). We expect the majority of the mates found nevertheless to be mislabelled duplicates.

We assess the cases where the XM has found mates and investigate if these sources are in fact duplicates or problematic matches (case 2 in Fig. 4). While the XM considers five parameters (right ascension, declination, parallax, and proper motions), we utilise here additional information on the magnitudes, and radial velocities for the classification of the possible duplicates. For each pair of possible duplicates, we calculate their differences in magnitudes and in radial velocities to determine the Mahalanobis distance (MD, Mahalanobis 1936) of the above parameters as a metric for defining outliers. The MD is a multiple regression generalisation of one-dimensional Euclidean distance and is described by:

(2)

where Xi are the paired differences in magnitude, and paired differences in radial velocities, is the mean values of the above parameters, and is the covariance matrix. If the MD for a pair is smaller than a threshold, then the mates are in fact the same star that is a duplicate. On the other hand, any outliers found indicate a possible mismatch and are flagged. The distribution of the MD2 is known to be chi-squared () with n degrees of freedom, n = 2 in this case (Gnanadesikan & Kettenring 1972). Then, the adopted rule for identifying outliers is the % quantile (q) of the and observations are flagged as outliers if MD2 > .

We confirm that most of mates identified from the XM are indeed duplicates (see Table 2) and we update their identification name in each survey to be treated as duplicates for the rest of the analysis. The initial identification is also kept in a different column for completeness. As a threshold we used different criteria for the cut-off: the 99% quantile, the 97.5% quantile, and a 3σ (standard deviation) as a more conservative cut-off. Even for the most conservative case, 82−97% of the mates appear as duplicates in all surveys. The problematic matches are flagged as mismatches in a separate column (flag_xm) and also stars we could not provide a result due to lack of any of the parameters (either magnitude or radial velocity) for the calculation of the MD are flagged as possible_mismatch. For this analysis, we selected the 97.5% cut-off which appears in the middle of the three. The numbers of true duplicates and problematic matches are shown in Table 2.

Table 2.

Results from the analysis of the duplicated sources.

As we mentioned before, during the XM, the spectroscopic catalogues are used as the leading ones and Gaia as the second where a one-to-many match is forced. Because this process evaluates the environment, the XM defines for a given object in the leading catalogue, neighbours as nearby objects with positions compatible within errors in Gaia and are provided in a separate table for checking purposes. When more than one neighbour is found, the best neighbour that is the most probable counterpart according to a figure of merit, is chosen among them. Even though in most cases this process is flawless, there are some reasons why the selection of the best match can be problematic. For instance, in cluster regions the counterpart may not exist in Gaia and another star close enough could be matched instead (case 4 in Fig. 4).

We therefore, want to further evaluate the selection of the best match derived from the XM algorithm by using additional parameters than positional. We divide each catalogue into two categories: (i) stars with only one match to Gaia that is a star with no neighbours (case 1 in Fig. 4) or (ii) stars where the Gaia match is selected from a neighbourhood of possible matches (case 3 in Fig. 4).

3.1. Stars with no neighbourhood

First, we analyse stars which are assigned to a unique match in Gaia. This category is comprised of the majority of the stars in each catalogue (95% for APOGEE, 98% for GALAH, 93% for GES, and 97% for LAMOST). We use photometry, namely the magnitudes, to verify the XM selection. We convert the magnitudes of each survey to the G mag system using the conversion functions of Evans et al. (2018) for the range of magnitude applicability.

If the match is correct, then the G mag from Gaia DR2 source should agree with the converted G mag from the source in the leading catalogue within 3σ. We show the difference in G magnitudes between the Gaia best match and the survey object in Fig. 5. We select the σ to be magnitude dependent (calculated per bin) so that we do not exclude the wings of the distribution. We assume that the stars outside the 3σ (black lines) in Fig. 5 are the outliers due to the mismatch. The magnitude comparisons reach up to significant differences, typically higher than the expected ones from the photometric catalogues where the magnitudes are taken. Apart from the mismatches from the cross match of the catalogues, other effects could play a role here so their identification should be taken with caution. In particular, here we compare the G mag converted from photometric transformation of the spectroscopic catalogues with a precision of the transformation set by the σ of 0.369 mag (Evans et al. 2018). We thus, expect some stars such as variable stars or stars with high photometric errors naturally to show high differences.

thumbnail Fig. 5.

G magnitude difference between Gaia match and the surveys: APOGEE, GALAH, GES, LAMOST, and RAVE respectively. The plots are colour coded to the stellar number density. The horizontal black lines indicate the ±3σ threshold for outliers per magnitude bin (10 bins in total).

Nevertheless, the fraction of the possible mismatches from the stars with available photometry is very small: 0.9% for APOGEE, 0.5% for GALAH, 0.8% for GES, 1.4% for LAMOST, and 0.6% for RAVE. For these stars, we raise a flag and assign them as mismatches (flag_xm = mismatch).

3.2. Stars with neighbourhood

The number of stars with multiple neighbours in Gaia is less than 7%, occurring mainly in crowded fields such as in the bulge or in stellar clusters. We investigate if the best match selected among the neighbours is indeed the correct one, or if any of the neighbours in Gaia could be better suited. We calculate the G magnitudes of the sources in the leading catalogues using the same conversions as previously. If the difference in G mag between the survey star and the neighbour in Gaia is smaller than of the best match, then the neighbour is now considered the best. In these cases, we flag the stars as mismatches (flag_xm = mismatch). The fraction of the possible mismatches from the stars with neighbourhood is comparable to the previous test: 4.1% for APOGEE, 4.9% for GALAH, 4.6% for GES, and 4.9% for LAMOST. The results of the above tests show a robust confirmation of the efficiency of the XM. We are confident that most of our stars are matched properly with the Gaia DR2.

4. The RV error analysis

In this section, we normalise the internal RV errors of each survey to make them homogeneous across surveys. Homogenising the errors allows us to combine the surveys using weighted averages to derive final RVs for stars observed in more than one survey. This process is necessary because each survey has heteroscedastic errors which should infer its precision and accuracy. For surveys with duplicated sources available (APOGEE, RAVE, LAMOST), we use the repeated RV measurements to evaluate the internal errors, while for surveys with few or no duplicates (Gaia, GALAH, GES), we use the three-cornered hat method. homogenisation of the errors is not an easy task because we have to account for the systematic and random errors of each survey. We focus here on the random component of the errors since they can be treated statistically as opposed to the systematic.

4.1. Error normalisation from repeated measurements

The analysis of the repeated RV measurements will give us insights on the precision of each survey. We have a satisfactory number of duplicates from APOGEE, RAVE, and LAMOST for which we calculate the paired differences of the repeated measurements and show the corresponding statistics in Table 3. GES has only 62 multiple observations and GALAH has none. These surveys will be treated differently in the next section but we also include GES here for comparison. The results in Table 3 arise from basic filtering we mention in Sect. 2 which affects mainly APOGEE and RAVE. APOGEE and GES exhibit the smallest Median Absolute Deviation (MAD) and therefore, the highest precision which is expected as these surveys operate in higher resolution compared to RAVE and LAMOST. In fact, the MAD decreases with increasing spectral resolution.

Table 3.

Statistics for the paired RV differences from the repeated measurements.

If we assume that each observed RV is a random measurement, it will follow a Gaussian distribution centred on the true RV with dispersion given by the RV uncertainty. The difference between two repeated independent measurements, ΔRV = RV1–RV2 with σRV1, σRV2 their respective errors, follows a Gaussian distribution centred on zero with a dispersion given by σRV = . In case the RVs and their related uncertainties are well determined, the distribution of ΔRV normalised to their errors should be a Gaussian with zero mean value and dispersion of one. Any deviation from unity could mean that the errors are over- or under-estimated. Figure 6 shows the normalised ΔRV distributions for the repeated measurements. We select the normalised MAD from Table 3, less sensitive to outliers, as a weight unit factor (normalisation factor) to multiply to the survey errors for their normalisation hereafter. The MAD is the best choice in our case because we do not filter strictly the surveys for outliers and therefore, we expect to have RVs that deviate a lot from the exact RV value with the goal to correct them in the proceeding steps. GES and RAVE have MAD around unity but LAMOST has overestimated errors by a factor of ∼2.5. APOGEE, on the other hand, has underestimated errors by a factor of ∼6. The underestimation for the APOGEE errors is also demonstrated in Cottaar et al. (2014) but in their case is estimated by at least a factor of 3. We note that the distributions are not fully Gaussian but have extended tails represented better by Cauchy–Lorentzian distributions (see Gaussian fits in Fig. 6). Even though we have an estimation of the error realisation for GES, it may not be representative for the whole survey as it only relies on 31 ΔRV measurements for 62 stars.

thumbnail Fig. 6.

Distributions of the RV paired differences of the stars with multiple measurements in each survey normalised to their errors (solid lines). The Gaussian fits are plotted with dotted lines. The statistics of these distributions are given in Table 3.

4.2. Error normalisation from the three-cornered-hat method

To estimate the random errors for the surveys with few or no duplicates, such as GES, GALAH, and Gaia, we use the three-cornered-hat (TCH) method. This method has been developed to investigate the frequency stability of atomic clocks (e.g. Gray & Allan 1974), and was applied for noise analyses of various data, in particular, astronomical and geodetic time series (e.g. Malkin 2013a). The TCH method is applied to three independent data sets, in our case to different catalogues, which are described by the following system assuming they are uncorrelated:

(3)

with the solution:

(4)

where are the unknown RV variances we want to determine for each, i, catalogue: with the true RVs also unknown, and are the observed RV variances of the paired differences between the i and j catalogues: . Unfortunately, the method can be problematic because it can produce negative variances if the data under investigation are correlated. For the cases we consider here, we did not face this problem which means that convariances are very small.

To derive the expected σGALAH with the TCH method, we use the combination of surveys with the most stars in common: APOGEE, LAMOST, and GALAH (210 stars). Before we calculate the variances of the RV paired differences (), we correct for the ZP differences between the catalogues. Following Eq. (4), we calculate σGALAH = 0.23 km s−1. To obtain the error normalisation factor, this value has to be compared with the formal RV errors of GALAH, which have a median value for this sample of 0.12 km s−1. We therefore, demonstrate that the normalisation factor of GALAH with the TCH method is 2.0 which means that the errors in this survey are underestimated by this factor.

Similarly for GES, we use the combination of surveys with the highest number of stars in common: APOGEE, RAVE, GES. Unfortunately, the number of stars in common with available RVs is only 28 and corresponds to a σGES = 0.38 km s−1 with median error of 0.56 km s−1. The error normalisation factor from the TCH method is therefore 0.7 which is slightly lower than the value we obtain from the duplicates of the previous section. It is difficult to estimate which of the two methods is more accurate because both are based on small samples. Therefore, we use the average value of both which is 0.8, still close to unity for GES.

For Gaia, we have the advantage to have more combinations to infer the normalisation factor with adequate number of stars: (APOGEE, LAMOST, Gaia), (APOGEE, RAVE, Gaia), (LAMOST, RAVE, Gaia). Following the same method for each of the three sets, we find σGaia = (1.03, 0.77, 0.90) km s−1 with their median errors of (0.87, 0.57, 0.48) km s−1 which leads to an average factor of 1.5 used for Gaia. We summarise the normalisation factors for all surveys in Table 4.

Table 4.

Summary of the error normalisation factor for all surveys.

5. The RV homogenisation process

Once the previous steps are concluded, we can compare the RVs of the stars in common with Gaia and calculate the statistics of their RV differences (ΔRV = RVGaia–RVsurvey). The results in Table 5 show the ZP with the highest systematic offset and the highest standard deviation being from LAMOST. We consider the median ΔRV as the ZP in this work. The σ and MAD depend on the spectral resolution as expected. The ΔRV distributions are shown in Fig. 7. In this section, we split our calibration methodology in two parts. First, we calibrate the Gaia RVs to an arbitrary frame, to show no trends as a function of the main parameters the RVs depend on compared to the higher resolution ground-based surveys (Sect. 5.1) and second, we calibrate the ground-based surveys for their intrinsic RV trends (Sect. 5.2).

thumbnail Fig. 7.

Histograms of the RV differences computed as: ΔRV = RVGaia–RVsurvey. The statistics of these distributions are given in Table 5.

Table 5.

Statistics of the ΔRV (=RVGaia–RVsurvey).

5.1. Gaia RV calibration

Figure 8 shows the ΔRV as a function of various parameters the RVs depend on. The trends of ΔRV as a function of G mag, metallicity, and Teff are almost identical for most surveys. The ΔRV deviations are higher for fainter stars (G > 12 mag) and for hotter stars (Teff > 6000 K). We also notice a negative trend with metallicity. In this section, we investigate these trends assuming that if these correlations are present in most surveys, in particular the ones with the highest resolution, then they must originate from Gaia and our goal is to eliminate them.

thumbnail Fig. 8.

Initial RV differences of stars in common with Gaia as a function of: G magnitude, RV, Teff, log g, iron metallicity, and signal-to-noise ratio of the surveys. The S/N is scaled for visual convenience and GES does not provide S/N measurements. The RV differences are binned to contain more than 10 entries for each bin.

5.1.1. The G magnitude trend

Concerning G magnitude, there is a clear trend of ΔRV with G mag observed in Fig. 8 for all surveys even in the case of LAMOST that shows a large ZP offset and significant deviations at the extremes of the magnitude range. The positive trend of ΔRV as a function of magnitude is also demonstrated by Katz et al. (2019) for the validation of Gaia RVs in comparison to ground-based surveys, pointing out that the trend begins at G ≃ 9 mag and reaches 0.5 km s−1 at G ≃ 11.75 mag. We confirm this and show that the trend extends to fainter stars reaching differences of around 0.7 km s−1 (see Fig. 9). We calibrated the trend of the Gaia RVs with G magnitude, by fitting a second order polynomial with least squares using the normalised errors as weights of the form:

(5)

thumbnail Fig. 9.

Calibration of Gaia RVs for stars in common with APOGEE, GALAH, GES, and RAVE as a function of G mag derived by fitting a second degree polynomial (Eq. (5)). The blue and red points represent the median ΔRV of each bin with > 10 entries before and after the calibration respectively. The grey points are the binned ΔRV of the fit. The blue and red shadowed areas are the MAD of each bin. The bottom panels of each plot show the number of stars in each bin.

The selection of this form is empirical and a linear function would not fit properly the faintest stars. For the fitting process, we apply the filtering criteria for the Gaia DR2 photometry suggested by Evans et al. (2018) and Arenou et al. (2018) based on Eq. (1). We fitted the ΔRV–G mag function with the weighted least squares method for each survey separately and ensure that indeed the trends observed from the different surveys are compatible with each other and thus, these biases can be attributed in the Gaia data sets (see Fig. 9). Then, we define the global coefficients of Eq. (5) from their weighted mean to correct all RVs from the entire Gaia DR2 data set even for the stars that are not in common with the ground-based surveys. We exclude the LAMOST from the process of calculating the global coefficients since this survey is carried out in lower resolution than Gaia and other trends (see next section) intrinsic to the survey could weaken the genuine trend in the Gaia data set. The best-fit and final coefficients of Eq. (5) are presented in Table B.1. The errors of the best-fit coefficients are the standard errors which correspond to the square-root diagonal values of the covariance matrix of the fit. The correction successfully removes the trend in magnitude when comparing Gaia with APOGEE, GALAH, GES, and RAVE as shown in Fig. 9. This calibration covers the full Gaia magnitude range 4−16 mag (see limits of our calibrations in Table 6) and does not alter the ZP because the surveys are shifted for their ZP with respect to Gaia before the fitting process. The errors for the dependent values, ΔRV, are calculated from the covariance matrix including all variances (non-diagonal terms) and covariances re-scaled by the χ2 of the fit (e.g. Bevington 1969). The confidence interval we select is the probability of 95% to find the true value. Then, the errors for the calibrated Gaia RVs, δRVcalib, arise from quadratic sum of the aforementioned errors and the normalised errors of Gaia (see Sect. 4): δRVcalib = (δΔRV2 + δRV)1/2. We use the same method for the error calculation of the least squares fits throughout this paper.

Table 6.

Parameter space covered by all the Gaia internal RV calibrations computed in Sect. 5.1.

5.1.2. The iron metallicity trend

Apart from the magnitude trend in Fig. 8, we demonstrate that all surveys apart from RAVE, show a linear correlation of ΔRV with metallicity predominantly from around −2.0 to 0.5 dex. The ΔRV–[Fe/H] trend is fitted with weighted least squares and has the following form:

(6)

Interestingly, RAVE shows the opposite trend from the rest of surveys suggesting that this bias is related to RAVE itself. Before applying a linear fit, we have to ensure that metallicities of all surveys are on the same scale because we do not want to propagate errors from the metallicity discrepancies into the Gaia RVs. We thus compare the metallicities of the surveys in pairs and discover that metallicities for stars in common agree very well within 1σ (< 0.11 dex) apart from RAVE which shows a trend7, with σ ≃ 0.2 dex (see Fig. A.1). The comparison metallicity plots between the surveys are in Appendix A. We note that for RAVE we used the iron metallicities from the GAUGUIN pipeline (see references in Guiglion et al. 2016) filtered for the clean sample as suggested by the latest release of RAVE (Steinmetz et al. 2020b). We apply a calibration to the RAVE metallicities to scale them with the high resolution surveys (see Appendix A) but the negative ΔRV–[Fe/H] trend still remains strong, indicating that this trend arises from the RAVE RVs which we further investigate in the next section. Steinmetz et al. (2020a) also notice this tendency for the RV shift between RAVE and Gaia DR2 with overall metallicity ([M/H]) amounting to 0.5 km s−1 differences between metal poor ([M/H] < −1.0 dex) and metal rich stars ([M/H] > 0.0 dex) which is similar to what we observe here.

RAVE is not included in the fits to obtain the ΔRV–[Fe/H] calibration coefficients for Eq. (6). LAMOST is also excluded because it operates in low resolution. The best-fit coefficients of Eq. (6) are presented in Table B.1 and the limits of the calibration in Table 6. For the final coefficients, we use again the mean weighted coefficients shown in Table B.1 and present the results in Fig. 10 for the higher resolution surveys. Even though RAVE and LAMOST are not included for the calculation of the correlation coefficients, we still use their metallicities to calibrate the Gaia RVs of stars in common with Gaia from Eq. (6) since we have demonstrated that their metallicities are in agreement with the other surveys as well.

thumbnail Fig. 10.

Calibration of Gaia RVs for stars in common with APOGEE, GALAH, and GES as a function of iron metallicity. The colours and symbols are described in Fig. 9.

The metallicity trend is also observed in the Gaia DR2 validation of the radial velocities for RAVE but not for APOGEE (Katz et al. 2019) as we see in this work. A reason may be that in their comparison there is a lack of metal poor stars in common with APOGEE (their Fig. 12).

5.1.3. The Teff trend

The last trend we notice is the ΔRV increase as a function of effective temperature for Teff > 5900 K for all surveys but with different rates (see upper right panel of Fig. 8). This correlation is more difficult to investigate because the different Teff scales of the surveys could play a role. Moreover, since hotter stars have fewer lines in their spectra, their parameters are more difficult to be precisely obtained via spectroscopy than solar-type stars for instance. These stars could also exhibit higher rotational velocities impacting the determination of both their RVs and Teff.

To investigate to what extend this trend arises from Gaia RVs, it is important to check first if the Teff scale among the surveys is compatible and if the ΔRV–Teff appears when comparing the ground-based surveys themselves. From a comparison of the Teff between the surveys, we notice that their differences are significant for the lowest resolution surveys, RAVE and LAMOST (see Appendix A). The Teff scale between APOGEE and GALAH is in agreement. Moreover, by comparing the RVs of APOGEE and GALAH as a function of Teff, they appear flat. For GES, we do not have enough stars in common for the hotter stars to draw conclusions. The fact that APOGEE and GALAH agree on the temperature scale and their ΔRV–Teff plots are flat but in comparison to Gaia they show deviations, it is an indication that there could be actually some dependence between Gaia RVs and Teff. In case of APOGEE, the ΔRV reaches 0.5 km s−1 and for GALAH around 1.0 km s−1 for the hotter stars.

Because the accuracy of the Teff for the hotter stars is low, we follow a conservative approach here. We apply a second degree polynomial to account for this bias but using only APOGEE stars:

(7)

This is equivalent to assuming that the APOGEE scale is the true one, and that the Teff trends are caused by Gaia can thus be estimated from the comparison with APOGEE. If this assumption is wrong, we would have biased the SoS RV behaviour as a function of Teff by up to 0.5 km s−1 for the hot stars (but see Sect. 6.2, where we compare with external catalogues).

The best-fit coefficients of Eq. (7) are shown in Table B.1 and their limits in Table 6. The results of the APOGEE calibration are presented in Fig. 11. The coefficients of Table B.1 are used to correct the Gaia RVs using the Teff of each survey. The effective temperature of LAMOST shows significant discrepancies compared to the other surveys with same trends, therefore, we correct for it with a linear polynomial shown in Fig. A.2. The calibrated Teff scale for LAMOST is used in Eq. (7) to calibrate the RVs for stars in common with Gaia.

thumbnail Fig. 11.

Calibration of Gaia RVs as a function of Teff for APOGEE stars in common with Gaia to calibrate the Gaia RVs. The colours and symbols are described in Fig. 9.

We point out that the Gaia RVs in DR2 are computed for stars in the Teff range of 3550−6900 K because of degraded performance of the RVs outside this range and the restricted grid of templates. Therefore, the deviations for the hotter stars we see here are expected. Katz et al. (2019) mention as well that the Gaia RV precision is a function of Teff among other parameters in the same direction we see here. Another reason for such trend can be attributed to the increasing dominance of broad Paschen lines in emission in the Calcium triplet region of Gaia and RAVE. The final errors of the Gaia RVs come from the propagation of the three individual corrections of the fits quadratic summed to the original error.

5.2. Surveys RV calibration

After the investigation and correction of the possible biases the Gaia RVs may suffer, we study the remaining trends of the RV differences between Gaia and the surveys caused by the internal biases of each survey itself. These trends are more pronounced for the surveys with lower spectral resolution than Gaia, such as RAVE and LAMOST. Once the previous dependencies caused primarily by Gaia are minimised, we apply a multivariate correction of RVs as a function the most relevant parameters: Teff, log g, [Fe/H], and S/N to bring these surveys to the corrected Gaia RV scale. The shape of the function depends on the behaviour of each survey with respect to Gaia. For instance, GALAH and LAMOST show an evident second order dependence on Teff while the others do not. In the case of LAMOST, it is necessary to add a linear term for RV above and below 100 km s−1 (see the top centre panel of Fig. 8), while the RVs outside these limits are only corrected for the ZP. Finally, GES does not provide unique S/N measurements because their results are based on the star-by-star combination of varying sets of spectra, taken with different instruments (UVES and GIRAFFE) and in different combinations of spectral ranges. We have the following general and empirical function for the RV calibration for all surveys:

(8)

For a more detailed analysis, we split the stars of the surveys into sub-giants and dwarfs (log g > 3.5 dex) and giants (log g < 3.5 dex). We do that because we notice that dwarfs show a linear correlation in the ΔRV–log g plot whereas the function for giants appears flatter (see the bottom left panel of Fig. 8). The division of the samples into dwarfs and giants at least in the cases of GALAH and APOGEE produced better least-squares fits in terms of their χ2 but there was no significant difference in the overall statistics of their median RV differences. The LAMOST sample was further split to high and low Teff (with the limit of 6200 K, see the top right panel of Fig. 8) since the temperature correlation could not be fitted properly due to the fact that LAMOST contains an enormously larger number of cool stars. The fits are performed for the filtered samples shown in the Appendix B but the correction is applied to the whole surveys with the corresponding coefficients in Table B.2.

The ZP of all surveys is now shifted to the new Gaia reference. The final corrected statistics for the whole samples are shown in the bottom part of Table 5. The spreads are now much smaller compared to the initial samples indicating better agreement between the surveys. The comparison of the Gaia RVs with the calibrated ones from the surveys is shown in Fig. 12. In Appendix B, we also show the paired RV calibrated differences between surveys (see Fig. 13).

thumbnail Fig. 12.

Calibrated RV differences of stars in common with Gaia as a function of: G magnitude, RV, Teff, log g, iron metallicity, and signal-to-noise ratio of the surveys. The symbol are the same as Fig. 8.

thumbnail Fig. 13.

Distribution of the normalised errors for all surveys.

6. The SoS catalogue

Once we have the calibrated RVs from the previous steps, we merge all surveys together into a unified catalogue in a two step process. First, we get the weighted mean RV and the corresponding weighted error for the duplicate sources in each catalogue to obtain a unique entry for each star in each survey8 Second, for stars in common across the surveys, we compute the best RV estimate from a weighted average using the normalised RV errors of each survey as weights.

The final catalogue contains around 11 million unique stars which amounts to the largest catalogue of RV measurements published so far. In this catalogue, we have combined around 5.1 million stars with RVs from the ground-based spectroscopic surveys and 7.2 million stars with Gaia RVs. There are only ∼52 000 stars observed solely by the ground-based surveys and do not have a counterpart in Gaia. Almost half of the SoS (52%) contains RVs only from Gaia while the rest is combined with ground-based spectroscopy.

We provide a quality flag to facilitate the scientific exploration of the SoS catalogue described as flag_rv:

  • flag_rv = 0. These stars have passed the filtering criteria of each survey (Sect. 2) and have reliable Teff, log g, [Fe/H], and available S/N (apart from GES) for all calibrations in the previous sections. These stars amount to 4.1 million in the SoS (≃40%).

  • flag_rv = 1. Stars which do not have any of the parameters available for the RV calibration (Teff, log g, [Fe/H], and S/N) are shifted only by the ZP to the calibrated Gaia reference. Also, this flag represents stars with parameters outside the calibration limits. These stars amount to ≃8% of the SoS.

  • flag_rv = 1.5. The stars with RV measurements obtained only from Gaia DR2 have their RV calibrated only for their photometric magnitudes and amount to 5.8 million stars.

  • flag_rv = 2. Finally, there are a few stars with problematic RVs as defined from the corresponding surveys (≃1%). These stars have lower quality measurements that should only be used with care. We do not use them in the validation and the analysis in the next Sections, but they were included in the catalogue for completeness. Among these, we included 70 365 sources from Gaia DR2 that were indicated as potentially contaminated by Boubert et al. (2019).

In this first version, SoS contains the basic parameters related to the RV homogenisation process as shown in Table 7. The calibrated RVs of each survey from the intermediate steps are also kept separately in different tables (see Table 8). The distribution of the SoS sources and their RVs are shown in Fig. 15. The star density in SoS is higher for the Galactic centre mainly due to Gaia, for the Galactic anticentre and for the northern hemisphere mainly due to the LAMOST fields. We also notice overdensities over the Kepler field around (l, b) ∼ (76°, 14°) which is predominately covered by LAMOST, APOGEE, and Gaia. The rotation pattern of stars in the Galaxy is evident in the right panel where stars in blue move towards the Sun, whereas stars in red move away in a similar manner presented from the Gaia DR2 data in Katz et al. (2019). The centre of the Galaxy is at the centre of the map. The RVs of the for the Small and Large Magellanic Clouds stand out around (l, b) ∼ (−57°, −44°) and ∼ (280°, −33°) respectively. In Fig. 14, we have the final HRD of SoS based on Gaia DR2 parameters as a function of star density.

thumbnail Fig. 14.

HR diagram for SoS colour coded to stellar density in log scale.

thumbnail Fig. 15.

Left panel: surface density distribution in Molleview projection for the stars in SoS in logarithmic scale. Right panel: same as left panel but colour coded to the final SoS RVs (median RV per pixel) for stars with |RV|< 40 km s−1. Both plots are in Galactic coordinates with pixel size of ∼0.46°.

Table 7.

SoS I.

Table 8.

Auxiliary catalogues to include original data from the catalogues and intermediate products before final homogenisation.

Finally, the errors of the RV homogenisation process (δRV) are shown in Fig. 16 as a function of magnitude and stellar parameters indicating the dependence of the RV precision on these parameters. We note that not all SoS stars have available stellar parameters but 99% of them have G mag measurements. Apart from the weighted RV error, we have also calculated the error on the RV to include the scatter from multiple surveys, σRV, used to account for other sources of errors other than systematics (e.g. Malkin 2013b):

(9)

thumbnail Fig. 16.

RV errors in SoS after the homogenisation process as a function of magnitude and stellar parameters. The errors (δRV and σRV) are binned to their median value. The black background corresponds to the 2D hexagonal binned values of the whole δRV in SoS.

where is the average RV for 1 to i surveys. These errors are plotted in Fig. 16 for stars with RVs derived from two or more surveys. For the brightest stars (G < 12 mag), the precision is estimated to be better than 1 km s−1. The precision worsens sharply for the hot and very metal-poor stars but shows weaker dependence on surface gravity. Dwarf stars show higher errors. The distribution of errors has peaks at 0.05, 0.2, 0.6, and 1.5 km s−1 in Fig. 17, depending on the number of multiple measurements and the resolution of the surveys. The σRV is a smoother function peaking at 0.09 km s−1 for stars observed in more than one surveys (1.3 million stars), comprising a more robust indicator of the precision in SoS.

thumbnail Fig. 17.

Distribution of errors (δRV and σRV) in SoS after the homogenisation process.

6.1. Binaries

Binary systems have an impact on the measured RVs as they are affected by the motion of stars around the centre of mass. We have collected information on binarity from the dedicated literature studies for each survey. Whenever a star in the above references is indicated as a binary (or binary candidate), we assigned a separate flag (flag_binary = 1) otherwise, the flag was set to zero. In particular, Price-Whelan et al. (2020) have detected 19 635 close-in binaries in APOGEE, Traven et al. (2020) have found 12 234 double-lined spectroscopic binaries in GALAH, Merle et al. (2017) 641 spectroscopic binaries in GES, Birko et al. (2019) 3838 single-lined binary stars in RAVE, Qian et al. (2019) more than 256 000 spectroscopic binary or variable star candidates in LAMOST, and Tian et al. (2020) ∼800 000 binary candidates for Gaia DR2. The above stars mount to around 10% of the SoS and is far from complete as the fraction of binaries in the Galaxy could be between 20% and 80%, depending also on spectral type (e.g. Duchêne & Kraus 2013).

We expect these systems to show larger dispersion in their RVs taken at different epochs. Figure 18 shows the standard deviation (σ) calculated for stars with RVs derived from more than one surveys. The binary population exhibits higher σ as a function of the stellar parameters compared to the rest of the sample. Apart from σ, other parameters in the Gaia data can also indicate binarity, such as the Renormalised Unit Weight Error (RUWE, Lindegren 2018, see also Appendix C).

thumbnail Fig. 18.

Binaries in SoS. The y axis is the standard deviation in RV from SoS as derived for stars observed from more than one survey. The x axis shows the stellar parameters from the surveys. The red points are the median binned values for the binary population and the blue points the rest of the SoS sample defined here as non binaries.

6.2. Comparison with external catalogues

We perform a set for comparisons with external catalogues to confirm the robustness of our results. First, we compare SoS with the Gaia RV standard stars (Gaia-STD, Soubiran et al. 2018a) which are comprised of 4813 FGK-type stars by combining numerous individual measurements from five high resolution spectrographs (R > 45 000): ELODIE (Baranne et al. 1996) and SOPHIE (Perruchot et al. 2008) on the 1.93 m telescope at Observatoire de Haute-Provence (OHP), CORALIE on the Euler telescope at La Silla Observatory, HARPS at the ESO La Silla 3.6 m telescope (Mayor et al. 2003), and NARVAL (Aurière 2003) on the Télescope Bernard Lyot at Pic du Midi Observatory. The ZP of Gaia-STD was established using asteroids whose RVs were accurately calculated from the dynamics of the solar system with an accuracy at a level of a few m s−1. Their ZP is set at 0.38 km s−1 which is interestingly very similar to the offset we find in our comparison of Gaia-STD with SoS (0.34 km s−1, see Table 9). We define this value as our absolute ZP and shift our final RVs to be in agreement with the Gaia-STD. Figure 19 shows the difference in RVs between the two samples as a function of the stellar atmospheric parameters and magnitude. The stellar parameters in this plot are average values taken from spectroscopic surveys. We notice a very good agreement in the RVs with σ of 0.64 km s−1 and MAD of 0.16 km s−1.

thumbnail Fig. 19.

RV comparison between SoS and the Gaia-STD stars as a function of magnitude, Teff, log g, and [Fe/H]. The red points are the median binned values and the shadowed area the MAD of each bin.

Table 9.

Statistics for the RV differences of the external catalogues with SoS.

The Geneva-Copenhaghen (GC) survey provides, among other parameters, very good quality RVs for G and F type dwarf stars in the solar neighbourhood (Nordström et al. 2004). The catalogue contains 14 139 RVs for disc stars obtained with the photoelectric cross-correlation spectrometers CORAVEL (Baranne et al. 1979; Mayor 1985) operated at the Swiss 1 m telescope at OHP and at the Danish 1.5 m telescope at La Silla in high resolution. The GC survey has 8097 stars in common with SoS but we used 7352 stars for the comparison (see Table 9) by excluding the spectroscopic binaries flagged in both samples. In Fig. 20, we show the differences in RV as a function of magnitude, Teff, log g, and [Fe/H], respectively. The Teff for this comparison is obtained from the GC catalogue because it covers most stars whereas log g and [Fe/H] come from the spectroscopic surveys included in the SoS. The plots in Fig. 20 appear flat even for the log g which shows a slight trend for dwarf stars in the comparison with Gaia-STD. Similar behaviour is presented for the Teff comparisons of SoS with Gaia-STD and GC. The median RV difference of SoS with GC is 0.67 km s−1 but after the ZP shift of the SoS it reaches 0.33 km s−1. We note that the differences between the Gaia-STD and GC is 0.32 km s−1 which is in perfect agreement with our results after correcting for the ZP defined by asteroids.

thumbnail Fig. 20.

RV comparison between SoS and the GC survey as a function of magnitude, Teff, log g, and [Fe/H]. The red points are the median binned values and the shadowed area the MAD of each bin. The dashed line is their median difference.

The differences and the scatter in RVs for both comparisons at this level can be attributed to the limitations of spectroscopic measurements by gravitational shift, convective shift, other astrophysical effects (stellar rotation, stellar activity, granulation, etc.), low-mass companions (stars and exoplanets), and last but not least, in case of systematic offsets, to instrumental effects. The dispersion in ΔRVs can be further decreased if we use the cleaner sample of SoS (flag_rv = 0).

6.3. Future updates and the SEGUE survey

This release of the SoS includes surveys with strong overlap with Gaia. However, there are other important spectroscopic surveys mapping the Galaxy with reliable parameters to be inserted in SoS in the future. For instance, the Sloan Extension for Galactic Understanding and Exploration (SEGUE, Yanny et al. 2009) and SEGUE-2 surveys have derived astrophysical parameters for ∼562 000 stars in low resolution (R ∼ 2000). The overlap of SEGUE with Gaia stars with available RVs is small because SEGUE targets much fainter stars. This survey cannot serve well for the Gaia RV calibration methods we used previously but it can be inserted to SoS following a different procedure. In this case, the entire SoS is used as reference and any external catalogue is calibrated into the SoS reference frame.

The stellar parameters for SEGUE are derived from the SEGUE Stellar Parameter Pipeline (SSPP)9 (Lee et al. 2008a,b; Allende Prieto et al. 2008) with typical uncertainties in RVs of 2.4 km s−1. In Fig. 21, we show the RV comparisons of stars in common between SEGUE and SoS after flagging both samples for their high quality parameters. We find a zero point offset of −4.00 km s−1 with SoS and their differences mainly correlate with effective temperature and metallicity. Figure 21 also shows the calibrations we apply to the SEGUE RVs in order to place them to the SoS RV frame after fitting linear functions as in Eq. (8) (with α = 0 and ζ = 0).

thumbnail Fig. 21.

RV comparison between SoS and SEGUE as a function of magnitude, Teff, log g, and [Fe/H]. The y-axis shows the ΔRV = RVSoS–RVSEGUE. The blue and red points represent the median ΔRV of each bin with more than 3 entries before and after the calibration respectively. The blue and red shadowed areas are the MAD of each bin.

We are looking forward to upcoming surveys such as SDSS-V (Kollmeier et al. 2017), WEAVE (Dalton et al. 2018), 4MOST (de Jong et al. 2019), and PFS (Takada et al. 2014) which can be also appended to SoS. Depending on the size of the overlap with SoS and the accuracy of the above surveys, they can also be used to revise and update the current SoS reference system.

7. Science validation with open clusters

Open clusters (OCs) are good examples to showcase some of the applications of SoS since radial velocities along with proper motions allow the study of their three-dimensional kinematics, trace their orbits, and relate them to the spiral structure of the Galactic disc. Moreover, RVs have proved to be an efficient method for membership determination since the stars in OCs are formed together from the same material, sharing the same kinematics. Mermilliod et al. (2008, 2009) present 1309 red giants in 166 OCs and 2565 solar-type dwarfs in 179 nearby OCs, respectively obtained from high resolution CORAVEL-type spectrographs at the OHP and at La Silla.

A comparison between homogeneous and independent literature analyses of OCs with SoS will demonstrate the precision and accuracy of our results, in particular when comparing the average RVs of the clusters and their dispersion. The sample of Mermilliod et al. (2008, 2009), hereafter MM, is ideal for this comparison because of its size and its high accuracy of 0.20−0.30 km s−1 (Duquennoy & Mayor 1991; Baranne et al. 1996). From this sample, we select clusters with more than 3 stars in common with SoS resulting in 1064 stars in 55 clusters, excluding stars flagged as non members and binaries by MM. We calculate their weighted mean RV (RVOC) and MAD after a 3σ outlier removal for both SoS and MM samples. The average difference of RVOC between SoS and MM is 0.21 km s−1 and the MAD of 0.26 km s−1. Figure 22 shows the distribution of the MAD for both SoS and MM for the 55 clusters to have very similar behaviour in both samples which indicates that they both are of similar accuracy and precision. As an example, we show the distribution of the RVs for the OCs with the highest number of stars in common (more than 15). The weighted mean RV and MAD as presented in Fig. 23 are very close for both samples which is a very good indication that our homogenisation works.

thumbnail Fig. 22.

Distribution of the MAD of the RVs derived from SoS in red and Mermilliod et al. (2008, 2009) in blue for the 55 clusters. The OCs contain more than 3 stars and the MAD is calculated after a 3σ outlier removal for both samples.

thumbnail Fig. 23.

RV distributions of 15 clusters from Mermilliod et al. (2008, 2009) with more than 15 stars. Red histograms represent the SoS results and blue the literature. The Gaussian kernel-density estimate is plotted as shaded areas. There is also the information on the median and MAD values after the 3σ outlier removal for both samples.

We note that 27 from the 55 clusters have RVs in SoS derived from the homogenisation of two or more surveys and the rest come from the homogenised Gaia RVs. The SoS RVs for the clusters in Fig. 23 come from the homogenisation of all six surveys apart from NGC 2506 and NGC 6475 where the RV source is only Gaia. Interestingly, these two clusters exhibit the highest MAD among the most numerous OCs in this sample. The results from Figs. 22 and 23 show that the homogenisation included in SoS agrees within 0.26 km s−1 with the high quality CORAVEL studies, designed to reach an accuracy of 0.20−0.30 km s−1. Therefore, our homogenisation and calibration procedures allowed us to effectively overcome the limitation of each individual survey.

The field of open clusters studies has received a tremendous boost thanks to Gaia data using astrometric and kinematic criteria (e.g. Cantat-Gaudin et al. 2018; Liu & Pang 2019; Castro-Ginard et al. 2020; Cantat-Gaudin & Anders 2020) providing catalogues with more than 2000 different clusters with hundreds of them newly discovered and have yet to be characterised (e.g. Soubiran et al. 2018b). SoS in this release can significant contribute to this effort by investigating their kinematics for instance, by exploring a sample of a few thousands OCs from Gaia data with an accuracy of 0.26 km s−1 in the RVOC. We select the recent sample of Cantat-Gaudin & Anders (2020) comprising of 2014 OCs with member probability higher than 70%. From this sample, we select clusters with more than 3 stars in common with SoS amounting to a total of 532 OCs. We note that not all OCs observed with Gaia have RVs measurements in Gaia DR2. Even though there may not be a full RV coverage by Gaia for these OCs, we incorporate in SoS ground based RVs which proves the synergy and the full exploitation of the available data.

As in the previous OC analysis, we calculate the weighted mean and their corresponding errors in RVs per cluster shown in Table 10 and their distribution in the Galaxy in Fig. 24 also showing the rotation in the plane. Among these OCs there are some poorly studied such as some of the 41 new stellar clusters detected from Cantat-Gaudin et al. (2019) and some of the few hundreds from Castro-Ginard et al. (2020).

thumbnail Fig. 24.

Spatial distribution of the 532 OCs observed with Gaia using the median RVs from SoS in colour. The X and Y distances are taken from Cantat-Gaudin & Anders (2020).

Table 10.

Parameters of 532 OCs from Cantat-Gaudin & Anders (2020) with more than 3 stars in SoS.

This test is a first step in the study of OCs with SoS in a self-consistent way. In the next release we will provide homogeneous metallicities and abundances for these clusters. In fact, we already have iron metallicity measurements for 294 OCs from this sample making this a unique sample to trace the history of our Galaxy.

8. Conclusions

The SoS is a project to gather RV measurements from the largest spectroscopic surveys (APOGEE, GALAH, GES, RAVE, and LAMOST) including Gaia with the goal to deliver homogeneous determinations to the community in possibly the largest catalogue to date. Combining data from different catalogues is not trivial because different surveys suffer from different biases. An additional problem in our homogenisation process was the large amount of data which forced us to find time- and resource-efficient solutions for the data manipulation. For our homogenisation we followed the steps below:

  • We pre-process the data sets to understand the strength and limitations of each survey and to select samples of reliable stars to use in the homogenisation procedure.

  • We perform the XM of the ground-based surveys with Gaia and evaluate its efficiency based on finding duplicated sources and assessing the best matches in Gaia based on magnitudes and RVs.

  • We normalise the errors of the RVs of each survey based on (i) the repeated measurements and (ii) the TCH method. This process delivers normalisation factors to multiply the RV errors of each survey to reveal that LAMOST and RAVE have overestimated their errors while APOGEE and GALAH have them underestimated, and Gaia and GES have better determined errors according to our analysis.

  • We first calibrate Gaia RVs with respect to magnitude, metallicity and effective temperature from the ground-based surveys separately by defining calibration coefficients to apply to the full Gaia DR2 sample.

  • Then we calibrate the RVs from the five ground-based surveys with respect to the new Gaia reference as a function of effective temperature, surface gravity, metallicity, and S/N using a multiple regression function.

  • The external ZP is set from the comparison to the Gaia RV standard stars.

  • We further validate the accuracy of SoS by comparing with the GC survey to be at 0.31 km s−1. Moreover, we calculated the median RVs of OCs from the MM sample and compare them and their MAD with SoS to find excellent agreement with this high resolution and homogeneous sample. We also provided median RVs and MAD for 532 OCs discovered by Gaia, some of them poorly studied.

The SoS catalogue contains RVs for around 11 million sources distributed in both hemispheres. The catalogue’s precision is set by the error distribution which peaks at 0.09 km s−1 and its accuracy is set by comparison with external catalogues at 0.16−0.31 km s−1. Our next goal is to perform a similar analysis on the main stellar parameters for effective temperature, surface gravity, metallicity, and chemical abundances to provide an unbiased compilation of the main properties of the stars in our Galaxy.


1

Usually the iron abundance is used as a proxy for the overall metallicity of a star.

2

phot_bp_rp_excess_factor is the ratio of the sum of GBP and GRP fluxes over the G integrated flux.

5

LAMOST DR5: http://dr5.lamost.org/

7

We note here that, although good parameters and abundances were later published for RAVE, the initial goal was to derive accurate RVs, not an accurate chemistry for individual stars.

8

Weighted mean and its error:

Acknowledgments

We thank the anonymous referee for the useful comments and suggestions that helped improve this work. We thank Lucio Angelo Antonelli, Matteo Perri, Peter B. Stetson, Ricardo Carrera, Matteo Monelli, Germano Sacco, and Michele Fabrizio for their contributions in this work. This research has been partially supported by the following grants: MIUR Premiale “Gaia-ESO survey” (PI Sofia Randich), MIUR Premiale “MiTiC: Mining the Cosmos” (PI Bianca Garilli), the ASI-INAF contract 2014-049-R.O: “Realizzazione attività tecniche/scientifiche presso ASDC” (PI Angelo Antonelli), Fondazione Cassa di Risparmio di Firenze, progetto: “Know the star, know the planet” (PI Elena Pancino), and Progetto Main Stream INAF: “Chemo-dynamics of globular clusters: the Gaia revolution” (PI Elena Pancino). CG acknowledges support from the State Research Agency (AEI) of the Spanish Ministry of Science, Innovation and Universities (MCIU) and the European Regional Development Fund (FEDER) under grant AYA2017-89076-P. TM acknowledges financial support from the Spanish Ministry of Science and Innovation (MICINN) through the Spanish State Research Agency, under the Severo Ochoa Programme 2020-2023 (CEX2019-000920-S). This work uses data from the European Space Agency (ESA) space mission Gaia. Gaia data are being processed by the Gaia Data Processing and Analysis Consortium (DPAC). Funding for the DPAC is provided by national institutions, in particular the institutions participating in the Gaia Multi-Lateral Agreement (MLA). We acknowledge the use of the public data products from RAVE (https://www.rave-survey.org). Funding for RAVE has been provided by: the Leibniz Institute for Astrophysics Potsdam; the Australian Astronomical Observatory; the Australian National University; the Australian Research Council; the French National Research Agency; the German Research Foundation (SPP 1177 and SFB 881); the European Research Council (ERC-StG 240271 Galactica); the Istituto Nazionale di Astrofisica at Padova; The Johns Hopkins University; the National Science Foundation of the USA (AST-0908326); the W.M. Keck foundation; the Macquarie University; the Netherlands Research School for Astronomy; the Natural Sciences and Engineering Research Council of Canada; the Slovenian Research Agency; the Swiss National Science Foundation; the Science & Technology Facilities Council of the UK; Opticon; Strasbourg Observatory; and the Universities of Basel, Groningen, Heidelberg and Sydney. This work made use of GALAH data (https://galah-survey.org) acquired through the Australian Astronomical Observatory, under programmes: A/2013B/13 (The GALAH pilot survey); A/2014A/25, A/2015A/19, A2017A/18 (The GALAH survey). We acknowledge the traditional owners of the land on which the AAT stands, the Gamilaraay people, and pay our respects to elders past and present. We acknowledge the use of the public data products from APOGEE (https://www.sdss.org). Funding for the Sloan Digital Sky Survey IV has been provided by the Alfred P. Sloan Foundation, the US Department of Energy Office of Science, and the Participating Institutions. SDSS acknowledges support and resources from the centre for High-Performance Computing at the University of Utah. SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS Collaboration including the Brazilian Participation Group, the Carnegie Institution for Science, Carnegie Mellon University, centre for Astrophysics | Harvard & Smithsonian, the Chilean Participation Group, the French Participation Group, Instituto de Astrofísica de Canarias, The Johns Hopkins University, Kavli Institute for the Physics and Mathematics of the Universe/University of Tokyo, the Korean Participation Group, Lawrence Berkeley National Laboratory, Leibniz Institut für Astrophysik Potsdam, Max-Planck-Institut für Astronomie (MPIA Heidelberg), Max-Planck-Institut für Astrophysik (MPA Garching), Max-Planck-Institut für Extraterrestrische Physik, National Astronomical Observatories of China, New Mexico State University, New York University, University of Notre Dame, Observatório Nacional/MCTI, The Ohio State University, Pennsylvania State University, Shanghai Astronomical Observatory, United Kingdom Participation Group, Universidad Nacional Autónoma de México, University of Arizona, University of colourado Boulder, University of Oxford, University of Portsmouth, University of Utah, University of Virginia, University of Washington, University of Wisconsin, Vanderbilt University, and Yale University. Support to the development of the GES (https://www.gaia-eso.eu) has been provided in part by the European Science Foundation Gaia Research for European Astronomy Training (GREAT-ESF) Research Network Programme. Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST (http://www.lamost.org) is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences. This research has made use of the Gaia Portal catalogues access tool, Agenzia Spaziale Italiana (ASI) – Space Science Data centre (SSDC), Rome, Italy (http://gaiaportal.ssdc.asi.it). In Sect. 5.1 we used the Kapteyn packages (Terlouw & Vogelaar 2014) for the fits. Some of the figures in this paper have been plotted using the healpy and HEALPix packages (https://healpy.readthedocs.io, Zonca et al. 2019; Górski et al. 2005). We also used the python packages: astropy (http://www.astropy.org, Astropy Collaboration 2013, 2018), numpy (https://numpy.org, Harris et al. 2020), scipy (https://www.scipy.org, Virtanen et al. 2020), pandas (https://pandas.pydata.org, McKinney 2010), and matplotlib (https://matplotlib.org, Hunter 2007). We used NASA Astrophysics Data System Bibliographic Services, the arXiv pre-print server operated by Cornell University, and the VizieR catalogue access tool, CDS, Strasbourg, France (DOI: 10.26093/cds/vizier).

References

  1. Ahumada, R., Allende Prieto, C., Almeida, A., et al. 2020, ApJS, 249, 3 [NASA ADS] [CrossRef] [Google Scholar]
  2. Allende Prieto, C., Sivarani, T., Beers, T. C., et al. 2008, AJ, 136, 2070 [Google Scholar]
  3. Anguiano, B., Majewski, S. R., Allende-Prieto, C., et al. 2018, A&A, 620, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  4. Arenou, F., Luri, X., Babusiaux, C., et al. 2018, A&A, 616, A17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  5. Astropy Collaboration (Robitaille, T. P., et al.) 2013, A&A, 558, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  6. Astropy Collaboration (Price-Whelan, A. M., et al.) 2018, AJ, 156, 123 [Google Scholar]
  7. Aurière, M. 2003, in EAS Pub. Ser., eds. J. Arnaud, & N. Meunier, 9, 105 [Google Scholar]
  8. Balbinot, E., & Gieles, M. 2018, MNRAS, 474, 2479 [NASA ADS] [CrossRef] [Google Scholar]
  9. Baranne, A., Mayor, M., & Poncet, J. L. 1979, Vistas Astron., 23, 279 [CrossRef] [Google Scholar]
  10. Baranne, A., Queloz, D., Mayor, M., et al. 1996, A&AS, 119, 373 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  11. Bevington, P. R. 1969, Data Reduction and Error Analysis for the Physical Sciences (New York: McGraw-Hill) [Google Scholar]
  12. Binney, J., Merrifield, M., & Wegner, G. A. 2000, Am. J. Phys., 68, 95 [NASA ADS] [CrossRef] [Google Scholar]
  13. Birko, D., Zwitter, T., Grebel, E. K., et al. 2019, AJ, 158, 155 [NASA ADS] [CrossRef] [Google Scholar]
  14. Boubert, D., Strader, J., Aguado, D., et al. 2019, MNRAS, 486, 2618 [Google Scholar]
  15. Buder, S., Asplund, M., Duong, L., et al. 2018, MNRAS, 478, 4513 [Google Scholar]
  16. Buder, S., Sharma, S., Kos, J., et al. 2021, MNRAS, 506, 150 [NASA ADS] [CrossRef] [Google Scholar]
  17. Cantat-Gaudin, T., & Anders, F. 2020, A&A, 633, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  18. Cantat-Gaudin, T., Jordi, C., Vallenari, A., et al. 2018, A&A, 618, A93 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Cantat-Gaudin, T., Krone-Martins, A., Sedaghat, N., et al. 2019, A&A, 624, A126 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  20. Castro-Ginard, A., Jordi, C., Luri, X., et al. 2020, A&A, 635, A45 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Cottaar, M., Covey, K. R., Meyer, M. R., et al. 2014, ApJ, 794, 125 [NASA ADS] [CrossRef] [Google Scholar]
  22. Dalton, G., Trager, S., Abrams, D. C., et al. 2018, in Ground-based and Airborne Instrumentation for Astronomy VII, eds. C. J. Evans, L. Simard, & H. Takami, SPIE Conf. Ser., 10702, 107021B [NASA ADS] [Google Scholar]
  23. Deepak, & Reddy, B. E. 2018, AJ, 156, 170 [NASA ADS] [CrossRef] [Google Scholar]
  24. de Jong, R. S., Agertz, O., Berbel, A. A., et al. 2019, The Messenger, 175, 3 [NASA ADS] [Google Scholar]
  25. Deng, L.-C., Newberg, H. J., Liu, C., et al. 2012, Res. Astron. Astrophys., 12, 735 [Google Scholar]
  26. De Silva, G. M., Freeman, K. C., Bland-Hawthorn, J., et al. 2015, MNRAS, 449, 2604 [NASA ADS] [CrossRef] [Google Scholar]
  27. Duchêne, G., & Kraus, A. 2013, ARA&A, 51, 269 [Google Scholar]
  28. Duquennoy, A., & Mayor, M. 1991, A&A, 500, 337 [NASA ADS] [Google Scholar]
  29. Evans, D. W., Riello, M., De Angeli, F., et al. 2018, A&A, 616, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  30. Gaia Collaboration (Prusti, T., et al.) 2016, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  31. Gaia Collaboration (Babusiaux, C., et al. 2018a, A&A, 616, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  32. Gaia Collaboration (Brown, A. G. A., et al.) 2018b, A&A, 616, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Gilmore, G., Randich, S., Asplund, M., et al. 2012, The Messenger, 147, 25 [NASA ADS] [Google Scholar]
  34. Gnanadesikan, R., & Kettenring, J. R. 1972, Biometrics, 28, 81 [Google Scholar]
  35. Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
  36. Gray, J., & Allan, D. 1974, 28th Annual Symposium on Frequency Control, 243 [CrossRef] [Google Scholar]
  37. Guiglion, G., de Laverny, P., Recio-Blanco, A., et al. 2016, A&A, 595, A18 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  38. Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357 [Google Scholar]
  39. Helmi, A. 2020, ARA&A, 58, 205 [Google Scholar]
  40. Hełminiak, K. G., Ukita, N., Kambe, E., et al. 2017, MNRAS, 468, 1726 [CrossRef] [Google Scholar]
  41. Hinkel, N. R., Timmes, F. X., Young, P. A., Pagano, M. D., & Turnbull, M. C. 2014, AJ, 148, 54 [NASA ADS] [CrossRef] [Google Scholar]
  42. Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [Google Scholar]
  43. Jackson, R. J., Jeffries, R. D., Lewis, J., et al. 2015, A&A, 580, A75 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  44. Jönsson, H., Holtzman, J. A., Prieto, C. A., et al. 2020, AJ, 160, 120 [CrossRef] [Google Scholar]
  45. Katz, D., Sartoretti, P., Cropper, M., et al. 2019, A&A, 622, A205 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  46. Kharchenko, N. V., Piskunov, A. E., Schilbach, E., Röser, S., & Scholz, R. D. 2013, A&A, 558, A53 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  47. Kollmeier, J. A., Zasowski, G., Rix, H. W., et al. 2017, ArXiv e-prints [arXiv:1711.03234] [Google Scholar]
  48. Koposov, S. E., Gilmore, G., Walker, M. G., et al. 2011, ApJ, 736, 146 [NASA ADS] [CrossRef] [Google Scholar]
  49. Lee, Y. S., Beers, T. C., Sivarani, T., et al. 2008a, AJ, 136, 2022 [Google Scholar]
  50. Lee, Y. S., Beers, T. C., Sivarani, T., et al. 2008b, AJ, 136, 2050 [Google Scholar]
  51. Lee, Y. S., Beers, T. C., Carlin, J. L., et al. 2015, AJ, 150, 187 [NASA ADS] [CrossRef] [Google Scholar]
  52. Lindegren, L. 2018, Re-normalising the Astrometric Chi-square in Gaia DR2, gAIA-C3-TN-LU-LL-124 [Google Scholar]
  53. Liu, L., & Pang, X. 2019, ApJS, 245, 32 [CrossRef] [Google Scholar]
  54. Luo, A. L., Zhao, Y.-H., Zhao, G., et al. 2015, Res. Astron. Astrophys., 15, 1095 [Google Scholar]
  55. Mahalanobis, P. C. 1936, Proc. Natl. Inst. Sci. India, 1, 49 [Google Scholar]
  56. Majewski, S. R., Schiavon, R. P., Frinchaboy, P. M., et al. 2017, AJ, 154, 94 [Google Scholar]
  57. Malkin, Z. 2013a, A&A, 558, A29 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  58. Malkin, Z. M. 2013b, Astron. Rep., 57, 882 [NASA ADS] [CrossRef] [Google Scholar]
  59. Marrese, P. M., Marinoni, S., Fabrizio, M., & Giuffrida, G. 2017, A&A, 607, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  60. Marrese, P. M., Marinoni, S., Fabrizio, M., & Altavilla, G. 2019, A&A, 621, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  61. Mayor, M. 1985, in Stellar Radial Velocities, eds. A. G. D. Philip, & D. W. Latham, 21 [Google Scholar]
  62. Mayor, M., Pepe, F., Queloz, D., et al. 2003, The Messenger, 114, 20 [NASA ADS] [Google Scholar]
  63. McKinney, W. 2010, in Proceedings of the 9th Python in Science Conference, eds. S. van der Walt, & J. Millman, 56 [Google Scholar]
  64. Merle, T., Van Eck, S., Jorissen, A., et al. 2017, A&A, 608, A95 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  65. Mermilliod, J. C., Mayor, M., & Udry, S. 2008, A&A, 485, 303 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  66. Mermilliod, J. C., Mayor, M., & Udry, S. 2009, A&A, 498, 949 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  67. Nordström, B., Mayor, M., Andersen, J., et al. 2004, A&A, 418, 989 [Google Scholar]
  68. Pasquini, L., Avila, G., Blecha, A., et al. 2002, The Messenger, 110, 1 [Google Scholar]
  69. Perruchot, S., Kohler, D., Bouchy, F., et al. 2008, in Ground-based and Airborne Instrumentation for Astronomy II, eds. I. S. McLean, & M. M. Casali, SPIE Conf. Ser., 7014, 70140J [NASA ADS] [Google Scholar]
  70. Price-Whelan, A. M., Hogg, D. W., Rix, H.-W., et al. 2020, ApJ, 895, 2 [NASA ADS] [CrossRef] [Google Scholar]
  71. Qian, S.-B., Shi, X.-D., Zhu, L.-Y., et al. 2019, Res. Astron. Astrophys., 19, 064 [Google Scholar]
  72. Randich, S., Gilmore, G., & Gaia-ESO Consortium 2013, The Messenger, 154, 47 [NASA ADS] [Google Scholar]
  73. Sacco, G. G., Morbidelli, L., Franciosini, E., et al. 2014, A&A, 565, A113 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  74. Sartoretti, P., Katz, D., Cropper, M., et al. 2018, A&A, 616, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  75. Sheinis, A., Anguiano, B., Asplund, M., et al. 2015, J. Astron. Telesc. Instrum. Syst., 1, 035002 [NASA ADS] [CrossRef] [Google Scholar]
  76. Soubiran, C., Le Campion, J.-F., Brouillet, N., & Chemin, L. 2016, A&A, 591, A118 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  77. Soubiran, C., Jasniewicz, G., Chemin, L., et al. 2018a, A&A, 616, A7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  78. Soubiran, C., Cantat-Gaudin, T., Romero-Gómez, M., et al. 2018b, A&A, 619, A155 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  79. Steinmetz, M., Zwitter, T., Siebert, A., et al. 2006, AJ, 132, 1645 [Google Scholar]
  80. Steinmetz, M., Matijevič, G., Enke, H., et al. 2020a, AJ, 160, 82 [NASA ADS] [CrossRef] [Google Scholar]
  81. Steinmetz, M., Guiglion, G., McMillan, P. J., et al. 2020b, AJ, 160, 83 [NASA ADS] [CrossRef] [Google Scholar]
  82. Takada, M., Ellis, R. S., Chiba, M., et al. 2014, PASJ, 66, R1 [Google Scholar]
  83. Terlouw, J. P., & Vogelaar, M. G. R. 2014, Kapteyn Package, version 2.3b3, Kapteyn Astronomical Institute, Groningen, http://www.astro.rug.nl/software/kapteyn/ [Google Scholar]
  84. Tian, H.-J., El-Badry, K., Rix, H.-W., & Gould, A. 2020, ApJS, 246, 4 [Google Scholar]
  85. Torra, F., Castañeda, J., Fabricius, C., et al. 2021, A&A, 649, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  86. Traven, G., Feltzing, S., Merle, T., et al. 2020, A&A, 638, A145 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  87. Trifonov, T., Tal-Or, L., Zechmeister, M., et al. 2020, A&A, 636, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  88. Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nat. Meth., 17, 261 [Google Scholar]
  89. Xiang, M. S., Liu, X. W., Yuan, H. B., et al. 2015, MNRAS, 448, 822 [NASA ADS] [CrossRef] [Google Scholar]
  90. Yanny, B., Rockosi, C., Newberg, H. J., et al. 2009, AJ, 137, 4377 [Google Scholar]
  91. Zhao, G., Zhao, Y.-H., Chu, Y.-Q., Jing, Y.-P., & Deng, L.-C. 2012, Res. Astron. Astrophys., 12, 723 [Google Scholar]
  92. Zonca, A., Singer, L., Lenz, D., et al. 2019, J. Open Sour. Softw., 4, 1298 [Google Scholar]
  93. Zwitter, T., Siebert, A., Munari, U., et al. 2008, AJ, 136, 421 [Google Scholar]
  94. Zwitter, T., Kos, J., Chiavassa, A., et al. 2018, MNRAS, 481, 645 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Comparisons of [Fe/H] and Teff between surveys

RAVE was primarily designed to be a Galactic archaeology survey focusing on obtaining reliable trends for populations of stars rather than providing precise stellar parameters for individual stars. Therefore, differences in the metallicity in comparison to the surveys are expected. In Fig. A.1, we present the differences in [Fe/H] for the surveys in pairs. RAVE stars show the same trends in comparison to the other surveys. We apply a correction function for the calibration on the RAVE metallicities after fitting a second order polynomial:

(A.1)

thumbnail Fig. A.1.

Iron metallicity comparisons of stars in common for the paired surveys.

where α = −0.089 ± 0.002, β = 0.432 ± 0.011, γ = 0.55 ± 0.02.

Similarly, the same comparisons are plotted for Teff for the paired surveys in Fig. A.2. In this case, we find LAMOST to show strong trends indicating that a correction of their Teff scale is needed. As for RAVE above, we apply a linear fit to calibrate the LAMOST Teff:

(A.2)

thumbnail Fig. A.2.

Differences in Teff for stars in common for the paired surveys.

where α = 0.122 ± 0.004, β = −606 ± 2.

Appendix B: The RV calibration coefficients

Table B.1 summarises all the coefficients for the calibration of the Gaia RVs presented in Sect. 5.1 and the results of the fits are plotted in the respective Figs. 911. In turn, Table B.2 shows the coefficients to calibrate the survey RVs to the calibrated Gaia RV reference frame, discussed in Sect. 5.2, and the results of the fits are plotted in Fig. B.1. Additionally, we plot the calibrated RVs for the surveys for the stars in common in pairs in Fig. B.2.

thumbnail Fig. B.1.

Calibration of the survey RVs for stars in common with Gaia based on Eq. 7 as a function of: Teff, log g, [Fe/H], and S/N. GES does not have S/N measurements. The ΔRV in the y-axis indicates the calibratedGaia RVs minus the survey RVs before (blue points) and after the calibration (red points). The colours and symbols are also described in Fig. 9.

thumbnail Fig. B.2.

Comparison of the calibrated RVs for stars in common between surveys.

Table B.1.

The coefficients for all the Gaia RV calibrations from Eqs. 57 in Sect. 5.1 for each survey and the final coefficients derived from their weighted mean.

Table B.2.

The coefficients for the RV calibration of Eq. 8 for each survey. The respective plots are shown in Fig. B.1.

Appendix C: Binarity parameters

Another parameter which could infer binarity is the Renormalised Unit Weight Error (RUWE) which assesses the quality and reliability of the astrometric data calculated from the Gaia DR2. Binarity is one of the reasons for inconsistent astrometric solutions causing large RUWE values (> 1.4). We have calculated the RUWE parameter for stars with Gaia parameters and find that in fact binary stars show higher RUWE (see Fig. C.1).

thumbnail Fig. C.1.

Binaries in SoS. The y axis is the RUWE for stars which have Gaia parameters. The x axis shows the stellar parameters from the surveys. The red points are the median binned values for the binary population and the blue points the rest of the SoS sample defined here as non binaries.

All Tables

Table 1.

Description of number of objects included per survey.

Table 2.

Results from the analysis of the duplicated sources.

Table 3.

Statistics for the paired RV differences from the repeated measurements.

Table 4.

Summary of the error normalisation factor for all surveys.

Table 5.

Statistics of the ΔRV (=RVGaia–RVsurvey).

Table 6.

Parameter space covered by all the Gaia internal RV calibrations computed in Sect. 5.1.

Table 8.

Auxiliary catalogues to include original data from the catalogues and intermediate products before final homogenisation.

Table 9.

Statistics for the RV differences of the external catalogues with SoS.

Table 10.

Parameters of 532 OCs from Cantat-Gaudin & Anders (2020) with more than 3 stars in SoS.

Table B.1.

The coefficients for all the Gaia RV calibrations from Eqs. 57 in Sect. 5.1 for each survey and the final coefficients derived from their weighted mean.

Table B.2.

The coefficients for the RV calibration of Eq. 8 for each survey. The respective plots are shown in Fig. B.1.

All Figures

thumbnail Fig. 1.

Surface density distribution in a Mollweide projection of the galactic coordinates of the six surveys used in this work, obtained using a HEALPix (Hierarchical Equal Area isoLatitude Pixelisation) tessellation with different resolutions. The colour scales of the maps are in logarithmic scale and are different for each survey.

In the text
thumbnail Fig. 2.

G magnitude distribution of the surveys in this work for stars in common with Gaia.

In the text
thumbnail Fig. 3.

HR diagrams of the surveys used in this work using Gaia photometry and parallaxes, colour coded to the stellar density in log scale.

In the text
thumbnail Fig. 4.

Sketch of four possible scenarios for the XM algorithm. Case 1: one to one match for isolated sources. Case 2: two stars have the same match which in most cases it is a duplicated source. Case 3: the best match is selected from a neighbourhood. Case 4: wrong match possibly because the right one is missing.

In the text
thumbnail Fig. 5.

G magnitude difference between Gaia match and the surveys: APOGEE, GALAH, GES, LAMOST, and RAVE respectively. The plots are colour coded to the stellar number density. The horizontal black lines indicate the ±3σ threshold for outliers per magnitude bin (10 bins in total).

In the text
thumbnail Fig. 6.

Distributions of the RV paired differences of the stars with multiple measurements in each survey normalised to their errors (solid lines). The Gaussian fits are plotted with dotted lines. The statistics of these distributions are given in Table 3.

In the text
thumbnail Fig. 7.

Histograms of the RV differences computed as: ΔRV = RVGaia–RVsurvey. The statistics of these distributions are given in Table 5.

In the text
thumbnail Fig. 8.

Initial RV differences of stars in common with Gaia as a function of: G magnitude, RV, Teff, log g, iron metallicity, and signal-to-noise ratio of the surveys. The S/N is scaled for visual convenience and GES does not provide S/N measurements. The RV differences are binned to contain more than 10 entries for each bin.

In the text
thumbnail Fig. 9.

Calibration of Gaia RVs for stars in common with APOGEE, GALAH, GES, and RAVE as a function of G mag derived by fitting a second degree polynomial (Eq. (5)). The blue and red points represent the median ΔRV of each bin with > 10 entries before and after the calibration respectively. The grey points are the binned ΔRV of the fit. The blue and red shadowed areas are the MAD of each bin. The bottom panels of each plot show the number of stars in each bin.

In the text
thumbnail Fig. 10.

Calibration of Gaia RVs for stars in common with APOGEE, GALAH, and GES as a function of iron metallicity. The colours and symbols are described in Fig. 9.

In the text
thumbnail Fig. 11.

Calibration of Gaia RVs as a function of Teff for APOGEE stars in common with Gaia to calibrate the Gaia RVs. The colours and symbols are described in Fig. 9.

In the text
thumbnail Fig. 12.

Calibrated RV differences of stars in common with Gaia as a function of: G magnitude, RV, Teff, log g, iron metallicity, and signal-to-noise ratio of the surveys. The symbol are the same as Fig. 8.

In the text
thumbnail Fig. 13.

Distribution of the normalised errors for all surveys.

In the text
thumbnail Fig. 14.

HR diagram for SoS colour coded to stellar density in log scale.

In the text
thumbnail Fig. 15.

Left panel: surface density distribution in Molleview projection for the stars in SoS in logarithmic scale. Right panel: same as left panel but colour coded to the final SoS RVs (median RV per pixel) for stars with |RV|< 40 km s−1. Both plots are in Galactic coordinates with pixel size of ∼0.46°.

In the text
thumbnail Fig. 16.

RV errors in SoS after the homogenisation process as a function of magnitude and stellar parameters. The errors (δRV and σRV) are binned to their median value. The black background corresponds to the 2D hexagonal binned values of the whole δRV in SoS.

In the text
thumbnail Fig. 17.

Distribution of errors (δRV and σRV) in SoS after the homogenisation process.

In the text
thumbnail Fig. 18.

Binaries in SoS. The y axis is the standard deviation in RV from SoS as derived for stars observed from more than one survey. The x axis shows the stellar parameters from the surveys. The red points are the median binned values for the binary population and the blue points the rest of the SoS sample defined here as non binaries.

In the text
thumbnail Fig. 19.

RV comparison between SoS and the Gaia-STD stars as a function of magnitude, Teff, log g, and [Fe/H]. The red points are the median binned values and the shadowed area the MAD of each bin.

In the text
thumbnail Fig. 20.

RV comparison between SoS and the GC survey as a function of magnitude, Teff, log g, and [Fe/H]. The red points are the median binned values and the shadowed area the MAD of each bin. The dashed line is their median difference.

In the text
thumbnail Fig. 21.

RV comparison between SoS and SEGUE as a function of magnitude, Teff, log g, and [Fe/H]. The y-axis shows the ΔRV = RVSoS–RVSEGUE. The blue and red points represent the median ΔRV of each bin with more than 3 entries before and after the calibration respectively. The blue and red shadowed areas are the MAD of each bin.

In the text
thumbnail Fig. 22.

Distribution of the MAD of the RVs derived from SoS in red and Mermilliod et al. (2008, 2009) in blue for the 55 clusters. The OCs contain more than 3 stars and the MAD is calculated after a 3σ outlier removal for both samples.

In the text
thumbnail Fig. 23.

RV distributions of 15 clusters from Mermilliod et al. (2008, 2009) with more than 15 stars. Red histograms represent the SoS results and blue the literature. The Gaussian kernel-density estimate is plotted as shaded areas. There is also the information on the median and MAD values after the 3σ outlier removal for both samples.

In the text
thumbnail Fig. 24.

Spatial distribution of the 532 OCs observed with Gaia using the median RVs from SoS in colour. The X and Y distances are taken from Cantat-Gaudin & Anders (2020).

In the text
thumbnail Fig. A.1.

Iron metallicity comparisons of stars in common for the paired surveys.

In the text
thumbnail Fig. A.2.

Differences in Teff for stars in common for the paired surveys.

In the text
thumbnail Fig. B.1.

Calibration of the survey RVs for stars in common with Gaia based on Eq. 7 as a function of: Teff, log g, [Fe/H], and S/N. GES does not have S/N measurements. The ΔRV in the y-axis indicates the calibratedGaia RVs minus the survey RVs before (blue points) and after the calibration (red points). The colours and symbols are also described in Fig. 9.

In the text
thumbnail Fig. B.2.

Comparison of the calibrated RVs for stars in common between surveys.

In the text
thumbnail Fig. C.1.

Binaries in SoS. The y axis is the RUWE for stars which have Gaia parameters. The x axis shows the stellar parameters from the surveys. The red points are the median binned values for the binary population and the blue points the rest of the SoS sample defined here as non binaries.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.