Open Access
Issue
A&A
Volume 684, April 2024
Article Number A148
Number of page(s) 31
Section Catalogs and data
DOI https://doi.org/10.1051/0004-6361/202347558
Published online 19 April 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

The Gaia-ESO Survey is a European Southern Observatory (ESO) public spectroscopic survey designed to observe 105 stars. It sampled the main populations of the Milky Way: the disk, bulge, and halo. The observing programme and the science goals of the Gaia-ESO Survey are described in Gilmore et al. (2022) and Randich et al. (2022). The Gaia-ESO observing campaign was undertaken using the Fibre Large Array Multi Element Spectrograph (FLAMES) multi-object, intermediate and high resolution spectrograph of the Very Large Telescope (VLT) (Pasquini et al. 2002), using both the GIRAFFE spectrograph (R ≃ 20000) and the Ultraviolet and Visual Echelle Spectrograph (UVES) (R ≃ 50, 000). As such, approximately 100 stars at medium resolution and six stars at high resolution were observed in each fibre configuration.

The observed data were divided between five working groups (WGs) as follows: WG10, FGK medium-resolution stars; WG11, FGK high-resolution stars (Smiljanic et al. 2014); WG12, premain sequence (Lanzafame et al. 2015); WG13, OBA stars (Blomme et al. 2022); and WG14, non-standard objects and quality flags (Van Eck et al., in prep.). Each WG was composed of multiple analysis teams from which the results were combined into a per star catalogue per WG. These results were then combined by the top-level working group (WG15) to produce the final Gaia-ESO per star catalogue. This is described in Hourihane et al. (2023).

The WG10 homogenisation of the analyses of the mediumresolution FGK stars for the final Gaia-ESO data release is described in this work. The observations at medium resolution were made in four of the available GIRAFFE setups: HR10, HR21, HR15N, and HR9B, for which the characteristics are given in Table 1. Throughout this paper, ‘SETUP’ refers to the four setups used in the Gaia-ESO Survey. In total there were 92 348 stars (158 809 spectra) observed at medium resolution that were analysed within WG10.

As is described below, the WG10 homogenisation particularly relies on the WG11 homogenisation of the analyses of the UVES observations. The WG11 homogenisation of the fourth data release is described in Smiljanic et al. (2014); however, the WG11 homogenisation was updated for this release, so we provide an updated description in Sect. 5. In total there were 6987 stars (16 350 spectra) observed at high resolution for WG11.

Section 2 presents a description of the GIRAFFE spectral dataset, Sect. 3 shows the analysis methods of the WG10 nodes, and in Sect. 4 we define the reference sets used in the homogenisation. In Sect. 5, we outline the Bayesian inference homogenisation method developed by WG11 and describe the WG11 parameter homogenisation and chemical abundance homogenisation. In Sect. 6, we describe the WG10 parameter homogenisation, Sect. 7 presents the WG10 chemical abundance homogenisation, and the final catalogue and conclusions are presented in Sect. 8.

Table 1

Overview of WG10 spectral dataset.

2 GIRAFFE medium-resolution spectral dataset

Table 1 summarises the number of spectra for the relevant observing programmes within WG10. These are a subset of the full list of observing programmes within Gaia-ESO. (See Gilmore et al. 2022; Randich et al. 2022; Hourihane et al. 2023 for the full list.) The GES_TYPE is the associated code per observing programme that allows these sub-samples to be easily identified in the Gaia-ESO catalogue.

The two main science programmes are the open clusters (OCs) and the Milky Way (MW). The observing strategies for these programmes are explained in full in Randich et al. (2022), Bragaglia et al. (2022), and Gilmore et al. (2022). The OC and MW programmes contain the bulk of the spectra. The remaining programmes are part of the calibration strategy of Gaia-ESO (Pancino et al. 2017a).

The four key values provided to WG10 for use in the analysis are the signal-to-noise, the radial velocity and its uncertainty, and the rotational velocity. The calculation of these for the GIRAFFE spectra are described in Gilmore et al. (2022). The distribution of these values per SETUP are shown in Fig. 1. The signal-to-noise (S/N) distribution shows a large contribution of stars with a S/N less than ten. These can mainly be attributed to filler stars which were used to fill in the fibres once the observing programme targets for a field-of-view were exhausted. The majority of the stars have a rotational velocity of less than 20 km s−1. This indicates that they are mainly slow rotating stars, as expected for the observing programmes. The radial velocity distribution is centred on zero and the bulk of the stars lie within −200 to 200 km s−1. The distribution of the uncertainty on the radial velocity shows the bulk of the stars have a precision better than 2 km s−1. Particularly for the MW fields, the main effect is that for many of these filler stars, only the radial velocity could be reliably determined out of the set of stellar parameters.

Table 2

SETUP and phase analyses carried out by each WG10 node.

3 Working Group 10 node analysis methods

Seven analysis teams (hereafter referred to as ‘nodes’) undertook either stellar parameter or chemical abundance, or both, analyses of subsets of the GIRAFFE SETUPs within WG10 for the final Gaia-ESO data release. The list of nodes and the SETUPs each node employed in which analysis phase is presented in Table 2.

To provide a standardisation to the node analyses, the nodes were required to use the MARCS Stellar Atmosphere Models (Gustafsson et al. 2008), the solar abundances as from Grevesse et al. (2007), and the Gaia-ESO Line list (Heiter et al. 2021). Pregenerated synthetic spectra for Gaia-ESO were also available, calculated as described in de Laverny et al. (2012). The following describes the analysis process of each node in turn.

3.1 Active in both the parameter and abundance phases

EPINARBO. Equivalent widths were measured with DAOSPEC (Stetson & Pancino 2008). Atmospheric parameters and abundances were determined with the Fast Automatic MOOG Analysis code (FAMA, Magrini et al. 2013), which automatises the use of MOOG (Sneden et al. 2012). The HR15N and HR9B SETUPs were analysed.

Lumba. The Lumba GIRAFFE analysis pipeline makes use of the Spectroscopy Made Easy code (SME, Valenti & Piskunov 1996; Piskunov & Valenti 2017) to compute on-the-fly synthetic spectra that are used to determine atmospheric parameters and chemical abundances. Departures from local thermodynamic equilibrium (LTE) line formation were included for Li, Mg, Al, Si, and Fe lines. The HR15N, HR10, and HR21 SETUPs were analysed. This pipeline has also been used for UVES analysis Gavel et al. (2019) and is very similar to the pipeline used for the second and third data releases of the GALAH survey Buder et al. (2018, 2021).

MaxPlanck. The MaxPlanck node used neural networks to determine stellar parameters and magnesium abundance (Kovalev et al. 2019). A training set of synthetic spectra was generated using the MARCS stellar atmosphere models and the Gaia-ESO line list. The MaxPlanck node investigated analysing the HR15N, HR10, and HR21 spectra (HR10+HR21 as a single analysis) but determined that the results from their analysis of HR10 were the only reliable results and thus provided results for that SETUP only. Results for the HR21 SETUP for the bulge fields and standard stars were also provided.

thumbnail Fig. 1

Distribution of key values available for use in the WG10 analysis per SETUP: signal-to-noise, radial velocity, error on radial velocity, and rotational velocity. The bin size (Bin) for each parameter is given.

3.2 Active only in the parameter phase

IAC. The code FERRE (see Allende Prieto et al. 2014, and references therein) was used. The strategy was to search for the atmospheric parameters of the best fitting model among a grid of pre-computed synthetic spectra for each observed spectrum. The HR10 and HR21 SETUPs were analysed together as a single analysis, and HR10 was also analysed separately. Results for the HR21 SETUP for the bulge fields and standard stars were also provided.

OACT. The OACT node used the code ROTFIT (Frasca et al. 2003, Frasca et al. 2006). The method consists of a χ2 minimisation of the residuals between the observed spectrum and a set of reference spectra. In this case, a library of observed spectra from the ELODIE archive (Prugniel & Soubiran 2001) was used as reference. The HR15N and HR9B SETUPs were analysed.

3.3 Active only in the abundance phase

Arcetri. Equivalent widths of the Li and the nearby Fe line were measured with a Gaussian fitting. Abundances were determined with a set of curves of growth (Franciosini et al. 2022) determined from the Gaia-ESO stellar spectra grid. The HR15N SETUP was analysed.

CAUP. Equivalent widths were measured with the Automatic Routine for line Equivalent widths in the stellar Spectra code (ARES, Sousa et al. 2007, 2015). Atmospheric parameters and chemical abundances were determined with MOOG (Sneden et al. 2012). This was carried out on the HR15N, HR10, and HR21 SETUPs.

Vilnius. Equivalent widths were measured with DAOSPEC (Stetson & Pancino 2008). The node developed its own wrapper to automatise the use of MOOG (Sneden et al. 2012) for the determination of chemical abundances. The HR10 and HR21 SETUPs were analysed.

4 Definition of reference sets

An underlying difficulty in the analysis of large stellar datasets is ensuring that the parameters and abundances that are produced are as close to the truth as is possible in our current understanding of stellar physics. The sheer number of spectra make it impossible to carry out a detailed ‘by hand’ analysis of each spectrum, so automated analyses must be used, such as those described previously. During their development, automated analyses are calibrated and validated against reference sets. For the homogenisation of the node results into the per star catalogue for both WG10 and WG11, the results from each node analysis were compared to known results of key reference sets to verify the node results and, where necessary, correct them onto the reference set scale prior to the node results being combined. The WG11 analysis, use of reference sets, and final homogenisation is described in Sect. 5 as an update to the process described in Smiljanic et al. (2014).

For WG10, there were a reasonable number of stars in common with WG11 such that the results could be combined with the FGK benchmark stars in order to construct a larger reference set for which the parameter space was more filled in. This also meant that the WG10 results would be calibrated directly onto the WG11 parameter scale.

5 Working Group 11: Bayesian inference homogenisation method

The WG10 and WG11 node parameters were homogenised separately but used the same Bayesian inference method. Thus, the description below supersedes the previous WG11 homogenisation strategy described in Smiljanic et al. (2014).

The Bayesian homogenisation for the WG11 results was first developed by Andrew R. Casey (2014–2017, priv. comm.)1 and used for the Gaia-ESO internal data release 5 (iDR5). For the final data release (which corresponds to iDR6), we built upon his initial work and developed a different implementation of the method. The homogenisation process and Bayesian modelling was written with R (R Core Team 2021)2 using JAGS (Plummer 2003)3 and a number of related packages4.

The problem we want to solve is that of finding the best estimate of a stellar parameter, given the multiple values determined by the different nodes. The parameter can be any of Teff, log g, or [Fe/H]. For the microturbulence velocity (ξ), the procedure was slightly different (see Sect. 5.4).

Let us consider that a star ‘n’ is characterised by a true value of a given parameter, true.paramn. When a node ‘i’ attempts to estimate that value, the analysis returns a measurement, parami,n, that is affected by systematic and random errors introduced by the methodology that was used. We made the assumption that these errors are independent and can be separated if parameters are reported for enough repeat spectra. The systematic error accounts for any and all zero point offsets and biases. The random error accounts for any and all effects that are stochastic in nature. We further assumed that the bias error is itself a function of the atmospheric parameter in question and that the random errors can be described by a Gaussian distribution.

Numerically, we would write parami,ndnorm(true. paramn,random.erri)+bias.parami,n,${\rm{para}}{{\rm{m}}_{i,n}} \sim dnorm\,\left( {{\rm{true}}{\rm{. para}}{{\rm{m}}_n},{\rm{random}}{\rm{.er}}{{\rm{r}}_i}} \right) + {\rm{bias}}{\rm{.para}}{{\rm{m}}_{i,n}}{\rm{,}}$(1)

where dnorm(μ, σ) stands for the Gaussian distribution of mean = μ and standard deviation = σ. Equation (1) states that the measurement provided by the node i is a random draw (which is the meaning of the symbol ‘~’ in the equation) from the distribution centred on true.paramn affected by the random error random.erri. This random error is postulated to be a property of the node. Further, the measurement is affected by an offset bias.parami,n. This bias value comes from a function that is also a property of the node and was computed at the value of the parameter that characterises the star n. For this bias function, we assumed that the variation of the bias in the parameter space can be described by a quadratic function: bias.parami,n=α1+α2parami,n+α3(parami,n)2.${\rm{bias}}{\rm{.para}}{{\rm{m}}_{i,n}} = {\alpha _1} + {\alpha _2} * {\rm{para}}{{\rm{m}}_{i,n}} + {\alpha _3} * {\left( {{\rm{para}}{{\rm{m}}_{i,n}}} \right)^2}{\rm{.}}$(2)

Strictly speaking, we should write Eq. (2) as a function of the true parameter and not of the parameter value measured by the node. However, that makes the problem circular (i.e. to know the true value, we need to correct for the bias, but to compute the bias, we need to use the true value). As it stands, Eq. (2) should work reasonably well if the difference between the true and measured values is not too big, although exactly what that means has to be checked a posteriori. In any case, differences that cannot be accounted for by the bias will tend to inflate the random component of the error. Our tests with the final results showed that such choice for the modelling worked well (which does not mean that things could not be improved by assuming a different model).

For numerical reasons, we actually write Eq. (1) as parami,ndnorm(true.paramn+bias.parami,n,  random.erri).${\rm{para}}{{\rm{m}}_{i,n}} \sim dnorm\left( {{\rm{true}}{\rm{.para}}{{\rm{m}}_n} + {\rm{bias}}{\rm{.para}}{{\rm{m}}_{i,n}},\,\,{\rm{random}}{\rm{.er}}{{\rm{r}}_i}} \right).$(3)

This choice means that we assumed it is equivalent to say that the measured value was shifted by an offset or to say that when a node is affected by a certain bias, the measurement was made from a distribution centred around a ‘biased true parameter’ (true.paramn + bias.parami,n).

Because there are actually multiple nodes making the measurements, we can write the problem using a multi-dimensional normal distribution (where each node is one dimension): node.paramsndmnorm(μn,Σparam),${\bf{node}}{\bf{.param}}{{\bf{s}}_n} \sim dmnorm\left( {{{\bf{\mu }}_n},{{\rm{\Sigma }}_{{\rm{param}}}}} \right){\rm{,}}$(4)

where dmnorm(μ, Σ) stands for the multi-dimensional normal distribution of mean vector = μ and covariance matrix = Σ. The vector node.paramsn = (param1,n, paramK,n), that is, it combines the measurements of all K different nodes for star n. The covariance matrix, Σparam, takes into account the random errors of each node and the correlations between their measurements. The mean vector μn combines together the ‘mean’ that we would write for each node separately in Eq. (3); in other words, it is made of repeated entries for each node with the true.paramn of the star and the corresponding node bias: μn=(true.paramn+bias.param1,n,,true.paramn+bias.paramK,n).${{\bf{\mu }}_n} = \left( {{\rm{true}}{\rm{.para}}{{\rm{m}}_n} + {\rm{bias}}{\rm{.para}}{{\rm{m}}_{1,n}}, \ldots ,{\rm{true}}{\rm{.para}}{{\rm{m}}_n} + {\rm{bias}}{\rm{.para}}{{\rm{m}}_{K,n}}} \right){\rm{.}}$(5)

To be able to apply the Bayesian inference to the homogenisation, we first needed to estimate the coefficients that define the bias function of each node (Eq. (2)) and the covariance matrix (Eq. (4)). Once the covariance matrix and the biases are defined, the only unknown in Eq. (4) is the true.paramn of star n. For these calculations, we have relied on a set of reference objects with known values of their true parameters. Of course, how well such true values are known can be discussed. In practice, what we write for the reference objects is that the known parameter values, and their uncertainties, are priors of the true values: true. paramndnorm(reference.paramn, reference.errorn).${\rm{true}}{\rm{. para}}{{\rm{m}}_n} \sim dnorm\left( {{\rm{reference}}{\rm{.para}}{{\rm{m}}_n}{\rm{, reference}}{\rm{.erro}}{{\rm{r}}_n}} \right){\rm{.}}$(6)

This means that the simulation is free to adapt the true value of the parameter, within the error of the estimate given for that reference star. Each spectral SETUP analysed by the WG10 and WG11 nodes (UVES 520 and 580) was homogenised separately, in order to account for the possibility of different biases in the analysis of these different spectra. To estimate the biases and the covariance matrix, we ran a Bayesian Monte Carlo simulation using the tools mentioned above. We ran diagnostic tests to ensure that the simulations converged and that the autocorrelations were low (below 1–2%).

For WG10 the reference sets were constructed as described in Sect. 4. For WG11, different sets of reference stars were used, depending on the parameter being homogenised. Details are given in the subsections below.

Nine analysis nodes participated in the WG11 analysis. They are identified in Table 3. A summary of the methods, with references to the codes employed, is given in the companion paper by Gilmore et al. (2022) and is not repeated here. A description of the methodologies can also be found in Appendix A of Smiljanic et al. (2014) or, in the case of the OACT node, in Lanzafame et al. (2015).

Table 3

WG11 nodes that participated in the analysis of the final Gaia-ESO data release.

5.1 Working Group 11: homogenisation of effective temperature

When estimating values of Teff, we used as reference the FGK benchmark stars (Heiter et al. 2015) and the cool M-dwarf benchmark candidates from Pancino et al. (2017a, see Table 5).

However, not all of these cool benchmarks were successfully analysed, as they fall in a region of the parameter space where the WG11 methods do not work well. In total, the WG11 nodes analysed 640 individual spectra for 35 benchmark stars. In terms of Teff, those that were successfully analysed cover the interval between 3224 and 6635 K.

One problem we faced in our method is that nodes do not provide results for each and every spectrum. For the purposes of the MCMC simulations, the missing values were substituted with a broad non-informative uniform prior (from 3000 to 8000 K). As a result of the simulations, we obtained the coefficients of Eq. (2) and the covariance matrix of Eq. (4), along with their uncertainties.

As a second step, the bias function and covariance matrix were applied to compute the best estimate of Teff for each star in the sample (including to the reference stars themselves). We note that the errors of the atmospheric parameters provided by the nodes are not used (neither for the homogenisation of Teff nor of the other parameters). Each node estimated their errors in a different way, some providing internal errors of the method, while others applied more complicated prescriptions. Consequently, these values cannot be directly compared. We let the comparison between reference and measured parameters define the intrinsic node random errors.

At the end of this second step, we found that the Teff values of the reference stars were recovered with a standard deviation of ±85 K. We assumed that this value represents the external accuracy of our final Teff scale, even though, in truth, this value is a composition of our accuracy and the errors of the reference scale. The internal errors of our Teff values have a median of 65 K, with the first and third quartiles at 60 and 75 K. A comparison against the isochrones of open and globular clusters seemed to validate the final Teff results (See Sect. 5.5).

5.2 Working Group 11: homogenisation of surface gravity

For the WG11 log g homogenisation, two models of the node biases had to be combined. The first model, valid for dwarfs and metal-rich giants, used as the reference set the same sample of 35 benchmark stars employed in the analysis of Teff. The benchmark stars cover the interval between 0.68 and 5.05 dex.

However, a second model had to be built for the metal-poor giants ([Fe/H] ≤ −0.50 and log g ≤ 3.50). In this case, in addition to the benchmark stars, a sample of giants with asteroseismic values of log g was used. The sample included 62 stars with data from K2 (Worley et al. 2020) and 88 stars with data from CoRoT (Masseron et al., in prep.). The K2 stars have log g with an interval between 1.74 and 3.41. The CoRoT stars have log g with an interval between 1.75 and 2.99. The combination of the two models was found necessary to reproduce the Teff–log g diagram of the globular clusters. Conversely, the solution with the seismic values degraded the quality of the diagrams and tests for other types of clusters and field stars. A few iterations were needed to assign the model that should be used for the stars at the edges of the parameter space division.

For the purposes of the model, the uncertainty of the seismic log g values was fixed at a value of ±0.02 dex. The missing values in log g were substituted by a broad uniform prior (between 0.0 and 5.0 dex). At the end of the homogenisation, we found that the reference log g values were recovered with a standard deviation of ±0.14 dex. The internal errors have a median value of 0.15 dex, with the first and third quartiles at 0.14 and 0.18 dex. A comparison with cluster isochrones is shown in Sect. 5.5.

We performed a number of tests attempting the use Gaia log g priors as additional constraints in the Bayesian model. However, none of the attempts produced results of higher quality than the ones obtained with the approach described above.

5.3 Working Group 11: homogenisation of [Fe/H]

For the [Fe/H] homogenisation, the model was obtained using the results provided for 34 of the 35 FGK benchmark stars (with results for a total of 565 spectra). One benchmark (HD 140283) had no reference [Fe/H] value but good values of Teff and log g. The stars cover the interval between −2.64 and +0.35 dex (including the cool benchmarks). As the error in the reference [Fe/H], we used the standard deviation of the Fe I lines given in Jofré et al. (2014).

However, the homogenisation of metallicities using only the benchmarks was found to make the globular clusters become too metal rich. At the same time, it was also making the known metal-rich open clusters become too metal poor. To correct this we were forced to add additional references for metallicities. These references are the open and globular clusters from Tables 7 and 8 of Pancino et al. (2017a). The open clusters used were NGC 6253, NGC 6705, NGC 2477, NGC 3532, and Melotte 71. The globular clusters used were NGC 4372, NGC 5927, NGC 2808, M 15, NGC 4833, NGC 6752, NGC 104, NGC 1904, NGC 6553, NGC 1261, and M 12. For the simulation, a typical value of ±0.05 was used as the reference error of the metallicities of the clusters.

Cluster members were either adopted from Pancino et al. (2017b) or a two-sigma cut around the mean of the radial velocities was used. For computing the biases, we essentially needed to select a majority of members (but not necessarily only members). This crude membership criterion was found to produce acceptable results. At the end of the homogenisation, we found that the reference [Fe/H] values were recovered with a standard deviation of ±0.09 dex. The internal errors have a median value of 0.07 dex, with the first and third quartiles at 0.06 and 0.10 dex.

We find a few important remarks regarding the final WG11 metallicities should be made: (i) Most of the globular cluster stars are bright giants, with log g ≤ 2.0, and are thus in an area of the parameter space where the WG11 pipelines usually do not perform very well. (ii) Better agreement with literature values for the metal-rich open clusters and metal-poor globular clusters was achieved at the expense of an increased scatter in the homogenised [Fe/H] values of the benchmark stars. This seems to indicate that the two scales (literature clusters and benchmarks) have important differences. (iii) The final metallicity for three of the cool benchmark stars (GJ205, GJ436, and GJ581) was very low. An inspection after the homogenisation revealed that the input node values are always much lower than the reference values. The conclusion seems to be that the WG11 metallicities for cool stars (Teff ≤ 4000 K) are not reliable.

5.4 Working Group 11: homogenisation of the microturbulence

The homogenisation of the microturbulence is different from the other parameters since there are no benchmark values that can be used as reference. Instead, we made use of the Gaia-ESO microturbulence calibration derived using iDR5 results to write the prior for the true values of ξ: true.xindnorm(calib.xin,0.25),${\rm{true}}{\rm{.x}}{{\rm{i}}_n} \sim dnorm\left( {{\rm{calib}}.{\rm{x}}{{\rm{i}}_n},0.25} \right),$(7)

where calib.xin is computed using the homogenised values of Teff, log g, and [Fe/H], and we assumed a typical uncertainty of 0.25 km s−1. When writing Eq. (4) for ξ, we did not consider biases. In the Bayesian simulation, to homogenise this parameter, both the true values and the covariance matrix were determined at the same time.

In essence, the Bayesian modelling of ξ is an elaborated way of finding the mean of the distribution of multiple node values, with the advantage of taking into account the correlations between the nodes and of using the calibration as a prior. We remark, however, that the final results are indeed different from both the simple mean of the individual node results and from the direct application of the calibration.

5.5 Consistency checks of the final Working Group 11 stellar parameters

Here, we discuss the final stellar parameters obtained within WG11. We remark that these are not necessarily the final Gaia-ESO parameters for the stars we analysed, as there is still a process of survey-wide homogenisation. This final homogenisation process is described in the companion paper by Hourihane et al. (2023).

Figure 2 shows the Teff-log g diagram of the homogenised results, in bins of metallicity and in comparison to isochrones. The agreement with the location of the isochrones is in general very good. The location of the main sequence and of the red giant branch are in general well reproduced.

At the lowest metallicity bin (where no isochrones are plotted), the scatter does seem to be excessive. There is the possibility that some of these stars are not real metal-poor stars, but artefacts of the analysis. We recall that for the cool benchmarks, the WG11 homogenised metallicities were too low. Investigation showed that some hot stars (>7000 K) included in the sample also ended up with very low metallicities. Care is therefore advised when using the results for the most metal-poor stars.

Figures 3 and 4 show Teff-log g diagrams for a few open and globular clusters. When possible, membership information was obtained from previous Gaia-ESO papers (Spina et al. 2014; Magrini et al. 2017; Pancino et al. 2017b; Randich et al. 2018). If that was not possible, a simple two-sigma cut in radial velocity was used as a first estimate of membership. (We note that since this analysis was carried out Jackson et al. 2022 has provided key cluster membership lists. These are used in the verification of the final Gaia-ESO dataset in Hourihane et al. 2023.)

As can be seen, the agreement for the open clusters is excellent. The member stars tend to follow the isochrones, in particular for the case of red giants in older open clusters. In the case of young clusters, it happens that the main sequence stars are usually sitting slightly above the isochrone. We did find a few issues, however. For M67, the subgiants seem to have a homogenised log g that is too high for their temperatures. In addition, the analysis of the Pleiades spectra did not return good atmospheric parameters. Although in this last case, the spectra are not from UVES, and we believe there is something different in the data creating some kind of systematic problem in the analysis.

The agreement for the globular clusters is good in many cases, but there are cases of disagreement. In particular we mention NGC 1904, NGC 4833, NGC 5927, NGC 4372, and M 15, nearly all of which, except NGC 5927, are metal poor. In these cases, the stars tend to have temperatures that are cooler than expected from the position of the isochrones. In general, the stars in the globular clusters are bright giants (log g < 1.5). For such stars, the WG11 analysis does not seem to be very robust. We recommend care when using results for these stars.

We also checked for trends between metallicity values and Teff or log g for the same selection of open and globular clusters. In most cases, trends were not seen, are very small, or driven by one outlier whose membership could be questioned. However, there are cases (e.g. NGC 2243 and Trumpler 20) where correlations were detected. In other cases, large scatter can be present, as is particularly seen in younger clusters (see e.g. NGC 2516 or IC 4665). We point out that some of the trends and large scatter are not errors induced by the homogenisation, but are effects that appear from limitations in our methodology. The recent work by Baratella et al. (2020) suggests that traditional methods of analysis that rely on the various equilibria of Fe lines fail for young stars because the microturbulence is overestimated. Another example is the work of Semenova et al. (2020), which indicates that abundance trends in the 2 Gyr open cluster NGC 2420 can be explained by neglected 3D non-LTE effects.

thumbnail Fig. 2

Teff-log g diagram with the WG11 recommended results. PARSEC isochrones (Bressan et al. 2012) are shown for ages of 1 and 12.5 Gyr (violet and orange, respectively) and for the minimum and maximum metallicity indicated in each panel (dashed and solid lines, respectively). Red crosses are stars in open clusters, blue circles are stars in globular clusters, and the black starred symbols are the remaining stars.

thumbnail Fig. 3

Teff-log g diagrams for the open clusters IC 2602, NGC 6663, M67, and Trumpler 20.

thumbnail Fig. 4

Teff-log g diagrams for the globular clusters M2, NGC 104, NGC 362, and NGC 1851.

5.6 Working Group 11: homogenisation of chemical abundances

Overall, the WG11 nodes attempted to derive abundances for 38 atomic species and two molecules. The atomic and molecular data that were used in the analysis are those described in Heiter et al. (2021). Abundances of O I using the forbidden line at 6300 Å, of carbon from molecular C2, and nitrogen from CN bands were derived using spectrum synthesis by the Vilnius node only (see Tautvaišienė et al. 2015, for details). Abundances and upper limits of Li I come from measurements by the Arcetri node (Franciosini et al. 2022). None of these abundances have gone through a homogenisation process.

For all other atomic species, we used individual line abundances for the homogenisation. The measurements come from a mix of equivalent width (by the CAUP, EPINARBO, and Vilnius nodes) and spectrum synthesis analyses (by LUMBA and, in the case of Mg I, Ba II, Ce II, La II, Pr II, Y II, Zr I, Zr II, and Nd II, by the Vilnius node). The final list of species from which abundances were estimated, in addition to the metallicity itself, is: Li I, C (from C2), C I, N (from CN), O I, Na I, Mg I, Al I, Si I, Si II, S I, Ca I, Ca II, Sc I, Sc II, Ti I, Ti II, V I, Cr I, Cr II, Mn I, Co I, Ni I, Cu I, Zn I, Y II, Zr I, Zr II, Nb I, Mo I, Ba II, La II, Ce II, Pr II, Nd II, Sm II, and Eu II (i.e., 30 different chemical elements).

We advise particular care when using the abundances from S I, Ca II, Sc I, Mo I, and Nb I. These abundances come from a single (or a few) weak and/or blended lines. They were measured with equivalent widths, but precise results probably require spectrum synthesis. Quality control led us to reject all abundances from Sr I, Ru I, and Dy II, and therefore they are not part of the release.

Before homogenisation, we ran quality checks on the individual line abundances. For each line, we produced three plots of abundance as a function of Teff, log g, and [Fe/H]. In these plots, we visually checked for trends, excessive scatter, and offsets among the nodes, and we removed anything that appeared suspicious. We also excluded lines that had been measured only in a small number of stars or by only one or two nodes (if there were other lines that had been measured by several nodes).

Homogenisation was also performed using a Bayesian modelling, with an adapted version of Eq. (4) for a given chemical species: node.abundancesndmnorm(abun.μn,Σspecies),${\bf{node}}{\bf{.abundance}}{{\bf{s}}_n} \sim dmnorm\left( {{\bf{abun}}{\bf{.}}{{\bf{\mu }}_n},{\Sigma _{{\rm{species}}}}} \right),$(8)

where node.abundancesn combines all line abundances measured by every node for star ‘n’. The covariance matrix Σspecies has dimensions equal to the sum of the number of lines used by all nodes, and, by itself, it introduced a large number of free parameters in the model. The mean vector node.abundancesn, expressed in a similar manner to Eq. (5), combines the true abundance, true.abunn, of that species in star ‘n’ with the line biases line.biasj.

The line bias was not introduced as a property of the node but of the spectral line j. This was meant to take into account a possible bias coming from uncertainties in the log gf value of the lines. In principle, variation of this line bias in the parameter space could be introduced in order to model the changing importance of blends in different types of stars. However, this was not implemented, as it would introduce too many additional free parameters in the model. Distinct priors for the line biases were introduced depending on their quality flags in the form line.biasjdnorm(0.0, sigma.bias),${\bf{line}}{\bf{.bia}}{{\bf{s}}_j} \sim dnorm(0.0{\rm{, sigma}}{\rm{.bias),}}$(9)

where sigma.bias is equal to 0.01, 0.02, 0.05, or 0.1, for lines with (SYNFLAG,LOGGFFLAG) = (Y,Y); (Y,U) or (U,Y); (U,U); (N,?) or (?,N), respectively.5 To avoid that this line bias diverges when only one or two lines are measured, we found it necessary to change sigma.bias to 0.002, 0.01, 0.02, and 0.05.

Apart from the solar abundances, there are no other fundamental reference values that can be used to constrain the covariance matrix and the line biases. Because of that, the homogenisation of abundances was run as a single step, similar to what was done for ξ. Priors were used for the true.abunn of each star. For the Sun, the abundances from Grevesse et al. (2007) were used as a strong Gaussian prior with σ = 0.001. For the abundances of Mg, Ti, Ni, Mn, and V, we found it helpful to introduce the abundances of the benchmark stars as additional priors (Jofré et al. 2015). For the other stars, we used a Gaussian distribution as the prior, with the mean at the metallicity-scaled solar abundance and σ = 0.4. For Ba II, Cr I, Cr II, Ca II, Ni I, Y II, Mn I, Zn I, Si II, Sc I, and V II, this had to be changed to σ = 0.1 in order to decrease the final scatter of the abundances.

Essentially, although it looks more complicated, the method can be considered as a sophisticated way to define a weighted mean. The sophistication lies within estimating the random errors of each node (i.e. the weights) directly from the data and in allowing for the line biases.

6 Working Group 10: homogenisation of stellar parameters

In this section, we describe the homogenisation of the WG10 stellar parameters. As shown in Table 2, for WG10 there were five nodes that provided parameters across the four GIRAFFE SETUPs. The specific parameters and number of spectra analysed per SETUP per node is shown in Table 4. For each SETUP there were two to four sets of node results available with which to perform the homogenisation.

For the MW observing programme (not including the BL fields) two SETUPs were observed, HR10 and HR21. These were selected, as they contain key lines that have a different sensitivity to surface gravity depending on whether the star is a giant or a dwarf, thus breaking the dwarf-giant degeneracy due to effective temperature. See Sect. 7 for details on the wavelength regions.

For this reason, for the MW fields, it was recommended that the nodes analysing the MW SETUPs combine HR10 and HR21 in a single analysis. However, nodes were free to analyse the data as suited their method, and all the data they provided were used in the homogenisation.

In particular, IAC provided two sets of analysis for the MW fields, the analysis of HR10 combined with HR21 (the HR10|HR21 SETUP) and the analysis of HR10-only. MaxPlanck provided results for the HR10-only SETUP. During the quality control phase, MaxPlanck investigated combining the HR10 and HR21 spectra (SETUP=HR10|HR21) as a single analysis but concluded that these results were not reliable for their process and therefore did not provide them.

As the HR10-only and HR10|HR21 analyses effectively covered the same sample, the Lumba HR10|HR21 results, the IAC HR10|HR21 and HR10-only results and the MaxPlanck HR10-only results were all used for the homogenisation of the MW fields. For the remainder of this work, the HR10|HR21 homogenisation refers to these four sets of node results.

The MW BL fields were observed at a higher S/N in the HR21 SETUP than the main MW fields and it was decided to not observe the same fields in HR10. (See Gilmore et al. 2022 for more details on the observing strategy.) The nodes analysed the HR21-only BL fields and the standard fields that had also been observed in HR21. These samples were used in the homogenisation of the HR21-only SETUP.

The OC SETUPs, HR15N and HR9B, covered different samples of open cluster stars, so the spectra of these SETUPs were analysed separately. (See Randich et al. 2022; Bragaglia et al. 2022 for further details on the observing strategy.) For HR15N three nodes provided results, while for HR9B, two nodes provided results.

Figure 5 shows the distribution of Teff, log g and [Fe/H] provided by each node for each SETUP. The node results and associated reports were reviewed before homogenisation.

The flag information (TECH, PECULI, REMARK) was inspected, and the WG14 flags that were used by each node were assessed to determine whether the associated results should be used in the homogenisation. The flags were assigned by each node based on the definitions in the WG14 Flag Dictionary.

There were 20 flags that were determined at the WG10 homogenisation level to mean that the associated results, if present, should not be used in the homogenisation. Which flags were reported as well as whether or not the results were provided varied between nodes so if the flag was present, the result (null or otherwise) was not used. The flag prefixes, the WG14 flag descriptions, and the number of spectra per node for which they were used are listed in Table 5.

Inspection of the resulting node datasets showed that both the IAC and MaxPlanck analyses had results lying at the parameter grid limits that were not flagged. The MaxPlanck analysis also showed a non-physical feature at Teff=4000 K, log g=4 dex that is not present in the other node results. These are indicated as red points in Fig. 5 and these results were removed prior to the homogenisation.

Table 4

Number of parameters provided per NODE per SETUP.

thumbnail Fig. 5

Distribution of Teff with log g and Teff with [Fe/H] for each node for each SETUP. Unflagged but rejected IAC and MaxPlanck results are shown in red.

Table 5

WG14 Flags used by WG10 NODES.

Table 6

Cross-match of stars (NCN) and spectra (NSP) between WG11 and WG10 SETUPs.

6.1 Parameter reference set

Constructing a reference set using stars in common with WG11 was explored for each of the WG10 SETUPs for both the parameter phase and the abundance phase. Table 6 gives the cross-match of each WG10 SETUP to WG11 and between WG10 SETUPs.

The decision to use the WG11 results as the source of the reference set against which to derive the WG10 results was driven by the reasoning that this would immediately put the WG10 results onto the WG11 scale. The cross-match between WG11 and WG10 then comprised a larger more comprehensive set of stars in common than the process of using the reference sets that have more sparse coverage.

While the WG11 and WG10 observing programmes were not designed with an overlap in the parameter space for calibration purposes, the cross-match of each WG10 SETUP to WG11 was reasonably well sampled, in particular for Teff and log g. The cross-match between WG10 SETUPs was also explored for use in the parameter phase as another way to expand the reference set for each SETUP.

As shown in Table 6, the SETUP with the largest per star (CNAME) cross-match to WG11 is the HR15N dataset. The cross-match of the HR10 dataset to WG11 is almost three times less than the HR15N dataset cross-match to WG11. However, the cross-match of the HR10 dataset to the HR15N dataset is over 21 times greater than the HR10 dataset cross-match to WG11. This is particularly due to the CoRoT sample for which all of the CoRoT fields were observed in all three SETUPS: HR10, HR21, and HR15N. The cross-match between HR21 and HR15N reflects that of HR10, as the HR21 targets were observed either in combination with HR10 or specifically for the BL fields. Therefore, the cross-match of HR21 to HR10 is particularly good. Targets in HR9B were observed to complement HR15N, so the cross-match with the HR15N dataset is almost four times greater than that with WG11 targets.

From this assessment, a bootstrapping approach was taken to ensure all the SETUPs were homogenised onto a common scale. However, the last gap that needed to be covered was the lack of metal-poor reference stars, particularly for the HR10|HR21 (MW) and HR21 (BL) SETUPs in which metal-poor stars were most likely to be found, rather than in HR15N and HR9B (OC).

6.1.1 Filling in the metal poor tail with globular clusters

Galactic globular clusters (GCs) were included as part of the Gaia-ESO calibration strategy to provide a reference set across the metal-poor regime. (See Pancino et al. 2017a for more details.) Inspection of the cross-match between WG11 and the WG10 SETUPs revealed that there were not many globular cluster stars in common and what was in common was not sufficient to well sample the metal-poor end. However, there were globular cluster stars present that could be used as [Fe/H] reference stars, assuming they could be confirmed as globular cluster members and thus could be assigned an associated [Fe/H] value.

To that end, a detailed globular cluster membership analysis was carried out using Gaia DR2 data in order to identify which stars in both WG11 and WG10 were globular cluster members (Worley et al., in prep.). The WG11 members per cluster were used to provide an average [Fe/H] and dispersion for that cluster. Then, for each WG10 cluster, if a cluster did not include a WG11 star that was a cluster member (none in the cross-match sample), the three most probable WG10 cluster members present in that SETUP were assigned the WG11 average [Fe/H] and dispersion as the value and its uncertainty. These then became the reference values against which those stars per node were compared. There are no globular cluster stars in the HR9B dataset, and the HR21 to HR10|HR21 overlap was already sufficient in globular cluster stars, so that this fill-in procedure was not implemented for these two SETUPs, and it was used only for HR10|HR21 and HR15N. While HR15N was used primarily to observe open cluster stars, which are mainly solar metallicity and thus metal-poor stars were not expected, archival data of globular clusters were available in HR15N for some of the globular clusters that were observed in HR10 and HR21. This expanded the globular cluster star sample and provided another inter-SETUP calibration sample.

6.1.2 Construction of per SETUP reference sets for parameter homogenisation

For the parameter homogenisation phase, to maximise the size of the reference set for each SETUP, a bootstrapping procedure was implemented as listed in Table 7.

Figures 6 and 7 show the construction of the per SETUP reference sets as Kiel diagrams and [Fe/H] against Teff. The Kiel diagram of the full sample for each SETUP is shown, then the final reference set and the cross-match with WG11 are shown for comparison. The key samples as described above that are present in each reference set are also shown in order to show what part of the parameter space each key sample occupies.

Table 7

Bootstrapping SETUP order and reference set content.

6.2 Corrections to node parameters based on reference sets

As described in Sect. 4 the node results per SETUP were first assessed against the reference set to determine a bias correction. An example of the difference between the reference set values and the calculated bias corrections for the nodes and parameters for HR15N are shown in Fig. 8.

Table 8 gives the coefficients for each bias correction per SETUP per node per parameter as well as the independent parameter against which each correction was calculated. The mean and standard deviation of the reference values per parameter are given per node. These values vary between the nodes as each node did not necessarily provide values for the complete set of reference stars. These values were used to normalise the node and reference values before the correction function was determined as described in Sect. 5. The median and standard deviation of the difference between the node and reference values are also given per parameter per SETUP in Table 8.

Each dataset was investigated in great detail using the relevant reference set in order to identify the polynomial function and independent parameter that provided the optimal correction. A variety of quality criteria were used to assess the agreement of the homogenised values to the reference values, such as difference measures (median and standard deviation) on the whole sample and sub-samples. In an extensive quality control process, results from different combinations of reference sets, polynomial fits, and independent parameters were compared to finally converge on the corrections provided in Table 8 using the WG11 Bayesian implementation (see Sect. 5) .

In the majority of cases, using [Fe/H] as the independent parameter provided the optimal correction. For HR10|HR21, the investigations showed that for all the nodes, there was a different trend with [Fe/H] between dwarfs and giants. Using log g or Teff as the independent parameter did not capture the correction sufficiently either. Ultimately a two-parameter correction against both [Fe/H] and log g was used in those cases, as listed in Table 8.

thumbnail Fig. 6

Reference sets for HR15N and HR10|HR21 setups. Top row: Kiel diagrams for full HR15N sample, final HR15N reference set and WG11 cross-match with HR15N. Second row: same as for the top row but for [Fe/H] versus Teff. Third row: Kiel diagrams of the four main samples: benchmarks (yellow), OC (magenta), GCs (green), and MW (blue) within the HR15N reference set overlaid on the full reference set. Fourth row: same as for the third row but for [Fe/H] versus Teff. Fifth to eighth rows: same but for HR10|HR21.

thumbnail Fig. 7

Reference sets for HR21-only and HR9B SETUPs. Top to fourth rows: same as in Fig. 6 but for HR21-only. The reference samples are benchmarks (yellow), BL (red), GCs (green) and MW (blue). Fifth to eighth rows: same as in Fig. 6 but for HR9B.The reference samples are benchmarks (yellow), OC (magenta), and MW (blue).

Table 8

Coefficients for the WG10 bias corrections.

thumbnail Fig. 8

Difference of node (EPINARBO, Lumba, OACT) values to reference set values for Teff, log g and [Fe/H] for HR15N. The difference of the final homogenised reference set values from the reference set values per parameter for HR15N are shown in the bottom row.

6.3 Combining SETUPs for final Working Group 10 parameter homogenisation

The final step in the WG10 homogenisation process was to combine the per SETUP homogenisation into the per CNAME homogenisation, which is the single star catalogue for all CNAMEs analysed within WG10. Due to the bootstrapping procedure used to construct the reference sets, each homogenisation per SETUP was ultimately bootstrapped onto the WG11 scale.

Table 9 lists the mean and standard deviation of the difference between the homogenised values and reference values per SETUP. In all cases, the offsets are close to zero and within the spread of the differences (Δ) given by the standard deviation, which indicates very good agreement between the per SETUP homogenised values and the reference values. The dispersion of the difference (standard deviation) is generally two or three times higher than the typical uncertainties of the homogenised stellar parameters. Therefore, due to the bootstrapping procedure, the homogenised values per SETUP were all assumed to be on the WG11 parameter scale and can thus be combined without further correction.

The results per SETUP were combined into the final WG10 per CNAME catalogue. The majority of CNAMEs were observed using only one SETUP. However, for the cases in which results from multiple SETUPs were available (e.g. the reference sets) an order of priority was implemented reflecting the science programmes and calibration samples as specified by GES_TYPE (see Table 1 for definitions). The priority order depending on the GES_TYPE are given in Table 10.

Figure 9 shows the homogenised parameters per SETUP and the final WG10 homogenised parameters as Kiel diagrams with a metallicity colour map. As no combining of values was performed, features in each per SETUP Kiel diagram can be identified within the final Kiel diagram, which itself coherently displays the morphology of the branches of stellar evolution.

Figure 10 characterises the final WG10 stellar parameters with respect to the key quality measures of S/N, number of nodes (NN) contributing to the final result, error on Teff, error on log g, and error on [Fe/H]. The top row shows the histogram of each of these quantities, the middle row shows the Kiel diagrams of the final stellar parameters binned with respect to each quantity, and the bottom row shows the metallicity distribution as a histogram also binned with respect to each quantity.

The S/N and NN both show a decrease in scatter and a more refined stellar evolution morphology in the Kiel diagrams with better quality results (e.g. more signal and more node results contributing, respectively). For Teff, log g, and [Fe/H], the errors have a significant peak around a particular value (~65 K, ~0.17, ~0.08, respectively), which is reflected in the binning of those quantities in the Kiel diagrams. However, the scatter is reduced with bins of decreasing error.

The metallicity distributions are less informative on this aspect. For the error quantities shown in Fig. 10 (m, n, and o), the bulk of the values lie about a single value and thus fall mainly in a single bin.

However, Fig. 10k shows the peak of the metallicity distribution moving towards solar with bins of increasing S/N. This reflects the sampling of the medium-resolution data, as the fainter targets are typically more distant, and so the peak reflects the more metal-poor populations of the thick disk and the halo. The brighter targets are typically closer, sampling the thin disk and the solar neighbourhood, which are typically solar metallicity. Hence the metallicity distribution of the mediumresolution data reflects the expected trend of metallicity with stellar populations.

Figure 10l shows the metallicity distribution binned with NN. The peak does not shift between bins and the shape of the distributions are similar, indicating that the bins with fewer than the maximum NN contain a similar sample in general, and that NN does not necessarily track with S/N.

This discussion illustrates how the quality measures can, and indeed should, be used to refine the WG10 dataset for any study in Galactic Archeaology. To be most effective these quality measures should be considered both individually and together.

Table 9

Difference (Δ) of WG10 SETUP homogenisations to reference.

thumbnail Fig. 9

Kiel diagram with a metallicity colour map for the per SETUP homogenised parameters: Panels from left to right: a) HR15N, b) HR10|HR21, c) HR21, d) HR9B, and e) the final WG10 homogenised parameters. All panels are on the same colour map scale.

Table 10

Priority order for selecting final result in cases of results from multiple SETUPs.

6.4 Verification of the homogenised Working Group 10 stellar parameters

For the parameter homogenisation, as shown in Figs. 6 and 7, key sub-samples were included within the parameter reference set. Verification of the WG10 homogenisation as part of the greater homogenisation of Gaia-ESO is explored in detail in Hourihane et al. (2023) with particular attention to these sub-samples. As such, only the FGK benchmark stellar parameters, the WG11 cross-match stellar parameters, and the GC metallicities are reviewed in this section.

Figure 11 shows a comparison of the reference stellar parameters for the FGK benchmark stars with the values determined in both WG11 and WG10, and a comparison of the WG11 stellar parameters with the final WG10 stellar parameters for the cross-match between WG11 and WG10. The median and standard deviation of the differences are also given. Overall, the agreement between these reference sets and the final parameters is good, with relatively small offsets and small spread in differences within the typical errors of the stellar parameters.

Inspecting further the stars with large differences (> 3σ) to the benchmark parameters, for WG11, 61_Cyg_B shows a notable disagreement in both Teff (−215 K) and [Fe/H] (0.34 dex). It is a close-to-solar-metallicity K dwarf which the nodes analyses should have dealt with quite well. However, the spectrum analysed was in fact non-UVES archive spectra from the benchmark spectral library made to be UVES-like for the WG11 node analyses in an expansion of the calibration effort. Making the spectrum UVES-like may have caused an issue with the archived data, although this star is not in common with WG10 and thus was not used in the WG10 homogenisation.

HD 122563, the very metal-poor ([Fe/H] = −2.64) luminous giant (log g = 1.61), represents a difficult combination of parameters. The difficulty shows up as a significant difference compared to the benchmark parameters in Teff for the WG11 result (−383 K), and in log g for the WG10 result (0.84). HD 84937 is also a metal-poor ([Fe/H] = −2.03), albeit dwarf, star for which there was a significant difference in Teff for the WG10 result compared to the FGK benchmark result.

The WG10 node analyses also struggled with luminous giants, as shown by the trio of low log g stars with large differences compared to the benchmark log g.

Finally there is a difference in [Fe/H] of 0.34 dex for the WG10 results for the K giant, Arcturus, placing it as more metal poor than the FGK benchmark accepted value.

Overall, these discrepancies indicate that the WG10 and WG11 results in the parameter space of metal-poor stars and luminous giants are not as robust as in the parameter space of more metal-rich, high gravity stars within the survey dataset. This is not unexpected, as metal-poor stars and luminous giants were not the primary FGK science targets of the Gaia-ESO survey (Gilmore et al. 2022; Randich et al. 2022), thus ensuring robust parameters for these types of stars was not the main focus of the node analyses.

However, the metal-poor stars, whilst few and only comprising two benchmarks, end up in quite good agreement with the reference [Fe/H] values for the WG10 homogenisation. As described above, to supplement the very few metal-poor benchmarks, the mean WG11 [Fe/H] value per GC was used to try to anchor the metal-poor end in the WG10 analysis by imposing that value on the respective highly probable cluster members in the WG10 and including them in the parameter reference set.

Figure 12 shows the outcome of this effort, by comparing the mean [Fe/H] values of the GC members in WG11 and WG10 to the reference values (Harris 1996, Harris 2010 edition), where the WG11 values are those that were imposed in the reference set if needed. The mean and standard deviation of each WG sample to the reference values are also given.

The majority of the mean GC values for both WG10 and WG11 are within 0.1 dex of the reference values, with M2 being the main outlier. There is a large spread in [Fe/H] values for the WG10 stars defined here as members of NGC4833, although the mean value agrees well with the reference value. Otherwise, the spread in [Fe/H] per GC, particularly at the metal-poor end, are reasonable, and the mean [Fe/H] of each GC for WG10 generally track with WG11, indicating that the attempt to anchor the metal-poor end of the WG10 dataset with the WG11 GC mean values was relatively successful.

thumbnail Fig. 10

Characterisation of the final WG10 stellar parameters with histograms (top row), bins in Kiel diagrams (middle row) and bins in metallicity distribution (bottom row). Specific panel content are: (a, f, k) S/N; (b, g, l) number of nodes (NN); (c, h, m) error on Teff ; (d, i, n) error on log g; (e, j, o) error on [Fe/H].

thumbnail Fig. 11

Comparison of WG10 (black) and WG11 (red) stellar parameters for: Left column: the FGK benchmarks stars against the reference values. The mean difference and standard deviation are given. Three sigma limits are shown as dashed lines. Right column: the cross-match between WG10 and WG11 against the WG11 values. The mean difference and standard deviation are given.

thumbnail Fig. 12

Comparison of mean [Fe/H] per GC for WG10 (black) and WG11 (red) against the reference values. The mean difference and standard deviation are given.

7 Working Group 10 homogenisation of chemical abundances

The strategy of the chemical abundance homogenisation was to combine in a single step the per spectral line element abundances derived by each node for each SETUP per CNAME. We refer to these element abundances as the line-by-line (LbL) abundances. Hence, all SETUPs were combined at once per CNAME rather than homogenising the results for one CNAME one SETUP at a time and then combining the per SETUP results.

The wavelength ranges across the solar spectrum for each of the four WG10 SETUPs, and the location of the spectral lines used by the nodes to measure abundances are shown in Fig. 13. The list of all lines measured is provided in Table A.1. These are taken from the Gaia-ESO line list (Heiter et al. 2021). In the following tables and figures, a capitalised format for designating the elements is used in which the final digit indicates the ionisation state (1=neutral, 2=singly ionised). This matches the data model used within the survey.

Table 11 gives the number of CNAMEs analysed by each node per element species per SETUP as well as the specific line list references. Two numbers are provided: ‘D’ is the number of detections, and ‘L’ is the average number of spectral lines measured per species. The number of CNAMEs in the WG11 cross-match to all the WG10 SETUPs with WG11 abundances per element species is also given. There was no requirement on the nodes to measure the abundance of every possible element in all four SETUPs. Hence, as can be seen in Table 11, the node results are a complex dataset with varied coverage of the chemical abundance space.

7.1 Removing extrema from node results

Table 12 gives some broad rejection criteria that were applied to specific element datasets, as extreme values (in absolute abundance) were identified that were not reasonable compared to the bulk of the distribution. Further cleaning was of course possible, but the goal was to take the node analyses as provided and to try to use as much information as possible. Quality measures such as S/N, NN, and errors should be used to refine the dataset as needed. This allows for differences between scientific studies regarding the tolerable level of uncertainty in the data.

7.2 Homogenisation procedure for Working Group 10 chemical abundances

It was important to follow a methodical procedure to obtain the optimal homogenisation of the WG10 LbL chemical abundances. The Bayesian inference method used in the parameter phase was not used here due to the range of incompleteness in the measurements, as the gaps made it difficult to apply the method consistently.

A simple procedure was therefore employed, of which the key steps are as follows. (1) Calculate the correction to the WG11 element abundance scale for each element based on LbL abundances for each SETUP for each node, using the set of cross-matched stars of the SETUP to WG11. (2) On a per star per element basis, apply the correction for each node for each setup. (3) Reject LbL abundances following rules set by WG11. (4) Take the median of the corrected LbL abundances across all nodes and SETUPs per element per star to calculate the final abundance for that star. (5) Take the standard deviation of the corrected LbL abundances for the element for the star as the error on the element abundance. (6) Take the number of NODE+SETUP analysed as the NN contributions to the abundance determination.

The homogenised abundances were then assessed using the quality control samples that are described in the following section. As the full distribution could then be inspected, this revealed issues with the corrections that could not be detected on the much smaller cross-match with WG11. Each element distribution was inspected, and adjustments to the correction were made when warranted such that the homogenisation was run again. This iterative process from correction to homogenisation to quality control to correction was repeated several times to home in on the optimal homogenisation.

thumbnail Fig. 13

ESO Solar spectrum (https://www.eso.org/observing/dfo/quality/UVES/pipeline/FLAMES_solar_spectrum.html) reduced with the Gaia-ESO GIRAFFE reduction pipeline for the WG10 SETUPs. Spectral lines analysed by the WG10 nodes are indicated by vertical lines coloured by groupings of elements.

7.3 Correction to Working Group 11 cross-match reference set

The strategy used in the parameter homogenisation (i.e. bootstrap each SETUP onto a reference set based on the previous SETUP plus other reference stars) could not be employed for the abundance analysis due to the decision to homogenise all spectral lines for an element across all SETUPs at once. Thus, for each CNAME, all the spectral line abundances from all the possible setups from all the possible nodes were combined to derive the final abundance. There was no homogenisation of each setup in turn. Thus the only reference set available was the cross-match to WG11. No bootstrapping between SETUPs was possible.

The WG11 cross-match was not complete in the parameter space; in particular, there were gaps in the [Fe/H] space. When the cross-match was deemed insufficient, alternate procedures were adopted, which are explained below. The number of stars (CNAMEs) in the cross-match between WG11 and each of the WG10 SETUPs is given in Table 6. However, depending on the node and the SETUP, there were not necessarily abundances for the full set of stars in each cross-match. In some cases, there were no abundances available for the WG11 cross-match stars, or there were no WG11 abundances available at all.

In the general case, for each ELEMENT+SETUP+NODE combination a set of corrections were calculated between the node values and the reference set values for each parameter as given in Table 13.

The corrections were calculated on the binned difference between the node values and the WG11 values in the associated SETUP cross-match. The corrections were calculated for the whole sample as well as separately for the dwarf (log g > 3.4) and the giant (log g < 3.4) samples.

An example of the set of corrections that was calculated for a particular element for a particular node for a particular SETUP is shown in Fig. 14. This example (CAUP+HR21+MG1) shows how the difference between the node values and the reference values can behave differently depending on the independent variable that is used and how the sample is or is not separated. In this case, the median offset was applied for the dwarf sample, while the quadratic fit against FEH was applied for the giant sample, see Table B.1.

For each NODE+SETUP+ELEMENT combination, the difference was calculated between the LbL abundances and the reference abundance value for the WG11 cross-match (black points in Fig. 14). The set was then divided into ten evenly distributed bins spanning the range of the reference values for the respective parameter (Teff, log g, or [Fe/H]). The median and standard deviation of the differences in each bin were then calculated (orange points with error bars in Fig. 14). The median difference and standard deviation, linear fit, and quadratic fit to the binned data points were then calculated (shown as orange dot-dashed line and red dot-dashed line, respectively in Fig. 14). The coefficients and goodness of fit for the range of corrections were returned and examined.

Table B.1 gives the coefficients of the fit and the parameter range for the final set of corrections. While useful numbers with which to derive a correction were returned for the majority of SETUP+NODE+ELEMENT combinations (indicated as WG11xmat in the ‘Calibration’ column of the table), there were nonetheless cases for which there were not enough data points with which to work.

Notes. List of SETUP+NODE+ELEMENT combinations for which the sample was compared to the solar chemical abundance for deriving the correction offset, Scaled Solar.

There were three exceptions to the general case: (1) Insufficient element abundances in the WG11 cross-match sample but a reasonably useful number in the rest of the WG11 dataset (WG11-full). (2) No WG11 abundances at all for that element (Scaled Solar). (3) Super-solar trend in HR21 compared to HR10 (HR21toHR10-WG11).

In the first case, there were nine SETUP+NODE+ ELEMENT combinations for which the full WG11 element abundance distribution was used to estimate a correction. These combinations are listed in Table 14.

In the second case, there were two combinations for which the only option was to scale to the solar abundance. These two combinations are listed in Table 15.

In the third case, comparison of the HR21 BL to HR10 BL abundances revealed an exaggerated upturning to enhanced abundances at the metal-rich end. Further exploration showed that this was a difference between the giants and dwarfs. The dwarf sample did not show this in neither HR21 nor HR10. This upturning seemed extreme for an astrophysical effect, but it could not be compared to the reference sample as there were no stars in common between HR21 and WG11 for the bulge sample.

However, there was the cross-match sample between HR21 and HR10 to examine. The equivalent set in HR10 did not show such an extreme upturning at the metal-rich end, though for some abundances it was slightly present which could indicate an astro-physical effect. The goal was to put giants in HR21 onto the same scale as giants in HR10 but not to remove the feature completely if present in both sets of results.

Thus HR21 giants cross-matched to the HR10 giants sample were used to remove any systematic without erasing a potential astrophysical signature. However, just because the giants in HR21 behaved differently compared to the dwarfs in HR21, this did not necessarily mean that the dwarfs in HR21 behaved the same as those in HR10. It was necessary to investigate a correction to HR10 for the dwarf targets in HR21 to also ensure all targets were put onto the HR10 scale. As HR10 was corrected onto the WG11 scale separately, this was carried out first. Then HR21 was corrected onto HR10, which had already been corrected onto the WG11 scale. The SETUP+NODE+ELEMENT combinations for which the corrections needed to be calculated are listed in Table 16.

Figure 15 illustrates the process of determining and applying the correction using the Vilnius and Ti I results as an example. The panels show the Ti I abundances against [Fe/H], comparing HR21 with the uncorrected HR10 (HR10uncor), HR10 corrected to WG11 (HR10cor), and WG11. The first row shows the crossmatch to HR10 for the bulge sample (BL), the second row shows the giant sample (GT) and the third row shows the dwarf sample (DW). The first column shows HR21 uncorrected. The upturn at super solar is clear in the HR21 giant sample (we note that the bulge stars are also giants) when compared to the HR10 giant sample, the dwarfs in both HR21 and HR10, and WG11. The second column shows the linear, quadratic and cubic fit to the difference in Ti I values of the cross-match between HR21 and HR10uncor and HR10cor for the bulge, giants, and dwarfs. The third column applies the correction from the quadratic fit to the HR21 values in each case. The procedure successfully scales HR21 and HR10 to WG11 while retaining any subtle potentially astrophysical effects.

Table 11

Summary of element abundance detections (D) and lines measured (L).

Table 12

Element measurements rejection criteria.

Table 13

Set of corrections calculated for each parameter for each ELEMENT+SETUP+NODE combination.

thumbnail Fig. 14

Difference between node values and reference values in example case of the CAUP analysis of Mg I measured in the HR21 spectrum. Top row: differences against [Fe/H] for the full sample (left-black), the giants (middle-red), and the dwarfs (right-blue). Middle row: same but against log g. Bottom row: same but against Teff. The differences as median differences calculated per bins are shown as orange points with error bars. The linear fit (orange dot-dash) and quadratic fit (red dot-dash) are shown in each case.

Table 14

Abundance corrections derived using the full WG11 dataset.

Table 15

Abundance corrections derived using the solar chemical composition.

Table 16

Extreme abundance enhancement present at super-solar metallicity.

thumbnail Fig. 15

Comparing HR21 (by cross-match) with HR10 uncorrected (HR10uncor), HR10 corrected to WG11 (HR10cor), and WG11. Top row: the bulge sample (BL); Middle row: the giant sample (GT); bottom row: the dwarf sample (DW). First column: Ti I abundances against [Fe/H]; second column: difference between HR21 and HR10 abundances (HR10uncor and HR10cor) with linear, quadratic and cubic fits; third column: application of correction from quadratic fit to HR21 (HR21cor).

thumbnail Fig. 16

Errors on WG10-recommended abundances (in absolute abundance) against S/N on a log scale. Sub-samples per NN are shown as specified in the top-left panel.

7.4 Errors in the homogenised abundances

Figure 16 shows the error distributions against S/N for the homogenised WG10 abundances. For the LbL abundances, the error in the abundances was calculated as the standard deviation of the set of node LbL abundances used per target in the homogenisation. In some cases if a single line abundance from a single node was the only abundance available, then the error provided by that node was reported as the error.

There were three situations in which errors would potentially end up missing from the final homogenisation: (1) No error was provided with the single line abundance measurement although errors for other measurements for the same element for that node were provided. (2) No errors were provided at all by the node for the LbL abundances of that element. (3) No errors were provided at all by the node for the LbL abundances of any element.

Three relations were derived to complete the final errors. (1) An error relation with S/N was generated for each NODE+SETUP+ELEMENT combination. (2) An error relation with S/N was generated by combining all reported errors for all abundances provided by the node for the SETUP. (3) For each CNAME and SETUP, the spread in each abundance that had more than one line was calculated across the node values and was used to calculate an uncertainty relation with S/N per abundance per SETUP.

Thus, in the homogenisation procedure, if the final abundance was based on a single node value that did not have an associated error, the appropriate relation was used to provide an estimate of the uncertainty, which was then reported as the final error on that abundance. In this way, all values in the WG10 abundance homogenisation have associated error values as shown in Fig. 16.

However, we observed some extreme outliers and large spread in error values for some elements. Figure 16 shows the errors by sub-sample of NN, in which the lowest errors and highest S/N are typically represented by the maximum NN sub-sample. The highest errors typically occur when less than three node results are combined. In particular, when (and despite the fact that) the S/N is high (~100), the homogenised error is also high (> 2), such as for MG1, SI1, TI1, and FE1. This may not just be attributed to a better result by combining more nodes, but also to differences in the level of data quality for which nodes reported results. A consistent error model imposed across the nodes would have improved the resulting dataset.

thumbnail Fig. 17

Final WG10 abundances against WG11 abundances for the CNAMEs in common with WG11 for each WG10 homogenised element. The median difference and spread are provided for each element.

thumbnail Fig. 18

Final WG10 stellar parameters with S/N≥25 and NN≥2 as a) Kiel diagram with metallicity colour map, and b) Metallicity distribution.

thumbnail Fig. 19

Final WG10 chemical abundances as [X/Fe] against [Fe/H] illustrating the process of selecting by the NN contributing to the final abundance.

7.5 Verification of the Working Group 10 homogenised chemical abundances

The reference set used for the calibration of the WG10 chemical abundances was the cross-match with WG11 as described in Sect. 7.3. Detailed quality checks on the WG10 homogenisation regarding key sub-samples in the context of the full survey have been carried out in Hourihane et al. (2023). In this work, we only inspect the comparison to WG11 in Fig. 17.

The median difference and standard deviation are given for each element. In general, the agreement is very good across the elements, with a spread on the order of typical uncertainties in abundance measurements. With a subset of the cross-match S I shows an issue, though the bulk of the cross-match are in reasonable agreement. The error bars per CNAME are also large indicating greater uncertainty in the measurement of this element.

8 Conclusions

The homogenisation of the WG10 results across four SETUPs with analyses from multiple nodes that covered, often sparsely, different ranges in stellar parameters and chemical abundances was a challenging process. The goal was to produce a robust and well-calibrated single star catalogue that could be homogenised with the rest of the survey results. This meant optimally combining the node results following the WG11 Bayesian inference method for the WG10 and WG11 stellar parameters as well as for the WG11 chemical abundances, while for the WG10 chemical abundance, a simple per analysis calibration to WG11 was carried out. Crucial to the robustness of the final WG10 catalogue is understanding the quality of the results. In particular, the S/N, NN, and errors are key to refining the sample for any scientific study.

The final stellar parameters as a Kiel diagram and metallic-ity distribution, with a simple cleaning of S/N≥25 and NN≥2, is shown in Fig. 18. The scatter is considerably reduced in the Kiel diagram with respect to the full sample. The shift of the RGB with metallicity is clearly discernible. The metallicity distribution shows a left-handed asymmetry indicative of the metal-poor contribution from the thick disk. Small peaks at −1.5 and −2.4 coincide with globular cluster samples.

The final WG10 chemical abundances are shown in Fig. 19 as [X/Fe] against [Fe/H]. The abundances are binned by the NN that contributed to each abundance. The greater the NN, the clearer the morphology of the distribution, illustrating how these quality measures can be used to interpret this complex and intriguing dataset.

Acknowledgements

Based on data products from observations made with ESO Telescopes at the La Silla Paranal Observatory under programme ID 188.B-3002. These data products have been processed by the Cambridge Astronomy Survey Unit (CASU) at the Institute of Astronomy, University of Cambridge (supported by UKRI-STFC grants: ST/N005805/1, ST/T003081/1 and ST/X001857/1), and by the FLAMES/UVES reduction team at INAF/Osservatorio Astrofisico di Arcetri. These data have been obtained from the Gaia-ESO Survey Data Archive, prepared and hosted by the Wide Field Astronomy Unit, Institute for Astronomy, University of Edinburgh, which is funded by the UK Science and Technology Facilities Council. This work was partly supported by the European Union FP7 programme through ERC grant number 320360 and by the Leverhulme Trust through grant RPG-2012-541. We acknowledge the support from INAF and Ministero dell’ Istruzione, dell’ Università’ e della Ricerca (MIUR) in the form of the grant “Premiale VLT 2012”. The results presented here benefit from discussions held during the Gaia-ESO workshops and conferences supported by the ESF (European Science Foundation) through the GREAT Research Network Programme. D.M. acknowledges financial support from the Agencia Estatal de Investigacion 10.13039/501100011033 of the Ministerio de Ciencia e Innovacion and the ERDF “A way of making Europe” through project PID2019-109522GBC54. F.J.E. acknowledges support from ESA through the Faculty of the European Space Astronomy Centre (ESAC) – Funding reference 4000139151/22/ES/CM. H.M.T. acknowledges financial support from the Agencia Estatal de Investigacion (AEI/10.13039/501100011033) of the Ministerio de Ciencia e Innovacion and the ERDF “A way of making Europe” through project PID2019-109522GB-C51. S.V. gratefully acknowledges the support provided by Fondecyt reg. 1220264 and by the ANID BASAL projects ACE210002 and FB210003. S.M. thanks the COST Action CA18104: MW-Gaia. U.H. acknowledges support from the Swedish National Space Agency (SNSA/Rymdstyrelsen). E.J.A. acknowledges financial support from the State Agency for Research of the Spanish MCIU through the “Center of Excellence Severo Ochoa” award to the Instituto de Astrofisica de Andalucia (CEX2021-001131-S). T.B. was funded by the project grant no. 2018-04857 from the Swedish Research Council. G.G. acknowledges support by the Collaborative Research Centre SFB 881 (projects A5, A10), Heidelberg University, of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) and by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 949173). P.J. acknowledges support from Fondecyt Regular Ladder Number 1231057, Millenium Nucleus ERIS NCN2021_017, Centros ANID Iniciativa Milenio. J.I.G.H. acknowledges financial support from the Spanish Ministry of Science and Innovation (MICINN) project PID2020-117493GB-I00. E.M. acknowledges financial support through a “Margarita Salas” postdoctoral fellowship from Universi-dad Complutense de Madrid (CT18/22), funded by the Spanish Ministerio de Universidades with NextGeneration EU funds.

Appendix A Working Group 10 spectral line information

Table A.1

Spectral lines used by WG10 Nodes.

Appendix B Working Group 10 element abundance corrections

Table B.1

Coefficients of WG10 bias corrections.

References

  1. Allende Prieto, C., Fernández-Alvar, E., Schlesinger, K. J., et al. 2014, A&A, 568, A7 [Google Scholar]
  2. Baratella, M., D’Orazi, V., Carraro, G., et al. 2020, A&A, 634, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Bard, A., & Kock, M. 1994, A&A, 282, 1014 [NASA ADS] [Google Scholar]
  4. Biemont, E., Grevesse, N., Hannaford, P., & Lowe, R. M. 1981, ApJ, 248, 867 [Google Scholar]
  5. Biemont, E., Quinet, P., & Zeippen, C. J. 1993, A&AS, 102, 435 [NASA ADS] [Google Scholar]
  6. Biémont, E., Lefèbvre, P., Quinet, P., Svanberg, S., & Xu, H. L. 2003, Eur. Phys. J. D, 27, 33 [CrossRef] [Google Scholar]
  7. Biémont, É., Blagoev, K., Engström, L., et al. 2011, MNRAS, 414, 3350 [Google Scholar]
  8. Blomme, R., Daflon, S., Gebran, M., et al. 2022, A&A, 661, A120 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Bragaglia, A., Alfaro, E. J., Flaccomio, E., et al. 2022, A&A, 659, A200 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Bressan, A., Marigo, P., Girardi, L., et al. 2012, MNRAS, 427, 127 [Google Scholar]
  11. Buder, S., Asplund, M., Duong, L., et al. 2018, MNRAS, 478, 4513 [Google Scholar]
  12. Buder, S., Sharma, S., Kos, J., et al. 2021, MNRAS, 506, 150 [NASA ADS] [CrossRef] [Google Scholar]
  13. de Laverny, P., Recio-Blanco, A., Worley, C. C., & Plez, B. 2012, A&A, 544, A126 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. Den Hartog, E. A., Lawler, J. E., Sneden, C., & Cowan, J. J. 2003, Astrophys. J. Suppl. Ser., 148, 543 [NASA ADS] [CrossRef] [Google Scholar]
  15. Denwood, M. J. 2016, J. Stat. Softw., 71 [Google Scholar]
  16. Franciosini, E., Randich, S., de Laverny, P., et al. 2022, A&A, 668, A49 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  17. Frasca, A., Alcalá, J. M., Covino, E., et al. 2003, A&A, 405, 149 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  18. Frasca, A., Guillout, P., Marilli, E., et al. 2006, A&A, 454, 301 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Fuhr, J. R., Martin, G. A., & Wiese, W. L. 1988, J. Phys. Chem. Ref. Data, 17 [Google Scholar]
  20. García, G., & Campos, J. 1988, J. Quant. Spec. Radiat. Transf., 39, 477 [CrossRef] [Google Scholar]
  21. Garz, T. 1973, A&A, 26, 471 [NASA ADS] [Google Scholar]
  22. Gavel, A., Gruyters, P., Heiter, U., et al. 2019, A&A, 629, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Gilmore, G., Randich, S., Worley, C. C., et al. 2022, A&A, 666, A120 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Grevesse, N., Asplund, M., & Sauval, A. J. 2007, Space Sci. Rev., 130, 105 [Google Scholar]
  25. Gustafsson, B., Edvardsson, B., Eriksson, K., et al. 2008, A&A, 486, 951 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  26. Hannaford, P., Lowe, R. M., Grevesse, N., Biemont, E., & Whaling, W. 1982, ApJ, 261, 736 [NASA ADS] [CrossRef] [Google Scholar]
  27. Harris, W. E. 1996, AJ, 112, 1487 [Google Scholar]
  28. Heiter, U., Jofré, P., Gustafsson, B., et al. 2015, A&A, 582, A49 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  29. Heiter, U., Lind, K., Bergemann, M., et al. 2021, A&A, 645, A106 [EDP Sciences] [Google Scholar]
  30. Hourihane, A., Francois, P., Worley, C. C., et al. 2023, A&A, 676, A129 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  31. Ivarsson, S., Litzén, U., & Wahlgren, G. M. 2001, Physica Scripta, 64, 455 [NASA ADS] [CrossRef] [Google Scholar]
  32. Jackson, R. J., Jeffries, R. D., Wright, N. J., et al. 2022, MNRAS, 509, 1664 [Google Scholar]
  33. Jofré, P., Heiter, U., Soubiran, C., et al. 2014, A&A, 564, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  34. Jofré, P., Heiter, U., Soubiran, C., et al. 2015, A&A, 582, A81 [Google Scholar]
  35. Kovalev, M., Bergemann, M., Ting, Y.-S., & Rix, H.-W. 2019, A&A, 628, A54 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  36. Kurucz, R. L. 2004, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  37. Kurucz, R. L. 2007, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  38. Kurucz, R. L. 2008, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  39. Kurucz, R. L. 2009, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  40. Kurucz, R. L. 2010, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  41. Kurucz, R. L. 2011, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  42. Kurucz, R. L. 2012, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  43. Kurucz, R. L. 2013, Robert L. Kurucz on-line database of observed and predicted atomic transitions, http://kurucz.harvard.edu/atoms/ [Google Scholar]
  44. Kurucz, R. L., & Peytremann, E. 1975, SAO Special Rep., 362, 1 [Google Scholar]
  45. Lanzafame, A. C., Frasca, A., Damiani, F., et al. 2015, A&A, 576, A80 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  46. Lawler, J. E., & Dakin, J. T. 1989, J. Opt. Soc. Am. B Opt. Phys., 6, 1457 [NASA ADS] [CrossRef] [Google Scholar]
  47. Lawler, J. E., Bonvallet, G., & Sneden, C. 2001a, ApJ, 556, 452 [NASA ADS] [CrossRef] [Google Scholar]
  48. Lawler, J. E., Wickliffe, M. E., den Hartog, E. A., & Sneden, C. 2001b, ApJ, 563, 1075 [CrossRef] [Google Scholar]
  49. Lawler, J. E., Sneden, C., Cowan, J. J., Ivans, I. I., & Den Hartog, E. A. 2009, ApJS, 182, 51 [Google Scholar]
  50. Lawler, J. E., Guzman, A., Wood, M. P., Sneden, C., & Cowan, J. J. 2013, ApJS, 205, 11 [Google Scholar]
  51. Lindgård, A., & Nielson, S. E. 1977, Atomic Data Nuclear Data Tables, 19, 533 [CrossRef] [Google Scholar]
  52. Magrini, L., Randich, S., Friel, E., et al. 2013, A&A, 558, A38 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  53. Magrini, L., Randich, S., Kordopatis, G., et al. 2017, A&A, 603, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  54. Meggers, W. F., Corliss, C. H., & Scribner, B. F. 1975, Tables of spectral-line intensities. Part I, II_- arranged by elements., eds. W. F. Meggers, C. H. Corliss, & B. F. Scribner [Google Scholar]
  55. Miles, B. M., & Wiese, W. L. 1969, Atomic Data, 1, 1 [NASA ADS] [CrossRef] [Google Scholar]
  56. Nitz, D. E., Wickliffe, M. E., & Lawler, J. E. 1998, ApJS, 117, 313 [NASA ADS] [CrossRef] [Google Scholar]
  57. O’Brian, T. R., Wickliffe, M. E., Lawler, J. E., Whaling, W., & Brault, J. W. 1991, J. Opt. Soc. Am. B Opt. Phys., 8, 1185 [Google Scholar]
  58. Pancino, E., Lardo, C., Altavilla, G., et al. 2017a, A&A, 598, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  59. Pancino, E., Romano, D., Tang, B., et al. 2017b, A&A, 601, A112 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  60. Pasquini, L., Avila, G., Blecha, A., et al. 2002, The Messenger, 110, 1 [Google Scholar]
  61. Pinnington, E. H., Ji, Q., Guo, B., et al. 1993, Can. J. Phys., 71, 470 [NASA ADS] [CrossRef] [Google Scholar]
  62. Piskunov, N., & Valenti, J. A. 2017, A&A, 597, A16 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  63. Pitts, R. E., & Newsom, G. H. 1986, J. Quant. Spec. Radiat. Transf., 35S, 383 [NASA ADS] [CrossRef] [Google Scholar]
  64. Plummer, M. 2003, JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling, eds. Hornik, K., Leisch, F. & Zeileis, A. [Google Scholar]
  65. Plummer, M. 2022, Bayesian Graphical Models using MCMC, CRAN, Vienna, Austria [Google Scholar]
  66. Plummer, M., Best, N., Cowles, K., & Vines, K. 2006, R News, 6, 7 [Google Scholar]
  67. Prugniel, P., & Soubiran, C. 2001, A&A, 369, 1048 [CrossRef] [EDP Sciences] [Google Scholar]
  68. R Core Team. 2021, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria [Google Scholar]
  69. Ralchenko, Y., Kramida, A., Reader, J., & NIST ASD Team. 2010, NIST Atomic Spectra Database (ver. 4.0.0), [Online] [Google Scholar]
  70. Randich, S., Tognelli, E., Jackson, R., et al. 2018, A&A, 612, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  71. Randich, S., Gilmore, G., Magrini, L., et al. 2022, A&A, 666, A121 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  72. Seaton, M. J., Yan, Y., Mihalas, D., & Pradhan, A. K. 1994, MNRAS, 266, 805 [NASA ADS] [CrossRef] [Google Scholar]
  73. Semenova, E., Bergemann, M., Deal, M., et al. 2020, A&A, 643, A164 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  74. Smiljanic, R., Korn, A. J., Bergemann, M., et al. 2014, A&A, 570, A122 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  75. Smith, G. 1981, A&A, 103, 351 [NASA ADS] [Google Scholar]
  76. Smith, G. 1988, J. Phys. B At. Mol. Phys., 21, 2827 [NASA ADS] [CrossRef] [Google Scholar]
  77. Smith, G., & Raggett, D. S. J. 1981, J. Phys. B At. Mol. Phys., 14, 4015 [NASA ADS] [CrossRef] [Google Scholar]
  78. Sneden, C., Bean, J., Ivans, I., Lucatello, S., & Sobeck, J. 2012, Astrophysics Source Code Library [record ascl:1202.009] [Google Scholar]
  79. Sousa, S. G., Santos, N. C., Israelian, G., Mayor, M., & Monteiro, M. J. P. F. G. 2007, A&A, 469, 783 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  80. Sousa, S. G., Santos, N. C., Adibekyan, V., Delgado-Mena, E., & Israelian, G. 2015, A&A, 577, A67 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  81. Spina, L., Randich, S., Palla, F., et al. 2014, A&A, 567, A55 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  82. Stetson, P. B., & Pancino, E. 2008, PASP, 120, 1332 [Google Scholar]
  83. Tautvaišienė, G., Drazdauskas, A., Mikolaitis, Š., et al. 2015, A&A, 573, A55 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  84. Theodosiou, C. E. 1989, Phys. Rev. A, 39, 4880 [NASA ADS] [CrossRef] [Google Scholar]
  85. Valenti, J. A., & Piskunov, N. 1996, A&AS, 118, 595 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  86. Wallace, L., & Hinkle, K. 2009, ApJ, 700, 720 [NASA ADS] [CrossRef] [Google Scholar]
  87. Whaling, W., & Brault, J. W. 1988, Phys. Scr, 38, 707 [CrossRef] [Google Scholar]
  88. Wickliffe, M. E., & Lawler, J. E. 1997, ApJS, 110, 163 [CrossRef] [Google Scholar]
  89. Wickliffe, M. E., Lawler, J. E., & Nave, G. 2000, JQSRT, 66, 363 [NASA ADS] [CrossRef] [Google Scholar]
  90. Wiese, W. L., Smith, M. W., & Miles, B. M. 1969, Atomic Transition Probabilities. Vol. 2: Sodium through Calcium. A Critical Data Compilation, eds. W. L. Wiese, M. W. Smith, & B. M. Miles (US Government Printing Office) [Google Scholar]
  91. Wood, M. P., Lawler, J. E., Sneden, C., & Cowan, J. J. 2013, ApJS, 208, 27 [NASA ADS] [CrossRef] [Google Scholar]
  92. Worley, C. C., Jofré, P., Rendle, B., et al. 2020, A&A, 643, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

3

‘JAGS’ stands for Just Another Gibbs Sampler: http://mcmc-jags.sourceforge.net/

5

The SYNFLAG and LOGGFFLAG take values yes (Y), uncertain (U), or no (N), as explained in Heiter et al. (2021). SYNFLAG comes from tests of the blending of the lines in the spectra of the Sun and Arcturus. LOGGFFLAG is related to the confidence on the log gf value of the line.

All Tables

Table 1

Overview of WG10 spectral dataset.

Table 2

SETUP and phase analyses carried out by each WG10 node.

Table 3

WG11 nodes that participated in the analysis of the final Gaia-ESO data release.

Table 4

Number of parameters provided per NODE per SETUP.

Table 5

WG14 Flags used by WG10 NODES.

Table 6

Cross-match of stars (NCN) and spectra (NSP) between WG11 and WG10 SETUPs.

Table 7

Bootstrapping SETUP order and reference set content.

Table 8

Coefficients for the WG10 bias corrections.

Table 9

Difference (Δ) of WG10 SETUP homogenisations to reference.

Table 10

Priority order for selecting final result in cases of results from multiple SETUPs.

Table 11

Summary of element abundance detections (D) and lines measured (L).

Table 12

Element measurements rejection criteria.

Table 13

Set of corrections calculated for each parameter for each ELEMENT+SETUP+NODE combination.

Table 14

Abundance corrections derived using the full WG11 dataset.

Table 15

Abundance corrections derived using the solar chemical composition.

Table 16

Extreme abundance enhancement present at super-solar metallicity.

Table A.1

Spectral lines used by WG10 Nodes.

Table B.1

Coefficients of WG10 bias corrections.

All Figures

thumbnail Fig. 1

Distribution of key values available for use in the WG10 analysis per SETUP: signal-to-noise, radial velocity, error on radial velocity, and rotational velocity. The bin size (Bin) for each parameter is given.

In the text
thumbnail Fig. 2

Teff-log g diagram with the WG11 recommended results. PARSEC isochrones (Bressan et al. 2012) are shown for ages of 1 and 12.5 Gyr (violet and orange, respectively) and for the minimum and maximum metallicity indicated in each panel (dashed and solid lines, respectively). Red crosses are stars in open clusters, blue circles are stars in globular clusters, and the black starred symbols are the remaining stars.

In the text
thumbnail Fig. 3

Teff-log g diagrams for the open clusters IC 2602, NGC 6663, M67, and Trumpler 20.

In the text
thumbnail Fig. 4

Teff-log g diagrams for the globular clusters M2, NGC 104, NGC 362, and NGC 1851.

In the text
thumbnail Fig. 5

Distribution of Teff with log g and Teff with [Fe/H] for each node for each SETUP. Unflagged but rejected IAC and MaxPlanck results are shown in red.

In the text
thumbnail Fig. 6

Reference sets for HR15N and HR10|HR21 setups. Top row: Kiel diagrams for full HR15N sample, final HR15N reference set and WG11 cross-match with HR15N. Second row: same as for the top row but for [Fe/H] versus Teff. Third row: Kiel diagrams of the four main samples: benchmarks (yellow), OC (magenta), GCs (green), and MW (blue) within the HR15N reference set overlaid on the full reference set. Fourth row: same as for the third row but for [Fe/H] versus Teff. Fifth to eighth rows: same but for HR10|HR21.

In the text
thumbnail Fig. 7

Reference sets for HR21-only and HR9B SETUPs. Top to fourth rows: same as in Fig. 6 but for HR21-only. The reference samples are benchmarks (yellow), BL (red), GCs (green) and MW (blue). Fifth to eighth rows: same as in Fig. 6 but for HR9B.The reference samples are benchmarks (yellow), OC (magenta), and MW (blue).

In the text
thumbnail Fig. 8

Difference of node (EPINARBO, Lumba, OACT) values to reference set values for Teff, log g and [Fe/H] for HR15N. The difference of the final homogenised reference set values from the reference set values per parameter for HR15N are shown in the bottom row.

In the text
thumbnail Fig. 9

Kiel diagram with a metallicity colour map for the per SETUP homogenised parameters: Panels from left to right: a) HR15N, b) HR10|HR21, c) HR21, d) HR9B, and e) the final WG10 homogenised parameters. All panels are on the same colour map scale.

In the text
thumbnail Fig. 10

Characterisation of the final WG10 stellar parameters with histograms (top row), bins in Kiel diagrams (middle row) and bins in metallicity distribution (bottom row). Specific panel content are: (a, f, k) S/N; (b, g, l) number of nodes (NN); (c, h, m) error on Teff ; (d, i, n) error on log g; (e, j, o) error on [Fe/H].

In the text
thumbnail Fig. 11

Comparison of WG10 (black) and WG11 (red) stellar parameters for: Left column: the FGK benchmarks stars against the reference values. The mean difference and standard deviation are given. Three sigma limits are shown as dashed lines. Right column: the cross-match between WG10 and WG11 against the WG11 values. The mean difference and standard deviation are given.

In the text
thumbnail Fig. 12

Comparison of mean [Fe/H] per GC for WG10 (black) and WG11 (red) against the reference values. The mean difference and standard deviation are given.

In the text
thumbnail Fig. 13

ESO Solar spectrum (https://www.eso.org/observing/dfo/quality/UVES/pipeline/FLAMES_solar_spectrum.html) reduced with the Gaia-ESO GIRAFFE reduction pipeline for the WG10 SETUPs. Spectral lines analysed by the WG10 nodes are indicated by vertical lines coloured by groupings of elements.

In the text
thumbnail Fig. 14

Difference between node values and reference values in example case of the CAUP analysis of Mg I measured in the HR21 spectrum. Top row: differences against [Fe/H] for the full sample (left-black), the giants (middle-red), and the dwarfs (right-blue). Middle row: same but against log g. Bottom row: same but against Teff. The differences as median differences calculated per bins are shown as orange points with error bars. The linear fit (orange dot-dash) and quadratic fit (red dot-dash) are shown in each case.

In the text
thumbnail Fig. 15

Comparing HR21 (by cross-match) with HR10 uncorrected (HR10uncor), HR10 corrected to WG11 (HR10cor), and WG11. Top row: the bulge sample (BL); Middle row: the giant sample (GT); bottom row: the dwarf sample (DW). First column: Ti I abundances against [Fe/H]; second column: difference between HR21 and HR10 abundances (HR10uncor and HR10cor) with linear, quadratic and cubic fits; third column: application of correction from quadratic fit to HR21 (HR21cor).

In the text
thumbnail Fig. 16

Errors on WG10-recommended abundances (in absolute abundance) against S/N on a log scale. Sub-samples per NN are shown as specified in the top-left panel.

In the text
thumbnail Fig. 17

Final WG10 abundances against WG11 abundances for the CNAMEs in common with WG11 for each WG10 homogenised element. The median difference and spread are provided for each element.

In the text
thumbnail Fig. 18

Final WG10 stellar parameters with S/N≥25 and NN≥2 as a) Kiel diagram with metallicity colour map, and b) Metallicity distribution.

In the text
thumbnail Fig. 19

Final WG10 chemical abundances as [X/Fe] against [Fe/H] illustrating the process of selecting by the NN contributing to the final abundance.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.