The Gaia-ESO Survey: The DR5 analysis of the medium-resolution GIRAFFE and high-resolution UVES spectra of FGK-type stars

The Gaia-ESO Survey is an European Southern Observatory (ESO) public spectroscopic survey that targeted $10^5$ stars in the Milky Way covering the major populations of the disk, bulge and halo. The observations were made using FLAMES on the VLT obtaining both UVES high ($R\sim47,000$) and GIRAFFE medium ($R\sim20,000$) resolution spectra. The analysis of the Gaia-ESO spectra was the work of multiple analysis teams (nodes) within five working groups (WG). The homogenisation of the stellar parameters within WG11 (high resolution observations of FGK stars) and the homogenisation of the stellar parameters within WG10 (medium resolution observations of FGK stars) is described here. In both cases, the homogenisation was carried out using a bayesian Inference method developed specifically for the Gaia-ESO Survey by WG11. The WG10 homogenisation primarily used the cross-match of stars with WG11 as the reference set in both the stellar parameter and chemical abundance homogenisation. In this way the WG10 homogenised results have been placed directly onto the WG11 stellar parameter and chemical abundance scales. The reference set for the metal-poor end was sparse which limited the effectiveness of the homogenisation in that regime. For WG11, the total number of stars for which stellar parameters were derived was 6,231 with typical uncertainties for Teff, log g and [Fe/H] of 32~K, 0.05 and 0.05 respectively. One or more chemical abundances out of a possible 39 elements were derived for 6,188 of the stars. For WG10, the total number of stars for which stellar parameters were derived was 76,675 with typical uncertainties for Teff, log g and [Fe/H] of 64~K, 0.15 and 0.07 respectively. One or more chemical abundances out of a possible 30 elements were derived for 64,177 of the stars.


Introduction 1
The Gaia-ESO Survey is a European Southern Observatory 2 (ESO) public spectroscopic survey designed to observe 10 5 stars.Notes.Characteristics of each WG10 observed spectral range, resolution, number of spectra and number of stars.Summary of number of spectra observed for each science programme and calibration sample as labelled in GES_TYPE.
provide an updated description in Section 5.In total there were

Working Group 10 node analysis methods
Seven analysis teams (hereafter referred to as 'nodes') undertook either stellar parameter or chemical abundance, or both, analyses of subsets of the GIRAFFE SETUPs within WG10 for the final Gaia-ESO data release.The list of nodes and the SETUPs each node employed in which analysis phase is presented in Table 2.
To provide a standardisation to the node analyses, the nodes were required to use the MARCS Stellar Atmosphere Models (Gustafsson et al. 2008), the solar abundances as from Grevesse et al. (2007), and the Gaia-ESO Line list (Heiter et al. 2021).Pre-generated synthetic spectra for Gaia-ESO were also available, calculated as described in de Laverny et al. (2012).The following describes the analysis process of each node in turn.were analysed.This pipeline has also been used for UVES analysis Gavel et al. (2019) and is very similar to the pipeline used for the second and third data releases of the GALAH survey Buder et al. (2018Buder et al. ( , 2021)).
MaxPlanck: The MaxPlanck node used neural networks to determine stellar parameters and magnesium abundance (Kovalev et al. 2019).A training set of synthetic spectra was generated using the MARCS stellar atmosphere models and the Gaia-ESO line list.The MaxPlanck node investigated analysing the HR15N, HR10, and HR21 spectra (HR10+HR21 as a single analysis) but determined that the results from their analysis of HR10 were the only reliable results and thus provided results for that SETUP only.Results for the HR21 SETUP for the bulge fields and standard stars were also provided.The HR10 and HR21 SETUPs were analysed together as a single analysis, and HR10 was also analysed separately.Results for the HR21 SETUP for the bulge fields and standard stars were also provided.

OACT:
The OACT node used the code ROTFIT (Frasca et al. 137 2003(Frasca et al. 137 , 2006)).The method consists of a χ 2 minimisation of the 138 residuals between the observed spectrum and a set of refer-139 ence spectra.In this case, a library of observed spectra from the 140 ELODIE archive (Prugniel & Soubiran 2001) was used as refer-141 ence.The HR15N and HR9B SETUPs were analysed.were measured with a Gaussian fitting.Abundances were de-145 termined with a set of curves of growth (Franciosini et al. 2022) 146 determined from the Gaia-ESO stellar spectra grid.The HR15N 147 SETUP was analysed.
148 CAUP: Equivalent widths were measured with the Automatic 149 Routine for line Equivalent widths in the stellar Spectra code 150 (ARES, Sousa et al. 2007Sousa et al. , 2015)).Atmospheric parameters and 151 chemical abundances were determined with MOOG (Sneden 152 et al. 2012).This was carried out on the HR15N, HR10, and 153 HR21 SETUPs.
154 Vilnius: Equivalent widths were measured with DAOSPEC 155 (Stetson & Pancino 2008).The node developed its own wrap-156 per to automatise the use of MOOG (Sneden et al. 2012) for 157 Article number, page 3 of 31 the determination of chemical abundances.The HR10 and HR21 SETUPs were analysed.

Definition of reference sets
An underlying difficulty in the analysis of large stellar datasets is ensuring that the parameters and abundances that are produced are as close to the truth as is possible in our current understanding of stellar physics.The sheer number of spectra make it impossible to carry out a detailed 'by hand' analysis of each spectrum, so automated analyses must be used, such as those described previously.During their development, automated analyses are calibrated and validated against reference sets.For the homogenisation of the node results into the per star catalogue for both WG10 and WG11, the results from each node analysis were compared to known results of key reference sets to verify the node results and, where necessary, correct them onto the reference set scale prior to the node results being combined.The WG11 analysis, use of reference sets, and final homogenisation is described in Section 5 as an update to the process described in Smiljanic et al. (2014).
For WG10, there were a reasonable number of stars in common with WG11 such that the results could be combined with the FGK benchmark stars in order to construct a larger reference set for which the parameter space was more filled in.This also meant that the WG10 results would be calibrated directly onto the WG11 parameter scale.

Working Group 11: Bayesian inference homogenisation method
The WG10 and WG11 node parameters were homogenised separately but used the same Bayesian inference method.Thus, the description below supersedes the previous WG11 homogenisation strategy described in Smiljanic et al. (2014).
The Bayesian homogenisation for the WG11 results was first developed by Andrew R. Casey (2014-2017, private communication)1 and used for the Gaia-ESO internal data release 5 (iDR5).
For the final data release (which corresponds to iDR6), we built upon his initial work and developed a different implementation of the method.The homogenisation process and Bayesian modelling was written with R (R Core Team 2021)2 using JAGS (Plummer 2003) 3 and a number of related packages 4 .
The problem we want to solve is that of finding the best estimate of a stellar parameter, given the multiple values determined by the different nodes.The parameter can be any of T eff , log g, or [Fe/H].For the microturbulence velocity (ξ), the procedure was slightly different (see Sec. 5.4 below).
Let us consider that a star 'n' is characterised by a true value of a given parameter, true.param n .When a node 'i' attempts to estimate that value, the analysis returns a measurement, param i.n , that is affected by systematic and random errors introduced by the methodology that was used.We made the assumption that these errors are independent and can be separated if parameters are reported for enough repeat spectra.The systematic error accounts for any and all zero point offsets and biases.The random error accounts for any and all effects that are stochastic in nature.We further assumed that the bias error is it-211 self a function of the atmospheric parameter in question and that 212 the random errors can be described by a Gaussian distribution.

213
Numerically, we would write where dnorm(µ, σ) stands for the Gaussian distribution of 215 mean = µ and standard deviation = σ.Equation 1 states that the 216 measurement provided by the node i is a random draw (which 217 is the meaning of the symbol '∼' in the equation) from the dis-218 tribution centred on true.param n affected by the random error 219 random.erri .This random error is postulated to be a property 220 of the node.Further, the measurement is affected by an offset 221 bias.parami,n .This bias value comes from a function that is also 222 a property of the node and was computed at the value of the pa-223 rameter that characterises the star n.For this bias function, we 224 assumed that the variation of the bias in the parameter space can 225 be described by a quadratic function: ( Strictly speaking, we should write Eq. 2 as a function of the 227 true parameter and not of the parameter value measured by the 228 node.However, that makes the problem circular (i.e. to know 229 the true value, we need to correct for the bias, but to compute the 230 bias, we need to use the true value).As it stands, Eq. 2 should 231 work reasonably well if the difference between the true and mea-232 sured values is not too big, although exactly what that means has 233 to be checked a posteriori.In any case, differences that cannot be 234 accounted for by the bias will tend to inflate the random compo-235 nent of the error.Our tests with the final results showed that such 236 choice for the modelling worked well (which does not mean that 237 things could not be improved by assuming a different model).

238
For numerical reasons, we actually write Eq. 1 as 239 param i,n ∼ dnorm(true.paramn +bias.parami,n , random.erri ).(3) This choice means that we assumed it is equivalent to say 240 that the measured value was shifted by an offset or to say that 241 when a node is affected by a certain bias, the measurement was 242 made from a distribution centred around a 'biased true parame-243 ter' (true.paramn + bias.parami,n ).

244
Because there are actually multiple nodes making the mea-245 surements, we can write the problem using a multi-dimensional 246 normal distribution (where each node is one dimension): where dmnorm( − → µ , Σ) stands for the multi-dimensional nor-248 mal distribution of mean vector = − → µ and covariance matrix = 249 To be able to apply the Bayesian inference to the homogeni-258 sation, we first needed to estimate the coefficients that define the 259 bias function of each node (Eq.2) and the covariance matrix 260 (Eq. 4).Once the covariance matrix and the biases are defined, 261 these calculations, we have relied on a set of reference objects the sample (including to the reference stars themselves).We note that the errors of the atmospheric parameters provided by the nodes are not used (neither for the homogenisation of T eff nor of the other parameters).Each node estimated their errors in a different way, some providing internal errors of the method, while others applied more complicated prescriptions.Consequently, these values cannot be directly compared.We let the comparison between reference and measured parameters define the intrinsic node random errors.
At the end of this second step, we found that the T eff values of the reference stars were recovered with a standard deviation of ±85 K.We assumed that this value represents the external accuracy of our final T eff scale, even though, in truth, this value is a composition of our accuracy and the errors of the reference scale.The internal errors of our T eff values have a median of 65 K, with the first and third quartiles at 60 and 75 K.A comparison against the isochrones of open and globular clusters seemed to validate the final T eff results (See Sec.5.5).

Working Group 11: Homogenisation of surface gravity
For the WG11 log g homogenisation, two models of the node biases had to be combined.The first model, valid for dwarfs and metal-rich giants, used as the reference set the same sample of 35 benchmark stars employed in the analysis of T eff .The benchmark stars cover the interval between 0.68 and 5.05 dex.
However, a second model had to be built for the metal-poor giants ([Fe/H] ≤ −0.50 and log g ≤ 3.50).In this case, in addition to the benchmark stars, a sample of giants with asteroseismic values of log g was used.The sample included 62 stars with data from K2 (Worley et al. 2020) and 88 stars with data from CoRoT (Masseron et al., in preparation).The K2 stars have log g with an interval between 1.74 and 3.41.The CoRoT stars have log g with an interval between 1.75 and 2.99.The combination of the two models was found necessary to reproduce the T efflog g diagram of the globular clusters.Conversely, the solution with the seismic values degraded the quality of the diagrams and tests for other types of clusters and field stars.A few iterations were needed to assign the model that should be used for the stars at the edges of the parameter space division.
For the purposes of the model, the uncertainty of the seismic log g values was fixed at a value of ±0.02 dex.The missing values in log g were substituted by a broad uniform prior (between 0.0 and 5.0 dex).At the end of the homogenisation, we found that the reference log g values were recovered with a standard deviation of ±0.14 dex.The internal errors have a median value of 0.15 dex, with the first and third quartiles at 0.14 and 0.18 dex.A comparison with cluster isochrones is shown in Sec.5.5.
We performed a number of tests attempting the use Gaia log g priors as additional constraints in the Bayesian model.
Article number, page 5 of 31 than the ones obtained with the approach described above.The conclusion seems to be that the WG11 metallicities for cool 403 stars (T eff ≤ 4000 K) are not reliable.true.xi n ∼ dnorm(calib.xin , 0.25), (7) where calib.xin is computed using the homogenised values 412 of T eff , log g, and [Fe/H], and we assumed a typical uncertainty of 0.25 km s −1 .When writing Eq. 4 for ξ, we did not consider biases.In the Bayesian simulation, to homogenise this parameter, both the true values and the covariance matrix were determined at the same time.
In essence, the Bayesian modelling of ξ is an elaborated way of finding the mean of the distribution of multiple node values, with the advantage of taking into account the correlations between the nodes and of using the calibration as a prior.We remark, however, that the final results are indeed different from both the simple mean of the individual node results and from the direct application of the calibration.

Consistency checks of the final Working Group 11 stellar parameters
Here, we discuss the final stellar parameters obtained within WG11.We remark that these are not necessarily the final Gaia-ESO parameters for the stars we analysed, as there is still a process of survey-wide homogenisation.This final homogenisation process is described in the companion paper by Hourihane et al. (2023).
Figure 2 shows the T eff -log g diagram of the homogenised results, in bins of metallicity and in comparison to isochrones.The agreement with the location of the isochrones is in general very good.The location of the main sequence and of the red giant branch are in general well reproduced.
At the lowest metallicity bin (where no isochrones are plotted), the scatter does seem to be excessive.There is the possibility that some of these stars are not real metal-poor stars, but artefacts of the analysis.We recall that for the cool benchmarks, the WG11 homogenised metallicities were too low.Investigation showed that some hot stars (> 7000 K) included in the sample also ended up with very low metallicities.Care is therefore advised when using the results for the most metal-poor stars.
Figures 3 and 4 show T eff -log g diagrams for a few open and globular clusters.When possible, membership information was obtained from previous Gaia-ESO papers (Spina et al. 2014;Magrini et al. 2017;Pancino et al. 2017b;Randich et al. 2018).If that was not possible, a simple two-sigma cut in radial velocity was used as a first estimate of membership.(We note that since this analysis was carried out Jackson et al. (2022) has provided key cluster membership lists.These are used in the verification of the final Gaia-ESO dataset in Hourihane et al. (2023)).
As can be seen, the agreement for the open clusters is excellent.The member stars tend to follow the isochrones, in particular for the case of red giants in older open clusters.In the case of young clusters, it happens that the main sequence stars are usually sitting slightly above the isochrone.We did find a few issues, however.For M67, the subgiants seem to have a homogenised log g that is too high for their temperatures.In addition, the analysis of the Pleiades spectra did not return good atmospheric parameters.Although in this last case, the spectra are not from UVES, and we believe there is something different in the data creating some kind of systematic problem in the analysis.
The agreement for the globular clusters is good in many cases, but there are cases of disagreement.In particular we mention NGC 1904, NGC 4833, NGC 5927, NGC 4372, and M 15, nearly all of which, except NGC 5927, are metal poor.In these cases, the stars tend to have temperatures that are cooler than expected from the position of the isochrones.In general, the stars in the globular clusters are bright giants (log < 1.5).For such stars, the WG11 analysis does not seem to be very robust.We recommend care when using results for these stars.In most cases, trends were not seen, are very small, or driven by one outlier whose membership could be questioned.However, there are cases (e.g.NGC 2243 and Trumpler 20) where correlations were detected.In other cases, large scatter can be present, as is particularly seen in younger clusters (see e.g.NGC 2516 or IC 4665).We point out that some of the trends and large scatter are not errors induced by the homogenisation, but are effects that appear from limitations in our methodology.The recent work by Baratella et al. (2020) suggests that traditional methods of analysis that rely on the various equilibria of Fe lines fail for young stars because the microturbulence is overestimated.Another example is the work of Semenova et al. (2020), which indicates that abundance trends in the 2 Gyr open cluster NGC 2420 can be explained by neglected 3D non-LTE effects.

Working Group 11: Homogenisation of chemical abundances
Overall, the WG11 nodes attempted to derive abundances for 38 atomic species and two molecules.The atomic and molecular data that were used in the analysis are those described in Heiter et al. (2021).Abundances of O i using the forbidden line at 6300 Å, of carbon from molecular C 2 , and nitrogen from CN bands were derived using spectrum synthesis by the Vilnius node only (see Tautvaišienė et al. 2015, for details).Abundances and upper limits of Li i come from measurements by the Arcetri node (Franciosini et al. 2022).None of these abundances have gone through a homogenisation process.
For all other atomic species, we used individual line abundances for the homogenisation.The measurements come from a mix of equivalent width (by the CAUP, EPINARBO, and Vilnius nodes) and spectrum synthesis analyses (by LUMBA and, in the case of Mg i, Ba ii, Ce ii, La ii, Pr ii, Y ii, Zr i, Zr ii, and Nd ii, by the Vilnius node).The final list of species from which abundances were estimated, in addition to the metallicity itself, Nd ii, Sm ii, and Eu ii (i.e., 30 different chemical elements).
We advise particular care when using the abundances from S i, Ca ii, Sc i, Mo i, and Nb i.These abundances come from a single (or a few) weak and/or blended lines.They were measured with equivalent widths, but precise results probably require spectrum synthesis.Quality control led us to reject all abundances from Sr i, Ru i, and Dy ii, and therefore they are not part of the release.
Before homogenisation, we ran quality checks on the individual line abundances.For each line, we produced three plots of abundance as a function of T eff , log g, and [Fe/H].In these plots, we visually checked for trends, excessive scatter, and offsets among the nodes, and we removed anything that appeared suspicious.We also excluded lines that had been measured only in a small number of stars or by only one or two nodes (if there were other lines that had been measured by several nodes).
Homogenisation was also performed using a Bayesian modelling, with an adapted version of Eq. 4 for a given chemical species: abundances n , 537 expressed in a similar manner to Eq. 5, combines the true abun-538 dance, true.abun n , of that species in star 'n' with the line biases 539 line.biasj .

540
The line bias was not introduced as a property of the node 541 but of the spectral line j.This was meant to take into account a 542 possible bias coming from uncertainties in the log g f value of 543 the lines.In principle, variation of this line bias in the parameter 544 space could be introduced in order to model the changing im-545 portance of blends in different types of stars.However, this was 546 not implemented, as it would introduce too many additional free 547 parameters in the model.Distinct priors for the line biases were 548 introduced depending on their quality flags in the form 549 −−−−−−−→ line.biasj ∼ dnorm(0.0,sigma.bias),(9) where sigma.bias is equal to 0.01, 0.02, 0.05, or 0.1, for 550 lines with (SYNFLAG,LOGGFFLAG) = (Y,Y); (Y,U) or (U,Y); 551 (U,U); (N,?) or (?,N), respectively 5 .To avoid that this line bias 552 diverges when only one or two lines are measured, we found it 553 necessary to change sigma.bias to 0.002, 0.01, 0.02, and 0.05.

554
Apart from the solar abundances, there are no other funda-555 mental reference values that can be used to constrain the covari-556 ance matrix and the line biases.Because of that, the homogenisa-557 tion of abundances was run as a single step, similar to what was 558 done for ξ.Priors were used for the true.abunn of each star.For 559 the Sun, the abundances from Grevesse et al. (2007) were used 560 as a strong Gaussian prior with σ = 0.001.For the abundances 561 of Mg, Ti, Ni, Mn, and V, we found it helpful to introduce the 562 abundances of the benchmark stars as additional priors (Jofré 563 et al. 2015).For the other stars, we used a Gaussian distribution 564 as the prior, with the mean at the metallicity-scaled solar abun-565 dance and σ = 0.4.For Ba ii, Cr i, Cr ii, Ca ii, Ni i, Y ii, Mn i, Zn i, 566 Si ii, Sc i, and V ii, this had to be changed to σ = 0.1 in order to 567 decrease the final scatter of the abundances.

568
Essentially, although it looks more complicated, the method 569 can be considered as a sophisticated way to define a weighted 570 mean.The sophistication lies within estimating the random er-571 rors of each node (i.e. the weights) directly from the data and in 572 allowing for the line biases.In this section, we describe the homogenisation of the WG10 576 stellar parameters.As shown in Table 2, for WG10 there were 577 five nodes that provided parameters across the four GIRAFFE 578 SETUPs.The specific parameters and number of spectra anal-579 ysed per SETUP per node is shown in Table 4.For each SETUP 580 there were two to four sets of node results available with which 581 to perform the homogenisation.

582
For the MW observing programme (not including the BL 583 fields) two SETUPs were observed, HR10 and HR21.These 584 were selected, as they contain key lines that have a different sen-585 sitivity to surface gravity depending on whether the star is a giant 586 in a single analysis.However, nodes were free to analyse the data 592 as suited their method, and all the data they provided were used 593 in the homogenisation.

594
In particular, IAC provided two sets of analysis for the MW fields, the analysis of HR10 combined with HR21 (the HR10|HR21 SETUP) and the analysis of HR10-only.Max-Planck provided results for the HR10-only SETUP.During the quality control phase, MaxPlanck investigated combining the HR10 and HR21 spectra (SETUP=HR10|HR21) as a single analysis but concluded that these results were not reliable for their process and therefore did not provide them.
As the HR10-only and HR10|HR21 analyses effectively covered the same sample, the Lumba HR10|HR21 results, the IAC HR10|HR21 and HR10-only results and the MaxPlanck HR10only results were all used for the homogenisation of the MW fields.For the remainder of this work, the HR10|HR21 homogenisation refers to these four sets of node results.
The MW BL fields were observed at a higher S/N in the HR21 SETUP than the main MW fields and it was decided to not observe the same fields in HR10.(See Gilmore et al. (2022) for more details on the observing strategy.)The nodes analysed the HR21-only BL fields and the standard fields that had also been observed in HR21.These samples were used in the homogenisation of the HR21-only SETUP.
The OC SETUPs, HR15N and HR9B, covered different samples of open cluster stars, so the spectra of these SETUPs were analysed separately.(See Randich et al. (2022); Bragaglia et al. (2022) for further details on the observing strategy.)For HR15N three nodes provided results, while for HR9B, two nodes provided results.
Figure 5 shows the distribution of T eff , log g and [Fe/H] provided by each node for each SETUP.The node results and associated reports were reviewed before homogenisation.
The flag information (TECH, PECULI, REMARK) was inspected, and the WG14 flags that were used by each node were assessed to determine whether the associated results should be used in the homogenisation.The flags were assigned by each node based on the definitions in the WG14 Flag Dictionary.There were 20 flags that were determined at the WG10 homogenisation level to mean that the associated results, if present, should not be used in the homogenisation.Which flags were reported as well as whether or not the results were provided varied between nodes so if the flag was present, the result (null or otherwise) was not used.The flag prefixes, the WG14 flag descriptions, and the number of spectra per node for which they were used are listed in Table 5.
Inspection of the resulting node datasets showed that both the IAC and MaxPlanck analyses had results lying at the parameter grid limits that were not flagged.The MaxPlanck analysis also showed a non-physical feature at T eff =4000 K, log g=4 dex that is not present in the other node results.These are indicated as red points in Figure 5 and these results were removed prior to the homogenisation.

Parameter reference set
Constructing a reference set using stars in common with WG11 was explored for each of the WG10 SETUPs for both the parameter phase and the abundance phase.Table 6 gives the crossmatch of each WG10 SETUP to WG11 and between WG10 SE-TUPs.
The decision to use the WG11 results as the source of the reference set against which to derive the WG10 results was driven by the reasoning that this would immediately put the WG10 results onto the WG11 scale.The cross-match between WG11 and WG10 then comprised a larger more comprehensive set of stars Article number, page 9 of 31  in common than the process of using the reference sets that have more sparse coverage.
While the WG11 and WG10 observing programmes were not designed with an overlap in the parameter space for calibration purposes, the cross-match of each WG10 SETUP to WG11 was reasonably well sampled, in particular for T eff and log g.The cross-match between WG10 SETUPs was also explored for use in the parameter phase as another way to expand the reference set for each SETUP.
As shown in Table 6, the SETUP with the largest per star (CNAME) cross-match to WG11 is the HR15N dataset.The cross-match of the HR10 dataset to WG11 is almost three times less than the HR15N dataset cross-match to WG11.However, the cross-match of the HR10 dataset to the HR15N dataset is over 21 times greater than the HR10 dataset cross-match to WG11.This is particularly due to the CoRoT sample for which all of the CoRoT fields were observed in all three SETUPS: HR10, HR21, and HR15N.The cross-match between HR21 and HR15N reflects that of HR10, as the HR21 targets were observed either in combination with HR10 or specifically for the BL fields.There-fore, the cross-match of HR21 to HR10 is particularly good.676 Targets in HR9B were observed to complement HR15N, so the 677 cross-match with the HR15N dataset is almost four times greater 678 than that with WG11 targets.

679
From this assessment a bootstrapping approach was taken to 680 ensure all the SETUPs were homogenised onto a common scale.681 However, the last gap that needed to be covered was the lack 682 of metal-poor reference stars, particularly for the HR10|HR21 683 (MW) and HR21 (BL) SETUPs in which metal-poor stars were 684 most likely to be found, rather than in HR15N and HR9B (OC).685 Table 8 gives the coefficients for each bias correction per SETUP per node per parameter as well as the independent parameter against which each correction was calculated.The mean and standard deviation of the reference values per parameter are given per node.These values vary between the nodes as each node did not necessarily provide values for the complete set of reference stars.These values were used to normalise the node and reference values before the correction function was determined as described in Section 5.The median and standard deviation of the difference between the node and reference values are also given per parameter per SETUP in Table 8.
Each dataset was investigated in great detail using the relevant reference set in order to identify the polynomial function and independent parameter that provided the optimal correction.A variety of quality criteria were used to assess the agreement of the homogenised values to the reference values, such as difference measures (median and standard deviation) on the whole sample and sub-samples.In an extensive quality control process, results from different combinations of reference sets, polynomial fits, and independent parameters were compared to finally converge on the corrections provided in Table 8 using the WG11 Bayesian implementation (see Section 5) .
In the majority of cases, using [Fe/H] as the independent parameter provided the optimal correction.For HR10|HR21, the investigations showed that for all the nodes, there was a different trend with [Fe/H] between dwarfs and giants.Using log g or T eff as the independent parameter did not capture the correction sufficiently either.Ultimately a two-parameter correction against both [Fe/H] and log g was used in those cases, as listed in Table 8.

Combining SETUPs for final Working Group 10 parameter homogenisation
The final step in the WG10 homogenisation process was to combine the per SETUP homogenisation into the per CNAME homogenisation, which is the single star catalogue for all CNAMEs analysed within WG10.Due to the bootstrapping procedure used to construct the reference sets, each homogenisation per SETUP was ultimately bootstrapped onto the WG11 scale.
Table 9 lists the mean and standard deviation of the difference between the homogenised values and reference values per SETUP.In all cases, the offsets are close to zero and within the spread of the differences (∆) given by the standard deviation, which indicates very good agreement between the per SETUP homogenised values and the reference values.The dispersion of the difference (standard deviation) is generally two or three times higher than the typical uncertainties of the homogenised stellar parameters.Therefore, due to the bootstrapping procedure, the homogenised values per SETUP were all assumed to be on the WG11 parameter scale and can thus be combined without further correction.
The results per SETUP were combined into the final WG10 per CNAME catalogue.The majority of CNAMEs were observed using only one SETUP.However, for the cases in which results from multiple SETUPs were available (e.g. the reference sets) an order of priority was implemented reflecting the science programmes and calibration samples as specified by GES_TYPE (see Table 1 for definitions).The priority order depending on the GES_TYPE are given in Table 10.Article number, page 14 of 31    The S/N and NN both show a decrease in scatter and a more refined stellar evolution morphology in the Kiel diagrams with better quality results (e.g. more signal and more node results contributing, respectively).For T eff , log g, and [Fe/H], the errors have a significant peak around a particular value (∼65 K, ∼0.17, ∼0.08, respectively), which is reflected in the binning of those quantities in the Kiel diagrams.However, the scatter is reduced with bins of decreasing error.
The metallicity distributions are less informative on this aspect.For the error quantities shown in Figure 10 (m, n, and o), the bulk of the values lie about a single value and thus fall mainly in a single bin.
However, Figure 10 k shows the peak of the metallicity distribution moving towards solar with bins of increasing S/N.This reflects the sampling of the medium-resolution data, as the fainter targets are typically more distant, and so the peak reflects the more metal-poor populations of the thick disk and the halo.
Article number, page 16 of 31   This discussion illustrates how the quality measures can, and 836 indeed should, be used to refine the WG10 dataset for any study 837 in Galactic Archeaology.To be most effective these quality mea-838 sures should be considered both individually and together.

Verification of the homogenised Working Group 10 stellar parameters
For the parameter homogenisation, as shown in Figures 6 and     7, key sub-samples were included within the parameter reference set.Verification of the WG10 homogenisation as part of the greater homogenisation of Gaia-ESO is explored in detail in Hourihane et al. (2023) with particular attention to these subsamples.As such, only the FGK benchmark stellar parameters, the WG11 cross-match stellar parameters, and the GC metallicities are reviewed in this section.
Figure 11 shows a comparison of the reference stellar parameters for the FGK benchmark stars with the values determined in both WG11 and WG10, and a comparison of the WG11 stellar parameters with the final WG10 stellar parameters for the crossmatch between WG11 and WG10.The median and standard deviation of the differences are also given.Overall, the agreement between these reference sets and the final parameters is good, with relatively small offsets and small spread in differences within the typical errors of the stellar parameters.
Inspecting further the stars with large differences (> 3σ) to the benchmark parameters, for WG11, 61_Cyg_B shows a notable disagreement in both T eff (-215 K) and [Fe/H] (0.34 dex).
It is a close-to-solar-metallicity K dwarf which the nodes analyses should have dealt with quite well.However, the spectrum analysed was in fact non-UVES archive spectra from the benchmark spectral library made to be UVES-like for the WG11 node analyses in an expansion of the calibration effort.Making the spectrum UVES-like may have caused an issue with the archived data, although this star is not in common with WG10 and thus was not used in the WG10 homogenisation.
HD122563, the very metal-poor ([Fe/H] = -2.64)luminous giant (log g = 1.61), represents a difficult combination of parameters.The difficulty shows up as a significant difference compared to the benchmark parameters in T eff for the WG11 result (-383 K), and in log g for the WG10 result (0.84).HD84937 is also a metal-poor ([Fe/H] = -2.03),albeit dwarf, star for which there was a significant difference in T eff for the WG10 result compared to the FGK benchmark result.
The WG10 node analyses also struggled with luminous giants, as shown by the trio of low log g stars with large differences compared to the benchmark log g.
Finally there is a difference in [Fe/H] of 0.34 dex for the WG10 results for the K giant, Arcturus, placing it as more metal poor than the FGK benchmark accepted value.
Overall, these discrepancies indicate that the WG10 and WG11 results in the parameter space of metal-poor stars and luminous giants are not as robust as in the parameter space of more metal-rich, high gravity stars within the survey dataset.This is not unexpected, as metal-poor stars and luminous giants were not the primary FGK science targets of the Gaia-ESO survey (Gilmore et al. 2022;Randich et al. 2022),thus ensuring robust parameters for these types of stars was not the main focus of the node analyses.
However, the metal-poor stars, whilst few and only comprising two benchmarks, end up in quite good agreement with the reference [Fe/H] values for the WG10 homogenisation.As described above, to supplement the very few metal-poor benchmarks, the mean WG11 [Fe/H] value per GC was used to try to anchor the metal-poor end in the WG10 analysis by imposing that value on the respective highly probable cluster members in the WG10 and including them in the parameter reference set.
Figure 12 shows the outcome of this effort, by comparing the mean [Fe/H] values of the GC members in WG11 and WG10 to the reference values (Harris 1996(Harris , (2010 edition)) edition)), where the 903 WG11 values are those that were imposed in the reference set if 904 needed.The mean and standard deviation of each WG sample to 905 the reference values are also given.

906
The majority of the mean GC values for both WG10 and 907 WG11 are within 0.1 dex of the reference values, with M2 being 908 the main outlier.There is a large spread in [Fe/H] values for the 909 WG10 stars defined here as members of NGC4833, although the 910 mean value agrees well with the reference value.Otherwise, the 911 spread in [Fe/H] per GC, particularly at the metal-poor end, are 912 reasonable, and the mean [Fe/H] of each GC for WG10 generally 913 track with WG11, indicating that the attempt to anchor the metal-914 poor end of the WG10 dataset with the WG11 GC mean values 915 was relatively successful.The strategy of the chemical abundance homogenisation was to 919 combine in a single step the per spectral line element abundances 920 derived by each node for each SETUP per CNAME.We refer to 921 these element abundances as the line-by-line (LbL) abundances.922 Hence, all SETUPs were combined at once per CNAME rather 923 than homogenising the results for one CNAME one SETUP at a 924 time and then combining the per SETUP results.

925
The wavelength ranges across the solar spectrum for each of 926 the four WG10 SETUPs, and the location of the spectral lines 927 used by the nodes to measure abundances are shown in Fig- 928 ure 13.The list of all lines measured is provided in Table A.1.929 These are taken from the Gaia-ESO line list (Heiter et al. 2021).930 In the following tables and figures, a capitalised format for des-931 ignating the elements is used in which the final digit indicates 932 the ionisation state (1=neutral, 2=singly ionised).This matches 933 the data model used within the survey.

934
Table 11 gives the number of CNAMEs analysed by each 935 node per element species per SETUP as well as the specific line 936 list references.Two numbers are provided: 'D' is the number 937 of detections, and 'L' is the average number of spectral lines 938 measured per species.The number of CNAMEs in the WG11 939 cross-match to all the WG10 SETUPs with WG11 abundances 940 per element species is also given.There was no requirement on 941 the nodes to measure the abundance of every possible element in 942 all four SETUPs.Hence, as can be seen in Table 11, the node re-943 sults are a complex dataset with varied coverage of the chemical 944 abundance space.

946
Table 12 gives some broad rejection criteria that were applied to 947 specific element datasets, as extreme values (in absolute abun-948 dance) were identified that were not reasonable compared to the 949 bulk of the distribution.Further cleaning was of course possi-950 ble, but the goal was to take the node analyses as provided and 951 to try to use as much information as possible.Quality measures 952 such as S/N, NN, and errors should be used to refine the dataset 953 as needed.This allows for differences between scientific studies 954 regarding the tolerable level of uncertainty in the data.The homogenised abundances were then assessed using the quality control samples that are described in the following section.As the full distribution could then be inspected, this revealed issues with the corrections that could not be detected on the much smaller cross-match with WG11.Each element distribution was inspected, and adjustments to the correction were made when warranted such that the homogenisation was run again.This iterative process from correction to homogenisation to quality control to correction was repeated several times to home in on the optimal homogenisation.The strategy used in the parameter homogenisation (i.e.bootstrap each SETUP onto a reference set based on the previous SETUP plus other reference stars) could not be employed for the abundance analysis due to the decision to homogenise all spectral lines for an element across all SETUPs at once.Thus, for each CNAME, all the spectral line abundances from all the possible setups from all the possible nodes were combined to derive the final abundance.There was no homogenisation of each setup Article number, page 19 of 31   were no abundances available for the WG11 cross-match stars, or there were no WG11 abundances available at all.
In the general case, for each ELEMENT+SETUP+NODE combination a set of corrections were calculated between the node values and the reference set values for each parameter as given in Table 13.
Table 13.Set of corrections calculated for each parameter for each EL-EMENT+SETUP+NODE combination.The corrections were calculated on the binned difference between the node values and the WG11 values in the associated SETUP cross-match.The corrections were calculated for the whole sample as well as separately for the dwarf (log g> 3.4) and the giant (log g≤ 3.4) samples.
An example of the set of corrections that was calculated for a particular element for a particular node for a particular SETUP is shown in Figure 14.This example (CAUP+HR21+MG1) shows how the difference between the node values and the reference values can behave differently depending on the independent variable that is used and how the sample is or is not separated.In this   In the second case, there were two combinations for which 1058 the only option was to scale to the solar abundance.These two 1059 combinations are listed in Table 15.In the third case, comparison of the HR21 BL to HR10 BL abundances revealed an exaggerated upturning to enhanced abundances at the metal-rich end.Further exploration showed that this was a difference between the giants and dwarfs.The dwarf sample did not show this in neither HR21 nor HR10.This upturning seemed extreme for an astrophysical effect, but it could not be compared to the reference sample as there were no stars in common between HR21 and WG11 for the bulge sample.
However, there was the cross-match sample between HR21 and HR10 to examine.The equivalent set in HR10 did not show such an extreme upturning at the metal-rich end, though for some abundances it was slightly present which could indicate an astrophysical effect.The goal was to put giants in HR21 onto the same scale as giants in HR10 but not to remove the feature completely if present in both sets of results.
Thus HR21 giants cross-matched to the HR10 giants sample were used to remove any systematic without erasing a potential astrophysical signature.However, just because the giants in HR21 behaved differently compared to the dwarfs in HR21, this did not necessarily mean that the dwarfs in HR21 behaved the same as those in HR10.It was necessary to investigate a correction to HR10 for the dwarf targets in HR21 to also ensure all targets were put onto the HR10 scale.As HR10 was corrected onto the WG11 scale separately, this was carried out first.Then HR21 was corrected onto HR10, which had already been corrected onto the WG11 scale.The SETUP+NODE+ELEMENT combinations for which the corrections needed to be calculated are listed in Table 16.Notes.List of SETUP+NODE+ELEMENT combinations for which the sample showed an upturn at super-solar [Fe/H] in HR21 but not in HR10, HR21toHR10-WG11.
Figure 15 illustrates the process of determining and applying the correction using the Vilnius and Ti i results as an example.The panels show the Ti i abundances against [Fe/H], comparing HR21 with the uncorrected HR10 (HR10uncor), HR10 corrected to WG11 (HR10cor), and WG11.The first row shows the crossmatch to HR10 for the bulge sample (BL), the second row shows the giant sample (GT) and the third row shows the dwarf sample (DW).The first column shows HR21 uncorrected.The upturn at super solar is clear in the HR21 giant sample (we note that the bulge stars are also giants) when compared to the HR10 giant sample, the dwarfs in both HR21 and HR10, and WG11.The second column shows the linear, quadratic and cubic fit to the difference in Ti i values of the cross-match between HR21 and HR10uncor and HR10cor for the bulge, giants, and dwarfs.The third column applies the correction from the quadratic fit to the HR21 values in each case.The procedure successfully scales HR21 and HR10 to WG11 while retaining any subtle potentially astrophysical effects.
Article number, page 22 of 31  Thus, in the homogenisation procedure, if the final abundance was based on a single node value that did not have an as-sociated error, the appropriate relation was used to provide an es-1133 timate of the uncertainty, which was then reported as the final er-1134 ror on that abundance.In this way, all values in the WG10 abun-1135 dance homogenisation have associated error values as shown in 1136 Figure 16.
1137 However, we observed some extreme outliers and large 1138 spread in error values for some elements.Figure 16 shows the 1139 errors by sub-sample of NN, in which the lowest errors and 1140 highest S/N are typically represented by the maximum NN sub-1141 sample.The highest errors typically occur when less than three 1142 node results are combined.In particular, when (and despite the 1143 fact that) the S/N is high (∼ 100), the homogenised error is also 1144 high (> 2), such as for MG1, SI1, TI1, and FE1.This may not 1145 just be attributed to a better result by combining more nodes, but 1146 also to differences in the level of data quality for which nodes 1147 reported results.A consistent error model imposed across the 1148 nodes would have improved the resulting dataset.The reference set used for the calibration of the WG10 chemi-1152 cal abundances was the cross-match with WG11 as described in 1153 Section 7.3.Detailed quality checks on the WG10 homogenisa-1154 tion regarding key sub-samples in the context of the full survey 1155 have been carried out in Hourihane et al. (2023).In this work, 1156 we only inspect the comparison to WG11 in Figure 17.

Fig. 1 .
Fig. 1.Distribution of key values available for use in the WG10 analysis per SETUP: signal-to-noise, radial velocity, error on radial velocity, and rotational velocity.The bin size (Bin) for each parameter is given.

3. 2 .
Active only in the parameter phase: IAC: The code FERRE (see Allende Prieto et al. 2014, and references therein) was used.The strategy was to search for the atmospheric parameters of the best fitting model among a grid of pre-computed synthetic spectra for each observed spectrum.

142 3 . 3 .
Active only in the abundance phase: 143 Arcetri: Equivalent widths of the Li and the nearby Fe line 144

358 5 . 3 .
Working Group 11: Homogenisation of [Fe/H]For the [Fe/H] homogenisation, the model was obtained using 360 the results provided for 34 of the 35 FGK benchmark stars (with 361 results for a total of 565 spectra).One benchmark (HD140283) had no reference [Fe/H] value but good values of T eff and log g.

363
The stars cover the interval between −2.64 and +0.35 dex (inindicate that the two scales (literature clusters and benchmarks) 398 have important differences.iii) The final metallicity for three of 399 the cool benchmark stars (GJ205, GJ436, and GJ581) was very 400 low.An inspection after the homogenisation revealed that the in-401 put node values are always much lower than the reference values. 402 the microturbulence is different from the 407 other parameters since there are no benchmark values that can 408 be used as reference.Instead, we made use of the Gaia-ESO mi-409 croturbulence calibration derived using iDR5 results to write the 410 prior for the true values of ξ:

Fig. 2 .
Fig. 2. T eff -log g diagram with the WG11 recommended results.PARSEC isochrones (Bressan et al. 2012) are shown for ages of 1 and 12.5 Gyr (violet and orange, respectively) and for the minimum and maximum metallicity indicated in each panel (dashed and solid lines, respectively).Red crosses are stars in open clusters, blue circles are stars in globular clusters, and the black starred symbols are the remaining stars.

Fig. 5 .
Fig. 5. Distribution of T eff with log g and T eff with [Fe/H] for each node for each SETUP.Unflagged but rejected IAC and MaxPlanck results are shown in red.

Figure 9
Figure9shows the homogenised parameters per SETUP and the final WG10 homogenised parameters as Kiel diagrams with a metallicity colour map.As no combining of values was performed, features in each per SETUP Kiel diagram can be iden-

Fig. 6 .Fig. 7 .
Fig. 6.Reference sets for HR15N and HR10|HR21 setups.Top row: Kiel diagrams for full HR15N sample, final HR15N reference set and WG11 cross-match with HR15N.Second row: Same as for the top row but for [Fe/H] versus T eff .Third row: Kiel diagrams of the four main samples: benchmarks (yellow), OC (magenta), GCs (green), and MW (blue) within the HR15N reference set overlaid on the full reference set.Fourth row: Same as for the third row but for [Fe/H] versus T eff .Fifth to eighth rows: Same but for HR10|HR21.Article number, page 13 of 31 Fig. 8. Difference of node (EPINARBO, Lumba, OACT) values to reference set values for T eff , log g and [Fe/H] for HR15N.The difference of the final homogenised reference set values from the reference set values per parameter for HR15N are shown in the bottom row..
Figure 10 characterises the final WG10 stellar parameters

Fig. 9 .
Fig. 9. Kiel diagram with a metallicity colour map for the per SETUP homogenised parameters: Panels from left to right: a) HR15N, b) HR10|HR21, c) HR21, d) HR9B, and e) the final WG10 homogenised parameters.All panels are on the same colour map scale.

Fig. 10 .
Fig. 10.Characterisation of the final WG10 stellar parameters with histograms (top row), bins in Kiel diagrams (middle row) and bins in metallicity distribution (bottom row).Specific panel content are: a,f,k) S/N; b,g,l) Number of Nodes (NN); c,h,m) Error on T eff ; d,i,n) Error on log g; e,j,o) Error on [Fe/H] 835

955 7 . 2 .
Homogenisation procedure for Working Group 10 956 chemical abundances 957 It was important to follow a methodical procedure to obtain 958 the optimal homogenisation of the WG10 LbL chemical abun-959 Article number, page 18 of 31

Fig. 11 .
Fig. 11.Comparison of WG10 (black) and WG11 (red) stellar parameters for: Left column) the FGK benchmarks stars against the reference values.The mean difference and standard deviation are given.Three sigma limits are shown as dashed lines.Right column) the cross-match between WG10 and WG11 against the WG11 values.The mean difference and standard deviation are given.

Fig. 12 .
Fig. 12.Comparison of mean [Fe/H] per GC for WG10 (black) and WG11 (red) against the reference values.The mean difference and standard deviation are given.

963A( 4 )
simple procedure was therefore employed, of which the 964 key steps are as follows.(1) Calculate the correction to the 965 WG11 element abundance scale for each element based on LbL 966 abundances for each SETUP for each node, using the set of 967 cross-matched stars of the SETUP to WG11.(2) On a per star 968 per element basis, apply the correction for each node for each 969 setup.(3) Reject LbL abundances following rules set by WG11.970 Take the median of the corrected LbL abundances across all nodes and SETUPs per element per star to calculate the final abundance for that star.(5)Take the standard deviation of the corrected LbL abundances for the element for the star as the error on the element abundance.(6)Take the number of NODE+SETUP analysed as the NN contributions to the abundance determination.

7. 3 .
Correction to Working Group 11 cross-match reference set

Fig. 14 .
Fig. 14.Difference between node values and reference values in example case of the CAUP analysis of Mg i measured in the HR21 spectrum.Top row: Differences against [Fe/H] for the full sample (left-black), the giants (middle-red), and the dwarfs (right-blue).Middle row: Same but against log g.Bottom row: Same but against T eff .The differences as median differences calculated per bins are shown as orange points with error bars.The linear fit (orange dot-dash) and quadratic fit (red dot-dash) are shown in each case.Article number, page 20 of 31

1
Median and standard deviation of the difference 2 Linear fit to the difference against:

1027
For each NODE+SETUP+ELEMENT combination, the dif-1028 ference was calculated between the LbL abundances and the 1029 reference abundance value for the WG11 cross-match (black 1030 points in Figure 14).The set was then divided into ten evenly 1031 distributed bins spanning the range of the reference values for 1032 the respective parameter (T eff , log g, or [Fe/H]).The median and 1033 standard deviation of the differences in each bin were then cal-1034 culated (orange points with error bars in Figure 14).The me-1035 dian difference and standard deviation, linear fit, and quadratic 1036 fit to the binned data points were then calculated (shown as or-1037 ange dot-dashed line and red dot-dashed line, respectively in Fig-1038 ure 14).The coefficients and goodness of fit for the range of cor-1039 rections were returned and examined.1040 Table B.1 gives the coefficients of the fit and the parame-1041 ter range for the final set of corrections.While useful numbers 1042 with which to derive a correction were returned for the major-1043 ity of SETUP+NODE+ELEMENT combinations (indicated as 1044 WG11xmat in the 'Calibration' column of the table), there were 1045 nonetheless cases for which there were not enough data points 1046 with which to work.1047 There were three exceptions to the general case: (1) Insuf-1048 ficient element abundances in the WG11 cross-match sample 1049 but a reasonably useful number in the rest of the WG11 dataset 1050 (WG11-full).(2) No WG11 abundances at all for that element 1051 (Scaled Solar).(3) Super-solar trend in HR21 compared to HR10 1052 (HR21toHR10-WG11).

7. 4 .
Figure16shows the error distributions against S/N for the homogenised WG10 abundances.For the LbL abundances, the error in the abundances was calculated as the standard deviation of the set of node LbL abundances used per target in the homogenisation.In some cases if a single line abundance from a single node was the only abundance available, then the error provided by that node was reported as the error.There were three situations in which errors would potentially end up missing from the final homogenisation: (1) No error was provided with the single line abundance measurement although errors for other measurements for the same element for that node were provided.(2) No errors were provided at all by the node for the LbL abundances of that element.(3) No errors were provided at all by the node for the LbL abundances of any element.Three relations were derived to complete the final errors.(1) An error relation with S/N was generated for each NODE+SETUP+ELEMENT combination.(2) An error relation with S/N was generated by combining all reported errors for all abundances provided by the node for the SETUP.(3) For each CNAME and SETUP, the spread in each abundance that had more than one line was calculated across the node values and was used to calculate an uncertainty relation with S/N per abundance per SETUP.

Fig. 16 .
Fig. 16.Errors on WG10-recommended abundances (in absolute abundance) against S/N on a log scale.Sub-samples per NN are shown as specified in the top-left panel.
the WG10 results across four SETUPs 1166 with analyses from multiple nodes that covered, often sparsely, 1167 different ranges in stellar parameters and chemical abundances 1168 was a challenging process.The goal was to produce a robust and 1169 well-calibrated single star catalogue that could be homogenised 1170 with the rest of the survey results.This meant optimally com-1171 bining the node results following the WG11 Bayesian inference 1172 method for the WG10 and WG11 stellar parameters as well as for the WG11 chemical abundances, while for the WG10 chemical abundance, a simple per analysis calibration to WG11 was carried out.Crucial to the robustness of the final WG10 catalogue is understanding the quality of the results.In particular, the S/N, NN, and errors are key to refining the sample for any scientific study.The final stellar parameters as a Kiel diagram and metallicity distribution, with a simple cleaning of S/N≥25 and NN≥2, is shown in Figure 18.The scatter is considerably reduced in the Kiel diagram with respect to the full sample.The shift of the RGB with metallicity is clearly discernible.The metallicity distribution shows a left-handed asymmetry indicative of the metal-poor contribution from the thick disk.Small peaks at -1.5 and -2.4 coincide with globular cluster samples.The final WG10 chemical abundances are shown in Figure 19 as [X/Fe] against [Fe/H].The abundances are binned by Article number, page 24 of 31

Fig. 17 .
Fig. 17.Final WG10 abundances against WG11 abundances for the CNAMEs in common with WG11 for each WG10 homogenised element.The median difference and spread are provided for each element.
the NN that contributed to each abundance.The greater the NN, 1190 the clearer the morphology of the distribution, illustrating how 1191 these quality measures can be used to interpret this complex and 1192 intriguing dataset.

Fig. 18 .
Fig. 18.Final WG10 stellar parameters with S/N≥25 and NN≥2 as a) Kiel diagram with metallicity colour map, and b) Metallicity distribution.

Table 1 .
Overview of WG10 spectral dataset

Table 2 .
SETUP and phase analyses carried out by each WG10 node.
that is, 250 it combines the measurements of all K different nodes for star 251 n.The covariance matrix,Σ param , takes into account the random 252 errors of each node and the correlations between their measure-253 ments.The mean vector − → µ n combines together the 'mean' that we 254 would write for each node separately in Eq. 3; in other words, it 255 is made of repeated entries for each node with the true.paramn 256 of the star and the corresponding node bias: 257− → µ n = (true.paramn +bias.param1,n , ..., true.param n +bias.paramK,n ),

Table 3 .
WG11 nodes that participated in the analysis of the final Gaia-ESO data release.

Table 4 .
Number of parameters provided per NODE per SETUP.
589For this reason, for the MW fields, it was recommended that 590 the nodes analysing the MW SETUPs combine HR10 and HR21 591

Table 5 .
WG14 Flags used by WG10 NODES.Notes.WG14 flags and associated descriptions as per the WG14 Dictionary used by the WG10 nodes in the parameter determination per spectrum.EP=EPINARBO, LM=Lumba, OT=OACT, IC=IAC, MP=MaxPlanck

Table 6 .
Cross-match of Stars (N CN ) and Spectra (N S P ) between WG11 and WG10 SETUPs.
Worley et al.:The Gaia-ESO Survey: medium and high resolution FGK analyses
eff , log g, [Fe/H]).The mean and standard deviation of the error for each parameter are also given.
Article number, page 21 of 31 A&A proofs: manuscript no.output case, the median offset was applied for the dwarf sample, while 1025 the quadratic fit against FEH was applied for the giant sample,

Table 14 .
Abundance corrections derived using the full WG11 dataset.
1056a correction.These combinations are listed in Table14.1057

Table 15 .
Abundance corrections derived using the Solar Chemical Composition.List of SETUP+NODE+ELEMENT combinations for which the sample was compared to the solar chemical abundance for deriving the correction offset, Scaled Solar.

Table 16 .
Extreme abundance enhancement present at Super-solar metallicity.