A catalogue of asteroseismically calibrated ages for APOGEE DR17. The predictions of a CatBoost machine learning model based on the [Mg/Ce] chemical clock and other stellar parameters

Context. Understanding the Milky Way's formation and evolution across cosmic epochs necessitates precise stellar age determination across all Galactic components. Recent advancements in asteroseismology, spectroscopy, stellar modelling, and machine learning, coupled with all-sky surveys, now offer highly reliable stellar age estimates. Aims. This study aims to furnish accurate age assessments for the Main Red Star Sample within the APOGEE DR17 catalogue. Leveraging asteroseismic age constraints, we employ machine learning to achieve this goal. Methods. We explore optimal non-asteroseismic stellar parameters, including T$_{eff}$, L, [CI/N], [Mg/Ce], [$\alpha$/Fe], U(LSR) velocity, and 'Z' vertical height from the Galactic plane, to predict ages via categorical gradient boost decision trees. Training merges samples from the TESS Southern Continuous Viewing Zone and Second APOKASC catalogue to mitigate data shifts, enhancing prediction reliability. Validation employs an independent dataset from the K2 Galactic Archaeology Program. Results. Our model yields a median fractional age error of 20.8%, with a prediction variance of 4.77%. Median fractional errors for stars older than 3 Gyr range from 7% to 23%, from 1 to 3 Gyr range from 26% to 28%, and for stars younger than 1 Gyr, it's 43%. Applicable to 125,445 stars in the APOGEE DR17 Main Red Star Sample, our analysis confirms previous findings on the young Galactic disc's flaring and reveals an age gradient among the youngest Galactic plane stars. Additionally, we identify two groups of metal-poor ([Fe/H]<-1 dex) young stars (Age<2 Gyr) exhibiting similar chemical abundances and halo kinematics, likely remnants of the predicted third gas infall episode (~2.7 Gyr ago).


Introduction 1
Galactic archaeology is the study of the formation and evolution 2 of the Milky Way (Miglio et al. 2017), with stellar age preci-As red giants are the preferred targets for studying the Galaxy, the APOGEE survey (Majewski et al. 2017) stands out as the most suited mission, having probed the vastest number of them across a large fraction of the celestial sphere in both the Northern and Southern Hemispheres in the infrared H band (1.51µm-1.70µm).The latest public release (APOGEE DR17; Abdurro'uf et al. 2022) contains data on 657,000 unique stars.
However, there is a limitation to using asteroseismic constraints for dating methods.The majority of stars observed by all-sky spectroscopic surveys do not benefit from asteroseismic data.The disparity between the availability of asteroseismic data and chemical abundance data has motivated the search for age-abundance relations, also known as 'chemical clocks' (da Silva et al. 2012;Nissen 2015).
Chemical clock modelling improved dating precision within the solar neighbourhood (Feuillet et al. 2018;Delgado Mena et al. 2019;Sharma et al. 2021;Hayden et al. 2022;Sharma et al. 2022;Moya et al. 2022).Nevertheless, applying chemical clocks to vast regions of the Galaxy beyond the solar neighbourhood is inefficient because of the significant scatter in abundance.This was observed, for example, in red clump stars by Casamiquela et al. (2021) beyond 1 kpc from the Sun.
Recent advancements relying on a data-driven approach have enabled the expansion of dating capabilities from the solar neighbourhood to the entire Milky Way Galactic disc.This approach is based on the spectroscopic determination of age for red giants, and relies on a function to model the flux of reference stars at each wavelength.
The Cannon (Ness et al. 2015;Ness 2018) was a pioneering method to implement this data-driven approach using APOGEE DR12 (Holtzman et al. 2015) stellar spectra from stars sampled from the Second APOKASC asteroseismic catalogue (APOKASC-2; Pinsonneault et al. 2018).Following the release of The Cannon, two other methods based on the same principle were developed."ASTRO-NN" (Leung & Bovy 2019) relies on a neural network to deal with high-resolution spectra (R∼22 500) from APOGEE DR14 (Holtzman et al. 2018).On the other hand, "DD-Payne" (Xiang et al. 2019) uses the same training scheme as "The Cannon" but combines it with a flexible and efficient tool for the simultaneous determination of several stellar parameters with full spectral fitting called "Payne" (Ting et al. 2019).DD-Payne was used to predict stellar parameter values for 6 million stars from ∼ 8 million low-resolution (R∼1800) spectra from LAMOST DR5 (Zhao et al. 2012).These three methods have achieved a maximum age precision of 30% because of the inherent limitations in extracting information from the subtle differences in red giant stellar spectra.
In order to avoid the limitations of these data-driven methods, a promising approach was developed by Anders et al. (2023).Instead of relying on stellar spectra, it directly utilises the stellar parameters from the APOGEE-Kepler catalogue (Miglio et al. 2021) as features for training a machine learning model, specifically an XGBoostRegressor (Chen & Guestrin 2016).
The work presented in this article differs from that of Anders et al. (2023) in three major respects.Firstly, the model presented in the present article is trained with a CatBoostRegressor (Prokhorenkova et al. 2018)  Zone (TESS SCVZ, hereafter).TESS, a recent asteroseismological mission, overcomes the limitations of the previous missions COROT (Alecian et al. 2007), Kepler, and K2 (Rendle et al. 2019), offering advantages for studying the vertical and radial structure of the Milky Way.
The goals of the present work are to compile a catalogue of asteroseismically calibrated ages for stars within the Main Red Sample of the APOGEE DR17 catalogue and to subsequently analyse the distribution of stellar parameters associated with the obtained ages.
The research sample studied here is described in Section 2. Section 3 provides basic concepts in machine learning and justifies the choice of the selected model.Section 4 deals with the choice of features for the model.Section 5 details all the various op-100 timisation processes employed to improve the accuracy of the 101 predictive model.In Section 6, the optimal performances of the 102 model and the associated results are presented, and in Section 103 7, the results are discussed.The conclusions of this article are 104 outlined in Section 8.

106
A sequential approach was adopted to achieve the highest accu-107 racy in predicting ages for APOGEE DR17.Initially, one train-108 ing set was used that is made of stars from the APOKASC-2 109 catalogue.However, upon observing a decline in model perfor-110 mance when tested on a TESS SCVZ sample, the decision was 111 made to merge the two datasets, creating what is referred to as 112 the MCK-APOKASC sample.The rationale for this combina-113 tion is discussed in Section 5.3.The resulting merged dataset 114 yields more robust predictions in both regions, contributing to 115 an overall improvement in the prediction accuracy of the model.116 The selection of APOKASC-2 as the primary dataset was 117 driven by three key factors.Firstly, this catalogue contains the 118 highest number of red giants in the Kepler field, that is, 6676 119 evolved stars.Secondly, it offers high-quality stellar ages due to 120 asteroseismic parameters obtained using five independent tech-121 niques from continuous monitoring by Kepler over a four-year 122 period.Notably, the resulting asteroseismic constraints allowed 123 the authors to reach fractional age uncertainties of mainly be-124 tween 0.6% and 5%, as illustrated in Figure 1b.However, it is 125 important to note that these uncertainties are of random origin 126 and do not reflect systematic errors in inputs or theoretical age 127 inferences (Pinsonneault et al. 2018).Thirdly, APOKASC-2 pro-128 vides a dynamically sampled representation of a large portion of 129 the Galactic disc, as depicted in Figure 1c.

130
To ensure the highest accuracy in stellar age data, the 131 APOKASC-2 sample was refined.Only stars with evolution-132 ary states determined through the asteroseismology method de-133 scribed in Elsworth et al. (2017) were retained, excluding those 134 identified spectroscopically.The resulting APOKASC-2 sample 135 spans a galactocentric radius exceeding 5 kpc, as calculated us-136 ing the astropy Python package (Astropy Collaboration et al. 137 2013, 2018, 2022).The closest stars to the Sun are located just 138 beyond the Local Bubble (Zucker et al. 2022); that is, with a he-139 liocentric spherical radius (R Helio ) surpassing 300 pc.2003), Reid & Brunthaler (2004) and Bland-Hawthorn & 150 Gerhard (2016).Additionally, Galpy was used to confirm the 151 consistency between the two techniques used to compute veloc-152 ity.The resulting stellar velocities are illustrated in a Toomre dia-153 gram, as presented in Figure 1e, revealing the identification of 32 154 kinematically distinct halo stars.The expected blurred metallic-155 ity gradient with age (Nissen et al. 2020) is displayed in Figure 156 1f, and the chemical dichotomy in the sample is illustrated in 157 Figure 1g.Overall, the MCK sample displays similar properties to the 178 APOKASC-2 sample, but there are a few differences.SearchCV method.Finally, we determined the optimal set of ran-202 dom seeds.

203
The GridSearchCV method, a scikit-learn (Pedregosa et al.The random seed is used to initialise the random number generator used by machine learning algorithms.The random number generator is used in many different ways during the training process; for example, it is used to initialise the weights of the model and to select samples for each batch.The reason for testing different sets of values for the random seed is that even small differences in the sequence of random numbers can have a large impact on the final accuracy of the model. The optimisation process was initiated by the selection of a set of features for a default grid of hyperparameter values.After computation of the predicted target values, a graphical check was conducted to ensure that the spread in predictions was well tions, and the final predicted label value (y pred ) is determined from Equation 1: (1) Gradient-boosting algorithms, such as XGBoost (Chen & Guestrin 2016) and CatBoost (Prokhorenkova et al. 2018;Dorogush et al. 2018), share common characteristics, including efficient handling of large datasets and support for parallel processing.Notably, they are recognised as state-of-the-art performers.
Comparing their performances on the MCK-APOKASC sample using XGBoostRegressor and CatBoostRegressor models reveals better results with CatBoost in terms of the variance between the validation and training sets.Specifically, the variance with CatBoost (see Table 2) is, on average, two times smaller than with XGBoost.Consequently, we decided to continue our analysis with CatBoostRegressor as our machine learning model.
The performance of CatBoost can be attributed to its robust decision-tree algorithm, leveraging Oblivious Trees (Ferov & Modry 2016) for outlier handling.Its L2 regularisation approach, applied to both leaves and nodes, is more effective at preventing overfitting compared to other algorithms.Additionally, CatBoost benefits from a distinct hyperparameter ran-dom_strength, controlling randomness in the tree construction process to prevent overfitting and improve generalisation.For details on the tuned hyperparameters, we refer to Table D.1 in Appendix D.
The final optimised results derived from the MCK-APOKASC sample are summarised in Table 2.These results match the case displaying the highest accuracy in predictions among 1000 different random configurations of the three different random seeds.These configurations imply the random splitting of the training-test sets (90%-10%), and the random instantiation of a CatBoostRegressor and the RandomSampler method (refer to Section 5.2).The variance is sufficiently small to confidently consider that the model is not overfitting the data set.Also, the baseline test reveals that the model is not underfitting the data set.

Feature importance metric
The importance of the features in the model predictions was evaluated using the Shapley value technique.A Shapley value quantifies the average impact of a feature on a model output magnitude.This technique satisfies a set of axioms that make it more reliable than other feature-importance calculation techniques (Young 1985).
Tree-based models in scikit-learn have a built-in featureimportance calculation method based on the Gini impurity index.However, Gini impurity-based feature importances may lead to inaccurate results, as a large number of distinct values tend to lead the associated feature to a higher importance score with Gini impurity, even if it might not be as informative as suggested.
In our analysis, we frequently noticed that the rank of the most important features was permuted between the results based on the Gini index and the Shapley values.Therefore, the importance of the features is based on the Shapley values.The Shapley values were computed thanks to the SHAP Python package (Lundberg & Lee 2017).The plot of the Shapley features importance obtained on the MCK-APOKASC test set is displayed in Section 6.1 (refer to Figure 4e).

Feature selection
In this section, the rationale for selecting each age-correlated feature is described.The model was trained with a progressively expanding set of features, each known or expected to be age dependent.The addition of each new feature depended on its capacity to improve the capture of age dispersion, depict overall trends, reduce the variance of the model, and improve accuracy in age determination.There is a separate and dedicated section (refer to Section 5.4) dealing with an extra feature with no direct correlation to age.Notably, [Fe/H] was not a selected feature in the model for two reasons.First, there is an expected blurring of the age-metallicity relationship for large data sets with wide dispersion in age (Nissen et al. 2020), rendering the correlation between [Fe/H] and age statistically insignificant.Second, there is a negligible Shapley value for [Fe/H] compared to other features when included in the model.Eventually, the impact of [Fe/H] was accounted for through the [α/Fe] ratio, which serves as a proxy for metallicity.

[Mg/Ce]
The Galactic chemical clock, chosen as the primary feature, showed the best performance in age prediction.To identify this clock, abundance correlations with age were computed using the  2020), which continue to be observed in APOGEE DR17.
[CI/Fe] and [N/Fe] were not considered as Galactic chemical clocks, as they track stellar evolution rather than chemical Galactic evolution (Hasselquist et al. 2019).
The Spearman coefficients are detailed in Table 1, calculated using the scipy.statspackage (Jones et al. 2001).Chromium  (Osorio et al. 2020;Abdurro'uf et al. 2022), on its reliability in chemical enrichment studies (Gonzalez et al. 2011;Kobayashi et al. 2020), and on the recommendation of its use as a reference element (Weinberg et al. 2019(Weinberg et al. , 2022) )

405
The second selected feature was the α-dichotomy ratio used to 406 isolate the Galactic disc in two distinct chemical populations 407 (Adibekyan et al. 2012).When using regression trees, as long 408 as [α/Fe] is a parameter of the model, there is no need to sepa-409 rate the analysis into the α-rich and α-poor components of the 410 Galactic disc, as done in Delgado Mena et al. (2019) (Rodrigues et al. 2017), employed for 421 calculating ages in APOKASC-2 and MCK, respectively, do not 422 take into account the stellar Galactic population membership 423 (bulge, disc, halo).Consequently, this information is not incor-424 porated into the final CatBoost model.In other words, the model 425 assigns similar age confidence to bulge or halo stars if their pa-426 rameters match the training set, regardless of population.

427
The final age catalogue (refer to Section 6.3) reveals that 2% 428 of the stars are in the halo, with the rest in the disc.Probabilities 429 of the Galactic population membership of each star were calcu-430 lated using the method described in Bensby et al. (2003Bensby et al. ( , 2005)), 431 updated with priors from Anguiano et al. (2020).

T eff , Z, and L
The remaining features selected were the effective temperature, the vertical distance from the disc, and the luminosity.Given that the total evolutionary lifetime of a star on the main sequence scales with its mass (Serenelli et al. 2017), incorporating features related to stellar mass significantly improves model performance.This improvement is particularly evident in better fitting age dispersion.As the effective temperature is linked to stellar mass through asteroseismic scaling relations for red giants (Gaulme et al. 2016), it was selected as a relevant feature.The vertical distance from the Galactic disc (Z) is crucial for model accuracy, considering the known vertical gradient in the stellarmass distribution across the Galactic disc (Miglio et al. 2012;Casagrande et al. 2015;Hon et al. 2021).Z ranked among the most impactful features, leading to an improvement in accuracy (refer to Figure 4e).
To address the generation of fractional residuals in age exceeding values of 100%, luminosity (L) was chosen over log(g).Log(g) results in outliers reaching up to 150%, while L successfully mitigates this issue.Including both L and log(g) does not improve accuracy but increases the variance of the model, likely because of the lower Shapley feature importance of log(g) compared to luminosity (SHAP = 0.053 vs. SHAP = 0.063).
As luminosities are not provided in the second APOKASC catalogue, we computed them using the same method as in Mackereth et al. (2021) for consistency reasons.This method relies on bolometric corrections and the use of a 3D dust map library.The bolometric corrections in the K s band were computed using the bolometric-corrections code (Casagrande & VandenBerg 2014, 2018a,b), with a preference for the K s band given its lower sensitivity to extinction.
As bolometric corrections depend on reliable distances, stars with negative parallaxes were removed, and a fractional parallax uncertainty criterion ( f pu = σ p /p, where σ p is uncertainty on parallax and p is parallax) was applied to filter out stars with f pu > 0.2 (Bailer- Jones et al. 2021).After applying these filters, 6466 APOKASC-2 stars remain.Finally, the reddening ( E(B-V) ) was computed using the MWDust code (Bovy et al. 2016) and the 3D dust map library from Green et al. (2019).
Except for Z, which shows no significant Spearman correlation 504 with age (ρ=-0.05,p=4.7×10 −6 ), all features exhibit visible cor-505 relations with age (refer to Figure 3).However, it is important to 506 note that the correlation with age for [α/Fe] is significant only 507 for the α-poor component, aligning with the findings of Delgado 508 Mena et al. (2019).Additionally, T eff exhibits noticeable scatter, 509 increasing as the temperature diminishes towards 4000 K. Regression trees, which are designed to minimise mean squared 513 error (MSE), may exhibit bias in the presence of skewed target 514 variables, such as age (refer to Figure 3).Higher values dispro-515 portionately influence the optimisation process, leading to im-516 balanced splits in a tree's nodes and potential isolation of data 517 tails (Hastie et al. (2009), refer to their section 10.7).

518
To address this, log-transforming the data reduces the impact of 519 high values, improving accuracy for the majority.Tests with age 520 as the target variable demonstrate a 33% mean error in residuals 521 and a fractional residual maximum outlier of 359%.Logarithmic 522 transformation reduces these values to 28% and 281%, respec-523 tively, improving prediction accuracy.It is important to note here 524 that residuals were computed by converting ages back to the lin-525 ear scale.

526
Ages older than the age of the Universe were not included in 527 the model training (13.77 < Age(Gyr) < 20).There are several 528 reasons for this decision.Firstly, the model has been observed to 529 predict such ages to a noticeable extent.This is expected because 530 these ages were included as inputs for the model.Secondly, these 531 ages are known to be mainly due to unconstrained systematic er-532 rors in the stellar modelling (Pinsonneault et al. 2018).Thirdly, 533 setting these ages arbitrarily to the age of the Universe would re-534 sult in the generation of fabricated data with age-stellar param-535 eter inconsistencies.This last point underscores a fundamental 536 limitation of machine learning models in general; namely their 537 lack of built-in methods to explicitly incorporate uncertainties 538 in the input parameters.Models assume that the input data are 539 accurate and the training data are representative of the underly-540 ing distribution.Therefore, to mitigate the generation of poten-541 tial machine learning artefacts, the machine learning model was 542 eventually applied solely to APOGEE stars with a fractional lu-543 minosity uncertainty inferior to 30% (refer to Section 6.3).544

545
A noticeable data imbalance exists between ages older and 546 younger than 10 Gyr, where 'imbalance' in machine learn-547 ing refers to a skewed or unequal distribution of data classes.548 This imbalance was suspected to contribute to increased mean 549 fractional residuals at the oldest ages.To address this issue, 550 the 'oversampling' technique was applied using the Imbalance-551 Learn package (Lemaître et al. 2017).Random oversampling 552 was chosen as it involves duplicating existing data without the 553 need to synthesise any.Importantly, oversampling was applied 554 only to the training set.The approach sets a threshold at 10 Gyr, 555 classifying data beyond this age as the minority class and every-556 thing below as the majority class.While experimenting with dif-557 ferent thresholds, the one at 10 Gyr yields the best age accuracy 558 performances.Overall, this oversampling significantly improved 559 accuracy performances.-The maximum fractional error in predicting age is 131%, indicating instances where the model predictions deviate significantly.
-The overall median fractional age error is 21%, decreasing as the stellar age increases, except for a specific range between 11 and 13.77 Gyr. -

Additional feature 610
To address the deterioration of the performance of our model,

611
the APOKASC-2 sample was merged with the MCK sample, 612 and an additional feature was added to the model.This approach effectively resolves the issue.Indeed, it results in improved reliability, robustness, and accuracy of the model predictions for both APOKASC-2 and MCK data.
The most pronounced difference between the two samples is the sign shift in radial velocity U(LSR).In the merged sample, stars below the Galactic plane mostly display negative speeds and stars above the Galactic plane mostly display positive speeds.The associated mean and standard deviations in U(LSR) are (µ = -19.5,σ = 42.5) for the MCK sample and (µ = + 47.5, σ = 51.3)for the APOKASC-2 sample.This asymmetry in U(LSR) in the region (8 < R(kpc) < 9, -1 < Z(kpc) < 1) is expected from the robust measurements of the three-dimensional velocity moments presented in the detailed Galactic disc kinematics study with LAMOST K giants in Ding et al. (2021) (refer to their figure 6).Symmetries and asymmetries in the (U, Z) and (W, Z) planes are considered indicators of breathing and bending velocity motions in the Milky Way (Ding et al. 2021) but also in other disc galaxies (Kumar et al. 2022).Ding et al. (2021) showed that the local asymmetry discussed earlier does not have a consistent shape and extent throughout the entire Galactic disc.In fact, it does not exist in some regions of the Galactic disc.Therefore, the U(LSR) feature was added to the model in order for it to be able to generalise its predictions across the disc.
Finally, although U(LSR) shows no significant Spearman correlation with age (ρ = 0.059, p = 8.99•10 −7 ), incorporating U(LSR) further improves the performance of the model by reducing the fractional error, particularly for the youngest and oldest age groups.

Unreliable [Ce/Fe] abundances
The research by Casali et al. (2023) using APOGEE DR17-TESS-Kepler-K2 data suggests that combining cerium with αelements is a promising proxy for understanding star formation.However, it also suggests that uncertainties, especially in Based on these studies, criteria were implemented: excluding [Ce/Fe] with uncertainties greater than 0.2 dex, removing flagged abundances, and establishing a threshold at [Ce/Fe] = -0.46dex based on the median minus 1.5 times the interquartile range (M − 1.5 • IQR).The unreliable values of [Ce/Fe] account for 2.6% of the dataset.
No threshold was applied to the highest values of cerium (0.5 < [Ce/Fe] (dex) < 1.4) for two reasons.Firstly, they are expected from the study of Contursi et al. (2023), as these values have been observed in Baryum stars, which are known to have higher levels of barium, cerium, zirconium, ytterbium, and lanthanum    Their ages were computed using asteroseismic masses, GALAH DR3 temperatures (Buder et al. 2021), and stellar modelling with the code BSTEP (Sharma et al. 2018).GALAH data were used for age-abundance analysis.
To ensure a fair evaluation of the model on the K2-GALAH catalogue, only stars with fractional age uncertainties of lower than or equal to 30% were sampled.Also, stars with fractional luminosity uncertainties of greater than 30% were excluded.Consequently, the K2-GALAH testing sample comprises 371 stars (refer to Figure C.1).
Figure 5 displays residual distribution plots for the K2-GALAH test set.Lower and upper thresholds of absolute fractional residuals correspond to the third quartile and the upper edge in the box plot of Figure 5b.Error bars represent mean errors in ages for each age bin, except the last one, where bars represent mean errors for stars with predicted ages of between 6 and 13 Gyr.
Given the smaller size of the K2-GALAH test set (371 stars vs. 653 stars in MCK-APOKASC), the Wilcoxon-Mann-Whitney (WMW) test was chosen over the KS test.The reason behind this choice is that the KS test is recognised as being potentially unreliable when sample sizes are significantly different, as it relies upon the comparison of the empirical cumulative distribution function of the two samples.The WMW test yielded the following test statistic and p-value: (ρ = 68820.5,p = 1.000).Therefore, insufficient evidence exists to conclude differences in the distribution of fractional age errors between the two samples.Consequently, contrary to results in Section 5.3, no significant difference in model performance is observed when applied to independent data.This consistency can be explained by the train- ing data effectively capturing key stellar parameters of the distribution in the reference Main Red Star sample from APOGEE DR17 (for more details refer to Section 7).

The age map
The APOGEE Main Red Star Sample (MRS) comprises 372,000 stars randomly selected from the Two Micron All Sky Survey (2MASS) photometric catalogue (Skrutskie et al. 2006).
The MRS was designed to select red giants based on colourmagnitude criteria made to provide a clear set of rules for a robust selection function reconstruction.
To ensure the relevance of the MRS in Galactic archaeology, kinematic data are crucial.Obtaining such information involves a sequential process: first, cross-referencing the MRS with the Gaia DR3 catalogue, followed by excluding stars with negative parallax or fractional parallax error of greater than 20%, as described in Section 4.4.The refined sample, named MRS-Gaia, contains 283,196 stars, ensuring reliable astrometric information for deriving kinematic parameters.
Given the potential contamination from undesired targets due to colour-magnitude criteria, we inspected the MRS log(g) histogram (refer to Fig G.6,in Appendix G).The analysis reveals a bimodal distribution, indicating contamination with main sequence stars, which make up 42%.
To handle this issue, the stars with log(g) values surpassing 3.7 dex are excluded.This threshold is selected because it marks the shift from the declining trend of the initial component to the rising trend of the second component.Considering that a log(g) of around 3.5 dex is the theoretical upper boundary for red giants, the selection of the 3.7 dex threshold matches the usual 0.1 dex uncertainty associated with log(g) determination using spectroscopic methods.As a result of this refinement, the MRS-Gaia sample size is subsequently reduced to 176,516 stars.
When applying a CatBoostRegressor, the reliability of the model is generally higher when used on data with values similar to those in the initial training set.Therefore, the MRS-Gaia sample is restricted to values seen during the training phase, ex-787 cept for the variable 'Z', which captures the vertical age gradient 788 of the Galactic disc.Without the need for restrictions on Z, the 789 model has effectively captured the Z trend with age previously 790 found in Ness (2018) and Anders et al. (2023), as demonstrated 791 in Appendix F.

792
Table 4 illustrates each range of values for which a restric-793 tion in the application of the model was necessary.Given that 794 CatBoost assumes the accuracy of input data and the repre-795 sentativeness of training data for the underlying distribution, 796 age calculations were confined to stars with a luminosity un-797 certainty of lower than 30% to mitigate the potential introduc-798 tion of machine-learning artefacts.Additionally, stars displaying 799 flagged abundances in the features [CI/N], [Mg/Ce], and [α/Fe] 800 were excluded from the age-determination process.Moreover, 801 in accordance with the observations detailed in Section 5.5, only 802 stars meeting the criteria of [Ce/Fe] surpassing -0.46 dex and 803 having errors below 0.2 dex were considered for age determina-804 tion.

805
Ultimately, out of the 176,516 stars in the MRS-Gaia sam-806 ple, 125,445 stars were selected for age determination.In sum-807 mary, 51,071 could not be dated mainly because of their unre-808 liable abundances and luminosities.Additionally, but to a lesser 809 extent, this is also due to their parameter values not being en-810 countered during the training phase.The age map associated 811 with these data is displayed in Figure 6.In section 6.2, we demonstrate that the CatBoost model trained 815 on MCK-APOKASC is able to make predictions that extend ef-816 fectively to stars with reliable ages in the K2 Galactic program, 817 without a decrease in performance.Nevertheless, to assess the 818 potential limitation of the model when predicting stellar ages 819 in wider and future surveys, it is crucial to discuss whether the 820 Min Max Units Nevertheless, the APOGEE documentation online has already provided an analysis of previous selection functions relying on the Python code apogee, which are described in Bovy (2016) and in full detail in the associated documentation online1 .These analyses revealed that APOGEE has covered an increasingly large portion of the sky, with a far higher selection fraction in many fields of the Main Red Star sample between DR12 and DR16.Notably, it is already known from APOGEE documentation that the number of stars passed from 357,167 in DR16 to 372,458 in DR17.Finally, APOGEE has probed the vastest number of red giants for a great fraction of the sky in both the Northern and Southern Hemisphere (Abdurro'uf et al. 2022).

Comparison with other age maps
In the pioneering work of Ness ( 2018), age labels were provided for 73,180 red giant stars.The associated mean fractional error on age was reported to be 40%.The age map obtained with the CatBoost model is presented similarly to Figure 14 in Ness (2018).Hereafter, we refer to this age map as the Ness map.
Similarly, Anders et al. ( 2023) contributed a catalogue of 178,825 red giants from APOGEE DR17.Their training sample consists of stars exclusively from the Kepler field (3,060 stars) with ages sourced from Miglio et al. (2021).Achieving a median statistical uncertainty of 17% with an XGBoostRegressor, Anders et al. (2023) conducted validation plots that reproduced expected trends in chemistry, position, and kinematics with age.However, evaluating potential overfitting and underfitting, or assessing the bias and the variance of the model, is challenging due to the absence of learning curves in their study.
As discussed in Section 3, using the APOKASC-MCK training dataset for a CatBoostRegressor model results in more stable predictions (i.e.lower variance of the model) compared to training with an XGBoostRegressor model.Additionally, it is important to highlight that the dataset used in this study offers relevant advantages over the datasets used in the studies by Ness (2018) and Anders et al. (2023).For example, it includes more stars from the Kepler field (APOKASC-2 and APOGEE-Kepler) and also incorporates stars from the MCK catalogue.The inclusion of MCK stars is crucial, as explained in Section 5.3, because relying solely on data from the APOKASC catalogue for training leads to a significant drop in model performance when applied to stars from the MCK catalogue.
The enhancement in machine learning model performance is inherently linked to the amount of available data.This improvement can be attributed to several factors: a larger dataset provides more examples for learning, reducing the risk of overfitting.It better represents the diversity of cases, mitigating potential bias and allowing for the use of more complex models without overfitting concerns.Additionally, a more extensive dataset reduces variance, resulting in more stable predictions while minimising statistical uncertainty.Consequently, the model can adjust its parameters more robustly and reliably.
Comparing the age map presented in this article (refer to Figure 6) with those of Ness and Anders reveals several differences.
-The map presented here spans a more comprehensive age range than the Ness map but a similar age range to the Anders map.
-The stars in our map reach a greater vertical extension (-6 < Z(kpc) < 7) than in the maps of Ness and Anders, but a smaller extension along the Sun-Galactic centre axis (1.2 < 931 X(kpc) < 14.7).

932
-Our age map fills the gap of data found in the Ness map in 933 the region (Z < -3 kpc, 0 < X(kpc) < 6) and that in the region 934 ( Z ≈ -2 kpc, 4 < X(kpc) < 6) for the map of Anders.

935
However, some similarities are also apparent.

936
-Our map shows the flaring of the young Galactic disc (Age < 937 6 Gyr), as already outlined in the maps of Ness and Anders

938
(refer to Figure 6).(Reddy et al. 2006).We confirmed this possibility using the probabilistic kinematic stellar component technique described in (Bensby et al. 2003(Bensby et al. , 2005)).The results, summarised in Appendix E, namely in Table E.1 and Table E.3, reveal that every star within these groups exhibits a probability of belonging to the halo of greater than 97%.
Subsequently, the dynamics of these groups was examined with the distribution of vertical angular momentum in relation to age, denoted as L z (refer to Figure 9).This distribution unveils bulk L z values, from which these stars emerge as notable outliers.
In each group, stars show vertical angular momentum close to the group mean < L z >.Both group means deviate from the bulk mean by more than 3σ, indicating the emergence of two distinct kinematic groups.This notable difference in kinematical properties is likely attributed to the distinct gravitational potentials experienced by the stars in these widely separated regions (approximately 6 kpc apart).Indeed the first group is situated in the range (6.04 < X(kpc) < 6.34, -0.33 < Y(kpc) < -0.28, 0.53 < Z(kpc) < 0.62), while the second one occupies the range (7.24 < X(kpc) < 7.42, -6.89< Y(kpc) < -5.61, 0.88 < Z(kpc) < 1.07).
The existence of these young metal-poor stars is consistent with recent evidence pointing to a metal-poor gas infall, as reported by Spitoni et al. (2023).This gas infall event is estimated to have taken place around 2.7 Gyr ago.
We performed several tests to ensure the robust age derivation of these young metal-poor stars and to eliminate the possibility of machine learning artefacts.As these young red giants are absent from the comprehensive asteroseismic parameter catalogue of Hon et al. (2021), and corresponding light curves are unavailable in the TASOC database (Handberg et al. 2021;Lund et al. 2021), their ages could not be reliably calculated using state-of-the-art stellar modelling codes.However, one advantage of having Gaia luminosities is that the use of stellar tracks for mass determination is no longer required.Therefore, derivations of the stellar mass for red giants can be independent of the systematic errors originating from the effective temperature scale and the stellar tracks.
Consequently, the procedure to determine the stellar mass implies the use of the Stefan-Boltzmann and surface gravity equations, leading to an algebraic formula (refer to Equation 2) where L is the luminosity, g is the surface gravity, σ is the Stefan-Boltzmann constant, G is the universal gravitational constant, and T is the effective temperature.This equation makes mass determination only dependent on the quality of the data, but not on stellar models. (2) The associated uncertainties were propagated linearly.The above tests provide evidence that at least two of these stars were recently made from the most recent metal-poor gas infall, as described in Spitoni et al. (2023).As the classical two-infall model cannot predict this type of young lowmetallicity population, the discovery of these stars advocates for   CatBoostRegressor model by listing the advantages of treebased models.In order to build the model, a feature-selection phase was first undertaken.We justify the selection of each feature.We then explain the several strategies used to optimise the performance of the model.We describe how we address the issue of the age-skewed target distribution by rescaling it with a logarithmic transformation.The imbalance in the data is managed by applying the random oversampling technique, which increases the representation of the minority class (ages older than 10 Gyr).We identified a data shift between the APOKASC-2 and MCK samples.To address this shift, the two datasets were combined.
We discuss the unreliable nature of the lowest values of the [Ce/Fe] abundances and constrain the criteria to mitigate their impact.By removing the unreliable [Ce/Fe], the performance of the model is improved, particularly in terms of standard devia- instead of an XGBoostRegressor.Secondly, a different set of stellar parameters are used to train the model, including the [Mg/Ce] chemical clock.Finally, the training set is not only made of APOGEE red giants from the Kepler (Koch et al. 2010) field but also incorporates red giants observed with the Transiting Exoplanet Survey Satellite (TESS; Ricker et al. 2014) in its Southern Continuous Viewing 105

140Figure 1
Figure 1 summarises the key characteristics of the sample.The 141 age histogram (refer to Figure 1a) reveals some stars with ages 142 exceeding that of the Universe (13.77Gyr; Planck Collaboration 143 et al. 2020).The guiding radius computed using the Galpy code 144 (Bovy 2015) spans a wider range than current stellar positions 145 (refer to Figure 1c).Evolutionary states illustrated in the HR di-146 agram (refer to Figure 1d) clearly identify red clump stars.The 147 velocities were computed using the method outlined in Johnson 148 & Soderblom (1987), incorporating the latest data from Bensby 149 et al. (2003), Reid & Brunthaler (2004) and Bland-Hawthorn & 150Gerhard (2016).Additionally, Galpy was used to confirm the 151 consistency between the two techniques used to compute veloc-152 ity.The resulting stellar velocities are illustrated in a Toomre dia-153 gram, as presented in Figure1e, revealing the identification of 32 154 kinematically distinct halo stars.The expected blurred metallic-155 ity gradient with age(Nissen et al. 2020) is displayed in Figure1561f, and the chemical dichotomy in the sample is illustrated in 157 Figure1g.

158Fig. 1 :
Fig. 1: Ensemble of plots summarising the information on the APOKASC-2 component of the research sample.Panel 1a: Age histogram.A box plot is systematically added to the histograms.Panel 1b: Histogram of the random fractional uncertainties on age.Panel 1c: Histogram comparison of the current galactocentric distances and the guiding radii.Panel 1d: HR diagram of the sample.Panel 1e: Toomre diagram of the velocities in the galactocentric referential.Panel 1f: Age vs metallicity.Panel 1g: [α/Fe] vs [Fe/H] plane.The black point with error bars depicts the mean uncertainty for both parameters.

Fig. 2 :
Fig.2: Ensemble of plots summarising the information on the MCK component of the research sample.We refer to Figure1for the description of the panels.
, was employed to construct a grid of models with all 205 combinations of selected hyperparameters.This process allows 206 the identification of the model with the best hyperparameter val-207 ues through cross-validation, employing a 10 K-fold of the train-208 ing set.209 Cross-validation involves dividing the training set into multiple 210 subsets known as folds.The model is trained on several folds and is validated on a separate fold (the validation set) not used during training.This process is repeated, and the performance metrics are averaged across folds to provide a more robust assessment of the model's generalisation performance.The validation set helps tune hyperparameters and prevents overfitting by simulating how the model might perform on the testing set or any unseen data.
20 calibrated abundances from APOGEE DR17.The APOGEE calibrated abundances, denoted as [X/Fe], are obtained by aligning solar-metallicity stars to [X/M]=0.These abundances exclude stars with suspect or known incorrect values based on various criteria set by the APOGEE consortium.The Spearman coefficient (ρ) was used for correlation calculations because it does not require the assumption of a linear relationship.Stars with masses of greater than 1.8 M were excluded for [Na/Fe] and [Al/Fe] (refer to Section 1).The APOGEE DR17 did not provide [S/Fe] abundances for the stars in the training sample because of the unreliability of the associated spectra.Titanium (Ti) was excluded from the study due to persistent discrepancies between the APOGEE DR16 [Ti/Fe]-[Fe/H] trend and the optical trend as identified and discussed in Jönsson et al. (

(
Figure A.1, in Appendix A).These robust correlations align with the findings of Casali et al. (2020) and Casamiquela et al. (2021), who demonstrate that combinations of α and s-process elements make the most effective chemical clocks.[Mg/Ce] was chosen as the Galactic chemical clock rather than [O/Ce] because it displays the smallest intrinsic dispersion (σ [Mg/Ce] = 0.22) in the data.Extensive studies on the non-local thermodynamic equilibrium (NLTE) effects of magnesium in the H-band(Osorio et al. 2020; Abdurro'uf et al. 2022), on its re- selected was the carbon-to-nitrogen ratio.It 434 is important to know that the APOGEE catalogue comprises 435 two types of carbon abundance: [C/Fe], derived from carbon 436 molecule lines, and [CI/Fe], computed from neutral carbon lines.437 To emphasise the use of atomic carbon abundance, the carbon-438 to-nitrogen ratio is depicted as [CI/N].Studies on [CI/N] as an age indicator in red giants have 440 confirmed the correlation with age, albeit with some dispersion 441(Hasselquist et al. (2019) and references therein).According to 442Karakas (2010), given the mass range (0.64 < M(M ) < 3.48) 443 of the MCK-APOKASC sample, stars likely underwent only the 444 Article number, page 6 of 26 Thibault Boulet: A catalogue of asteroseismically calibrated ages for APOGEE DR17 first dredge-up, as their mass remains below the critical threshold of 5 M .This suggests a significant role for [CI/N] in indicating the evolutionary state within MCK-APOKASC, particularly impacting the surface composition of low-to intermediate-mass stars (0.8 < M(M ) < 8) (Karakas 2010).The scatter in the [CI/N] versus age relationship arises from various factors, including mixing processes, nucleosynthesis, and chemical evolution.In the model, the influence of chemical evolution was included by incorporating the Galactic chemical clock [Mg/Ce] as well as the [α/Fe] ratio.While [CI/N] is valuable for age predictions in red giants, it alone has limitations in providing accurate ages for individual stars (Salaris & Cassisi 2005).Additional factors, such as metallicity and effective temperature, influence stellar evolution and must be considered for reliable age estimates.Therefore, for robust age estimates, it is essential to combine the [CI/N] ratio with other stellar parameters.

Fig. 3 :
Fig. 3: Corner plot of the retained stellar features, with a significant correlation with age, in the full training sample (MCK-APOKASCK).The diagonal depicts the histogram associated with each feature and the triangular bulk displays the correlation trends between each feature.Each plot displays the associated Spearman correlation coefficient (ρ) with the p-value of the test.

Fig. 4 :
Fig. 4: Ensemble of plots summarising the final performances on the MCK-APOKASC training-test sample.Panel 4a: Evolution of the median of the absolute value for the fractional error on age per bin.Panel 4b: Comparison of the ages of reference and those predicted by the model.The black line is the identity function.|∆A/A| is the absolute fractional error on age.The black error bars represent the means of the errors in age.Panel 4c: Evolution of the standard deviation of the fractional error on age per bin.The vertical-horizontal bar depicts the mean value for the whole age range.Panel 4d: Histogram of the absolute fractional error on age.Panel 4e: Bar plot of the feature importance for all the retained features.Panel 4f: Plot of the learning curves.

Fig. 5 :
Fig. 5: Ensemble of plots summarising the performance of the model on the K2-GALAH sample.Panel 5a: Plot of the scatter between the predicted ages and reference ages.The black line serves to visualise the residuals.Panel 5b: Histogram of the absolute residuals between the predicted ages and the reference ages in absolute value.

Fig. 6 :
Fig. 6: APOGEE age map for the sample of red giants computed with CatBoost.

939--
The youngest stars (Age ≤ 2 Gyr) are mostly found close to 940 the Galactic plane (Z = 0 kpc) (refer to Figure 7), as also 941 revealed by the Ness and Anders maps.942 The expected gap of data within the Galactic plane towards 943 the bright Galactic centre for X < 6 kpc is also seen in all 944 three maps.This gap prevents us from gaining insight into 945 the age distribution close to the Galactic plane for this inner 946 part of the Galaxy.

Figure 7 )
Figure7).In order to unveil this gradient, the youngest stars 954

Fig. 7 :Fig. 8 :
Fig. 7: Plot of the youngest stars in the (X, Z) plane within the APOGEE map.

<
Fig. 9: Plot of the vertical angular momentum against the stellar age for the stars younger than 2 Gyr.Gaia DR3 T eff [Fe/H] log(g) L M Age T racks Age Model K dex dex L M Gyr Gyr 5413575344812320640 4936 +100 −100 −1.27 +0.1 −0.1 1.874 +0.038 −0.038 362.56 +103.13 −103.13 1.850 +0.57−0.57 machine learning model to provide a list of aster-1071 oseismically calibrated ages for the APOGEE DR17 catalogue, 1072 working with a sample of 6539 stars.One component of this 1073 sample comes from the TESS SCVZ catalogue (Mackereth et al. 1074 2021) cross-matched with the APOGEE DR17 catalogue.The 1075 other component comes from the second APOKASC catalogue 1076 (Pinsonneault et al. 2018) updated with data from APOGEE 1077 DR17.1078 We introduce the main concepts underlying the construction and 1079 evaluation of a machine learning model.We justify the use of a 1080

1113
Fig. G.2: Box plots of the T eff distributions for the MCK-APOKASC and MRS-Gaia samples.
The guid-179 ing radius spans a narrower range of distances.There are fewer 180 halo stars, which is expected given that the MCK sample is 181 smaller.Finally, the fractional age uncertainty distribution is 182 wider.

Table 1 :
Spearman correlation coefficients with their p-values for the relations involving [X/Fe] vs. age.The null hypothesis assumes no correlation between a given [X/Fe] and age.
further justify the choice of magnesium over oxygen to obtain the chemical clock with the most reliable performance.
. Moreover, 411 limiting the training set to the α-poor disc diminishes model per-412 formance due to fewer training data.Fe] has proved to be efficient in separating each of the 414 Galactic components in the [α/Fe] versus [Fe/H] diagram (Spi-415 toni et al. 2016; Rojas-Arriagada et al. 2017; Hawkins & Wyse 416 2018).This chemical tagging property is exploitable by regres-417 sion trees as they can identify distinct hidden trends in the data, 418 matching regions with different chemical-enrichment histories.419 Stellar age modelling codes, specifically BeSPP (Serenelli 420 et al. 2013) and PARAM 413[α/

Table 2 :
Summary of the best performances obtained on the MCK-APOKASCK sample.The median residual error is the mean difference between the ages of reference and those predicted by the model, divided by the reference ages.The two other metrics are defined in Section 3.1.
The standard deviation of the fractional error in a given age bin (σ) varies between σ=10 and σ=25, showing moderate fluctuations in model accuracy across different age ranges. cate- 706ative difference of 4.77%, also known as the variance of the 707 model (Figure4f).This suggests high performance on the MCK-708 APOKASC sample.Consequently, the CatBoostRegressor 709 model exhibits sufficient quality for a reliable application to the 710 APOGEE Main Red Star Sample (refer to Section 6.3).711 6.2.Performance on an independent set Given the inclusion of the MCK sample in the training set to address the data shift (refer to Section 5.3), we performed an extra evaluation to gauge the model's capability to generalise to new, independent data testing on a stellar age sample from Zinn et al. (2022) (K2-GALAH, hereafter).This catalogue is made of red giants enriched with asteroseismic parameters derived from various sources, including anterior K2 Galactic Archaeology Program data (Stello et al. 2017; Zinn et al. 2019) and APOGEE DR16 (Jönsson et al. 2020) spectroscopic data for calibration.

Table 4 :
Range of values used for age computation within the MRS-Gaia sample.APOGEE stars in the training set fairly sample the underlying tions of the Main Red Star sample in fairly underlying the population of red giants in the Galactic disc.
These stars cluster 989 in two distinct groups of abundance and stand out at a significance level exceeding 5σ from the mean [Fe/H] in the [α/Fe] versus [Fe/H] plane (refer to Figure 8).Interestingly, they display orbital eccentricities of greater than 0.79.Such high values, for associated low [Fe/H], are known indicators of stars with halo kinematics

Table 6 :
Partial content of the age catalogue.Only the most used stellar parameters in stellar physics are displayed here.The [Fe/H], T eff and [α/Fe] come from APOGEE DR17 and are known to be underestimated.The errors in log(g) are expected given the results of the neural network used by the authors of APOGEE DR17.The full age table with the complete set of stellar parameters can be found online.the three-infall chemical evolution model described in Spitoni 1068 et al. (2023).
Article number, page 16 of 26 Thibault Boulet: A catalogue of asteroseismically calibrated ages for APOGEE DR17 tion per age bin, which leads to a higher robustness in the result-The fully optimised model demonstrates performance charac-1098 terised by a decreasing trend in the median fractional error and 1099 in the standard deviation per age bin as the age increases.The 1100 median fractional error reaches its lowest point at approximately 1101 7% for ages between 10 and 11 Gyr, and its highest point at 43% 1102 for stars younger than 1 Gyr.The overall median fractional error The model yields an age map made of 125,445 red giants from 1110 the Main Red Star Sample within the APOGEE DR17 catalogue.The associated age catalogue is available online and Table 6 de- 1097 1109 1111 1112 picts some of its columns.