Issue 
A&A
Volume 636, April 2020



Article Number  A9  
Number of page(s)  15  
Section  Stellar atmospheres  
DOI  https://doi.org/10.1051/00046361/201937194  
Published online  06 April 2020 
ODUSSEAS: a machine learning tool to derive effective temperature and metallicity for M dwarf stars^{★}
^{1}
Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP,
Rua das Estrelas,
4150762
Porto,
Portugal
email: alexandros.antoniadis@astro.up.pt
^{2}
Departamento de Física e Astronomia, Faculdade de Ciências, Universidade do Porto,
Rua do Campo Alegre,
4169007
Porto,
Portugal
^{3}
Instituto Federal do Paraná,
Campus Foz do Iguaçu,
85860000
Foz do IguaçuPR,
Brazil
^{4}
Casimiro Montenegro Filho Astronomy Center,
Itaipu Technological Park,
85867900
Foz do IguaçuPR,
Brazil
Received:
26
November
2019
Accepted:
19
February
2020
Aims. The derivation of spectroscopic parameters for M dwarf stars is very important in the fields of stellar and exoplanet characterization. The goal of this work is the creation of an automatic computational tool able to quickly and reliably derive the T_{eff} and [Fe/H] of M dwarfs using optical spectra obtained by different spectrographs with different resolutions.
Methods. ODUSSEAS (Observing Dwarfs Using Stellar Spectroscopic EnergyAbsorption Shapes) is based on the measurement of the pseudo equivalent widths for more than 4000 stellar absorption lines and on the use of the machine learning Python package “scikitlearn” for predicting the stellar parameters.
Results. We show that our tool is able to derive parameters accurately and with high precision, having precision errors of ~30 K for T_{eff} and ~0.04 dex for [Fe/H]. The results are consistent for spectra with resolutions of between 48 000 and 115 000 and a signaltonoise ratio above 20.
Key words: stars: fundamental parameters / stars: atmospheres / stars: latetype / methods: data analysis / techniques: spectroscopic
ODUSSEAS can be tested by downloading the files from https://github.com/AlexandrosAntoniadis/ODUSSEAS, after reading the README instructions for clarifying the technical details.
© ESO 2020
1 Introduction
Spectra can be used to reveal the chemical composition of the stars, as well as important stellar atmospheric parameters, such as effective temperature (T_{eff}) and [Fe/H]. These parameters are crucial for the characterization of the stars and are therefore fundamental to understanding their formation and evolution. Furthermore, these stars influence the properties of the planets forming and orbiting around them (Everett et al. 2013). However, the spectroscopic analysis to derive these parameters has some difficulties to overcome. One of the main challenges is the correct determination of the spectral continuum, which is more problematic in cool and faint stars, such as M dwarfs. Their study is quite difficult and complicated compared to that of FGK stars, because in M dwarfs, molecules are the dominant sources of opacity. These molecules create thousands of spectral lines that are poorly known and moreover many of them blend with each other. Therefore, the position of the continuum is hardly identified in their spectra.
Methods that rely on the correct determination of the continuum only work well for the metalpoor and earliest types of M dwarfs (Woolf & Wallerstein 2005). Methods using spectral synthesis have not achieved as precise results as in FGK cases because of the poor knowledge of many molecular line strengths. Recently, spectral synthesis in the nearinfrared has lead to advances, as shown by several studies (Önehag et al. 2012; Lindgren et al. 2016; Rajpurohit et al. 2018; Passegger et al. 2019).
Regarding these limitations, most attempts at determining effective temperature and metallicity use photometric calibrations (Bonfils et al. 2005; Johnson & Apps 2009; Neves et al. 2012) or spectroscopic indices (RojasAyala et al. 2010, 2012; Mann et al. 2013a). Uncertainties in metallicity range from 0.20 dex using photometric calibrations to 0.10 dex when using spectroscopic scales in the infrared (RojasAyala et al. 2012). For T_{eff}, precisions of 100 K are reported, but significant uncertainties and systematic errors are still present, ranging from 150 to 300 K (Casagrande et al. 2008; RojasAyala et al. 2012).
One of the most popular methods to derive atmospheric stellar parameters for FGK stars is by measuring the equivalent widths (EWs) of many metal lines of the spectrum. Neves et al. (2014) used the MCAL code to measure pseudo EWs in the optical part of the spectrum for 110 M dwarfs observed in the HARPS GTO M dwarf program by setting a pseudo continuum for each line. These latter authors proceeded to the derivation of T_{eff} and [Fe/H] of these stars applying a calibration based on reference photometric T_{eff} and [Fe/H] scales that exist for 65 of them from Casagrande et al. (2008) and Neves et al. (2012), respectively. In the first case, the reference T_{eff} is the average value of the V –J, V –H, and V –K photometric scales as seen in Casagrande et al. (2008), while for [Fe/H] the calculation of its reference values was done using stellar parallaxes, and V and Ks magnitudes as described in Neves et al. (2012).
Machine learning is an increasingly popular tool and is used in several fields of science. It can be accurate in predicting outcomes without the need for the user to explicitly create a specific model to the problem at hand. The algorithms in machine learning receive input data and by applying statistical analysis, they predict an output value within a reasonable range. The interest for machine learning algorithms and automatic processes in astronomy is emerging from the increasing volume of survey data (Howard 2017). These tools can be applied to a wide range of studies, with the input attributes being, for example, the photometric properties of the sources (Das & Sanders 2019; Akras et al. 2019; Rau et al. 2019; Ucci et al. 2019).
In our work, we follow the pseudoEW approach. We present our tool ODUSSEAS (Observing Dwarfs Using Stellar Spectroscopic EnergyAbsorption Shapes), which makes use of the machine learning “scikit learn” package of Python. It offers a quick automatic derivation of T_{eff} and [Fe/H] for M dwarf stars using their 1D spectra and resolutions as input. The main advantage of this tool when compared to other ones that derive stellar parameters, such as the MCAL code (Neves et al. 2014; which is limited to the range of HARPS spectrograph and needs manual adjustment of results for different resolutions), is that it can operate simultaneously in an automatic fashion for spectra of different resolutions and different wavelength ranges in the optical. It is based on a supervised machine learning algorithm, meaning that it is provided with both input and expected output and uses these to create a model. The input to the machine learning function are the values of the pseudo EWs for 65 HARPS spectra and the expected output are the values of their reference T_{eff} and [Fe/H] from Casagrande et al. (2008) and Neves et al. (2012), respectively. After training with a part of these HARPS data, the algorithm produces a model and tests it on the rest of the data; it predicts their values and compares them with the reference ones given as expected output. Thus, it examines the accuracy and the precision of the model using several regression metrics described below. Finally, it applies the model to unknown spectra and estimates their stellar parameters.
In Sect. 2 we describe how ODUSSEAS computes the pseudo EWs. In Sect. 3 we provide a detailed description of ODUSSEAS and the flow of its process. We explain the characteristics of the machine learning function and its efficiency regarding different regression types, resolutions, and wavelength areas. In Sect. 4 we apply our tool to spectra obtained by several spectrographs of various resolutions and we examine the results. Finally, Sect. 5summarizes the work presented in this paper.
2 PseudoEW measurements
Since the identification of the continuum is very difficult in the spectra of M dwarfs, we follow the way of setting a pseudo continuum in each absorption line. The method is based on measurements of the pseudo EWs of absorption lines and blended lines in the range between 530 and 690 nm. We have excluded the parts where the activitysensitive Na doublet and Hα lines and strong telluric lines reside. The line list consists of 4104 features, which are given in the form of left and right boundaries, between which these absorption features are supposed to be created. This method, based on pseudo EWs and the specific line list, was used by Neves et al. (2014).
We created our own Python version of the method to compute the pseudo EWs. Our code reads the line list and the 1D fits files of the stellar spectra. We set an option for radialvelocity correction of the input spectra by our code for cases where they are shifted. For each line, the code identifies the position of the minimum flux of the feature, which is the central absorption wavelength. Starting from this position, the code identifies the maximum on each side of this absorption feature, after having cut this spectral area at the range defined by the respective boundaries provided in the line list. Eventually, the code fits the pseudo continuum along the edges of the absorption feature with a straight line and obtains the pseudo EW by calculating the area between the pseudo continuum and the flux. Mathematically, the pseudo EW is defined as follows, where F_{pp} is the value of the flux between the peaks of the feature (i.e., the pseudo continuum) and F_{λ} is the flux of the line at each integration step: (1)
We present an example in Fig. 1 where we use the star Gl176 and an absorption line at the region around 6530 Å.
An evaluation of our pseudoEW measurements, by comparing them with the ones obtained from MCAL code, is presented in Appendix B.
Fig. 1 Area fitting for the calculation of pseudo EW for a line with central λ = 6531.4 Å of the star Gl176. The position of the pseudo continuum is adjusted accordingly. This pseudo EW is equal to 87 mÅ. 

Open with DEXTER 
3 Machine learning on M dwarfs
We base our tool for the derivation of T_{eff} and [Fe/H] on the machine learning concept. The user needs to run two codes. The “HARPS_dataset.py” creates the databases that contain pseudoEW measurements in different resolutions and the reference stellar parameters. The “ODUSSEAS.py” measures the pseudo EWs of new stellar spectra and derives their unknown T_{eff} and [Fe/H] via machine learning. Below, we explain the details of their structure, describing the input parameters and how to use the codes.
3.1 The HARPS dataset
Each time the code “HARPS_dataset.py” runs, the outcome is a file which is used later as input to the machine learning algorithm when running “ODUSSEAS.py” for training the machine and testing the generated model. It contains the names of 65 stars of the HARPS M dwarf sample, the central wavelengths of the 4104 absorption features from 530 to 690 nm, their pseudoEW values according to the resolution at which we convolve the spectra, and their reference values of T_{eff} and [Fe/H] from Casagrande et al. (2008) and Neves et al. (2012) respectively. All of these 65 spectra, presented in Table A.1, have a signaltonoise ratio (S/N) of more than 100, as reported by Neves et al. (2014). The range of the reference stellar parameters is presented in Fig 2. Their photometric derivations have uncertainties of 100 K for T_{eff} and 0.17 dex for [Fe/H], as reported by Casagrande et al. (2008) and Neves et al. (2012) respectively.
The convolution function we use is the “instrBroadGaussFast” of “pyAstronomy”^{1}, which applies Gaussian instrumental broadening. The width of the kernel is determined by the resolution^{2}.
Since the HARPS spectra have a specific finite resolution, our code calculates the actual resolution to which they need to be convolved by the function in order to obtain spectra at the final resolution required. This calculation is done considering the following relation: (2)
where σ_{conv} corresponds to the resolution to which we need to convolve a spectrum with an original resolution of σ_{orig} in order to obtain a final resolution of σ_{final}.
Two settings are input by the user. First, one chooses whether or not to convolve the reference HARPS spectra to the spectral resolution of the new data. We provided precomputed pseudo EWs for a range of spectral resolutions in widely used spectrographs. In that case there is no need to convolve the spectra again and recalculate the pseudo EWs. Second, the user must define the resolution of the data being analyzed. The “HARPS_dataset.py” is presented schematically in Fig. 3.
Fig. 2 Distribution of reference T_{eff} and [Fe/H] of the 65 stars used to train and test the machine learning models. The cross symbol represents the uncertainties of their photometric derivations, which are 100 K and 0.17 dex respectively. 

Open with DEXTER 
3.2 ODUSSEAS tool
ODUSSEAS.py makes use of two algorithms that we developed: the New_data.py algorithm for measuring the pseudo EWs of new spectra to analyze, and the MachineLearning.py algorithm for the derivation of their T_{eff} and [Fe/H]. The innovative aspect of this tool is the simultaneous predictions for spectra of different resolutions and wavelength ranges.
The user has the option to activate the automatic radial velocity correction for the spectra if they are shifted. In addition, the user can set the regression type to be used by the machine learning process. The “ridge” is recommended, but also “ridgeCV” and “linear” work at a similar level of efficiency as well. Section 3.4 presents the efficiency of all the regression types used.
The workflow of “New_data.py” is similar to that of “HARPS_dataset.py”; it reads the files and resolutions of new spectra and, if needed, it calculates and corrects their radial velocity shift. In addition, if the original step of a spectrum is not 0.010, i.e., equal to that of the HARPS dataset, it is changed with linear interpolation to this value. Thus, the pseudo EWs are measured in a consistent way. The files containing the pseudoEW measurements of each spectrum are then used during the operation of “MachineLearning.py”, which returns the values of T_{eff} and [Fe/H] along with the regression metrics of the models that predicted them. The diagram of “ODUSSEAS.py” is presented in Fig. 4 showing its inputs, operations, and output in a concise manner.
3.3 Machine learning function
Here we present the machine learning function in more detail. The machine learning algorithm operates in a loop for each star separately, as each star may have a different wavelength range and resolution. For each star in the file list, the algorithm automatically loads two files: the HARPS dataset of the respective resolution, for training and testing the model, and the pseudo EWs of the star for which we want to derive T_{eff} and [Fe/H], in order to apply the model and return the stellar parameters. Based on the wavelength range of each spectrum, a mask is applied to the HARPS dataset in order to consider the absorption lines in common between spectra.
The 65 HARPS stars are randomly split into training groups consisting of about 70% of the sample (45 stars) and testing groups consisting of the remaining 30% of the population (20 stars). With these numbers selected, the machine learning model can be both trained accurately and tested on a sufficient number of stars. This is done 100 times for each star for which we want to determine the parameters.
We provide the algorithm with different regression types to be used: the “linear”, the “ridge”, the “ridgeCV”, the “multitask Lasso”, and the “multitask Elastic Net”. All these kinds of models provide an output value by fitting a linear regression to the input values. The relation between the predicted value y (the stellar parameter), the input variables x (the pseudo EWs), and the coefficients w is expressed as (3)
The mathematical details of each regression type are described in the official online documentation^{3}.
The performance of machine learning is indicated by the following three kinds of returned regression metrics. The mean absolute error is computed when the model is applied to the test dataset. This corresponds to the expected value of the absolute error loss in the predictions. In addition, the “explained variance score” is calculated. The best possible value of this score is 1.0. Variance is the expectation of the squared deviation of a random variable from its mean. It measures how far a set of numbers spread out from their average value. Furthermore, the “r2 score” computes the coefficient of determination, defined as R^{2}. The coefficient of determination is the proportion of the variance in the dependent variable that is predictable from the independent variables. This score provides a measure of how well future samples are likely to be predicted by the model. The best possible score is 1.0. A constant model that always predicts the expected value, disregarding the input features, would get a score of 0.0. In our case of multioutput, the resulting explained variance and r2 scores are by default the averages with uniform weight of the respective scores for T_{eff} and [Fe/H]. The mathematical types of those regression metrics are described in the official online documentation^{4}.
For each star, the tool makes 100 determinations by randomly splitting the training and testing groups each time. After these determinations, it returns the average values of T_{eff} and [Fe/H], the average values of the mean absolute errors of the models, the average scores of machine learning, and the dispersion of T_{eff} and [Fe/H] (measured as the standard deviation). This iterative process minimizes the possible dependence of the resulting parameters on how the stars from the HARPS dataset are split for training and testing in one single measurement. As there are only 65 reference stars, the results of the measurements could be affected by the selection of stars that end up in the training set. This is why multiple runs are performed, shuffling and splitting the reference stars into different training and testing groups each time; we then calculate the average values and the dispersion. The final results are automatically saved in the file named “Parameter_Results.dat”. Moreover, the tool saves a group of plots with the reference and the predicted parameters of model testing, as well as their differences, as a visualization of the model accuracy. An example is presented in Fig. 5.
Fig. 3 Workflow of HARPS_dataset.py. 

Open with DEXTER 
Fig. 4 Workflow of ODUSSEAS.py. 

Open with DEXTER 
Fig. 5 Demonstration of predictions applying ridge regression. Upper panel: T_{eff} values expected (Ref.) and predicted (M.L.) on the test dataset, along with their differences. Lower panel: [Fe/H] values expected (Ref.) and predicted (M.L.) on the test dataset, along with their differences. 

Open with DEXTER 
3.4 Machine learning efficiency
First, we test the regression models mentioned above to find the best one. We use the original spectra of the HARPS dataset at their real resolution of 115 000. For 100 runs with each regression type, we measure the scores and the absolute mean errors of the stellar parameters on the test set. We report the average values around which each model tends to result in Table 1. The linear, ridge, and ridgeCV models work very well in general, giving r2 and explained variance scores with average values of around 0.93 and 0.94, respectively. The range of these scores from the 100 runs is usually from 0.87 to 0.99. The average uncertainties of those regression types are ~27 K for T_{eff} and ~0.04 for [Fe/H]. The ridge model leads to slightly greater scores than the linear one. The ridgeCV model, which has a builtin cross validation function that applies “leaveoneout” or “kfold” strategies, does not seem to work better than the classic ridge one, at least for this sample of M dwarf measurements. Furthermore, multitask Elastic Net and multitask Lasso give considerably lower scores and higher mean absolute errors. Therefore, we advocate the use of ridge regression, as it operates best on the spectral values of the M dwarfs.
Second, we evaluate the explained variance and r2 scores and the mean absolute errors of the algorithm for different resolutions of the spectra. We do this for the HARPS dataset at its actual resolution of 115 000 and we repeat this test for convolved datasets at resolutions of other broadly used spectrographs: 110 000 (UVES), 94 600 (CARMENES), 75 000 (SOPHIE), and 48 000 (FEROS). This is done to examine the level of machine learning precision towards lower resolutions. We present the average values from 100 measurements of each case in Table 2.
To further test the reliability of the method, we examine the efficiency of the machine learning in different wavelength ranges of the spectrum. We divide the line list, which is from 530 to 690 nm, in four spectral regions and we calculate the respective scores and mean absolute errors. We do this test to analyze whether machine learning works better when using the full range or a specificpart of the wavelengths. For this test, we use the case of the convolved data at the resolution of 110 000. The machine learning operates at its best while using the full range of the initial line list. In addition, regarding the divided areas, we notice that using the bluer the part of the wavelength range leads to higher scores and lower mean absolute errors. In general, the results show that we can obtain highly precise predictions for stars observed at any part of the 530–690 nm spectrum. These results are presented in Table 3.
Average values of the scores and the mean absolute errors (M.A.E.) for T_{eff} and [Fe/H] of the test dataset after 100 runs with each regression type.
Average values of the scores and the mean absolute errors (M.A.E.) for T_{eff} and [Fe/H] of the test dataset after 100 runs at each resolution (using the “ridge” regression).
Average values of the scores and the mean absolute errors (M.A.E.) for T_{eff} and [Fe/H] of the convolvedto110 000 dataset after 100 runs with different wavelength ranges of the line list.
4 Derivation of stellar parameters
We apply our tool to spectra obtained using five widely used instruments of different resolutions: HARPS of 115 000, UVES of 110 000, CARMENES of 94 600, SOPHIE of 75 000, and FEROS of 48 000. The spectra were taken from the respective public data archives. To test the efficiency of our tool on spectra obtained using instruments other than HARPS, we use spectra from stars in common with the HARPS dataset, so that we can compare their results with the reference parameters of the respective HARPS spectra. To further validate the accuracy of our tool, we proceed to determinations and comparisonsfor a different selection of stars. Finally, we discuss possible future improvements of our determinations.
4.1 Resolution and spectral shape
We examinethe spectral change of M dwarfs according to convolution at different resolutions. The shapes of M dwarf spectra are different when obtained at lower resolutions. In general, the lower the resolution, the shallower the absorption lines. This is illustrated in Fig. 6 where three lines of Gl176 are shown in detail, both for the original HARPS spectrum and the convolved ones to several resolutions. We also measure these lines and report their pseudoEW values in Table 4 to show their differences. The relative differences can vary, as not only is the depth different but so is the location of the pseudo continuum in each case. All the differences confirm that the lower resolution always leads to lower pseudoEW values. This is why we need to convolve the HARPS spectra to the respective resolutions of the new spectra. Consequently, machine learning compares the pseudo EWs of the same resolution and accurately predicts the stellar parameters. Figure 7 shows the spectral shapes of Gl674 for three different cases: the original HARPS spectrum with resolution 115 000, the convolved HARPS spectrum tothe resolution of FEROS (48 000), and the original FEROS spectrum, which is the lowest resolution we examine. We notice that the convolved HARPS spectrum follows the shape of the FEROS one in a consistent way. Figure 8 shows the comparison of the pseudo EWs of the SOPHIE spectrum for Gl908 and the spectrum of the same star by HARPS before and after its convolution. The SOPHIE spectrum, which is of a lower resolution, has consistently lower pseudoEW values than the HARPS one, as expected. After the convolution of the HARPS spectrum to the resolution of SOPHIE, the overall trend of their pseudoEW values become highly compatible.
4.2 Measurements on different spectrographs
We examinethe performance of our tool in new spectra. We show the accuracy of the stellar parameters predicted and the precision foreach resolution by presenting the mean absolute errors of the models and the dispersion of the results, as calculated after 100 determinations for each spectrum.
For thecase of HARPS, we use a HARPS spectrum of Gl643 with S∕N = 83, which is not part of the HARPS dataset used in the machine learning. As reference values for this star, we consider its parameters reported by Neves et al. (2014). For the cases of the other instruments, we use a UVES spectrum of Gl846 with S∕N = 149, a CARMENES spectrum of Gl514 with S∕N = 191, a SOPHIE spectrum of Gl908 with S∕N = 90 and a FEROS spectrum of Gl674 with S∕N = 61. As reference values to those spectra, we consider the values of the respective HARPS ones in the dataset.
The results of T_{eff} and [Fe/H] are presented in Table 5. We notice that the parameters of the new spectra are very close to the respective reference values. The differences in T_{eff} vary up to ~50 K and the differences in [Fe/H] vary up to 0.03 dex. The mean absolute errors of models and the dispersions of values increase slightly with decreasing resolution.
Fig. 6 Shape of the HARPS original spectrum for Gl176 and convolved in different resolutions. The lower the resolution the swallower the absorption lines. 

Open with DEXTER 
4.3 Measurements at different S/N
Here we examine the possible variation of the results regarding different S/N for a given spectrum. We take the spectrum Gl514 of CARMENES, which has the highest S/N of those we examine (equal to 191, as reported in the CARMENES data archive) and we inject amounts of noise which correspond to lower S/N values that we set. Since the final noise is obtained by the quadratic sum of the initial noise and the injected noise, the final S/N values are calculated using the relation below. (4)
We create new spectra with final S/N values ranging from 100 to 9. For each spectrum, we measure the stellar parameters and their dispersion. Figure 9 illustrates the measurements of the CARMENES spectrum while degrading its S/N. Overall, the results are similar to those for the original spectrum and the differences are kept roughly constant with respect to the reference values. For S/N values down to 20, we notice that the dispersions are between 17 and 27 K for T_{eff} and between 0.03 and 0.04 dex for [Fe/H], i.e., at similar levels to those for the original spectrum. For S/N values below 20, the dispersions start to increase up to ~50 K and up to ~0.07 dex respectively. Moreover, it seems that there is a slight decrease of the order of 20 K in T_{eff} and a slight increase of the order of 0.02 dex in [Fe/H] for the spectra with S/N below 20. However, these results are within the uncertainties of the tool. Therefore, we conclude that our tool works consistently for spectra with S/N above 20. Below this S/N, the errors increase significantly.
Pseudo EWs of three absorption lines for HARPS spectrum Gl176 at different resolutions.
Fig. 7 Shape of spectra for Gl674 at the original HARPS resolution (blue), at the FEROS resolution (green), and at the HARPS convolved to FEROS resolution (orange). 

Open with DEXTER 
4.4 Comparison of results between our tool and Neves et al. (2014)
Now, we make an overall comparison of our results on a group of HARPS spectra with the ones presented by Neves et al. (2014). For this purpose, we measure 30 HARPS spectra from the initial GTO sample, for which we do not know their parameters from photometry.These HARPS spectra are not part of the machine learning dataset we use. Based on the information from Neves et al. (2014), we excluded very active stars and stars with S/N lower than 25, below which the method of these latter authors does not apply. Both methods were tested and shown to not work properly for very active or young stars, since the pseudo EWs of such spectra are affected and their parameters cannot be determined accurately with the pseudoEW approach we follow. Subsequently, we compare the results we obtain using our tool with the results presented by Neves et al. (2014).
The errors of the stellar parameters derived using our tool are 27 K for T_{eff} and 0.04 dex for [Fe/H], as the mean absolute errors are measured when the machine learning model is applied on the test dataset. The errors of the calibration by Neves et al. (2014), which are quantified from the root mean squared error (RMSE) in that work, are equal to 91 K and 0.08 dex respectively. We highlight the fact that both methods aretied to the same initial systematic uncertainties of the reference parameters used, which are 100 K for T_{eff} and 0.17 dex for [Fe/H]. The results and their differences are presented in Table 6 and Fig. 10. The mean and median difference in T_{eff} is 11 and 22K respectively, with a standard deviation of 101 K. Regarding [Fe/H], the mean and median difference is −0.04 dex, with a standard deviation of 0.06 dex.
Work by Neves et al. (2014) follows a traditional approach, using a leastsquares weighted fit to determine parameters. The regression of our tool reduces those errors of T_{eff} and [Fe/H] from 91 to 27 K and from 0.08 to 0.04 dex respectively. Therefore, our machine learning approach significantlyincreases the precision of parameter determinations. In terms of speed, the determination of T_{eff} and [Fe/H] for a star using machine learning, including multiple runs, shuffling and splitting the training and test samples each time, requires only a few seconds.
Fig. 8 Upper panel: pseudoEW values of the original Gl908 spectra for HARPS and SOPHIE. Lower panel: pseudoEW values of Gl908 after the convolution of HARPS to the resolution of SOPHIE. The units of pseudo EWs are mÅ. After the convolution, there is agreement between the identity line (solid green) and the slope (dashed red), with the intersection being close to zero. 

Open with DEXTER 
Fig. 9 Average differences and the dispersion of T_{eff} (upper panel) and [Fe/H] (lower panel) for CARMENES Gl514 spectrum with different values of S/N. 

Open with DEXTER 
4.5 Estimating total uncertainties
Intrinsic uncertainties exist in the T_{eff} and [Fe/H] reference values of the HARPS dataset, because their initial photometric derivations have average uncertainties of 100 K and 0.17 dex respectively. Since these parameters are used as the training values for the machine learning process, we decided to inject these uncertainties by perturbing their values accordingly in order to see how the final results of the predictions vary.
To this end, we create Gaussian distributions on the parameters for each HARPS training dataset, increasing the dispersion of distribution of the reference parameters each time in steps of 10 K and 0.02 dex until reaching uncertainties of 100 K and 0.17 dex. This adds different training values to the machine learning algorithm each time. For each step, we created 100 Gaussiandistributed training datasets. After these runs of machine learning, we calculated the average values of predicted parameters and their dispersion.
In Fig. 11, we present the variations for spectra from the highest resolution (HARPS), the lowest resolution (FEROS), and an intermediate resolution (CARMENES). The datapoints represent the average difference between the resulting parameters after being calculated with the 100 different datasets and the reference values. The error bars show the dispersion of the difference. We notice that the average differences from the reference values are almost the same among different Gaussian distributions, regardless of the amount of uncertainty injected to the Gaussian distribution.
The average results of T_{eff} and [Fe/H] for the spectra from all the instruments are presented in Table 7. We report their maximum errors after considering the maximum Gaussian distribution with 100 K and 0.17 dex. Overall, the average values of the parameters remain roughly the same as those calculated with no Gaussian distribution at all. The mean absolute errors of the machine learning models have grown to values of between 65 and 80 K for T_{eff} and between 0.10 and 0.13 dex for [Fe/H], depending on the resolution of the HARPS dataset. The dispersion of the derived parameters grows as the resolution of the spectra decreases. Specifically, the dispersion is smaller than the injected uncertainties for the HARPS spectrum (~60 K and ~0.10 dex), while for the spectra from other instruments, it is slightly higher than the uncertainties injected (~110 to ~130 K and ~0.18 to ~0.22 dex respectively).
However, in all cases the resulting average values of stellar parameters are very close to their expected values. Differences in T_{eff} are up to ~40 K and differences in [Fe/H] are up to 0.03 dex, with respect to the expected values.
Fig. 10 T_{eff} comparison (upper panel) and [Fe/H] comparison (lower panel) between this work and Neves et al. (2014). 

Open with DEXTER 
Machine learning (M.L.) results of T_{eff} and [Fe/H], their dispersion (Disp.), the mean absolute errors (M.A.E.) of the models, and the reference values (Ref.) for comparison.
Stellar parameters of 30 HARPS spectra as calculated by Neves et al. (2014) and using our tool (AA) and their difference.
4.6 Validation of [Fe/H] determinations by measuring binary systems
Here, we measure [Fe/H] in binary systems containing M dwarfs which are not part of the reference sample used for machine learning. Thus, we validate our method of [Fe/H] prediction in an independent way. We determine [Fe/H] both in FGK+M and in M+M systems for an even more intrinsic test of [Fe/H] agreement.
The [Fe/H] determinations of eight FGK+M binary systems from spectra obtained by UVES and FEROS spectrographs are presented inTable 8. Regarding the FGK stars, their [Fe/H] and respective uncertainties were derived using the methodology described in Sousa et al. (2008) and Santos et al. (2013). The method measures the equivalent widths of FeI and FeII lines and assumes ionization and excitation equilibrium. It makes use of the radiative transfer code MOOG (Sneden 1973) and a grid of Kurucz model atmospheres (Kurucz 1993). The [Fe/H] values of the respective M dwarf secondaries, derived by ODUSSEAS, are presented along with the total uncertainties of our tool at the resolutions of UVES (0.10 dex) and FEROS (0.13 dex). All binaries have differences that are within the uncertainties of the methods.
Furthermore, we proceed to [Fe/H] determinations of stars in five M+M binary systems, measuring their available spectra from the CARMENES public archive. In Table 9, we present these results along with their own dispersions, since both are estimated by our tool based on the same reference values with the same initial uncertainties. We notice agreement between the respective members of all the M+M binaries, within the dispersions of their [Fe/H] determinations. This is a validation that our tool predicts [Fe/H] in a consistent and accurate way.
Fig. 11 Average differences and the dispersions of the results for several amounts of Gaussian distribution injected to the reference parameters of the training HARPS datasets. The result of each step is the average outcome from 100 different distributed datasets. 

Open with DEXTER 
Machine learning (M.L.) results of T_{eff} and [Fe/H] after injecting uncertainties with Gaussian distributions of 100 K and 0.17 dex in the parameters of the training HARPS datasets, their dispersion (Disp.), the mean absolute errors (M.A.E.) of the models, and the reference values (Ref.) for comparison.
[Fe/H] difference between members of FGK+M binary systems.
[Fe/H] difference between members of M+M binary systems.
4.7 Discussion on the reference parameter scales
Since supervised machine learning determines the parameters based on reference values given to it, their systematic errors will also apply to the results of new stars. In this work, we used the reference T_{eff} and [Fe/H] photometric scales of Casagrande et al. (2008) and Neves et al. (2012) respectively, as they are derived in a homogeneous way for a sufficiently large number of spectra available to us. It is important to make a comparison between the reference values we use and values for the same stars derived in other recent studies that may be subject to different systematic errors, such as Mann et al. (2015), with which we share 26 of the 65 stars that we use as our reference dataset. In Table 10, we compare our reference parameters with determinations by Mann et al. (2015) and report the differences. These differences are illustrated in Fig. 12. Regarding T_{eff}, we notice that our reference values have an average underestimation of 178 K with a standard deviation of 73 K. This systematic difference originates from the different methods of derivation used. Work by Casagrande et al. (2008) is based on the multiple opticalinfrared technique (MOITE) for M dwarfs, which is an extension of the infrared flux method (IRFM) as described in Casagrande et al. (2006). On the other hand, determinations by Mann et al. (2015) were made by comparing the optical spectra with the CFIST suite of the BTSETTL version of the PHOENIX atmosphere models (Allard et al. 2013). A detailed description of this method can be found in Mann et al. (2013b). Regarding [Fe/H], we notice no significant systematic difference in parameter values resulting from the methods of calibration by Neves et al. (2012) and Mann et al. (2015). The average difference is 0.06 dex with a standard deviation of 0.11 dex for the sample of stars in common.
As a potential future improvement of our determinations, we consider the possibility of replacing our reference dataset. As new techniques for parameter determination become more accurate and precise and as more spectra become available to us, their homogeneously derived parameters can be correlated with their pseudo EWs. Therefore, we take into account the creation of an improved reference dataset for our machine learning tool, that could consist of more reference stars and of reference stellar parameters derived with greater accuracy and precision.
Stellar parameters of 26 stars in common with Mann et al. (2015) and their difference.
Fig. 12 T_{eff} comparison (upper panel) and [Fe/H] comparison (lower panel) between the reference values we use and Mann et al. (2015) for 26 common stars. 

Open with DEXTER 
5 Summary
We present our machine learning tool ODUSSEAS for the derivation of T_{eff} and [Fe/H] in Mdwarf stars, whose spectra are of varying resolution at wavelengths ranging between 530 and 690 nm. We provide a detailed explanation of how ODUSSEAS was built and how it works. We present the results of the tests weperform and we examine their accuracy and precision on spectra with resolutions that vary from 115 000 down to 48 000. Our tool seems to be reliable, as it operates with high machine learning scores of around 0.94 and achieves excellent predictions of significantly high precision with mean absolute errors of ~30 K for T_{eff} and ~0.04 dex for [Fe/H]. Taking into consideration the intrinsic uncertainties of the reference parameters and perturbing them accordingly, our models have maximum uncertainties of ~80 K for T_{eff} and ~0.13 dex for [Fe/H], which are within the typical uncertainties for M dwarfs. Our parameters for spectra from different spectrographs, obtained from the average of 100 determinations, have consistent values with differences within ~50 K and ~0.03 dex from the expected ones. Spectra are required to have a S/N of above 20 for optimal predictions. Our tool is valid for M dwarfs in the intervals 2800–4000 K for T_{eff} and −0.83 to 0.26 dex for [Fe/H], except from very active or young stars.
Acknowledgements
This work was supported by FCT/MCTES through national funds and by FEDER – Fundo Europeu de Desenvolvimento Regional through COMPETE2020 – Programa Operacional Competitividade e Internacionalização by these grants: UID/FIS/04434/2019, UIDB/04434/2020 and UIDP/04434/2020; PTDC/FISAST/32113/2017 and POCI010145FEDER032113; PTDC/FISAST/28953/2017 and POCI 010145FEDER028953. A.A.K., S.G.S., E.D.M. and G.D.C.T. acknowledge the support from FCT in the form of the exploratory projects with references IF/00028/2014/CP1215/CT0002, IF/00849/2015/CP1273/CT0003 and IF/00956/2015/CP1273/CT0002. S.G.S. and E.D.M. further acknowledge the support from FCT through the Investigador FCT contracts IF/00028/2014/CP1215/CT0002, IF/00849/2015/CP1273/CT0003 and POCH/FSE (EC). G.D.C.T. further acknowledges the support from an FCT/Portugal PhD grant with reference PD/BD/113478/2015.
Appendix A Additional table
Reference values of the HARPS spectra used for the machine learning.
Appendix B Evaluation of our pseudoEW measurements
We calculated the pseudo EWs of 4104 lines for the 110 stars of the total HARPS sample. Here we compare our values with the ones obtained by the MCAL code. In the upper panel of Fig. B.1, we present the comparison of all the pseudoEW values for the star Gl176 as an example. The units of pseudo EWs are mÅ. Inside the plots, AA stands for our measurements and VN stands for the measurements by Neves et al. (2014). The slope in the diagrams of most stars is almost identical with the identity line, with only a few pseudo EWs having considerably different values. In the lower panel of Fig. B.1 we show the relative difference (the percentage) of the values against our values. A scatter appears for the pseudoEW values smaller than 30 mÅ, which is normal, as the relative difference for these narrow lines is greater. In contrast, nearly all lines broader than 50 mÅ are measured with significant agreement. The quality test for measuring the pseudo EWs comes from the following comparison. We plot the mean differences and mean relative differences between our method and the code by Neves et al. (2014) for each line averaged over all the stars in order to see how all the lines are measured. The result is very good as can be seen in Fig. B.2, with only 184 lines out of 4104 showing a mean relative difference greater than ±15%.
Fig. B.1 Upper panel: comparison between the pseudoEW values measured by our tool and the MCAL code by Neves et al. (2014). Lower panel: percentage difference of the pseudoEW values plotted against our values. 

Open with DEXTER 
Fig. B.2 Upper panel: difference of each line as averaged over the measurements of all stars. Lower panel: percentage difference ofeach line as averaged over the measurements of all stars. 

Open with DEXTER 
References
 Akras, S., LealFerreira, M. L., GuzmanRamirez, L., & RamosLarios, G. 2019, MNRAS, 483, 5077 [NASA ADS] [CrossRef] [Google Scholar]
 Allard, F., Homeier, D., Freytag, B., et al. 2013, Mem. Soc. Astron. It. Suppl., 24, 128 [Google Scholar]
 Bonfils, X., Delfosse, X., Udry, S., et al. 2005, A&A, 442, 635 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Bonfils, X., Delfosse, X., Udry, S., et al. 2013, A&A, 549, A109 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Casagrande, L., Portinari, L., & Flynn, C. 2006, MNRAS, 373, 13 [NASA ADS] [CrossRef] [Google Scholar]
 Casagrande, L., Flynn, C., & Bessell, M. 2008, MNRAS, 389, 585 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
 Das, P., & Sanders, J. L. 2019, MNRAS, 484, 294 [NASA ADS] [CrossRef] [Google Scholar]
 Everett, M. E., Howell, S. B., Silva, D. R., et al. 2013, ApJ, 771, 107 [NASA ADS] [CrossRef] [Google Scholar]
 Howard, E. M. 2017, Astron. Data Anal. Softw. Syst. XXV, 512, 245 [NASA ADS] [Google Scholar]
 Johnson, J. A., & Apps, K. 2009, ApJ, 699, 933 [NASA ADS] [CrossRef] [Google Scholar]
 Kurucz, R. 1993, ATLAS9 Stellar Atmosphere Programs and 2 km/s grid. Kurucz CDROM No. 13. (Smithsonian Astrophysical Observatory: Cambridge), 13 [Google Scholar]
 Lindgren, S., Heiter, U., & Seifahrt, A. 2016, A&A, 586, A100 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Mann, A. W., Brewer, J. M., Gaidos, E., Lépine, S., & Hilton, E. J. 2013a, AJ, 145, 52 [NASA ADS] [CrossRef] [Google Scholar]
 Mann, A. W., Gaidos, E., & Ansdell, M. 2013b, ApJ, 779, 188 [NASA ADS] [CrossRef] [Google Scholar]
 Mann, A. W., Feiden, G. A., Gaidos, E., et al. 2015, ApJ, 804, 64 [NASA ADS] [CrossRef] [Google Scholar]
 Neves, V., Bonfils, X., Santos, N. C., et al. 2012, A&A, 538, A25 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Neves, V., Bonfils, X., Santos, N. C., et al. 2014, A&A, 568, A121 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Önehag, A., Heiter, U., Gustafsson, B., et al. 2012, A&A, 542, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Passegger, V. M., Schweitzer, A., Shulyak, D., et al. 2019, A&A, 627, A161 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Rajpurohit, A. S., Allard, F., Rajpurohit, S., et al. 2018, A&A, 620, A180 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Rau, M. M., Koposov, S. E., Trac, H., & Mandelbaum, R. 2019, MNRAS, 484, 409 [NASA ADS] [CrossRef] [Google Scholar]
 RojasAyala, B., Covey, K. R., Muirhead, P. S., & Lloyd, J. P. 2010, ApJ, 720, L113 [NASA ADS] [CrossRef] [Google Scholar]
 RojasAyala, B., Covey, K. R., Muirhead, P. S., & Lloyd, J. P. 2012, ApJ, 748, 93 [NASA ADS] [CrossRef] [Google Scholar]
 Santos, N. C., Sousa, S. G., Mortier, A., et al. 2013, A&A, 556, A150 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Sneden, C. A. 1973, PhD thesis, The University of Texas at Austin, Texas, USA [Google Scholar]
 Sousa, S. G., Santos, N. C., Mayor, M., et al. 2008, A&A, 487, 373 [NASA ADS] [CrossRef] [EDP Sciences] [MathSciNet] [Google Scholar]
 Ucci, G., Ferrara, A., Gallerani, S., et al. 2019, MNRAS, 483, 1295 [NASA ADS] [CrossRef] [Google Scholar]
 Woolf, V. M., & Wallerstein, G. 2005, MNRAS, 356, 963 [NASA ADS] [CrossRef] [Google Scholar]
A description of it can be found at https://www.hs.unihamburg.de/DE/Ins/Per/Czesla/PyA/PyA/pyaslDoc/aslDoc/broad.html
All Tables
Average values of the scores and the mean absolute errors (M.A.E.) for T_{eff} and [Fe/H] of the test dataset after 100 runs with each regression type.
Average values of the scores and the mean absolute errors (M.A.E.) for T_{eff} and [Fe/H] of the test dataset after 100 runs at each resolution (using the “ridge” regression).
Average values of the scores and the mean absolute errors (M.A.E.) for T_{eff} and [Fe/H] of the convolvedto110 000 dataset after 100 runs with different wavelength ranges of the line list.
Pseudo EWs of three absorption lines for HARPS spectrum Gl176 at different resolutions.
Machine learning (M.L.) results of T_{eff} and [Fe/H], their dispersion (Disp.), the mean absolute errors (M.A.E.) of the models, and the reference values (Ref.) for comparison.
Stellar parameters of 30 HARPS spectra as calculated by Neves et al. (2014) and using our tool (AA) and their difference.
Machine learning (M.L.) results of T_{eff} and [Fe/H] after injecting uncertainties with Gaussian distributions of 100 K and 0.17 dex in the parameters of the training HARPS datasets, their dispersion (Disp.), the mean absolute errors (M.A.E.) of the models, and the reference values (Ref.) for comparison.
Stellar parameters of 26 stars in common with Mann et al. (2015) and their difference.
All Figures
Fig. 1 Area fitting for the calculation of pseudo EW for a line with central λ = 6531.4 Å of the star Gl176. The position of the pseudo continuum is adjusted accordingly. This pseudo EW is equal to 87 mÅ. 

Open with DEXTER  
In the text 
Fig. 2 Distribution of reference T_{eff} and [Fe/H] of the 65 stars used to train and test the machine learning models. The cross symbol represents the uncertainties of their photometric derivations, which are 100 K and 0.17 dex respectively. 

Open with DEXTER  
In the text 
Fig. 3 Workflow of HARPS_dataset.py. 

Open with DEXTER  
In the text 
Fig. 4 Workflow of ODUSSEAS.py. 

Open with DEXTER  
In the text 
Fig. 5 Demonstration of predictions applying ridge regression. Upper panel: T_{eff} values expected (Ref.) and predicted (M.L.) on the test dataset, along with their differences. Lower panel: [Fe/H] values expected (Ref.) and predicted (M.L.) on the test dataset, along with their differences. 

Open with DEXTER  
In the text 
Fig. 6 Shape of the HARPS original spectrum for Gl176 and convolved in different resolutions. The lower the resolution the swallower the absorption lines. 

Open with DEXTER  
In the text 
Fig. 7 Shape of spectra for Gl674 at the original HARPS resolution (blue), at the FEROS resolution (green), and at the HARPS convolved to FEROS resolution (orange). 

Open with DEXTER  
In the text 
Fig. 8 Upper panel: pseudoEW values of the original Gl908 spectra for HARPS and SOPHIE. Lower panel: pseudoEW values of Gl908 after the convolution of HARPS to the resolution of SOPHIE. The units of pseudo EWs are mÅ. After the convolution, there is agreement between the identity line (solid green) and the slope (dashed red), with the intersection being close to zero. 

Open with DEXTER  
In the text 
Fig. 9 Average differences and the dispersion of T_{eff} (upper panel) and [Fe/H] (lower panel) for CARMENES Gl514 spectrum with different values of S/N. 

Open with DEXTER  
In the text 
Fig. 10 T_{eff} comparison (upper panel) and [Fe/H] comparison (lower panel) between this work and Neves et al. (2014). 

Open with DEXTER  
In the text 
Fig. 11 Average differences and the dispersions of the results for several amounts of Gaussian distribution injected to the reference parameters of the training HARPS datasets. The result of each step is the average outcome from 100 different distributed datasets. 

Open with DEXTER  
In the text 
Fig. 12 T_{eff} comparison (upper panel) and [Fe/H] comparison (lower panel) between the reference values we use and Mann et al. (2015) for 26 common stars. 

Open with DEXTER  
In the text 
Fig. B.1 Upper panel: comparison between the pseudoEW values measured by our tool and the MCAL code by Neves et al. (2014). Lower panel: percentage difference of the pseudoEW values plotted against our values. 

Open with DEXTER  
In the text 
Fig. B.2 Upper panel: difference of each line as averaged over the measurements of all stars. Lower panel: percentage difference ofeach line as averaged over the measurements of all stars. 

Open with DEXTER  
In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.