Free Access
Issue
A&A
Volume 657, January 2022
Article Number A35
Number of page(s) 13
Section Stellar atmospheres
DOI https://doi.org/10.1051/0004-6361/202141717
Published online 24 December 2021

© ESO 2021

1 Introduction

The electromagnetic radiation emitted by a stellar source and collected by a telescope allows for the determination of multiple physical quantities, such as the effective temperature (Teff), surface gravity (log g), and metallicity [Fe/H]1. Thus, the analysis of stellar spectra permits the determination of stellar parameters and abundances for many chemical species. However, for stars that do not have their spectra available, narrow-band photometry offers an alternative approach to determining abundances for selected elements such as carbon (Whitten et al. 2021).

One of the main questions that astronomical observations can help to answer is how the formation of our Galaxy took place and how it evolved to its current state. For that purpose, the identification of very metal-poor (VMP, [Fe/H] < −2.0) and extremely metal-poor stars (EMP, [Fe/H] < −3.0), according toBeers & Christlieb (2005), represents a critical step in understanding the origins of the Milky Way. However, according to the SAGA2 database (as of the update in April 2021), the lower end of the halo metallicity distribution function contains ~550 EMPs, comprising a few dozen stars with [Fe/H] < −4 confirmed spectroscopically. The identification of new objects of this type is attainable by using different approaches based on photometry and spectroscopy. In contrast to spectroscopy, photometry offers the advantage of higher signal-to-noise ratios (S/N) for a given exposure time and the simultaneous collection of data from a large number of sources but, in turn, photometric data is not able to provide a characterization of individual spectral features.

Over the last two decades, wide-field broad-band photometric surveys from ground-based telescopes, such as the Sloan Digital Sky Survey (SDSS; York et al. 2000) and the Panoramic Survey Telescope and Rapid Response System (Pan-STARSS; Chambers et al. 2016), have proven to be an extremely successful alternative to challenging spectroscopic surveys when it comes to studying large swathes of astronomical objects.

On the other hand, good examples of spectroscopic surveys employed to search for very metal-poor stars are: Apache Point Observatory Galactic Evolution Experiment (APOGEE; Majewski et al. 2016); Sloan Extension for Galactic Understanding and Exploration (SEGUE; Yanny et al. 2009); Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST; Cui et al. 2012); and the upcoming WHT Enhanced Area Velocity Explorer (WEAVE; Dalton et al. 2012) and 4-meter Multi-Object Spectroscopic Telescope (4MOST; De Jong et al. 2012).

More recently, many exciting projects are entering the scene, as for example The Javalambre Photometric Local Universe Survey (J-PLUS; Cenarro et al. 2019) and the Javalambre Physics of the Accelerating Universe Astrophysical Survey (J-PAS; Benitez et al. 2014) covering several thousands of deg2 of the Northern sky. The unique systems of 12 and 60 filters for J-PLUS and J-PAS, respectively, allow for accurate estimations of stellar parameters to provide information on some critical spectral features framed by some of the narrow-band filters. In the southern sky, the Southern Photometric Local Universe Survey (S-PLUS; Mendes de Oliveira et al. 2019; Almeida-Fernandes et al. 2021) and SkyMapper Southern Sky Survey (SkyMapper; Keller et al. 2007) also rely on a combination of narrow and broad-band photometry for obtaining data. Another interesting ongoing project focused on the northern hemisphere is the Pristine survey (Starkenburg et al. 2017), which uses an exclusive narrow-band filter centered on Ca II H&K lines combined with the Sloan broadband filters.

Photometric and spectroscopic data complement each other, meaning that the best possible approach to astronomical research is to combine both types of data analyses. Different photometric calibrations present reliable results to estimate stellar parameters but with a restricted range of applicability. For instance, Ivezić et al. (2008) found that (gr) color from SDSS provides an accurate estimation of Teff, while [Fe/H] can be obtained through a polynomial fitting using (ug) with some restrictions on (gr), due to a rapid saturation of the blue band, along with a loss in precision for stars with [Fe/H] < −2.0. Another independent methodology to infer Teff is the infrared flux method (IRFM; Casagrande et al. 2006), which can be used to build relationships between the flux measured in the infrared and BVJHK photometry. These calibrations are valid for Teff = 4000–8000 K, corresponding to stars with spectral types F, G, and K, with [Fe/H] varying from −5 to 0.4. In terms of spectroscopic observations, Lee et al. (2008a,b) presented the SEGUE Stellar Parameter Pipeline (SSPP), which estimates stellar parameters using both theoretical and empiric calibrations of medium resolution stellar spectra (R ~1800) in combination with the implementation of neural networks and validations based on spectral libraries such as ELODIE (Prugniel & Soubiran 2001; Moultaka et al. 2004) and MILES (Sánchez-Blázquez et al. 2006), and high-resolution spectra for additional validation (Prieto et al. 2008). Wu et al. (2014) and Xiang et al. (2015) also presented similar pipelines (LASP and LSP3) to estimate parameters from spectra produced by the LAMOST survey.

In the coming years, data sets that will be produced by different extensive surveys such the Large Synoptic Survey Telescope (LSST) and the progressive data releases of the Gaia mission (Prusti et al. 2016; Gaia Collaboration 2018) will be massive so that new effective strategies to deal with the data will be needed. In that sense, machine-learning algorithms or any other statistical tools may be useful for analyzing the data and making reasonably swift predictions. Machine-learning methods such as artificial neural networks (ANNs) are part of the modern approach for various astronomical applications. For instance, Singh et al. (1998) presented a model to classify stellar spectra; more recently, Whitten et al. (2019) used ANNs to estimate Teff and [Fe/H] for J-PLUS data in order to search for low-metallicity stars. Other models based on other algorithms such as random forest (hereafter RF; Breiman 2001) and extreme gradient boosting (hereafter XGB; Chen et al. 2015; Chen & Guestrin 2016) show promising results of morphological classifications and estimation of physical parameters. Miller et al. (2015) presented a RF model capable of inferring Teff, log g and [Fe/H] based on SDSS de-reddened colors. More recently, Bai et al. (2018) built a RF model that performed a star-galaxy-QSO classification and calculated the Teff for stars using data from SDSS and LAMOST; in addition, Chao et al. (2019) applied a model based on the XGB algorithm that is able to classify stars and galaxies in the SDSS improving the results for the faint light sources, in particular. The search for VMP candidates from wide-angle photometric surveys such as J-PLUS, complemented with spectroscopic follow-ups, can benefit enormously from machine-learning methods.

This paper presents the Stellar Parameters Estimation based on the Ensemble Methods pipeline (hereafter SPEEM) that explores the benefits of RF and XGB algorithms and its potential as an excellent alternative for analyzing J-PLUS data by estimating three main physical parameters of stellar sources: Teff, log g, and [Fe/H]. Section 2 presents the databases employed to produce the training and validation samples required in the development of SPEEM. Section 3 covers the details regarding the pipeline architecture and the cleaning process centered on removing contaminants, such as extragalactic point-like sources (QSOs) and white dwarfs (WD). In Sect. 4 we explain the training and validation process, pointing out the features we selected as the most reliable indicators of stellar parameters. Section 5 describes how SPEEM helped in selecting very metal-poor stars suitable for spectroscopic follow-up observations that resulted in 11 metal-poor star candidates, with 10 newly discovered stars among them along with one extremely metal-poor star. Finally, Sect. 6 summarizes the overall results obtained as well as future applications and improvements to extend estimation of metallicities to the [Fe/H] < −3.0 regime.

2 Data sets

This section describes the main datasets used to study the correlations between J-PLUS colors and the parameters estimated independently by other surveys.

2.1 J-PLUS photometric data

The J-PLUS3 survey is underway from the Observatorio Astrofísico de Javalambre (OAJ, Teruel, Spain; Cenarro et al. 2014) using the 83 cm Javalambre Auxiliary Survey Telescope (JAST80) and T80Cam, a panoramic camera of 9.2k × 9.2k pixels that provides a 2 deg2 field of view with a pixel scale of 0.55 arcsec pix−1 (Marín-Franch et al. 2015). The J-PLUS filter system is composed of 12 passbands (Table 1). The J-PLUS observational strategy, image reduction, and main scientific goals are presented in Cenarro et al. (2019).

The J-PLUS second data release (DR2) comprises 1088 pointings (2176 deg2) observed and reduced in all survey bands (Varela & J-PLUS collaboration, in prep.). The photometric calibration was performed using both the metallicity-dependent stellar locus and the white dwarf locus (López-Sanjuan et al. 2019; López-Sanjuan et al. 2021). The limiting magnitudes (5σ, 3 arcsec aperture, AB system) of the DR2 are ~21 mag in g and r passbands, and ~20 mag in the other nine bands. The median point spread function (PSF) full width at half maximum (FWHM) in the DR2 r-band images is 1.1 arcsec. Source detection was done in the r band using SExtractor (Bertin & Arnouts 1996) and the flux measurement in the 12 J-PLUS bands was performed at the position of the detected sources using the aperture defined in the r-band image. The DR2 is publicly available at the J-PLUS website4.

In order to select the sample for our study and ensure high-quality measurements in each of the 12 filters, we chose 6 arcsec aperture magnitudes plus aperture correction and a proper configuration in the flags parameters, such as MASKS_FLAGS = 0 (indicating the target is not inside a mask), FLAGS = 0 (no Sextractor flags detected), and NORM_WMAP_VAL > 0.8 (indicating adequate exposure). We corrected for interstellar reddening using the extinction correction vector Ax reported in the J-PLUS DR2 database. Considering the morphological classification star/galaxy based on the probability density function (PDF) analysis proposed by López-Sanjuan et al. (2019), we selected all targets with the probability of being a star pstar> 0.9. In addition, we selected stars brighter than g < 18, resulting in a sample of 746 531 objects (hereafter referred to as the “gold” sample).

Table 1

J-PLUS filter system.

2.2 Sloan extension for galactic understanding and exploration: SEGUE

SEGUE (Yanny et al. 2009) is part of the second phase of the Sloan survey (SDSS-II), and consists of a set of 250 000 medium-resolution spectra of stars observed within 3500 deg2 of the northern sky, excluding regions at low galactic latitude (|l < 30o|). The wavelength coverage is between 3900 Å and 9000 Å at R ~ 1800, with most of the observed stars presenting spectral types from A to M. SEGUE has produced several important results, including the identification of stars at [Fe/H] < −3.0 (Aoki et al. 2012; Placco et al. 2015), the study of the structure of the Galactic thick disk and halo (de Jong et al. 2010; Lee et al. 2017, 2019; Kim et al. 2019, 2021), the analysis of α∕Fe ratio of G-dwarfs from the Galactic disk (Lee et al. 2011), and the study of the carbon-to-iron ratio of dwarfs, main-sequence turnoff stars, and giants in the Milky Way (Lee et al. 2013).

The SEGUE Stellar Parameter Pipeline (SSPP) estimates the stellar parameters and abundances of selected elements from SEGUE spectra. It employs a combination of multiple techniques such as spectral fitting (Allende Prieto et al. 2006) and minimization of χ2 within grids of model atmospheres, analysis of Ca II K lines, autocorrelation functions (Beers et al. 1999), calibrations of Ca II triplet (Cenarro et al. 2001a,b), implementation of artificial neural networks trained both on observed (Fiorentin et al. 2007) and synthetic spectra, and semiempirical predictions using gr color (valid only for Teff).

2.3 LAMOST experiment for galactic understanding and exploration: LEGUE

LEGUE is a survey of the Galactic structure within the LAMOST project (Deng et al. 2012). The multi-object spectrograph allows us to obtain spectra with R ~ 1800 for 4000 targets simultaneously. Ultimately, LEGUE will provide a sample of 5 million stars with limited sky coverage. One of the LEGUE primary science goals is the search for EMPs. The LAMOST Stellar Parameter Pipeline (LASP; Wu et al. 2014) has been used to estimate stellar parameters from LAMOST data, using the Université de Lyon Spectroscopic Analysis Software (ULYSS; Koleva et al. 2009; Wu et al. 2011) jointly with the ELODIE library. This pipeline has been successfully tested and applied to estimate radial velocities and stellar parameters to LAMOST data. We noted that value-added catalogs are available with the official data releases (Luo et al. 2015). The LAMOST Stellar Parameter Pipeline at Peking University LSP3 (Xiang et al. 2015) is an alternative pipeline that is useful for calculating stellar parameters, which are also tested with LAMOST data. It uses the MILES library in addition to weighted average parameters that best match the spectra templates and values yielded by χ2 minimization.

2.4 Sample cross-matches

The J-PLUS gold sample was cross-matched with the SEGUE and LAMOST DR5 surveys to retrieve information about physical parameters and morphological classification for the targets in common. That process produced the following datasets: (i) J-PLUS × SEGUE, with 6794 targets in common, contains information on stellar parameters (Teff, log g, [Fe/H]) estimated by the SSPP pipeline; (ii)J-PLUS × LAMOST, with 99 184 targets with SN > 10. The analysis presented in this paper is anchored on estimates of atmospheric parameters obtained with the n-SSPP pipeline (Beers et al. 2014, 2017), an extension of the SEGUE pipeline, using a grid of spectra with χ2 minimization to choose the best fit to the observed spectrum, providing Teff, log g, and [Fe/H].

3 Model training

In particular, SPEEM is a set of machine-learning models developed from a supervised training approach, starting with a subsample of objects with previously known parameters such as classification star/QSO, spectral type, Teff, log g, and [Fe/H]. The training process explores statistical relationships between the input parameters and measured features, such as magnitudes and color indices. An extended discussion of supervised machine-learning techniques is available in Kotsiantis et al. (2007). In particular, these relationships allow us to predict the stellar parameters of the stars of the gold sample.

3.1 Decision trees, random forest and extreme gradient boosting

Decision trees are part of the most useful and intuitive tools used to deploy machine-learning models for classification and regression tasks, usually referred to as classification and regression trees (CARTs; Breiman et al. 1984), which are easy to train and interpret. However, they tend to overfit in some cases, for instance, on application to datasets with unbalanced classes producing less precise estimations (Strobl et al. 2009).

A CART structure is hierarchical (top-down nodes) and composed of a subset of the training sample derived recursively. This splitting process continues until the subset at a node is no longer statistically meaningful with regard to the relationship between the input and the target variables. In order to build a classification tree, it is necessary to define an information gain parameter to be optimized by the algorithm to get accurate estimations. This parameter usually comes from the Gini impurity or Shanon entropy coefficients, expressed by Eqs. (1) and (2), respectively,

where Xn is the training data in node n and pnk is the proportion of class k observed values in node n. For regression purposes, the criteria to be minimized is usually the mean squared error (MSE), expressed by (3)

where is the difference between reference value and the value estimated by machine learning , and n is the sample size.

The random forest (RF) algorithm represents an evolution of decision trees since it combines an arbitrary number of trees, each one of them fed with a random subsample of the input data and fitted with random subsets of features to avoid bias selection and prevent overfitting, offering more robust estimations. On the other hand, extreme gradient boosting (XGB) can outperform RF models by allowing for the possibility of penalizing the trees towards learning the more difficult data, using a combination of parallel and sequential computing (i.e., latter trees or forest learning from the errors made by previous runs).

A critical concern in the use of decision trees relies on the distribution of parameters of the studied sample. For example, the metallicity distribution obtained in the Pristine Survey for the inner Galactic halo peaks at [Fe/H] = −1.6 (Youakim et al. 2020) and the number of stars decreases towards lower [Fe/H] values with a slope of Δ(logN)∕Δ[Fe/H] = 1.0 ± 0.1. Since we are primarily interested in stars with [Fe/H] < −2.2, which are more scarce than the bulk of the distribution, the training datasets for different metallicity bins are unbalanced. Adopting decision trees on such unbalanced datasets may be a disadvantage but RF is likely to prove a reasonable alternative. Despite RFs performing well in estimating the parameters, we decided to apply the XGB algorithm to the regression process of the photometric parameters. That resulted in a significant improvement in dealing with unbalanced data sets, as is the case for the metallicity distribution of the Galactic halo. The following sections discuss the precision and the accuracy obtained by SPEEM.

3.2 SPEEM architecture

All the algorithms used in the development of SPEEM are available on the Scikit-learn package for the Python programming language (Pedregosa et al. 2011).

The SPEEM architecture, as shown in Fig. 1, consists of three different parts: the features selection, rescaling of the features (normalization), and the multi-output regressor, that gives the final output.

The first part of the pipeline, the features selection, is activated by applying the “feature union” module, which creates a customized set of features composed of photometric magnitudes, colors, and a combination of colors. These features are used by the pipeline as input to make the spectral type classification and the estimation of stellar parameters. Then, the “standard scaler” function transforms the distribution of the parameters into normal distributions centered around 0, with a standard deviation of 1, to avoid anybiases. The model considers the contribution of each variable without previous statistical assumptions due to the difference inscale values. Hence, all the scaled features are passed through the third part of the pipeline, the multi-output regressor, which consists of three parallel random forest regressors, each of them trained to calculate one of the stellar parameters considered. This way, SPEEM only requires one run to receive the inputs (the 12 J-PLUS magnitudes) and to return Teff, log g and [Fe/H] as outputs.

thumbnail Fig. 1

SPEEM structure showing the three parts of the pipeline: selection of features, normalization and the multi-output regressor.

3.3 Cleaning the sample from extragalactic sources and white dwarfs

The estimation of stellar parameters requires a training sample that is free from contaminants such as quasi-stellar objects (QSO) or white dwarfs (WD). These objects can increment the rate of false-positive candidates when it comes to searching for new VMP candidates. These unresolved contaminants might mimic VMP stars from photometric estimation, as shown in previous works using data from SEGUE (Dawson et al. 2012) or the Baryon Oscillation Spectroscopic Survey (BOSS; Bolton et al. 2012).

For this purpose, we built a data classifier using the SPECTYPE_CLASS flag provided by SEGUE as the target. We used the J-PLUS × SEGUE data set as the training sample to make a preliminary spectral type classification of the gold sample, allowing us to identify and remove objects that may resemble VMP candidates in the J-PLUS color space.

The separation between QSO and stars came from a machine-learning model trained with photometric features with higher relative importance, selected by RF in a two-step process from a list of 12 J-PLUS magnitudes and 66 J-PLUS colors.

Although the WISE magnitudes lie in the infrared regime while J-PLUS magnitudes correspond to the optical, the QSO-star separation of our sample was inspired by the W1−W2 × W2−W3 diagram, presented by Wright et al. (2010) and Scaringi et al. (2013), as a tool to separate QSOs (with W1−W2 > 0.5) and stars (with W1−W2 < 0.5). In the J-PLUS photometric system, we built a diagram based on the two essential and non-correlated colors, (u−J0410) × (J0378−g), from the RF feature importance list to separate QSOs from stars. An example of this Star-QSO separation applied to the sample J-PLUS × SEGUE is shown on the left panel of Fig. 2, suggesting there is a threshold at u−J0410 = 0.5 as a preliminary boundary to separate QSOs (represented as pink filled circles) from stars, color-coded according to the spectral type assigned by SEGUE in the insert. The right panel of Fig. 2 shows the distribution of classes used in the training data ingested into the model.

Then the J-PLUS × SEGUE sample was randomly split into two subsamples with a 0.75/0.25 ratio, giving a total of 5004 objects for training and 1668 objects for testing the predictions. Figure 3 presents the confusion matrix of the SPEEM pipeline for the classification process, color-coded by the number of objects in each class. The accuracy of the classifier, or the overall rate of correct classification, is 0.91. As shown in the confusion matrix presented in Fig. 3, the identification rate of QSOs and M stars was 98% and 75%, respectively. Spectral type stars A, F, and K (196 out of 217, 1107 out of 1165, and 55 out of 73, respectively) presented a satisfactory recovery rate. On the other hand, the lower metrics corresponding to G-stars (88 out of 157 stars correctly classified) is probably due to the misclassification of F and K type stars. Finally, the identification rate of WDs was around 67%.

At this stage of our analysis, we are interested in excluding the possible QSOs and WDs from our sample, resulting in an almost pure sample, ideal for training SPEEM to estimate the stellar parameters.

4 Estimation of parameters

4.1 Training for machine-learning regressions

The estimation of Teff, log g, and [Fe/H] was based on a training sample with 4299 stars with spectral types F and G that were randomly selected from J-PLUS × SEGUE (after removing QSOs, WDs and bad flags spectra), as described in Sect. 3.3. The training sample presents a distribution of effective temperatures between 4800 K and 9000 K along with surface gravities between 1.0 and 4.5 and metallicities between −3.0 and +0.5 estimated by SEGUE, as shown in the left panels of Figs. 46. The distribution of Teff (Fig. 4) is consistent with the temperatures corresponding to the selected spectral types F and G, with possible contamination of some misclassified A stars, which extends the Teff distribution up to 9000 K. Figure 5 indicates that main-sequence stars dominate the training sample but this also contains a few evolved stars. The metallicity distribution in Fig. 6 suggests the training sample contains stars of the thick disk, corresponding to the peak at [Fe/H] ~−0.6, as well as inner halo stars, corresponding to [Fe/H] ~−1.6 (Carollo et al. 2010).

In this work, we are mainly interested in deriving the atmospheric parameters and searching for new candidates for VMP stars, so the photometric estimation of [Fe/H] is crucial. However, the determination of stellar metallicity based on photometric measurements is especially challenging at the resolution of J-PLUS data: for a given temperature, the contrast between the metal absorption features and the continuum decreases significantly at lower metallicity values. Thus, it is crucial to define a training sample covering a wide range of [Fe/H] values. For this reason, the J-PLUS × SEGUE data set, which contains stars of [Fe/H] between − 3.17 and 0.5 in a bimodal distribution, as shown in Fig. 6, is most adequate for training the model.

The statistical correlation analysis of the complete set of colors and all possible combinations of J-PLUS filters with the stellar parameters Teff, log g, and [Fe/H] allows us to select the best set of features (i.e., those features with the highest Pearson’s correlation coefficient) to feed RF to obtain the best accurate predictions. The RF provides the relative importance assigned to input features for each estimated parameter (as shown in the right panels of Figs. 46) as feedback to test the model performance and the precision of the estimations. The feature importance score corresponds to the average of each decision tree within the trained model (Hastie et al. 2009). For example, in the case of Teff, the essential features include the colors gi and gz and the color J0515−J0861, based on the narrow-band filters centered on the Mg I triplet and the Ca II triplet, respectively. For log g, the best indicator is the color (J0378−J0410)−(J0430−J0861), which contains the spectral features Hδ, G-band, and the Ca II triplet. Finally, for [Fe/H], the index (J0395−J0430)−(J0660−z), based on filters containing the spectral features of Ca II H & K, G-band, and Hα, excels as a good metallicity indicator.

Figure 7 shows the J-PLUS filters J0395, J0410, J0430, and g overplotted to the observed spectrum of a G star (we refer to Sect. 5 for more details on the observations and reduction of the spectroscopic data), with the identification of some essential spectral features such as Ca II H & K lines, the Balmer lines Hβ, Hγ, Hδ, and the G-band. These spectral features, however, may be sensitive to more than one atmospheric parameter simultaneously, introducing a degeneracy in determining the parameters. For example, Fig. 4 of Whitten et al. (2019) exhibits the sensitivity of the Ca II H & K lines with Teff and [Fe/H]. On the other hand, H lines are mainly sensitive to effective temperature for stars with spectral types later than F. Thus, the analysis based on a combination of filters containing spectral features with different sensitivity to the atmosphere parameters may resolve this degeneracy.

thumbnail Fig. 2

Target classification with SPEEM. Left: QSO-Star separation based on the J-PLUS colors u−J0410 and J0378−g for a sample of 6794 stars. A threshold line at u−J0410 = 0.5 can be used to separate QSOs (pink filled circles) and stars (color-coded according to the spectral type). Right: distribution of classes (stellar spectral types and QSO) assigned by SEGUE for the studied sample.

thumbnail Fig. 3

Confusion matrix of the morphological and spectral type classification applied to a test sample of 1668 objects from J-PLUS × SEGUE that were not used in the training process. The blue bar indicates the number of objects in each class.

4.2 Validation of stellar parameters

Once the different SPEEM models completed the learning process, we tested the pipeline on a subsample of 1668 stars (corresponding to 25% of the J-PLUS × SEGUE sample not used in the training process) to compare the values estimated by SEGUE (medium-resolution spectroscopy) with the ones predicted by SPEEM as shown in Fig. 8. The accuracy of the predictions corresponds to the median absolute deviation (MAD), and the normalized absolute deviation (NMAD).

The Teff values estimated by SPEEM are in good agreement with the ones estimated by the pipeline SSPP of SEGUE (upper left panel of Fig. 8), obtaining an average error of 41 ± 61 K. Most of the stars in the test sample have Teff < 7000 K since it is highly dominated by F-type stars (Sect. 3.3, right panel of Fig. 2). The histogram in the lower-left panel of Fig. 8) represents the distribution of the differences between the Teff obtained with SSPP and SPEEM, and it shows no systematic difference between the two temperature scales.

The middle panel of Fig. 8 shows the correlation between SSPP values and SPEEM predictions for log g with MAD = 0.11 ± 0.17 dex. The log g values estimated by SPEEM agree with those derived by SSPP within 0.05 dex for main-sequence stars. On the other hand, the mean difference is 0.26 dex for more evolved stars, with log g ≤ 3.5. Even the use of different colors as surface gravity indicators (Fig. 5) does not provide an overall precision that is high enough to make more accurate estimations of log g. The study of the photometric determination of surface gravity anchored on asteroseismology is ongoing and will be the subject of a future paper.

Finally, the upper-right panel of Fig. 8 presents a reasonably good correlation between [Fe/H] values derived by SSPP and SPEEM for [Fe/H] > −1.0. However, there is a more significant dispersion in the lower metallicity regime. The MAD and NMAD values are 0.09 ± 0.14 dex, and the differences show a slight slope for [Fe/H] < −2.0, suggesting that SPEEM may overestimate the [Fe/H] value in the lower range of the metallicity distribution. This trend is probably due to the metallicity distribution and the lack of data required for a proper learning process in this metallicity range.

An additional test of the SPEEM capabilities has been performed with the J-PLUS × LAMOST database and using the same training sample of 4299 stars described in the previous section. The upper panels of Fig. 9 show the comparisons between stellar parameters estimated by SPEEM and those obtained with n-SSPP applied to LAMOST spectra.

The typical scatter between the Teff values from the two datasets is lower than ~150 K. The dispersion is higher than in comparison with SEGUE shown in Fig. 8, although the range of Teff is extended towards lower temperatures, reaching 4500 K. The estimation of surface gravity shows a larger dispersion with an average difference of 0.22 ± 0.33 dex while the comparison for metallicity has MAD = 0.09 ± 0.13. The distributions of the differences between the databases are shown in the respective lower panels. The difference [Fe/H]SPEEM−[Fe/H]LAMOST shows a trend with so that a change of 100 K in Teff produces a variation of ~ 0.2 dex in [Fe/H].

An extra test using a larger sample resulting from SEGUE and LAMOST merged databases to train the entire spectroscopic parameter space shows that such combination adds many stars with [Fe/H] > −2.5 but with no significant contribution in the lower end of the metallicity distribution. As a result, a combined training sample reinforces the unbalance in the metallicity distribution and introduces more considerable uncertainty in the metallicity estimation. For this reason, we decided to keep the J-PLUS × SEGUE as the training sample.

thumbnail Fig. 4

Parameter distribution and photometric indicators for effective temperatures. Left: distribution of Teff for the training sample composed of 4299 stars from J-PLUS × SEGUE. Right: relative importance of the features for the estimation of effective temperature. The most important features as Teff indicators are the colors g-i, g-z, and the color J05150−J0861 based on the narrow-band filters that contain the Mg I Triplet and the Ca II Triplet.

thumbnail Fig. 5

Parameter distribution and photometric indicators for surface gravities. Left: distribution of log g for the training sample composed of 4299 stars from J-PLUS × SEGUE. Right: relative importance of the features for the estimation of surface gravity. The most important feature as log g indicator is the combination (J0378−J0410)−(J0430−J0861).

thumbnail Fig. 6

Parameter distribution and photometric indicators for metallicities. Left: distribution of [Fe/H] for the training sample composed of 4299 stars from J-PLUS × SEGUE. Right: relative importance of the features for the estimation of metallicity. The most important feature as [Fe/H] indicator is the combination (J0395−J0430)−(J0660−z) based on narrow-band filters containing the spectral features Ca II H & K, G-band, and Hα and the broad-band filter z.

thumbnail Fig. 7

Some J-PLUS filters overplotted to the spectrum of the G-type star J-PLUS 75 091-15 989 with [Fe/H] = −3.09. The filters J0395 and J0410 contain the Ca II H & K and the Hydrogen line Hδ, respectively. The Hγ line and the G-band are evident in the filter J0430.

5 Searching for VMP candidates

After completing the validation process, we applied SPEEM to the entire gold sample and obtained effective temperatures, surface gravities, and metallicities for the 746 531 stars. The left panel of Fig. 10 shows the Hertzsprung-Russell diagram for the gold sample, with the stellar parameters Teff and log g obtained with SPEEM. The points are color-coded according to the number of stars in the parameters space and the observed distribution suggests that the main sequence stars dominate the gold sample with spectral types F and G.

The right panel of Fig. 10 shows the metallicity distribution obtained for the gold sample. The histogram has two peaks related to the stellar populations of the thick disk at [Fe/H] ~ −0.4 and the halo at [Fe/H] ~ −1.6. This distribution follows the general metallicity distribution of the halo widely discussed in the literature (e.g., Carollo et al. 2007)associated with different components in the Galactic Halo.

We used a subsample of 575 600 objects was used for our search for new VMP candidates. In order to define a list for a spectroscopic follow-up, we imposed the following selection criteria: Teff < 5500 K (corresponding to spectral types later than ~G) and [Fe/H] < −2.5. The limit magnitude has been constrained even further (from g < 18 to g < 17) for observations with 4-m class telescopes. The application of these selection criteria produced a list of 177 low-metallicity candidates.

From this list, we selected 11 stars with coordinates optimal for observations with the 4.2-m William Herschel Telescope at the Roque de Los Muchachos Observatory, La Palma. The observations with the WHT coupled to the spectrograph ISIS with a 1′′ slit provide spectra (in the blue arm) in the range 3900–5100 Å and nominal resolution λ∕Δλ ~ 2000 at 4000 Å. In Table 2, we list the observational data for the sample of candidates: the equatorial coordinates, observation dates, exposure times, g magnitudes, and the measured S/N ratios at ~4500Å.

The data reduction followed the standard procedure of bias subtraction, flat-field correction, extraction of the one-dimensional spectra, and wavelength calibration using the IRAF5 software.

The stellar atmospheric parameters were determined using the n-SSPP (Beers et al. 2014, 2017), providing the Teff, log g, and [Fe/H] listed in Table 3. We also list the estimates obtained with SPEEM for these parameters and effective temperatures from the Gaia DR2 catalog. The n-SSPP uses as input photometric information and the observed spectrum for each star. The parameters are determined based on photometric calibrations, line indices, and matching with a database of synthetic spectra. Further details on the procedures can be found on Lee et al. (2008a,b).

The left panels of Figs. 11 and 12 show the observed spectra obtained for the metal-poor candidates. The values of Teff and [Fe/H] obtained with the n-SSPP are indicated in each panel. The right panels of Figs. 11 and 12 show the J-PLUS photo-spectra for the sample stars. The Spectral Energy Distributions of the selected stars are consistent with the SED of a typical G-type star. The points are color-coded according to the J-PLUS filters, as in Cenarro et al. (2019). Squares represent the broadband filters and circles represent the narrowband filters. The medium-resolution spectrum obtained by SEGUE is available for only one star of our sample, J-PLUS ID 66 723-1757, represented by the gray line in the lower right panel of Fig. 12. Estimates of Teff and [Fe/H] obtained with SPEEM are indicated in each panel.

All the selected candidates have been observed with Gaia and have Teff estimates in Gaia DR2 as listed in Table 3. The different Teff estimates from Gaia, SPEEM and n-SSPP are all consistent within 2σ and the mean differences relative to Gaia results are K and K. The differences between the effective temperatures estimated with SPEEM and those derived from the spectroscopic analysis vary between − 211 K and + 228 K, with an average difference of 49 ± 128 K.

We repeated our search for stars in common with SEGUE, with the result that one of the candidates observed with ISIS/WHT had already been analysed by SEGUE, providing = 5115 K and [Fe/H]SEGUE = −2.96, while the effective temperature listed in the Gaia DR2 catalogue is 5058 K. The parameters estimated with SPEEM are = 5112 K and [Fe/H]SPEEM = − 2.92, while the spectroscopic analysis indicates slightly lower values ( = 5032 K and [Fe/H]nSSPP = −2.57) for the star J-PLUS ID 66 723-1757.

The differences in metallicities derived from SPEEM and n-SSPP are, on average + 0.11 ± 0.33, and most of the differences are within ± 0.4 dex. All the eleven selected candidates present [Fe/H] < −2.0 and seven stars have spectroscopic [Fe/H] < −2.5, representing a success rate of %6. This result supports the application SPEEM as a tool to obtain photometric estimates of the fundamental stellar parameters.

thumbnail Fig. 8

Comparison between parameters Teff (upper left panel), log g (upper middle panel) and [Fe/H] (upper right panel) obtained with SSPP and the SPEEM pipelines for 0.25% of the sample J-PLUS × SEGUE acting as a test sample. The median absolute deviation (MAD) and the corresponding normalized median absolute deviation (NMAD) are shown in each panel. The lower panels show the distributions of the differences between the respective parameters obtained with SSPP and SPEEM.

thumbnail Fig. 9

Comparison between parameters Teff (upper left panel), log g (upper middle panel) and [Fe/H] (upper right panel) obtained with n-SSPP and the SPEEM pipeline for stars in of the sample J-PLUS × LAMOST acting as a validation sample. The mean absolute error (MAD) and the corresponding standard deviation are shown in each panel. The lower panels show the distributions obtained of the differences between the respective parameters obtained with n-SSPP and SPEEM.

thumbnail Fig. 10

Distribution of stellar parameters obtained with SPEEM for the gold sample. Left: distribution of stellar parameters Teff and log g. The points are color-coded according to the number of stars, indicating the sample is mostly composed by main sequence stars with spectral types F and G. Right: the distribution of [Fe/H] has two peaks related to the stellar populations of the thick disk at [Fe/H] ~ −0.4 and the halo at [Fe/H] ~ −1.6.

Table 2

Very metal-poor stars candidates: spectroscopic data.

Table 3

Very metal-poor stars candidates: stellar parameters.

thumbnail Fig. 11

Spectra and photo-spectra of the targets. Left panels: WHT/ISIS spectra of very metal-poor candidates identified according to Table 2. The Ca II H & K lines and the Hydrogen lines Hβ, Hγ, and Hδ, as well as the G-band, are clearly seen. Right panels: J-PLUS photo-spectra of the very-metal poor candidates. The points are color-coded according tothe J-PLUS filters; squares represent the broadband filters and circles represent the narrowband filters.

6 Summary and conclusions

We present the SPEEM pipeline as a useful tool for stablishing the star × QSO classification and obtaining the three basic stellar parameters of Teff, log g, and [Fe/H] based on photometric measurements in the J-PLUS system. Specially, SPEEM employs RF and XGB machine-learning algorithms, trained with samples selected from cross-matched data with SEGUE. The pipeline can recover the parameters estimated by SEGUE within a deviation of 41 ± 61 K for Teff; 0.11 ± 0.17 for log g, and 0.09 ± 0.14 for [Fe/H]. The SPEEM performance test includes a sample of stars in common with LAMOST. Even though the sample J-PLUS × LAMOST is larger than J-PLUS × SEGUE, the metallicity distribution of the former sample is more unbalanced. As a result, we note a trend in the metallicity difference as a function of temperature difference, representing a variation of ~ 0.2 dex in [Fe/H] for ~ 100 K in Teff, although the obtained [Fe/H] shows no significant differences relative to the LAMOST [Fe/H] estimations.

The “gold sample” comprises a subsample of the J-PLUS DR2, selected according to criteria based on the quality of the photometric measurements and star/galaxy classification from the PDF analysis. The application of SPEEM to the “gold sample” produced stellar parameters for 746 531 stars. A list of 177 stars with [Fe/H] < −2.5, Teff < 5500 K, and magnitude g < 17.0 has been selected as potential candidate very metal-poor stars that are bright enough for a spectroscopic follow-up.

Eleven candidates were observed with the WHT+ISIS and the resulting spectra have been analysed using the n-SSPP, yielding spectroscopic values of Teff, log g, and [Fe/H]. The comparison between the parameters estimated by SPEEM and those derived spectroscopically shows that all the studied stars are confirmed very metal-poor stars with [Fe/H] < −2.0, including seven stars with [Fe/H] < −2.5 and one new extremely metal-poor star, with [Fe/H] = −3.09.

In conclusion, SPEEM has presented a success rate of % in validating the search for stars with [Fe/H] < −2.5. For comparison, the spectroscopic follow-up conducted by the Pristine collaboration has a success rate of 70% for [Fe/H] < −2.5 (Youakim et al. 2017) in a sample of 149 stars. In addition, the Best & Brightest Survey (Schlaufman & Casey 2014) found ~ 32% stars with − 3.0 < [Fe/H] < −2.0 in their spectroscopic follow-up study; and recently, Limberg et al. (2021) reported a success rate of 28% for [Fe/H] <−2.5 when combining spectroscopic follow-up of B&B Targets with Gaia EDR3 data. Future spectroscopic samples of low-metallicity stars in the [Fe/H] < −3.0 regime will be used to extend the capabilities of SPEEM in application to future data releases of the J-PLUS survey.

thumbnail Fig. 12

Spectra and photo-spectra of the targets. Left panels: WHT/ISIS spectra of very metal-poor candidates identified according to Table 2. The Ca II H & K lines and the Hydrogen lines Hβ, Hγ, and Hδ, as well as the G-band, are clearly seen. Right panels: J-PLUS photo-spectra of the very-metal poor candidates. The points are color-coded according tothe J-PLUS filters; squares represent the broadband filters and circles represent the narrowband filters. The gray line represents the SEGUE spectrum of the star 66 723-1757 superposed to its photo spectrum.

Acknowledgements

We thank the referee for suggestions and comments that contributed to the improvement of this paper. Based on observations made with the JAST80 telescope at the Observatorio Astrofísico de Javalambre (OAJ), in Teruel, owned, managed, and operated by the Centro de Estudios de Física del Cosmos de Aragón. We acknowledge the OAJ Data Processing and Archiving Unit (UPAD) for reducing the OAJ data used in this work. Funding for the J-PLUS Project has been provided by the Governments of Spain and Aragón through the Fondo de Inversiones de Teruel; the Aragón Government through the Reseach Groups E96, E103, and E16_17R; the Spanish Ministry of Science, Innovation and Universities (MCIU/AEI/FEDER, UE) with grants PGC2018-097585-B-C21 and PGC2018-097585-B-C22; the Spanish Ministry of Economy and Competitiveness (MINECO) under AYA2015-66211-C2-1-P, AYA2015-66211-C2-2, AYA2012-30789, and ICTS-2009-14; and European FEDER funding (FCDD10-4E-867, FCDD13-4E-2685). The Brazilian agencies FINEP, FAPESP, and the National Observatory of Brazil have also contributed to this project. C.A.G. acknowledges financial support from the CAPES through scholarship for developing his PhD project, and extend and special mention to Nathaniel Tucker for introducing him to the amazing world of machine learning. The work of V.M.P. is supported by NOIRLab, which is managed by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation. Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences. Y.S.L. acknowledges support from the National Research Foundation (NRF) of Korea grant funded by the Ministry of Science and ICT (NRF-2021R1A2C1008679). F.J.E. acknowledges financial support from the Spanish MINECO/FEDER through the grant AYA2017-84089 and MDM-2017-0737 at Centro de Astrobiología (CSIC-INTA), Unidad de Excelencia María de Maeztu, and from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement no. 824064 through the ESCAPE – The European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures project. R.A.D. acknowledges support from the CNPq through BP grant 308105/2018-4. J.V. acknowledges the technical members of the UPAD for their invaluable work: Juan Castillo, Tamara Civera, Javier Hernández, Ángel López, Alberto Moreno, and David Muniesa. This research has made use of the Spanish Virtual Observatory (http://svo.cab.inta-csic.es) supported from the Spanish MICINN/FEDER through grant AyA2017-84089. This research made use of Matplotlib, a 2D graphics package used for Python for publication-quality image generation across user interfaces and operating systems (Hunter 2007). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 898633.

References

  1. Allende Prieto, C., Beers, T. C., Wilhelm, R., et al. 2006, ApJ, 636, 804 [NASA ADS] [CrossRef] [Google Scholar]
  2. Almeida-Fernandes, F., Sampedro, L., Herpich, F. R., et al. 2021, ArXiv e-prints, [arXiv:2104.00020] [Google Scholar]
  3. Aoki, W., Beers, T. C., Lee, Y. S., et al. 2012, AJ, 145, 13 [Google Scholar]
  4. Bai, Y., Liu, J., Wang, S., & Yang, F. 2018, AJ, 157, 9 [NASA ADS] [CrossRef] [Google Scholar]
  5. Beers, T. C. & Christlieb, N. 2005, ARA&A, 43, 531 [NASA ADS] [CrossRef] [Google Scholar]
  6. Beers, T. C., Rossi, S., Norris, J. E., Ryan, S. G., & Shefler, T. 1999, AJ, 117, 981 [NASA ADS] [CrossRef] [Google Scholar]
  7. Beers, T. C., Norris, J. E., Placco, V. M., et al. 2014, ApJ, 794, 58 [NASA ADS] [CrossRef] [Google Scholar]
  8. Beers, T. C., Placco, V. M., Carollo, D., et al. 2017, ApJ, 835, 81 [NASA ADS] [CrossRef] [Google Scholar]
  9. Benitez, N., Dupke, R., Moles, M., et al. 2014, ArXiv e-prints [arXiv:1403.5237] [Google Scholar]
  10. Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  11. Bolton, A. S., Schlegel, D. J., Aubourg, É., et al. 2012, AJ, 144, 144 [NASA ADS] [CrossRef] [Google Scholar]
  12. Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]
  13. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. 1984, Classification and Regression Trees (CRC press) [Google Scholar]
  14. Carollo, D., Beers, T. C., Lee, Y. S., et al. 2007, Nature, 450, 1020 [NASA ADS] [CrossRef] [Google Scholar]
  15. Carollo, D., Beers, T. C., Chiba, M., et al. 2010, ApJ, 712, 692 [NASA ADS] [CrossRef] [Google Scholar]
  16. Casagrande, L., Portinari, L., & Flynn, C. 2006, MNRAS, 373, 13 [Google Scholar]
  17. Cenarro, A., Cardiel, N., Gorgas, J., et al. 2001a, MNRAS, 326, 959 [NASA ADS] [CrossRef] [Google Scholar]
  18. Cenarro, A., Gorgas, J., Cardiel, N., et al. 2001b, MNRAS, 326, 981 [NASA ADS] [CrossRef] [Google Scholar]
  19. Cenarro, A. J., Moles, M., Marín-Franch, A., et al. 2014, in Proc. SPIE, 9149, Observatory Operations: Strategies, Processes, and Systems V, 91491I [Google Scholar]
  20. Cenarro, A. J., Moles, M., Cristóbal-Hornillos, D., et al. 2019, A&A, 622, A176 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Chambers, K. C., Magnier, E. A., Metcalfe, N., et al. 2016, ArXiv e-prints, [arXiv:1612.05560] [Google Scholar]
  22. Chao, L., Wen-hui, Z., & Ji-ming, L. 2019, Chinese Astron. Astrophys., 43, 539 [Google Scholar]
  23. Chen, T., & Guestrin, C. 2016, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785 [Google Scholar]
  24. Chen, T., He, T., Benesty, M., et al. 2015, R package version 0.4-2, 1 [Google Scholar]
  25. Cui, X.-Q., Zhao, Y.-H., Chu, Y.-Q., et al. 2012, Res. Astron. Astrophys., 12, 1197 [Google Scholar]
  26. Dalton, G., Trager, S. C., Abrams, D. C., et al. 2012, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, 8446, Ground-based and Airborne Instrumentation for Astronomy IV, eds. I. S. McLean, S. K. Ramsay, & H. Takami, 84460P [Google Scholar]
  27. Dawson, K. S., Schlegel, D. J., Ahn, C. P., et al. 2012, AJ, 145, 10 [Google Scholar]
  28. de Jong, J. T. A., Yanny, B., Rix, H.-W., et al. 2010, ApJ, 714, 663 [NASA ADS] [CrossRef] [Google Scholar]
  29. De Jong, R. S., Bellido-Tirado, O., Chiappini, C., et al. 2012, in Ground-based and Airborne Instrumentation for Astronomy IV, 8446, International Society for Optics and Photonics, 84460T [Google Scholar]
  30. Deng, L.-C., Newberg, H. J., Liu, C., et al. 2012, Res. Astron. Astrophys., 12, 735 [Google Scholar]
  31. Fiorentin, P. R., Bailer-Jones, C., Lee, Y. S., et al. 2007, A&A, 467, 1373 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  32. Gaia Collaboration (Brown, A. G. A., et al.) 2018, A&A, 616, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Hastie, T., Tibshirani, R., & Friedman, J. 2009, The elements of statistical learning: data mining, inference, and prediction (Springer Science & Business Media) [Google Scholar]
  34. Hunter, J. D., 2007, Comput. Sci. Eng., 9, 90 [Google Scholar]
  35. Ivezić, Ž., Sesar, B., Jurić, M., et al. 2008, ApJ, 684, 287 [Google Scholar]
  36. Keller, S. C., Schmidt, B. P., Bessell, M. S., et al. 2007, PASA, 24, 1 [NASA ADS] [CrossRef] [Google Scholar]
  37. Kim, Y. K., Lee, Y. S., & Beers, T. C. 2019, ApJ, 882, 176 [NASA ADS] [CrossRef] [Google Scholar]
  38. Kim, Y. K., Lee, Y. S., Beers, T. C., & Koo, J.-R. 2021, ApJ, 911, L21 [CrossRef] [Google Scholar]
  39. Koleva, M., Prugniel, P., Bouchard, A., & Wu, Y. 2009, A&A, 501, 1269 [CrossRef] [EDP Sciences] [Google Scholar]
  40. Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. 2007, Emerg. Artif. Intell. Applic. Comput. Eng., 160, 3 [Google Scholar]
  41. Lee, Y. S., Beers, T. C., Sivarani, T., et al. 2008a, AJ, 136, 2022 [Google Scholar]
  42. Lee, Y. S., Beers, T. C., Sivarani, T., et al. 2008b, AJ, 136, 2050 [Google Scholar]
  43. Lee, Y. S., Beers, T. C., An, D., et al. 2011, ApJ, 738, 187 [NASA ADS] [CrossRef] [Google Scholar]
  44. Lee, Y. S., Beers, T. C., Masseron, T., et al. 2013, AJ, 146, 132 [Google Scholar]
  45. Lee, Y. S., Beers, T. C., Kim, Y. K., et al. 2017, ApJ, 836, 91 [CrossRef] [Google Scholar]
  46. Lee, Y. S., Beers, T. C., & Kim, Y. K. 2019, ApJ, 885, 102 [NASA ADS] [CrossRef] [Google Scholar]
  47. Limberg, G., Santucci, R.M., Rossi, S., et al. 2021, ApJ, 913, 11 [NASA ADS] [CrossRef] [Google Scholar]
  48. López-Sanjuan, C., Vázquez Ramió, H., Varela, J., et al. 2019, A&A, 622, A177 [Google Scholar]
  49. López-Sanjuan, C., Yuan, H., Vázquez Ramió, H., et al. 2021, A&A, 654, A61 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  50. Luo, A.-L., Zhao, Y.-H., Zhao, G., et al. 2015, Res. Astron. Astrophys., 15, 1095 [Google Scholar]
  51. Majewski, S. R., APOGEE Team, & APOGEE-2 Team 2016, Astron. Nachr., 337, 863 [NASA ADS] [CrossRef] [Google Scholar]
  52. Marín-Franch, A., Taylor, K., Cenarro, J., Cristobal-Hornillos, D., & Moles, M. 2015, in IAU General Assembly, 29, 2257381 [Google Scholar]
  53. Mendes de Oliveira, C., Ribeiro, T., Schoenell, W., et al. 2019, MNRAS, 489, 241 [NASA ADS] [CrossRef] [Google Scholar]
  54. Miller, A. A., Bloom, J. S., Richards, J. W., et al. 2015, ApJ, 798, 122 [NASA ADS] [CrossRef] [Google Scholar]
  55. Moultaka, J., Ilovaisky, S., Prugniel, P., & Soubiran, C. 2004, PASP, 116, 693 [NASA ADS] [CrossRef] [Google Scholar]
  56. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
  57. Placco, V. M., Frebel, A., Lee, Y. S., et al. 2015, ApJ, 809, 136 [NASA ADS] [CrossRef] [Google Scholar]
  58. Prieto, C. A., Sivarani, T., Beers, T. C., et al. 2008, AJ, 136, 2070 [NASA ADS] [CrossRef] [Google Scholar]
  59. Prugniel, P., & Soubiran, C. 2001, A&A, 369, 1048 [CrossRef] [EDP Sciences] [Google Scholar]
  60. Prusti, T., De Bruijne, J., Brown, A. G., et al. 2016, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  61. Sánchez-Blázquez, P., Peletier, R. F., Jiménez-Vicente, J., et al. 2006, MNRAS, 371, 703 [Google Scholar]
  62. Scaringi, S., Groot, P., Verbeek, K., et al. 2013, MNRAS, 428, 2207 [NASA ADS] [CrossRef] [Google Scholar]
  63. Schlaufman, K. C., & Casey, A. R. 2014, ApJ, 797, 13 [CrossRef] [Google Scholar]
  64. Singh, H. P., Gulati, R. K., & Gupta, R. 1998, MNRAS, 295, 312 [NASA ADS] [CrossRef] [Google Scholar]
  65. Starkenburg, E., Martin, N., Youakim, K., et al. 2017, MNRAS, 471, 2587 [NASA ADS] [CrossRef] [Google Scholar]
  66. Strobl, C., Malley, J., & Tutz, G. 2009, Psychol. Methods, 14, 323 [CrossRef] [Google Scholar]
  67. Whitten, D., Placco, V., Beers, T., et al. 2019, A&A, 622, A182 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  68. Whitten, D. D., Placco, V. M., Beers, T. C., et al. 2021, ApJ, 912, 147 [NASA ADS] [CrossRef] [Google Scholar]
  69. Wilson, E. B. 1927, J. Am. Stat. Assoc., 22, 209 [CrossRef] [Google Scholar]
  70. Wright, E. L., Eisenhardt, P. R., Mainzer, A. K., et al. 2010, AJ, 140, 1868 [Google Scholar]
  71. Wu, Y., Luo, A.-L., Li, H.-N., et al. 2011, Res. Astron. Astrophys., 11, 924 [Google Scholar]
  72. Wu, Y., Du, B., Luo, A., Zhao, Y., & Yuan, H. 2014, Proc. Int. Astron. Union, 10, 340 [CrossRef] [Google Scholar]
  73. Xiang, M., Liu, X., Yuan, H., et al. 2015, MNRAS, 448, 822 [NASA ADS] [CrossRef] [Google Scholar]
  74. Yanny, B., Rockosi, C., Newberg, H. J., et al. 2009, AJ, 137, 4377 [Google Scholar]
  75. York, D. G., Adelman, J., Anderson, Jr. J. E., et al. 2000, AJ, 120, 1579 [Google Scholar]
  76. Youakim, K., Starkenburg, E., Aguado, D. S., et al. 2017, MNRAS, 472, 2963 [NASA ADS] [CrossRef] [Google Scholar]
  77. Youakim, K., Starkenburg, E., Martin, N. F., et al. 2020, MNRAS, 492, 4986 [CrossRef] [Google Scholar]

1

[A/B] = , where N is the number density of chemical elements A and B in the star (⋆) and the Sun (⊙).

5

IRAF was distributed by the National Optical Astronomy Observatories, which are operated by the Association of Universities for Research in Astronomy, Inc., under cooperative agreement with the National Science Foundation.

6

Fractional uncertainties represented by the Wilson score (Wilson 1927).

All Tables

Table 1

J-PLUS filter system.

Table 2

Very metal-poor stars candidates: spectroscopic data.

Table 3

Very metal-poor stars candidates: stellar parameters.

All Figures

thumbnail Fig. 1

SPEEM structure showing the three parts of the pipeline: selection of features, normalization and the multi-output regressor.

In the text
thumbnail Fig. 2

Target classification with SPEEM. Left: QSO-Star separation based on the J-PLUS colors u−J0410 and J0378−g for a sample of 6794 stars. A threshold line at u−J0410 = 0.5 can be used to separate QSOs (pink filled circles) and stars (color-coded according to the spectral type). Right: distribution of classes (stellar spectral types and QSO) assigned by SEGUE for the studied sample.

In the text
thumbnail Fig. 3

Confusion matrix of the morphological and spectral type classification applied to a test sample of 1668 objects from J-PLUS × SEGUE that were not used in the training process. The blue bar indicates the number of objects in each class.

In the text
thumbnail Fig. 4

Parameter distribution and photometric indicators for effective temperatures. Left: distribution of Teff for the training sample composed of 4299 stars from J-PLUS × SEGUE. Right: relative importance of the features for the estimation of effective temperature. The most important features as Teff indicators are the colors g-i, g-z, and the color J05150−J0861 based on the narrow-band filters that contain the Mg I Triplet and the Ca II Triplet.

In the text
thumbnail Fig. 5

Parameter distribution and photometric indicators for surface gravities. Left: distribution of log g for the training sample composed of 4299 stars from J-PLUS × SEGUE. Right: relative importance of the features for the estimation of surface gravity. The most important feature as log g indicator is the combination (J0378−J0410)−(J0430−J0861).

In the text
thumbnail Fig. 6

Parameter distribution and photometric indicators for metallicities. Left: distribution of [Fe/H] for the training sample composed of 4299 stars from J-PLUS × SEGUE. Right: relative importance of the features for the estimation of metallicity. The most important feature as [Fe/H] indicator is the combination (J0395−J0430)−(J0660−z) based on narrow-band filters containing the spectral features Ca II H & K, G-band, and Hα and the broad-band filter z.

In the text
thumbnail Fig. 7

Some J-PLUS filters overplotted to the spectrum of the G-type star J-PLUS 75 091-15 989 with [Fe/H] = −3.09. The filters J0395 and J0410 contain the Ca II H & K and the Hydrogen line Hδ, respectively. The Hγ line and the G-band are evident in the filter J0430.

In the text
thumbnail Fig. 8

Comparison between parameters Teff (upper left panel), log g (upper middle panel) and [Fe/H] (upper right panel) obtained with SSPP and the SPEEM pipelines for 0.25% of the sample J-PLUS × SEGUE acting as a test sample. The median absolute deviation (MAD) and the corresponding normalized median absolute deviation (NMAD) are shown in each panel. The lower panels show the distributions of the differences between the respective parameters obtained with SSPP and SPEEM.

In the text
thumbnail Fig. 9

Comparison between parameters Teff (upper left panel), log g (upper middle panel) and [Fe/H] (upper right panel) obtained with n-SSPP and the SPEEM pipeline for stars in of the sample J-PLUS × LAMOST acting as a validation sample. The mean absolute error (MAD) and the corresponding standard deviation are shown in each panel. The lower panels show the distributions obtained of the differences between the respective parameters obtained with n-SSPP and SPEEM.

In the text
thumbnail Fig. 10

Distribution of stellar parameters obtained with SPEEM for the gold sample. Left: distribution of stellar parameters Teff and log g. The points are color-coded according to the number of stars, indicating the sample is mostly composed by main sequence stars with spectral types F and G. Right: the distribution of [Fe/H] has two peaks related to the stellar populations of the thick disk at [Fe/H] ~ −0.4 and the halo at [Fe/H] ~ −1.6.

In the text
thumbnail Fig. 11

Spectra and photo-spectra of the targets. Left panels: WHT/ISIS spectra of very metal-poor candidates identified according to Table 2. The Ca II H & K lines and the Hydrogen lines Hβ, Hγ, and Hδ, as well as the G-band, are clearly seen. Right panels: J-PLUS photo-spectra of the very-metal poor candidates. The points are color-coded according tothe J-PLUS filters; squares represent the broadband filters and circles represent the narrowband filters.

In the text
thumbnail Fig. 12

Spectra and photo-spectra of the targets. Left panels: WHT/ISIS spectra of very metal-poor candidates identified according to Table 2. The Ca II H & K lines and the Hydrogen lines Hβ, Hγ, and Hδ, as well as the G-band, are clearly seen. Right panels: J-PLUS photo-spectra of the very-metal poor candidates. The points are color-coded according tothe J-PLUS filters; squares represent the broadband filters and circles represent the narrowband filters. The gray line represents the SEGUE spectrum of the star 66 723-1757 superposed to its photo spectrum.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.