White dwarf Random Forest classification through Gaia spectral coefficients

Enrique Miguel García-Zamora; Santiago Torres; Alberto Rebassa-Mansergas

doi:10.1051/0004-6361/202347601

Home

All issues

Volume 679 (November 2023)

A&A, 679 (2023) A127

Full HTML

Open Access

Issue		A&A Volume 679, November 2023


Article Number		A127
Number of page(s)		15
Section		Catalogs and data
DOI		https://doi.org/10.1051/0004-6361/202347601
Published online		29 November 2023

A&A, 679, A127 (2023)

White dwarf Random Forest classification through Gaia spectral coefficients^★

Enrique Miguel García-Zamora¹, Santiago Torres¹^,2 and Alberto Rebassa-Mansergas¹^,2

¹ Departament de Física, Universitat Politécnica de Catalunya, c/Esteve Terrades 5, 08860 Castelldefels, Spain
e-mail: santiago.torres@upc.edu
² Institute for Space Studies of Catalonia, c/Gran Capità 2–4, Edif. Nexus 104, 08034 Barcelona, Spain

Received: 28 July 2023
Accepted: 27 September 2023

Abstract

Context. The third data release of Gaia has provided approximately 220 million low resolution spectra. Among these, about 100 000 correspond to white dwarfs. The magnitude of this quantity of data precludes the possibility of performing spectral analysis and type determination by human inspection. In order to tackle this issue, we explore the possibility of utilising a machine learning approach, based on a Random Forest algorithm.

Aims. Our goal is to analyse the viability of the Random Forest algorithm for the spectral classification of the white dwarf population within 100 pc from the Sun, based on the Hermite coefficients of Gaia spectra.

Methods. We utilised the assigned spectral type from the Montreal White Dwarf Database for training and testing our Random Forest algorithm. Once validated, our algorithm model was applied to the rest of the unclassified white dwarfs within 100 pc. First, we started by classifying the two major spectral type groups of white dwarfs: hydrogen-rich (DA) and hydrogen-deficient (non-DA). Next, we explored the possibility of classifying the various spectral subtypes, including the secondary spectral types in some cases.

Results. Our Random Forest classification presented a very high recall (>80%) for DA and DB white dwarfs, and a very high precision (>90%) for DB, DQ, and DZ white dwarfs. As a result we have assigned a spectral type to 9446 previously unclassified white dwarfs: 4739 DAs, 76 DBs (60 of them DBAs), 4437 DCs, 132 DZs, and 62 DQs (nine of them DQpec).

Conclusions. Despite the low resolution of Gaia spectra, the Random Forest algorithm applied to the Gaia spectral coefficients proves to be a highly valuable tool for spectral classification.

Key words: white dwarfs / stars: atmospheres / Hertzsprung-Russell and C–M diagrams / catalogs

^★

Full Table 3 is available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr (130.79.128.5) or via https://cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/679/A127

© The Authors 2023

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

White dwarfs are the remnants of stars with initial masses ≲8–10 M_⊙ (e.g. Althaus et al. 2010). They are basically composed of a degenerate core of typically half a solar mass that is surrounded by a thin partially degenerate atmospheric layer. Since nuclear reactions have practically ceased, the energy source in the deep interior of white dwarfs is primarily derived from gravothermal energy released by the ions and eventually provided by core crystallisation, phase separation, and other processes such as sedimentation of minor species (see Isern et al. 2022 for a recent review). The heat generated in the core of the white dwarf is radiated through the atmospheric envelope. Thus, this thin layer plays a capital role in the cooling of the white dwarf. In the canonical model, the outermost layer of a white dwarf is primarily composed of helium with a mass around 10⁻² M_⊙, accounting for less than 2% of the total white dwarf mass. However, in the majority of cases (approximately 80%), there is an additional thinner layer of hydrogen with a mass between 10⁻¹⁵ to 10⁻⁴ M_⊙, which overlays the helium layer.

From an observational point of view, spectroscopic analysis of white dwarf atmospheres enables the identification of atomic and molecular lines and bands. This fact has allowed a spectral classification of white dwarfs according to the presence of certain lines (Sion et al. 1983). Basically, white dwarfs are divided into those that present Balmer lines (i.e. hydrogen-rich white dwarfs, or DAs), and those that do not (generically called non-DAs). Among this last group, we may also find white dwarf spectra that exhibit absorption helium lines, He I or He II, called DB and DO, respectively; carbon features, either atomic or molecular, named DQ; metallic lines such as Ca II or Fe II, named DZ; or very weak lines or no features at all, thus showing a continuous spectrum, named DC. This general spectral classification relates to what is referred to as the primary spectral type (see Table 2 from Sion et al. 1983). However, it is common to identify lines from different elements in white dwarf spectra. For instance, we may find a DA with weaker helium lines or metallic lines additionally present, in which case these objects would be labelled as DAB or DAZ, respectively. The presence of a magnetic field or variability in the white dwarf spectrum, respectively, would add a secondary H or V to the primary spectral class.

Spectral classification of white dwarfs is of paramount importance for the determination of their stellar parameters such as temperature, surface gravity, mass, or luminosity. Moreover, our understanding of the physical evolution of the white dwarf population depends on the proper identification of their atmospheres. For instance, processes such as convective mixing or convective dilution in spectral evolution (e.g. Blouin et al. 2019; Cunningham et al. 2020), the presence of carbon in hydrogen-deficient atmospheres as a possible explanation of the Gaia colour-magnitude bifurcation (Camisassa et al. 2023; Blouin et al. 2023), the high ratio of DQ white dwarfs in the called Q branch (Tremblay et al. 2019; Cheng et al. 2019), or the origin of accreted material in white dwarfs (e.g. Zuckerman et al. 2007; Farihi et al. 2010) are a few examples where a detailed identification of white dwarf spectra is required for a proper understanding of these issues. However, spectro-scopic follow-up of white dwarfs is a time-consuming task. A volume complete spectroscopic sample is achieved up to 40 pc from the Sun (Tremblay et al. 2020; McCleery et al. 2020; O’Brien et al. 2023), where observations mostly performed from the William Herschel Telescope and the Gran Telescopio Canarias in the Northern Hemisphere and the Very Large Telescope and Southern Astrophysical Research Telescope in the Southern Hemisphere provide resolving powers R ≈ 2000–3900. However, this is not the case up to 100 pc, where the percentage of spectrally labelled white dwarfs is roughly 20% (e.g. Kilic et al. 2020).

Nevertheless, the third Gaia mission Data Release (Gaia Collaboration 2023) has provided astrometric data for nearly two billion objects and mean low resolution (30 ≲ R ≲ 100; Carrasco et al. 2021) BP and RP spectra of approximately 220 million sources (De Angeli et al. 2023). Of these, almost 100 000 correspond to candidates for white dwarf objects (Gentile Fusillo et al. 2021).

This enormous quantity of data prevents spectral classification by human inspection. With the recent increasing growth of large astronomical databases, other approaches based on machine learning artificial intelligence algorithms are absolutely necessary. These techniques are widely used nowadays in astrophysics, and particularly in the field of white dwarfs. Since the pioneering work of Torres et al. (1998) on the use of self-organising maps for the identification of halo stars, up to the most recent ones using the Random Forest algorithm in Galactic component identification (Torres et al. 2019), or their spectral identification (Echeverry et al. 2022; Montegriffo et al. 2023), or through deep learning techniques (Kong et al. 2018; Vincent et al. 2023), all of these approaches have been proven to be reliable methods in the automatic analysis of large white dwarf databases. Additional statistical classification methods have been performed, in particular in the spectral classification of white dwarfs. For instance, in Jiménez-Esteban et al. (2023) and Torres et al. (2023), the Virtual Observatory Spectral energy distribution Analyser tool (Bayo et al. 2008) was used to conduct an automated spectral energy distribution (SED) fitting of the 100 pc and 500 pc Gaia white dwarf samples, respectively, to different atmospheric models. These works allowed the authors to classify the samples into DA and non-DA white dwarfs with an accuracy of over 90 per cent.

In this work, we apply a Random Forest algorithm specifically developed to classify, for the first time, the whole 100-pc white dwarf Gaia sample into their spectral types. Focusing the analysis to objects identified within this distance limit is capital, since it represents a nearly complete volume-limited sample, which potentially allows to derive accurate percentages of the different spectral type classes among white dwarfs. This approach based on artificial intelligence techniques represents a clear advantage respect other approaches we performed in Jiménez-Esteban et al. (2023) and Torres et al. (2023) since it does not require the use of theoretical atmospheric models. The models are subject to substantial uncertainties for temperatures below 5500 K, which implies an unreliable classification for such cool white dwarfs. Instead, the Random Forest algorithm presented here relies on previously spectral type labelled white dwarfs covering all possible values of effective temperatures. Thus, we aim to obtain as much spectroscopic information as possible from the Gaia spectral coefficients. This includes not only a classification of the white dwarfs into their primary spectral types, but also attempting to classify them into different subcategories.

In Sect. 2, we explain the methodology applied. In Sect. 3, the validation tests performed on a subset of white dwarfs with spectral data assigned and their results are detailed. In Sect. 4, we apply the algorithm to the classification of white dwarfs in a 100-pc radius around the Sun. In Sect. 5, we identify the most relevant spectral coefficients used in the classification process. The performance of our Random Forest algorithm is compared in Sect. 7 to other recent classification methods. Finally, in Sect. 8, we present our conclusions.

2 The method: Random Forest classification of Gaia spectral coefficients

The Random Forest (Breiman 2001) is a widely used machine learning algorithm. From a set of labelled data, which is used to train the algorithm, an ensemble of decision trees, which is called a Random Forest, is created. Once this ensemble has been obtained, it can be used to classify new data in the given categories.

This algorithm has been widely used for the classification of stellar objects (see, for instance, Li et al. 2019; Plewa 2018; or Dubath et al. 2011) and, in particular, to the study of the white dwarf population: some examples already show the feasibility of using Random Forest for the identification of different Galactic white dwarf populations using Gaia data as input parameters (Torres et al. 2019), or distinguishing between spectra of isolated white dwarfs, main sequence objects and white dwarf-main sequence binaries (Echeverry et al. 2022). Moreover, a Random Forest algorithm has also been used for the selection of white dwarfs in the Gaia sample (Gaia Collaboration 2021). Besides, a first attempt to classify white dwarfs into DAs and non-DAs using the spectral coefficients was performed by Montegriffo et al. (2023).

Here, following the line of the previous works, we aimed to apply the Random Forest algorithm for the spectral classification of the Gaia white dwarf population in a 100-pc radius around the Sun. In particular, our effort is employed in classifying the different sub-populations of the non-DA sample identified by Jiménez-Esteban et al. 2023 as well as to extend the classification to cool white dwarfs (≲5500 K) that the previous work did not consider due to the lack of accurate atmospheric models.

The Gaia spectra have low resolution (λ/Δλ ≈ 100) and cover the 3300–10 500 Å wavelength range (3300–6800 Å by the Blue Photometer (BP) and 6400–10 500 Å by the Red Photometer (RP); Carrasco et al. 2021). One particularity of the Gaia spectra is that they are not provided as a typical series of flux values for certain wavelengths but rather as a set of 55 coefficients for each of the BP and RP spectrographs (i.e. 110 coefficients in total). These coefficients refer to the Hermite functions that act as the basis for the spectral representation (Carrasco et al. 2021). The spectra are internally calibrated in a pseudo-pixel scale, and they can also be transformed to an external calibration (i.e. flux versus wavelength representation) by using the specifically designed Python package GaiaXPy¹.

As input data for the Random Forest algorithm, we use the 110 Hermite coefficients. It was demonstrated in Montegriffo et al. (2023) that the use of the coefficients provides better performance of the classification algorithm than when other input passbands were used. Moreover, the coefficient procedure can be considered totally appropriate, as the different white dwarf spectral types are defined by their specific spectral features, and all this information is contained in the coefficients (see, for instance, Weiler et al. 2023 for a mathematical description applied to hydrogen lines). As a consequence, no external calibration was applied, since this process may introduce what is known as ‘wiggles’, or oscillatory behaviour. In our data, this effect would be more prominent at both ends of the externally calibrated spectra. These ‘wiggles’ are produced by the mathematical process used to obtain the spectra (De Angeli et al. 2023).

3 Training and validating the algorithm: The Montreal White Dwarf Database

The Montreal White Dwarf Database is a virtual² database containing astrometric, photometric, and spectroscopic data including a spectral type classification for tens of thousands of white dwarfs (Dufour et al. 2017). At the beginning of 2023, a total of 41 570 white dwarfs are classified into their different spectral types, 2905 of them within 100 pc from the Sun. In this work, the MWDD spectroscopic white dwarf classification within 100 pc is used as the input labelled sample for the training and validation tests of our Random Forest algorithm.

For the cross-validation, we adopt the stratified k-fold method. It consists in dividing the whole set into k folds, where k is a variable number (in this work we chose k = 10). Each fold has approximately the same number of objects, and the category ratio (the proportion of objects assigned to different spectral types) is kept as close as possible to the original category set ratio. For each fold, a Random Forest is trained with all nine remainder folds, and tested on it. The advantages over the random training-test split consists in avoiding the randomness of the subset divisions, the constancy of the subset proportions and the fact that the whole set is used for both training and testing.

As it is well established, the Random Forest performance tends to be optimal for balanced data sets (e.g. Breiman 2001). Consequently, the validation strategy followed consisted of keeping the classification samples as close as possible to a balanced sample. Thus, our first validation test consisted in classifying white dwarfs into DA and non-DA spectral types. The second one, focused on those labelled as non-DA, and classified them into DB, DC, DQ, and DZ types. Finally, the third one consisted of classifying the white dwarfs of a specific type into its different subtypes. For instance, DA white dwarfs are divided into DA, DAB, DAH, and DAZ, and similarly for the other spectral types.

In order to create the random forests and obtain the confusion matrices and classification metrics, the Python package scikit-learn (Pedregosa et al. 2011) was used.

3.1 First validation test: classifying the Gaia population with a MWDD type into DA and non-DA types

In the first place, we classify the whole sample of white dwarfs with labels in the MWDD within 100 pc into DA and non-DA types (2905 objects; 1993 as DAs and 912 as non-DAs). Although the ratio of DA and non-DA, 68.6% and 31.4%, respectively, is not strictly balanced, the proportion of the two groups is large enough to ensure an optimal performance of the Random Forest algorithm and to avoid extreme imbalance effects.

The resulting confusion matrix of our Random Forest model is shown in the top panel of Fig. 1. True labels (rows) correspond to the MWDD classification, while the labels predicted by our algorithm are shown as columns. The total number of objects are indicated in each element of the matrix, and in the line below, we indicate the percentage of objects relative to a group. An ideal classification case would correspond to a diagonal matrix. The hyperparameters used for this validation test are shown in the centre column of Table 1 and the resulting metrics are collected in Table 2. For a description of the metrics, we refer the reader to Appendix A in Echeverry et al. (2022).

The analysis of the results indicates that the performance of the Random Forest algorithm presents an excellent recall³ for DA white dwarfs (95%), and a very good recall for non-DAs (79%). A global accuracy of 0.90 is achieved. Similar values are obtained in Jiménez-Esteban et al. (2023) and Torres et al. (2023).

Fig. 1

Confusion matrices for our validation tests: DA vs non-DA (top panel), non-DA types (middle panel), and DB subtypes (bottom panel). As true label (rows) we adopted the MWDD classification, while the predicted label (columns) is the one resulting from our Random Forest algorithm.

Table 1

Hyperparameters and optimal values adopted in the first two validation tests.

3.2 Second validation test: Classifying the Gaia non DA white dwarf population with a MWDD type into their subtypes

In our second validation test, we classify the non-DA white dwarfs (912 objects in total) into their spectral types (DB, DC, DQ, and DZ). The resulting confusion matrix is shown in the middle panel of Fig. 1. The hyperparameters used and the metrics obtained can be found in Tables 1 and 2, respectively. The results obtained reveal a very good performance with an accuracy of 0.81. In particular, the algorithm presents a very good recall for DB white dwarfs (82%), an excellent recall for DCs (98%), and low recalls for both DQs and DZs (<50%). Two main reasons can be identified to account for these facts. First, the low recall may be caused by the low resolution spectra inherent to Gaia. That is, not very prominent spectral lines might be unnoticed in the Gaia low resolution spectra, which would result in the algorithm treating them as featureless, continuous spectra characteristic of DC white dwarfs. Second, DQ and DZ classes represent 12.8% and 13.7%, respectively, of the non-DA population used for training. Thus, imbalanced effects, worsening the performance of the algorithm, are likely to start to manifest.

However, it must be noted that, in spite of the low recall for DQ and DZ white dwarfs, their precision⁴ (as well as for DB white dwarfs) is excellent, that is 92% for the three types. False positives are almost absent. This implies that, while the algorithm does not find all DZ and DQ white dwarfs, the probability of a white dwarf belonging to the type it has been classified into is very high. This makes our algorithm highly useful for efficiently identifying white dwarfs of these spectral types within an unclassified population.

Finally, as a verification exercise, we have attempted to classify the entire sample into their primary subtypes; DA, DB, DC, DQ, and DZ. An excellent recall is achieved for DAs (97%), DBs show a very good recall (80%); and DQ and DZ recalls are certainly improvable (25% and 34%, respectively). Once more, DBs, DQs, and DZs show an excellent precision (89% for DBs and 91% for DQs and DZs). However, the scoring values are lower than the values obtained in the first two validation tests. The conclusion we extract from this result is that better results are obtained when the workflow includes a first DA/non-DA classification and a second, specific non-DA classification.

3.3 Third validation test: Classifiying the Gaia white dwarf population with a MWDD type into their secondary types

So far, we have demonstrated that the Random Forest algorithm based on the coefficients of Gaia spectra is a feasible tool for classifying white dwarfs into their primary spectral types. Now, we explore the possibility to classify the secondary types. Several factors prevent us from being optimistic about such performance. First, Gaia low resolution spectra appear to have limited capability to discern detailed features, such as distorted Balmer lines in DAH white dwarfs or weak lines in other atmospheres. Second, subtype classes represent in many cases clearly imbalanced samples with respect to the predominant subtype, thus worsening the performance of the algorithm.

Nevertheless, even if we expect a very low recall in the classification of the majority of spectral subtypes, if a high precision is achieved, it would imply the identification of valuable objects. Consequently, the four considered spectral types (DA, DB, DZ, and DQ) were divided into different subtypes and analysed separately. In the following sections we provide the technical details and a summary is given in Sect. 3.3.5.

3.3.1 DA subtype classification

The DA type was divided into pure DA, DAB, DAH, and DAZ. The classification test reveals the disappointing, but not unexpected, result that only one DAH white dwarf is correctly classified as such (1% recall). This shows that, in almost all cases, the fine magnetic splitting of the spectral lines produced by the intense magnetic field is too small to be discerned in the low resolution Gaia spectra. Additionally, one DA is missclassified as a DAH, resulting in a precision of 50% for DAHs.

The rest of the subtypes are not recognised in the classification and are all missclassified as DAs. Subtle lines and the spectral resolution, as well as their low number compared to the initial sample, would explain these results.

3.3.2 DB subtype classification

In this test, the DB type was divided into pure DB, DBA, DBAH, DBQA, DBZ, and DBZA. Except for DBAs, which comprise 47% of the DB sample, the other subtypes are only a residual part. Imbalanced effects are therefore expected.

The confusion matrix (bottom panel of Fig. 1) shows that, except for the DBA type, all subtypes are misclassified. DBAH, DBZ, and DBQA subtypes are incorrectly classified as DBA; and DBZA are missclassified as 33% DB and 67% DBA. In the DB/DBA classification, 50% of the DB and 74% of DBAs are correctly classified.

3.3.3 DQ subtype classification

The results of the Random Forest applied to the DQ subtype reveals that the algorithm is only able to distinguish DQpec white dwarfs, and even then only two out of seventeen (11% recall). The other subtypes (DQA, DQZ, and DQZA), which comprise at most two objects each, are indistinguishable from DQs.

It is also worth noting, however, that the precision is also perfect for the DQpec subtype (100%), with no false positives from other subtypes. This implies that the few DQpec stars the algorithm may find have a very high probability of belonging to this subtype.

Table 2

Classification metrics for the first validation test in which we classify white dwarfs of the MWDD into DA and non-DA classes, and the second validation test in which non-DAs are classified into DB, DC, DQ, and DZ.

3.3.4 DZ subtype classification

Regarding DZs, the Random Forest algorithm is not able to distinguish DZH subtype from DZ white dwarfs. On the otherhand, a single DZA (8% recall) is properly classified. However, three DZs are mislabelled as DZA, negatively impacting the precision for this subtype (25%).

3.3.5 Spectral subtype classification summary

From the analysis of the results of the Random Forest algorithm applied to the different spectral subtypes, we can conclude that the algorithm is mostly unable to classify secondary spectral subtypes, whether due to the imbalance of the dataset or the inherent low resolution of the Gaia spectra that prevents their spectral lines from being recognised by the algorithm.

A possible exception is the subtype DQpec, which, although shows a low recall, also presents a perfect precision. Its situation among DQ subtypes is similar to the situation of DQs among non-DA white dwarfs: low recall, but very high precision that might allow us to find candidates with a very high probability of actually belonging to the group is has been classified into.

Furthermore, although the DB/DBA classification may seem possible due to the good recovery of DBA white dwarfs, we must take this result cautiously. While the recall is reasonably good for DBAs (74%), it is just 50% for DBs. Additionally, the precision is only slightly superior to 50% for both subtypes. Consequently, we cannot assume that a white dwarf identified as a DB or DBA has a high probability of really being one.

4 Classifying the Gaia 100-pc white dwarf population

Once our Random Forest algorithm has been tested and validated, it can be applied to the unclassified Gaia 100-pc white dwarf population. A subgroup of these white dwarfs, namely those with BR – RP < 0.86 (equivalent to white dwarfs hotter than ≳5500K), has already been classified into DAs and non-DAs by Jiménez-Esteban et al. (2023). To that end, synthetic photometry of all white dwarfs was generated using their spectra and the J-PAS (Benitez et al. 2014) filter system (Marín-Franch et al. 2012). These spectra were fitted using a collection of DA and DB atmospheric models, and a probability for each them belonging to the DA type was computed from the χ² arising from the best fits. In this exercise, Jiménez-Esteban et al. (2023) adopted two approaches: model fits using all Gaia spectral coefficients and model fits using the truncated coefficients. The former case, defined as the VOSA-GJP estimator, provided better results, with an overall accuracy of 91%. This value is slightly higher than the accuracy we have obtained here using our Random Forest algorithm (90%). Although both classification performances can be considered practically equivalent, we hereafter adopt the DA and non-DA VOSA-GJP classification of Jiménez-Esteban et al. (2023) for all white dwarfs with BR – RP < 0.86.

Thus, in this section we first apply our Random Forest model to those white dwarfs classified into DA and non-DA by Jiménez-Esteban et al. (2023) with the aim to obtain their spectral subtypes. Then, we expand the classification to those unclassified objects with colour BR – RP > 0.86, i.e. the cooler white dwarfs that we failed to identify in Jiménez-Esteban et al. (2023) due to the lack of accurate atmospheric models.

4.1 White dwarfs identified by VOSA-GJP

In this section, we analyse the objects classified in Jiménez-Esteban et al. (2023) with the aim of obtaining their spectral sub-types. In that classification, as mentioned before, white dwarfs were assigned a probability, P_DA, of being DAs. Those with P_DA > 0.5 were classified as DAs, while those with P_DA < 0.5 were classified as non-DAs. A total of 5823 white dwarfs with BR – RP > 0.86 are considered in this section; 4157 of them classified as DAs and 1666 classified as non-DAs.

4.1.1 DA white dwarfs identified by VOSA-GJP

Despite the poor performance of the algorithm in classifying secondary spectral types found in Sect. 3.3.1, we attempted to find possible DAH or DAZ candidates among the group of 4157 white dwarfs classified as DA in Jiménez-Esteban et al. (2023).

The result reveals that only two DAH candidates are found. Nonetheless, despite the very low number of DAHs, we consider it as a success for our algorithm, specially since the effect of magnetic fields in spectral lines is fine magnetic splitting, which is not easily noticeable in low resolution spectra. In Fig. 2, we show the location of these two DAH candidates in the Gaia Hertzsprung-Russell (HR) diagram.

Fig. 2

HR diagram of the classified Gaia DA 100-pc white dwarf population in Jiménez-Esteban et al. (2023). Two DAH (blue dots) are identified by our algorithm.

Fig. 3

HR diagram of the classified Gaia non-DA white dwarf population within 100 pc in Jiménez-Esteban et al. (2023), divided into their different subtypes found in this work.

4.1.2 Non-DA white dwarfs identified by VOSA-GJP

We analyse now the white dwarfs that have been classified as non-DAs in Jiménez-Esteban et al. (2023) via adopting the VOSA-GJP estimator. In order to train the set, we once more resort to the MWDD. From the whole set, we derive a subset that mimics the conditions of the objects that will be classified (i.e. non-DA white dwarfs with BR – RP < 0.86). This left us with only 509 objects in the training set for the Random Forest algorithm, contrasting with 912 non-DA white dwarfs classified in the whole training set.

The classifying algorithm is then applied to the remaining 1666 objects in the test subset. The classification yields the following results: 76 objects are identified as DBs, 1429 as DCs, 40 as DQs, and 121 as DZs. The corresponding HR diagram of these classified objects is shown in Fig. 3.

The HR diagram not only serves to illustrate the composition of the classified population, but it also allows us to check for consistency with expected white dwarf characteristics. For instance, no DB white dwarfs should be found below a certain temperature (≈10 000 K). In Fig. 3, DBs appear restricted to the top left, hotter region of the white dwarf sequence, while some DQ white dwarfs appear in the Q branch. All these factors reinforce the idea that our classification is essentially correct, and no spectral types appear outside of their expected locations.

Furthermore, as seen in our third validation test (see Sect. 3.3), the Random Forest algorithm is able to identify (although with low recall but with high precision) secondary subtypes of DBs and DQs. As we do not expect to find more DBs in the cooler region that remains to be analysed, we apply our classification algorithm to the set of 76 DB white dwarfs identified so far. The Random Forest identified 16 pure DB and 60 DBA objects. No other secondary subtypes (DBAH, DBAZ, DBQA, DBZ, DBZA, and DBe) were identified. In Fig. 4, we depict the HR diagram location for the identified pure DBs and DBAs. We can check that no pure DBs are found with colours redder than BP – RP ≳ −0.1 (i.e. effective temperatures cooler than ≈ 12 000 K).

The sample of identified DQs is analysed into its secondary types in Sect. 4.3.

Fig. 4

HR diagram of the classified pure DB and DBA white dwarfs found in this work.

4.2 White dwarfs not identified by VOSA-GJP

Once those white dwarfs classified in Jiménez-Esteban et al. (2023) as DA and non-DA have been further classified in their different subtypes, it was decided to explore the cold region of the HR-diagram, i.e. BR – RP > 0.86, which had not been analysed in the aforementioned work.

As described in Sect. 3, our strategy consists in first classifying the cold white dwarf sample into DAs and non-DAs; then, the non-DAs are classified into DBs, DCs, DQs, and DZs. Finally, we look for possible secondary spectral type candidates (although this last step will probably be impracticable, as the spectra in this region have a very low signal-to-noise ratio).

4.2.1 DA vs. non-DA classification

The number of white dwarfs present in the 100 pc sample from Jiménez-Esteban et al. (2023) and with colours BR – RP > 0.86 (that is, with no VOSA-GJP classification) is 3623 objects. To that sample we apply our Random Forest algorithm, once trained with those objects labelled in the MWDD (2905 objects). It is worth saying that the MWDD sample contains 192 DA and 293 non-DA white dwarfs with colours BR – RP > 0.86 and within 100 pc. This set of white dwarfs guarantees the reliability of our method, as there are enough labelled objects to train the algorithm in the HR region of interest.

The results of applying our Random Forest to the cold sample of Jiménez-Esteban et al. (2023) can be seen in the HR diagram presented in Fig. 5. The vast majority of white dwarfs (3041; 84%) are classified as non-DAs (blue dots), while DAs (magenta dots) comprise only a small fraction (582; 16%) and none of them has BR – RP > 1.25.

This classification is consistent with the expected behaviour at temperatures lower than ≃5000 K, since at this range the hydrogen in the white dwarf atmosphere remains mostly in the ground state. Thus, Banner spectral lines would become too weak (or they simply disappear) to be detected in Gaia low resolution spectra and, consequently, the object would be classified as a featureless DC.

Fig. 5

HR diagram of the classified Gaia DA and non-DA 100-pc white dwarf population with colour BP – RP > 0.86.

4.2.2 DA secondary type classification

As we have seen in Sect. 4.1.1, our Random Forest algorithm was able to find two DAHs. Thus, we applied the algorithm to the classified cold DA white dwarfs. From the 582 objects, none was classified as a DAH or DAZ; all of them were classified as DAs. This result is not entirely unexpected as it was already known that the Gaia resolution was, in almost all cases, insufficient for this purpose.

4.2.3 Non-DAs subtype classification

Once the identification between DAs and non-DAs has been completed, the 3041 found non-DAs were classified into DC, DQ, and DZ categories. DBs were discarded, as none can be found at these low temperatures. As in Sect. 4.2.1, the whole MWDD classified set was used as the training data. In particular, we used 248 DCs, 19 DQ, and 26 DZ with colours BR – RP > 0.86.

The results shown in Fig. 6 reveal that, as expected, the most prominent group are DCs: 3008 objects representing 98.9% of the sample. Despite the low Gaia resolution, and possibly low signal-to-noise ratio, which impair the algorithm’s ability to correctly identify any spectral feature at this low temperature regime, the Random Forest algorithm was able to identify 22 DQs and 11 DZs. Taking into account that only 19 DQs and 26 DZs white dwarfs with colours BR – RP > 0.86 form the training sample, these newly found objects represent a 115.6% and 42.3% increment, respectively.

4.3 DQ secondary type classification

In our analysis, we have found 62 DQ so far. Of them, 40 with colours bluer than BP – RP = 0.86 (see Sect. 4.1.2) and 22 redder than that value (previous section). As demonstrated in our third validation test (see Sect. 3.3), the Random Forest is capable of identifying certain secondary DQ spectral types, although at the expense of low recall.

We apply our Random Forest algorithm to the 62 DQ-identified white dwarfs, with the aim of classifying them into the secondary spectral types (i.e. DQ, DQA, DQZ, DQZA, and DQpec). The result reveals that objects are thus classified only into two groups: 53 DQ and 9 DQpec white dwarfs. Figure 7 shows the corresponding HR diagram. Those objects classified as DQpec are typically cold, with BP – RP ≳ 0.6, indicative that Swan bands are more easily distorted at low temperatures (e.g. Blouin & Dufour 2019).

Fig. 6

As Fig. 5, but showing the classification of non-DAs into their different spectral subtypes.

Fig. 7

HR diagram of the classified DQ and DQpec white dwarfs found in this work.

5 Feature importance

As an ensemble learning method, the Random Forest algorithm constructs multiple decision trees combining their predictions to achieve the more accurate and stable result. In this construction, some features (variables or parameters of the sample) play a more remarkable role than others. Even more, one can remove some features without significantly altering the result. In our case, the features are the 110 Gaia spectral coefficients. We aim to analyse which of them have the highest importance for each classification. The method used to compute the feature importance was the mean decrease in impurity (MDI), which is based on the decrease of node impurity averaged over the whole Random Forest. This can be understood as follows. When a decision tree is generated, decision nodes are created. Node impurity is a measurement of the amount of classes in a certain decision node. They are said to be pure if they only comprise one class. Therefore, the most important features in our analysis are the ones that reduce the node impurity the most across the forest. These will, of course, be dependent on the set that is being classified. For instance, the coefficients that rule the Balmer lines are capital in a DA non-DA classification, but of no importance in a non-DA classification.

In Fig. 8, we show the feature importance obtained by the MDI method as a function of the Gaia spectral coefficients for the DA versus non-DA (top panel) and the non-DA subtype (bottom panel) classification. Regarding the DA versus non-DA classification, the most important coefficients are approximately the 15 first red coefficients and the 20 first blue coefficients. Moreover, if we consider a 0.8% threshold (marked as black line in Fig. 8), we can eliminate most of the low-significant spectral coefficients, representing the remaining 73.6% of the information.

This result implies that a greater importance is placed in the BP information; indeed, all Balmer lines except Hα, He I lines, Swan bands and most metallic lines fall in the BP wavelength range. The most important feature, however, corresponds to the RP. We identify it with the Balmer Hα line. Since DAs show H features, it is predictable that the algorithm considers this spectral line as the most important to distinguish between DAs and non-DAs.

With respect to the non-DA classification into its spectral subtypes, the feature importance distribution (bottom panel) reveals that blue coefficients are the most relevant. Applying the same 0.8% threshold that in the previous case, approximately the first 30 coefficients contain 52% of the information. As most of the type-characteristic spectral lines (for instance, most He I lines, the Swan bands, or Ca II lines) appear in the wavelength range covered by the BP, rather than in the RP range, this result is both expected and consistent with our previous knowledge.

Fig. 8

Feature importance as a function of the Gaia spectral coefficients for DA vs non-DA classification (top panel) and the non-DA classification into the different spectral subtypes (bottom panel). An importance threshold of 0.8% is represented by a black horizontal line.

6 The Gaia 100-pc sample classification summary

In this section, we present a summary of our white dwarf spectral classification. From the 9446 classified white dwarfs, 4737 have been classified as DA, 2 as D AH, 76 as DB, 4437 as DC, 62 as DQ, and 132 as DZ. The original, labelled MWDD sample used as training comprises 2905 objects; 1845 DA, 90 DAH, 97 DB, 573 DC, 117 DQ, and 125 DZ.

Consequently, the number of classified objects within 100 pc from the Sun has been increased by 257% for DAs, 2.2% for DAHs, 78.4% for DBs, 774% for DCs, 53% for DQs, and 105.6% for DZs.

Figure 9 shows the Gaia HR diagrams for the white dwarfs classified in this work as DA and its secondary types (left panels), while in Fig. 10 are represented the corresponding subtypes for objects classified as DB, DC, DQ, and DZ by our algorithm (left panels). For completeness, we also show the HR diagrams including those white dwarfs previously classified in the MWDD (right panels).

Additionally, all the objects studied here are collected in a list, where we provide their corresponding spectral classification among other Gaia parameters. A representative excerpt of this catalogue is presented in Table 3. The whole catalogue can be found in electronic form at the CDS. Moreover, for illustrative purposes, in Appendix A we show some examples of Gaia spectra corresponding to white dwarfs of different spectral types classified by our algorithm. These spectra are compared to the Gaia spectra of white dwarfs labelled in the MWDD.

7 Comparison to other automatic classification methods

To assess the quality of the performance of our Random Forest algorithm, in this section, we compare it with other similar automated classification methods described in the literature. In particular, we analyse the results obtained by Vincent et al. (2023) in their white dwarf spectral classification using neural networks.

Although the methodology is not the same, and neither the classification sample, the training sample, nor the input data are identical, we can establish a certain comparative analysis of the results. For instance, our work is focused on the Gaia 100 pc white dwarf sample for mainly primary spectral types (DA, DB, DC, DQ, and DZ) classification, while the work by Vincent et al. (2023) consists in a more general approach for white dwarf candidate selection and spectroscopic classification. This includes primary spectral types, and also other subtypes, such as DO, hot DQ, DAH, PG 1159 objects and various types of subdwarfs, as well as white dwarfs plus main sequence binaries. Moreover, the input data used in Vincent et al. (2023) comes mainly from both the Gaia parameter database and Sloan Digital Sky Survey spectra, while in our study, we only focused on the Hermite coefficients from Gaia spectra.

Nevertheless, for comparative purposes, we have constructed a confusion matrix with the spectral classification of the objects that appear in both the present work and Vincent et al. (2023). Both catalogues were cross-matched and 1103 objects were found in both tables. Of them, six were classified in Vincent et al. (2023) as cataclysmic variables and two as hot subdwarf stars; these objects were disregarded in the construction of the confusion matrix. As such, the resulting confusion matrix, shown in Fig. 11, contains 1095 objects.

The obtained confusion matrix is nearly diagonal, which indicates a general good agreement (86% accuracy considering the Vincent et al. (2023) classification as true labels). The only remarkable exception are magnetic DAH white dwarfs. For the 22 white dwarfs classified as DAH in Vincent et al. (2023), 19 of them are classified by our algorithm as DAs and 3 as DCs. In Sect. 3.3.1, we showed that our algorithm is practically unable to distinguish DAs from DAHs. The reason is due to the lack of the necessary Gaia spectral resolution to resolve the fine magnetic splitting. Therefore, this result for DAHs is entirely understandable.

Similarly, but to a lesser extent, the 104 DZ candidates in Vincent et al. (2023) are broadly in agreement with our classification (61 objects, 59%), but our algorithm classifies the discrepancies as DC, DA, or DQs. Once again, the low resolution of Gaia spectra prevents a more accurate classification.

We can conclude that, despite the quality of the input data (which, obviously, the better the quality data, the better the performance), our Random Forest algorithm is a feasible tool with very low computer time consumption and model-independent tuning parameters, allowing a reliable and robust classification of white dwarf spectra

Fig. 9

Gaia HR diagrams showing DA white dwarfs. Left panels: DA white dwarfs classified in this work. Right panels: entire population of DA white dwarfs (i.e. those classified in this work and those labelled in MWDD).

8 Conclusions

By using Artificial Intelligence techniques based on a Random Forest algorithm, we have analysed the information contained in the coefficients of Gaia spectra. Even though these spectra are of low resolution, we have verified their usefulness in classifying the population of white dwarfs into their different spectral types. In particular, we have classified the full 100 pc Gaia white dwarf population into their primary spectral types (i.e. DA, DB, DC, DZ, and DQ), and we have also identified some secondary types (DAH, DBA, and DQpec).

A summary of the main findings is as follows:

The Random Forest algorithm is able to classify DA and DC white dwarfs with excellent recall (>97%), DBs with very good recall (>80%), and DQs and DZs with improvable recall (<50%);
In spite of the low recall, DQ and DZ white dwarfs are classified with an excellent precision (>90%);
While the algorithm performance is certainly improvable for the correct identification of DQ and DZ white dwarfs, its high precision for these spectral types, as well as DB, allows us to use the classifying algorithm as a white dwarf finder for these specific types;
With the possible exception of the DBA and DQpec subtypes, spectral subtypes do not seem to be recognised by the algorithm. Low resolution inherent to Gaia mean spectra seems to be the limiting factor for classification, as non-prominent spectral lines are not expected to be detected in them;
Our algorithm has identified 76 DB (most of them, 60, DBA), 60 DQ (9 of them DQpec), 132 DZ, and 2 DAH candidates in a 100-pc radius around the Sun. For comparison, the MWDD classified sample used in validation tests and as training material contained 117 DQ and 125 DZ white dwarfs.

In conclusion, this initial classification of the entire white dwarf population within 100 pc opens the door to more precise studies of mass distribution and luminosity function, among others, based on the spectral classification of these objects. In parallel, we have initiated a spectroscopic follow-up of a large sample of candidate objects to confirm their classification.

Fig. 10

As Fig. 9 but for the different spectral subtypes of non-DAs.

Table 3

Gaia 100 pc white dwarf sample catalogue classified by our Random Forest algorithm into spectral types.

Fig. 11

Confusion matrix of objects that appear both in our classification and in Vincent et al. (2023).

Acknowledgements

We acknowledge support from MINECO under the PID2020-117252GB-I00 grant and by the AGAUR/Generalitat de Catalunya grant SGR-386/2021. E.M.G.Z. also acknowledges financial support from Banco de Santander, under a Becas Santander Investigación/Ajuts de Formació de Professorat Universitari (2022_FPU-UPC_16) grant.

Appendix A White dwarf Gaia spectra

In this Appendix, we show examples of Gaia white dwarf spectra classified by our Random Forest algorithm. In all of them, the expected location of the Balmer spectral lines are shown, as well as some characteristic spectral lines for every spectral type. These include He I spectral lines for DB white dwarfs, Swan bands for DQs and a selection of metallic lines for DZs, detailed in subsection A.6. For comparative purposes, we accompanied each classified spectrum obtained by our algorithm with one spectrum corresponding to a white dwarf classified with the same type from the MWDD.

All shown spectra have been obtained from Gaia internally calibrated spectra, using the GaiaXPy Python package to transform them into wavelength-flux externally calibrated spectra. Oscillatory behaviour in the spectra, as explained in Section 2 at the blue and red extremes, are characteristic of Gaia externally calibrated spectra, and are caused by the behaviour of the Hermite polynomials used to convey them near the extremes

Appendix A.1 DA spectra

In this section, we show a DA spectrum from a registered white dwarf in the MWDD and a spectrum classified as such in this work in Figure A.1. Balmer lines (Hα, Hβ, Hγ, Hδ, and Hϵ) are marked in both spectra. These lines are very prominent, which explains the excellent recall for DAs (see confusion matrix in the top panel of Figure 1.)

Fig. A.1

Examples of Gaia spectra. Left panel: of a white dwarf classified as DA by MWDD. Right panel: of a white dwarf classified as DA by our algorithm.

Appendix A.2 DAH spectra

Gaia spectra for those objects classified as DAH are shown in Figure A.2. As hydrogen-dominated DAs, Balmer lines are prominent; Zeeman effect magnetic splitting is not noticeable in the objects labelled as such in the MWDD nor in those identified by our Random Forest algorithm. This serves to illustrate the role of Gaia spectral low resolution in explaining the low recall for DAH white dwarfs by our algorithm.

Fig. A.2

As Fig. A.1 but for a DAH classified in MWDD (left panel) and a DAH classified by our algorithm (right panel).

Appendix A.3 DB spectra

The Gaia spectra of a white dwarf classified as DB by the MWDD and by our algorithm are presented in Figure A.3. Neutral helium lines are present and noticeable in the spectra, and the He I lines at 4 471, 5 015, 5 875, and 6 678 Å have been marked.

Fig. A.3

As Fig. A.1 but for a DB classified in MWDD (left panel) and a DB classified by our algorithm (right panel).

Appendix A.4 DC spectra

In Figure A.4 we show the Gaia DC spectra of a MWDD and here classified white dwarf. It can clearly be visualised a characteristic, featureless spectra (except for oscillations caused by Hermite polynomials behaviour). No spectral lines could be found; wavelengths corresponding to H I Balmer lines have been marked nonetheless to stress the featurelessness of these spectra.

Fig. A.4

As Fig. A.1 but for a DC classified in MWDD (left panel) and a DC classified by our algorithm (right panel).

Appendix A.5 DQ spectra

The defining characteristic of DQ white dwarfs is the presence of carbon spectral lines. Most atomic carbon lines are outside of the Gaia BP and RP spectral coverage. However, Swan bands, which are vibrational bands characteristic of diatomic carbon (C₂), are in the visible spectra. These are marked in the shown spectra (Figure A.5). Even though Swan bands comprise a high number of vibrational transitions, for clarity they have been marked at 4600 and 5 050 Å.

Fig. A.5

As Fig. A.1 but for a DQ classified in MWDD (left panel) and a DQ classified by our algorithm (right panel).

Appendix A.6 DZ spectra

Metallic lines are present in DZ spectra. In these example (Figure A.6) we have marked spectral lines of elements that frequently form planetary matter: calcium (Ca II at 3 933, 3 968, 8 542, and 8 662 Å), magnesium (Mg II at 4481 Å), oxygen (O I at 7 772, 7 774, and 7 775 Å), iron (Fe I at 4 578, 5 167, 5 227, and 5 269 Å) and silicon (Si II at 6 347 and 6 371 Å).

Fig. A.6

As Fig. A.1 but for a DZ classified in MWDD (left panel) and a DZ classified by our algorithm (right panel).

References

Althaus, L. G., Córsico, A. H., Isern, J., & García-Berro, E. 2010, A&ARv, 18, 471 [NASA ADS] [CrossRef] [Google Scholar]
Bayo, A., Rodrigo, C., Barrado, Y. Navascués, D., et al. 2008, A&A, 492, 277 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Benitez, N., Dupke, R., Moles, M., et al. 2014, arXiv e-prints [arXiv:1403.5237] [Google Scholar]
Blouin, S., & Dufour, P. 2019, MNRAS, 490, 4166 [NASA ADS] [CrossRef] [Google Scholar]
Blouin, S., Dufour, P., Thibeault, C., & Allard, N. F. 2019, ApJ, 878, 63 [NASA ADS] [CrossRef] [Google Scholar]
Blouin, S., Bédard, A., & Tremblay, P.-E. 2023, MNRAS, 523, 3363 [NASA ADS] [CrossRef] [Google Scholar]
Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]
Camisassa, M., Torres, S., Hollands, M., et al. 2023, A&A, 674, A213 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Carrasco, J. M., Weiler, M., Jordi, C., et al. 2021, A&A, 652, A86 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Cheng, S., Cummings, J. D., & Ménard, B. 2019, ApJ, 886, 100 [Google Scholar]
Cunningham, T., Tremblay, P.-E., Gentile Fusillo, N. P., Hollands, M., & Cukanovaite, E. 2020, MNRAS, 492, 3540 [NASA ADS] [CrossRef] [Google Scholar]
De Angeli, F., Weiler, M., Montegriffo, P., et al. 2023, A&A, 674, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dubath, P., Rimoldini, L., Süveges, M., et al. 2011, MNRAS, 414, 2602 [Google Scholar]
Dufour, P., Blouin, S., Coutu, S., et al. 2017, ASP Conf. Ser., 509, 3 [Google Scholar]
Echeverry, D., Torres, S., Rebassa-Mansergas, A., & Ferrer-Burjachs, A. 2022, A&A, 667, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Farihi, J., Barstow, M. A., Redfield, S., Dufour, P., & Hambly, N. C. 2010, MNRAS, 404, 2123 [NASA ADS] [Google Scholar]
Gaia Collaboration (Smart, R. L., et al.) 2021, A&A, 649, A6 [EDP Sciences] [Google Scholar]
Gaia Collaboration (Vallenari, A., et al.) 2023, A&A, 674, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gentile Fusillo, N. P., Tremblay, P. E., Cukanovaite, E., et al. 2021, MNRAS, 508, 3877 [NASA ADS] [CrossRef] [Google Scholar]
Isern, J., Torres, S., & Rebassa-Mansergas, A. 2022, Front. Astron. Space Sci., 9, 6 [NASA ADS] [CrossRef] [Google Scholar]
Jiménez-Esteban, F. M., Torres, S., Rebassa-Mansergas, A., et al. 2023, MNRAS, 518, 5106 [Google Scholar]
Kilic, M., Bergeron, P., Kosakowski, A., et al. 2020, ApJ, 898, 84 [Google Scholar]
Kong, X., Luo, A.-L., Li, X.-R., et al. 2018, PASP, 130, 084203 [NASA ADS] [CrossRef] [Google Scholar]
Li, X.-R., Lin, Y.-T., & Qiu, K.-B. 2019, Res. Astron. Astrophys., 19, 111 [Google Scholar]
Marín-Franch, A., Chueca, S., Moles, M., et al. 2012, SPIE Conf. Ser., 8450, 84503S [Google Scholar]
McCleery, J., Tremblay, P.-E., Gentile Fusillo, N. P., et al. 2020, MNRAS, 499, 1890 [Google Scholar]
Montegriffo, P., De Angeli, F., Andrae, R., et al. 2023, A&A, 674, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
O’Brien, M. W., Tremblay, P. E., Gentile Fusillo, N. P., et al. 2023, MNRAS, 518, 3055 [Google Scholar]
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
Plewa, P. M. 2018, MNRAS, 476, 3974 [NASA ADS] [CrossRef] [Google Scholar]
Sion, E. M., Greenstein, J. L., Landstreet, J. D., et al. 1983, ApJ, 269, 253 [CrossRef] [Google Scholar]
Torres, S., García-Berro, E., & Isern, J. 1998, ApJ, 508, L71 [NASA ADS] [CrossRef] [Google Scholar]
Torres, S., Cantero, C., Rebassa-Mansergas, A., et al. 2019, MNRAS, 485, 5573 [NASA ADS] [CrossRef] [Google Scholar]
Torres, S., Cruz, P., Murillo-Ojeda, R., et al. 2023, A&A, 677, A159 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Tremblay, P.-E., Fontaine, G., Gentile Fusillo, N. P., et al. 2019, Nature, 565, 202 [CrossRef] [Google Scholar]
Tremblay, P. E., Hollands, M. A., Gentile Fusillo, N. P., et al. 2020, MNRAS, 497, 130 [NASA ADS] [CrossRef] [Google Scholar]
Vincent, O., Bergeron, P., & Dufour, P. 2023, MNRAS, 521, 760 [NASA ADS] [CrossRef] [Google Scholar]
Weiler, M., Carrasco, J. M., Fabricius, C., & Jordi, C. 2023, A&A, 671, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Zuckerman, B., Koester, D., Melis, C., Hansen, B. M., & Jura, M. 2007, ApJ, 671, 872 [Google Scholar]

¹

https://gaia-dpci.github.io/GaiaXPy-website/

²

https://www.montrealwhitedwarfdatabase.org/

³

The recall of sub-class i is defined as $r_{i} = \frac{a_{i i}}{Σ_{i = 1}^{n} a_{i j}}$ ${r_i} = {{{a_{ii}}} \over {\Sigma _{i = 1}^n{a_{ij}}}}$ , where a_ij indicate the number of objects of true class i classified as class j.

⁴

Analogously to the recall, the precision of sub-class i is defined as $p_{i} = \frac{a_{i i}}{Σ_{j = 1}^{n} a_{i j}}$ ${p_i} = {{{a_{ii}}} \over {\Sigma _{j = 1}^n{a_{ij}}}}@$

All Tables

Table 1

Hyperparameters and optimal values adopted in the first two validation tests.

In the text

Table 2

Classification metrics for the first validation test in which we classify white dwarfs of the MWDD into DA and non-DA classes, and the second validation test in which non-DAs are classified into DB, DC, DQ, and DZ.

In the text

Table 3

Gaia 100 pc white dwarf sample catalogue classified by our Random Forest algorithm into spectral types.

In the text

All Figures

	Fig. 1 Confusion matrices for our validation tests: DA vs non-DA (top panel), non-DA types (middle panel), and DB subtypes (bottom panel). As true label (rows) we adopted the MWDD classification, while the predicted label (columns) is the one resulting from our Random Forest algorithm.
In the text

	Fig. 2 HR diagram of the classified Gaia DA 100-pc white dwarf population in Jiménez-Esteban et al. (2023). Two DAH (blue dots) are identified by our algorithm.
In the text

	Fig. 3 HR diagram of the classified Gaia non-DA white dwarf population within 100 pc in Jiménez-Esteban et al. (2023), divided into their different subtypes found in this work.
In the text

	Fig. 4 HR diagram of the classified pure DB and DBA white dwarfs found in this work.
In the text

	Fig. 5 HR diagram of the classified Gaia DA and non-DA 100-pc white dwarf population with colour BP – RP > 0.86.
In the text

	Fig. 6 As Fig. 5, but showing the classification of non-DAs into their different spectral subtypes.
In the text

	Fig. 7 HR diagram of the classified DQ and DQpec white dwarfs found in this work.
In the text

	Fig. 8 Feature importance as a function of the Gaia spectral coefficients for DA vs non-DA classification (top panel) and the non-DA classification into the different spectral subtypes (bottom panel). An importance threshold of 0.8% is represented by a black horizontal line.
In the text

	Fig. 9 Gaia HR diagrams showing DA white dwarfs. Left panels: DA white dwarfs classified in this work. Right panels: entire population of DA white dwarfs (i.e. those classified in this work and those labelled in MWDD).
In the text

	Fig. 10 As Fig. 9 but for the different spectral subtypes of non-DAs.
In the text

	Fig. 11 Confusion matrix of objects that appear both in our classification and in Vincent et al. (2023).
In the text

	Fig. A.1 Examples of Gaia spectra. Left panel: of a white dwarf classified as DA by MWDD. Right panel: of a white dwarf classified as DA by our algorithm.
In the text

	Fig. A.2 As Fig. A.1 but for a DAH classified in MWDD (left panel) and a DAH classified by our algorithm (right panel).
In the text

	Fig. A.3 As Fig. A.1 but for a DB classified in MWDD (left panel) and a DB classified by our algorithm (right panel).
In the text

	Fig. A.4 As Fig. A.1 but for a DC classified in MWDD (left panel) and a DC classified by our algorithm (right panel).
In the text

	Fig. A.5 As Fig. A.1 but for a DQ classified in MWDD (left panel) and a DQ classified by our algorithm (right panel).
In the text

	Fig. A.6 As Fig. A.1 but for a DZ classified in MWDD (left panel) and a DZ classified by our algorithm (right panel).
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Althaus, L. G., Córsico, A. H., Isern, J., & García-Berro, E. 2010, A&ARv, 18, 471 [NASA ADS] [CrossRef] [Google Scholar]

[2] Bayo, A., Rodrigo, C., Barrado, Y. Navascués, D., et al. 2008, A&A, 492, 277 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[3] Benitez, N., Dupke, R., Moles, M., et al. 2014, arXiv e-prints [arXiv:1403.5237] [Google Scholar]

[4] Blouin, S., & Dufour, P. 2019, MNRAS, 490, 4166 [NASA ADS] [CrossRef] [Google Scholar]

[5] Blouin, S., Dufour, P., Thibeault, C., & Allard, N. F. 2019, ApJ, 878, 63 [NASA ADS] [CrossRef] [Google Scholar]

[6] Blouin, S., Bédard, A., & Tremblay, P.-E. 2023, MNRAS, 523, 3363 [NASA ADS] [CrossRef] [Google Scholar]

[7] Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]

[8] Camisassa, M., Torres, S., Hollands, M., et al. 2023, A&A, 674, A213 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[9] Carrasco, J. M., Weiler, M., Jordi, C., et al. 2021, A&A, 652, A86 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[10] Cheng, S., Cummings, J. D., & Ménard, B. 2019, ApJ, 886, 100 [Google Scholar]

[11] Cunningham, T., Tremblay, P.-E., Gentile Fusillo, N. P., Hollands, M., & Cukanovaite, E. 2020, MNRAS, 492, 3540 [NASA ADS] [CrossRef] [Google Scholar]

[12] De Angeli, F., Weiler, M., Montegriffo, P., et al. 2023, A&A, 674, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[13] Dubath, P., Rimoldini, L., Süveges, M., et al. 2011, MNRAS, 414, 2602 [Google Scholar]

[14] Dufour, P., Blouin, S., Coutu, S., et al. 2017, ASP Conf. Ser., 509, 3 [Google Scholar]

[15] Echeverry, D., Torres, S., Rebassa-Mansergas, A., & Ferrer-Burjachs, A. 2022, A&A, 667, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[16] Farihi, J., Barstow, M. A., Redfield, S., Dufour, P., & Hambly, N. C. 2010, MNRAS, 404, 2123 [NASA ADS] [Google Scholar]

[17] Gaia Collaboration (Smart, R. L., et al.) 2021, A&A, 649, A6 [EDP Sciences] [Google Scholar]

[18] Gaia Collaboration (Vallenari, A., et al.) 2023, A&A, 674, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[19] Gentile Fusillo, N. P., Tremblay, P. E., Cukanovaite, E., et al. 2021, MNRAS, 508, 3877 [NASA ADS] [CrossRef] [Google Scholar]

[20] Isern, J., Torres, S., & Rebassa-Mansergas, A. 2022, Front. Astron. Space Sci., 9, 6 [NASA ADS] [CrossRef] [Google Scholar]

[21] Jiménez-Esteban, F. M., Torres, S., Rebassa-Mansergas, A., et al. 2023, MNRAS, 518, 5106 [Google Scholar]

[22] Kilic, M., Bergeron, P., Kosakowski, A., et al. 2020, ApJ, 898, 84 [Google Scholar]

[23] Kong, X., Luo, A.-L., Li, X.-R., et al. 2018, PASP, 130, 084203 [NASA ADS] [CrossRef] [Google Scholar]

[24] Li, X.-R., Lin, Y.-T., & Qiu, K.-B. 2019, Res. Astron. Astrophys., 19, 111 [Google Scholar]

[25] Marín-Franch, A., Chueca, S., Moles, M., et al. 2012, SPIE Conf. Ser., 8450, 84503S [Google Scholar]

[26] McCleery, J., Tremblay, P.-E., Gentile Fusillo, N. P., et al. 2020, MNRAS, 499, 1890 [Google Scholar]

[27] Montegriffo, P., De Angeli, F., Andrae, R., et al. 2023, A&A, 674, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[28] O’Brien, M. W., Tremblay, P. E., Gentile Fusillo, N. P., et al. 2023, MNRAS, 518, 3055 [Google Scholar]

[29] Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]

[30] Plewa, P. M. 2018, MNRAS, 476, 3974 [NASA ADS] [CrossRef] [Google Scholar]

[31] Sion, E. M., Greenstein, J. L., Landstreet, J. D., et al. 1983, ApJ, 269, 253 [CrossRef] [Google Scholar]

[32] Torres, S., García-Berro, E., & Isern, J. 1998, ApJ, 508, L71 [NASA ADS] [CrossRef] [Google Scholar]

[33] Torres, S., Cantero, C., Rebassa-Mansergas, A., et al. 2019, MNRAS, 485, 5573 [NASA ADS] [CrossRef] [Google Scholar]

[34] Torres, S., Cruz, P., Murillo-Ojeda, R., et al. 2023, A&A, 677, A159 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[35] Tremblay, P.-E., Fontaine, G., Gentile Fusillo, N. P., et al. 2019, Nature, 565, 202 [CrossRef] [Google Scholar]

[36] Tremblay, P. E., Hollands, M. A., Gentile Fusillo, N. P., et al. 2020, MNRAS, 497, 130 [NASA ADS] [CrossRef] [Google Scholar]

[37] Vincent, O., Bergeron, P., & Dufour, P. 2023, MNRAS, 521, 760 [NASA ADS] [CrossRef] [Google Scholar]

[38] Weiler, M., Carrasco, J. M., Fabricius, C., & Jordi, C. 2023, A&A, 671, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[39] Zuckerman, B., Koester, D., Melis, C., Hansen, B. M., & Jura, M. 2007, ApJ, 671, 872 [Google Scholar]

White dwarf Random Forest classification through Gaia spectral coefficients★

1 Introduction

2 The method: Random Forest classification of Gaia spectral coefficients

3 Training and validating the algorithm: The Montreal White Dwarf Database

3.1 First validation test: classifying the Gaia population with a MWDD type into DA and non-DA types

3.2 Second validation test: Classifying the Gaia non DA white dwarf population with a MWDD type into their subtypes

3.3 Third validation test: Classifiying the Gaia white dwarf population with a MWDD type into their secondary types

3.3.1 DA subtype classification

3.3.2 DB subtype classification

3.3.3 DQ subtype classification

3.3.4 DZ subtype classification

3.3.5 Spectral subtype classification summary

4 Classifying the Gaia 100-pc white dwarf population

4.1 White dwarfs identified by VOSA-GJP

4.1.1 DA white dwarfs identified by VOSA-GJP

4.1.2 Non-DA white dwarfs identified by VOSA-GJP

4.2 White dwarfs not identified by VOSA-GJP

4.2.1 DA vs. non-DA classification

4.2.2 DA secondary type classification

4.2.3 Non-DAs subtype classification

4.3 DQ secondary type classification

5 Feature importance

6 The Gaia 100-pc sample classification summary

7 Comparison to other automatic classification methods

8 Conclusions

Acknowledgements

Appendix A White dwarf Gaia spectra

Appendix A.1 DA spectra

Appendix A.2 DAH spectra

Appendix A.3 DB spectra

Appendix A.4 DC spectra

Appendix A.5 DQ spectra

Appendix A.6 DZ spectra

References

All Tables

All Figures

White dwarf Random Forest classification through Gaia spectral coefficients^★