A random forest spectral classification of the Gaia 500 pc white dwarf population

Enrique Miguel García-Zamora; Santiago Torres; Alberto Rebassa-Mansergas; Aina Ferrer-Burjachs

doi:10.1051/0004-6361/202554414

Home

All issues

Volume 699 (July 2025)

A&A, 699 (2025) A3

Full HTML

Open Access

Issue		A&A Volume 699, July 2025


Article Number		A3
Number of page(s)		21
Section		Catalogs and data
DOI		https://doi.org/10.1051/0004-6361/202554414
Published online		27 June 2025

A&A, 699, A3 (2025)

A random forest spectral classification of the Gaia 500 pc white dwarf population

Enrique Miguel García-Zamora¹, Santiago Torres¹^,2^★, Alberto Rebassa-Mansergas¹^,2 and Aina Ferrer-Burjachs¹

¹ Departament de Física, Universitat Politècnica de Catalunya, c/Esteve Terrades 5, 08860 Castelldefels, Spain
² Institut d’Estudis Espacials de Catalunya, Esteve Terradas, 1, Edifici RDIT, Campus PMT-UPC, 08860 Castelldefels, Spain

^★ Corresponding author: santiago.torres@upc.edu

Received: 7 March 2025
Accepted: 5 May 2025

Abstract

Context. The third Gaia Data Release (Gaia DR3) has provided the astronomical community with astrometric data on more than 1.8 billion sources, along with low-resolution spectra for 220 million of them. Such a large amount of data is difficult to handle by means of visual inspection. In recent years, artificial intelligence and machine learning algorithms have started to be applied in astronomy for data analysis and automatic classification, with excellent results.

Aims. In this work, we present a spectral analysis of the Gaia white dwarf population up to 500 pc from the Sun based on artificial intelligence algorithms to classify the sample into their main spectral types and subtypes.

Methods. In order to classify the sample, which consists of 78 920 white dwarfs with available Gaia spectra, we have applied a random forest (RF) algorithm to the Gaia spectral coefficients. We used the Montreal White Dwarf Database of previously labeled objects as our training sample. We compared this classified sample with other already published catalogs and with our own higher resolution Gran Telescopio Canarias (GTC) spectra. This allowed us to construct a golden sample of well-classified objects.

Results. The RF spectral classification of the 500 pc white dwarf population achieved an excellent global accuracy of 0.91 and an F1-score of 0.88 for the DA classification (i.e., white dwarfs that show Balmer spectral lines) versus the non-DA classification. In addition, we obtained a very high accuracy of 0.76 and a global F1-score of 0.62 for the non-DA subtype classification. In particular, our classification shows an excellent recall for DAs, as well as DBs and DCs (>90%), along with a very good precision (≥80%) for DQs, DZs, and DOs. Unfortunately, our algorithm does not perform as well with respect to correctly classifying subtypes due to the low resolution of the Gaia spectra.

Conclusions. The use of machine learning techniques, in particular, the RF algorithm, has enabled us to spectrally classify 78 920 white dwarfs – an increase of 543.6% over those previously labeled – with reasonable accuracy. Having an estimate of the spectral type for the vast majority of white dwarfs up to 500 pc provides the possibility of making better estimates of cooling ages, star formation rates, and stellar evolution processes, among other fundamental aspects necessary for studying the white dwarf population.

Key words: catalogs / stars: atmospheres / white dwarfs

© The Authors 2025

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

White dwarfs are the most common stellar remnants in the Galaxy. They are the end products of stars with initial masses of M≲8-10 M_ʘ (see, e.g., Althaus et al. 2010). They are composed of a degenerate core with a typical mass of M≈0.6 M_ʘ, generally surrounded by a thin, partially degenerate atmospheric hydrogen layer. Nuclear reactions in their cores have practically ceased and the energy source in their deep interiors is instead primarily derived from gravothermal energy.

Spectroscopic observations of white dwarfs enable the detection of atomic and molecular spectral lines and bands. This, in turn, makes it possible to create a white dwarf spectral classification, taking into consideration the presence of certain spectral lines (Sion et al. 1983). A first distinction can be established between white dwarfs that show Balmer spectral lines (known as DAs or hydrogen-rich white dwarfs) and those that do not, which are generally grouped under the generic non-DA class. Under this term, we may find DB white dwarfs, which show HeI spectral lines; DOs, which exhibit HeII lines; DQs, with prominent carbon features that are either atomic or molecular in their spectra; DZs, which display metallic spectral lines, such as CaII; or DCs, which display no spectral lines and a featureless continuum.

The previously described types are referred to as the primary spectral type of the white dwarf (see Table 2 in Sion et al. 1983). Secondary spectral features, such as spectral lines arising from different atomic lines than the dominant or the presence of a magnetic field, are frequently denoted by adding the relevant characteristic as a spectral subtype. In this way, for instance, a DA white dwarf with weaker metallic features will be labeled as a DAZ or a magnetic DB white dwarf will be labeled as DBH.

Not only are spectral classifications of capital importance for accurate stellar parameter derivation (Bergeron et al. 2019), (Tremblay et al. 2019a), but they are also essential for understanding the process of spectral evolution in white dwarfs (see Bédard 2024, and references therein). Processes such as convective mixing or convective dilution (e.g Blouin et al. 2019; Cunningham et al. 2020), the presence of carbon in hydrogendeficient atmospheres as a possible explanation of the Gaia color-magnitude bifurcation (Camisassa et al. 2023; Blouin et al. 2023), the high ratio of DQ white dwarfs in the so-named Q branch (Tremblay et al. 2019b), or the origin of accreted material in white dwarfs (e.g., Zuckerman et al. 2007; Farihi et al. 2010), require the correct identification of the spectral type in order to be understood.

The spectroscopic follow-up of white dwarfs, however, is a time-consuming process. A completed survey of white dwarfs within 40-pc was performed by Tremblay et al. (2020); McCleery et al. (2020); O’Brien et al. (2023) and a full 100 pc spectroscopic classification will eventually be achieved thanks to current and forthcoming surveys such as WEAVE (Jin et al. 2024), DESI (Levi et al. 2019), and 4MOST (de Jong et al. 2019). However, it has to be emphasized that by far the fastest way to provide a spectral type for a large number of white dwarfs is to make use of already available spectroscopic samples, such as the one provided by Gaia.

The third Gaia Data Release (Gaia Collaboration 2023) has delivered the astrometric data of 1.8 billion objects to the astronomical community. More importantly with respect to the context of this paper, Gaia has also provided low resolution (30≲R≲100, see Carrasco et al. 2021) spectra for 220 million sources, including more than 100 000 white dwarfs. This is the largest spectroscopic sample to date, however, the vast quantity of data makes its analysis through human visual inspection alone impossible, thereby making artificial intelligence algorithms and machine learning approaches a necessity. These approaches are hardly recent and have proved their reliability for the analysis of large astronomical databases. For instance, we highlight the pioneering use of self-organizing maps for the identification of halo stars in Torres et al. (1998), along with the first and more recent efforts using the RF algorithm to perform white dwarf identifications in Galactic components (Torres et al. 2019) or spectral identification (Echeverry et al. 2022; Montegriffo et al. 2023). In addition, there are recent deep learning techniques used in Kong et al. (2018) or Vincent et al. (2023) for classifying white dwarf types. Additionally, statistical classification methods have been performed, particularly regarding white dwarfs. Namely, the use of the Virtual Observatory Spectral energy distribution Analyzer tool (Bayo et al. 2008) in Jiménez-Esteban et al. (2023) and Torres et al. (2023), whereby the spectral energy distributions (SEDs) of the 100 pc and 500 pc white dwarf samples respectively, were fitted automatedly to different model atmospheres. In this way, white dwarfs were classified into DA and non-DA with an accuracy over 90%.

In this work, we applied a RF algorithm to classify the 500 pc Gaia white dwarf sample into their spectral types and subtypes, as well as an analysis dedicated to detect possible binary objects such as white dwarf-main sequence binaries or unresolved double-degenerates. This paper offers an updates of a recent study we performed in García-Zamora et al. (2023), where we used the same technique to classify the 100 pc white dwarf sample. This study also complements the work by Vincent et al. (2024), who also classified white dwarfs in a 500 pc radius using gradient boosting classifiers. The 500 pc distance limit imposed is capital to our research, since it allows us to obtain a nearly-complete classification of all known white dwarfs with available Gaia spectra. This in turn allows us to derive more accurate percentages of the different spectral types.

The 500 pc white dwarf sample used in this work was selected following the same conditions established in Torres et al. (2023), except for the −0.5 ≤ G_BP – G_RP ≤ 0.86 color restriction. Unlike Torres et al. (2023), we did not use atmospheric models for our spectral classification. Additionally, we selected only objects lying below the 0.45 M_ʘ white dwarf cooling track, further reducing potential contamination by nonwhite dwarfs to an almost negligible level. Of the 93 439 objects with Gaia spectra, almost all (93 323) also appear in the recent classification by Vincent et al. (2024).

In Sect. 2, we describe the applied methodology in detail. In Sect. 3, the algorithm validation tests performed on the classified white dwarf sample, both for the spectral and binarity classification, are shown. The classification of the unclassified sample is presented in Sect. 4. The results of this work are analyzed in Sect. 5 and compared in Sect. 6 with those obtained by various recent automated classifications. Finally, we outline our conclusions in Sect. 7.

2 Method: Random Forest classification of Gaia spectral coefficients

Random Forest (RF, Breiman 2001) is a widely used machine learning algorithm. A set of labeled data comprises the training set, which is used to create an ensemble of decision trees that make up the RF. This process is known as training the algorithm and once it has taken place, new data can be classified into the training dataset categories.

As we are using the same methodology described in García-Zamora et al. (2023), we only outline the most important details in this section. For a more in-depth description, we refer to García-Zamora et al. (2023) and Torres et al. (2019).

The Gaia spectra cover the wavelength range 3300-10 500 Å (the Blue Photometer covers the 3300-6800 Å wavelength range; while the Red Photometer covers the 6400-10 500 Å range; see Carrasco et al. 2021) at a resolution of λ/Δλ ≈ 100. Additionally, they are not provided in the classical flux versus wavelength representation; but, rather, as the coefficients of a linear combination of base functions (more concretely, Hermite functions that act as the basis for spectral representation; see Carrasco et al. 2021) internally calibrated in a pseudo-pixel scale. Each spectrum, both BP and BP, is provided through 55 coefficients, resulting in a total of 110 coefficients for the mean XP spectrum.

Following the same approach as García-Zamora et al. (2023) and Vincent et al. (2024), these 110 coefficients were used as input data for our algorithm. The rationale behind this decision is that the use of the Gaia spectral coefficients provides better performance than other inputs (Montegriffo et al. 2023). This treatment is appropriate, as all the relevant spectral information (e.g., the continuum shape or the spectral lines that define the different spectral types) is contained within the 110 coefficients (see, e.g., Weiler et al. 2023 for a mathematical description of the spectral coefficients applied to hydrogen lines). No external calibration to wavelength-flux form was applied, as one consequence of the external calibration is the introduction of oscillatory behaviour in the spectra, known as “wiggles,” which are introduced by the mathematical process used for external calibration (De Angeli et al. 2023). While these wiggles affect the whole wavelength range, they are more prominent in its extremes.

Furthermore, our analysis relies on a sample of externally classified white dwarfs (see Sect. 3), the temperature range of which spans from the hotter objects to the cooler end. This poses an advantage over the classification approaches undertaken in Jiménez-Esteban et al. (2023) and Torres et al. (2023), since they rely on fitting SEDs to white dwarf theoretical atmosphere models. These models are known to harbor substantial uncertainties at temperatures below T ≈ 5500 K; therefore, the obtained classification might be inaccurate for the colder objects. On the contrary, our approach forgoes this model fitting, making it possible to obtain a more accurate classifications for colder objects. To create the RFs and obtain the confusion matrices and classification metrics, as well as creating all unclassified white dwarf classifications, the Python package scikit-learn (Pedregosa et al. 2011) was used.

3 Algorithm training and validation

Before the sample classification can be performed, the RF algorithm must first be trained and then validated using already classified white dwarfs. For this purpose, we resorted to the Montreal White Dwarf Database¹ (MWDD), which contains astrometric and photometric data, as well as spectral and binarity classification, for tens of thousands of white dwarfs sampling the entire effective temperature range (Dufour et al. 2017).

The white dwarfs in the MWDD with an assigned spectral type and a binary classification in a 500 pc radius around the Sun were collected. From these, those possessing a Gaia spectra formed the training set for the spectral classification and subclassification validation tests.

For the validation tests, we applied the cross-validation method called Stratified k-Fold. It consists of dividing the whole training set into k subsets, all of which keep a very similar category ratio (i.e. the proportion of objects belonging to each class in the sample, as close as possible to the whole set category ratio). In this work, following the method outlined in García-Zamora et al. (2023), the value k = 10 was chosen and as many iterations as subsets exist were completed. In each iteration, k – 1 folds were used for training the algorithm, while the remaining fold was used for testing. The chosen test fold was different for each iteration. Thus, once the cross-validation was completed, the whole sample has been used for training and testing the algorithm.

Two different sets of validation tests were carried out. The first set validated the spectral type classification algorithm; the second set tested the binarity classification algorithm. In all cases, the 110 Gaia spectral coefficients, as well as the spectral and binarity information contained in the MWDD, were used as input data for our algorithm.

3.1 Spectral classification validation tests

In the MWDD, 51 319 objects have been classified into their spectral types, with 34 250 of them within a500 pc radius around the Sun. From this subset, we derive a training set comprised of 14 519 white dwarfs with Gaia spectra (10 916 DAs, 1189 DBs, 1596 DCs, 26 DOs, 372 DQs, and 420 DZs). As input data for the validation tests, their spectral labels and their 110 spectral coefficients are used in order to train the algorithm.

Following the pipeline described for validation tests in García-Zamora et al. (2023), three different spectral type classification tests were performed. The first test classified the whole sample as DA or non-DA; the second test classified non-DAs into their spectral types (DB, DC, DO, DQ, and DZ). Finally, the third validation test focused on spectral subtype classification for the previously listed spectral types.

3.1.1 First validation test: DA vs non-DA

The 14 519 objects in our training sample were classified into DA (10 916 objects, 75.18%) and non-DAs (3603 objects, 24.82%). The results are shown in the form of a confusion matrix in the upper panel of Fig. 1.

Confusion matrices show a visual representation of the classification model results. The rows contain the true values (i.e., the spectral types assigned to every object in the MWDD) and the columns contain the predicted values (i.e., the spectral classification assigned by our algorithm). For every matrix element, the number of objects as well as its ratio to the number of objects contained in each category in the MWDD are given. A perfect classification would produce a diagonal matrix, with non-zero elements exclusively along the confusion matrix main diagonal. The classification metrics are also shown in Table 1.

The confusion matrix (Fig. 1, top panel) shows an excellent recall for DAs (97%), as well as a very good recall for non DAs (75%). We also took into account the global metrics. For this particular classification, the algorithm achieved an accuracy² of 0.91 and a mean F1-score³ of 0.88. To take into account the class imbalance in the test sample, two other metrics were additionally considered: balanced accuracy⁴ (0.86 value) and G-means⁵ (0.85 value). In all cases, the results indicate that the RF algorithm is able to classify DA and non-DA white dwarfs satisfactorily.

Fig. 1

Confusion matrices for our validation tests: DA vs non-DA (top panel) and non-DA types (bottom panel). As true label (rows) we adopted the MWDD classification, while the predicted label (columns) is the one resulting from our RF algorithm.

Table 1

Validation tests classification metrics.

3.1.2 Non-DAs validation test

For this test, only the 3603 objects classified as non-DA in the MWDD (1189 DBs, 1596 DCs, 26 DOs, 372 DQs, and 420 DZs) were considered. As it can be seen in the corresponding confusion matrix (see Fig. 1, bottom panel), our algorithm has an excellent recall for DBs and DCs (≥90%), as well as a relatively good recall for DOs (~60%). While the recalls for DQs and DZs are improvable (14% and 34%, respectively), it must be stressed that these categories, as well as DOs, also have a very good precision (≥80%). This implies that in addition to false positives being almost nonexistent for these spectral types, our algorithm is highly useful for identifying members of these classes in the population, despite the fact that not all of them are found.

Globally, our algorithm achieves a mean F1-score of 0.62, an accuracy of 0.76, a balanced accuracy of 0.57, and a G-mean score of 0.47. These values show the impact of the low resolution Gaia spectra in our non-DA spectral classification.

3.1.3 Spectral subtype validation test

Once we have demonstrated the capability of our RF algorithm to correctly classify white dwarfs into their primary spectral types, its ability to identify subtypes inside a certain class is tested. Only the DA, DB, DO, DQ, and DZ subtypes were taken into account, since DCs, showing featureless spectra have not been divided into subtypes (by design).

Separate validation tests were conducted, one for each primary spectral type. As the primary type validation sample contains (in some instances) spectral information that is useful for spectral type classification, but not useful for subtype classification (e.g., spectral types being registered as DA/DAZ? or DB/DBAZ?), these elements were taken out of the validation sample. This process left us with a subtype validation sample comprised of 14 319 objects. The excluded objects were not discarded but were moved to the subtype classification sample, keeping their MWDD primary spectral type.

The DA subtype training sample comprises the subtypes DA, DAB, DAH, DAO, and DAZ (10 417, 30, 259, 6, and 107 elements, respectively). No subtypes were correctly identified, while five DAs were misidentified as magnetic DAHs. No other types were correctly classified. Extreme numerical imbalance is believed to severely impact our classification.

For DB subtypes, the pure DB, DBA, DBH and DBZ subtypes (791, 291, 14, and 63 elements, respectively) were considered. Results show that the DBA subtype, despite reaching over 40% precision, achieves only a 5% recall. The DBH and DBZ subtypes, on the other hand, show no cases of correct identification; moreover, one DBH and two DBZs are misclassified as DBAs.

Among the DO subtypes, only DO, DOA, and DOZ (with 15, 3, and 8 elements, respectively), were considered. Although some DOZs are correctly classified (25% recall, albeit with alow precision, only 33%), we opted for a more cautious interpretation of these results. Due to the small size of the whole DO set, which includes only 26 white dwarfs, the model training is far from optimal.

DQs are divided into DQ, DQA, DQH, DQZ and DQpec (346, 11, 2, 7, and 6 elements, respectively). Only DQs have been correctly classified and no other subtypes were found. Numerical imbalance is thought to be at the root of this result.

Finally, DZ subtypes are considered: DZ, DZA, and DZH (360, 50, and 10 elements, respectively). While two DZAs were correctly identified as such (4% recall), three DZs were mislabeled as DZA (40% precision). Therefore, we conclude that our algorithm cannot correctly classify DZ subtypes.

Based on the above analysis, it seems that a reliable subspectral type classification based on Gaia white dwarf spectra is likely unfeasible. This result is not entirely surprising given the low spectral resolution of Gaia. Moreover, due to the fact that the numerical imbalance is now more extreme than in the 100 pc sample (see Sect. 6.1), which favors pure spectral types, the classification is biased towards them.

3.2 Binarity validation test

The last test aimed at discerning the possibility of using the RF algorithm to find white dwarf candidates that are members of binaries, either unresolved white dwarf-main sequence (WDMS) binaries or unresolved double-degenerate (DD) objects. While we do not expect to find many such objects (especially WDMS candidates) since our considered objects lie in the white dwarf locus of the color-magnitude diagram, we nevertheless undertook this classification to fully assess the capabilities of our RF algorithm.

For this test, we added to our training sample the white dwarfs in a 500 pc radius registered as WDMS or DD in the MWDD, as well as the WDMS from the Sloan Digital Sky Survey catalog of Rebassa-Mansergas et al. (2013). The final training set comprises 14 651 objects (14 309 WDs, 20 DDs, and 322 WDMS binaries, respectively). As already mentioned, all objects considered for the training set are inside the colormagnitude diagram white dwarf locus. Objects in the intermediate zone between white dwarfs and main sequence stars were not considered here and they will be analyzed elsewhere in the future.

Out of the 322 WDMS considered, only three are correctly identified (a recall slightly lower than 1%). Moreover, three white dwarfs were misidentified as WDMS objects (50% precision). All DD are classified as single WDs (0% recall), while no white dwarf or WDMS binary were wrongly classified as a DD. The parameters are summarized in Table 2.

While disheartening, these results are not completely unexpected. Due to the location of these objects in the colormagnitude diagram, it is expected that the white dwarf component is dominant in the spectrum if the system is a WDMS. On visual inspection, most of them do indeed show very little contribution from the main sequence, confirming our hypothesis.

Regarding the identification of DDs, the task itself is extremely challenging even when using high-resolution spectra. As a consequence, classification of double white dwarfs is generally based on the detection of radial velocity shifts (Breedt et al. 2017; Napiwotzki et al. 2020), rather than on direct analysis of the possible combined spectra.

We conclude this section confirming the robustness of our RF algorithm in classifying DA and non-DA white dwarfs, its great utility in precisely assigning spectral types to non-DA white dwarfs (despite the fact that not all individual objects are expected to be identified), and the lack of use in differentiating between spectral subclasses and white dwarf binaries in the white dwarf locus.

Table 2

Binarity validation test classification metrics.

4 Classification of the Gaia 500 pc white dwarf population

Once the algorithm has been validated, it was applied to the unclassified 500 pc white dwarf sample, which comprises 78 920 objects. We recall that our analysis is restricted to the white dwarf color-magnitude locus, namely, the region below the 0.45 M white dwarf cooling track (see Sect. 2 of Torres et al. 2023). As outlined in Sect. 3, three different classifications are performed. The first spectroscopically classifies the whole sample into their primary spectral types. These are further classified into their subtypes in the second classification. Finally, the binarity classification algorithm was applied to the sample. Even if the results of both the spectral subtype and binarity validation tests do not give us much reason for optimism, we nevertheless wanted to test the algorithm on the unclassified sample.

4.1 Primary spectral types

Following the procedure described in Sect. 3.1.1, the first step classifies the white dwarfs into the DA and non-DA categories. The sample used for validation tests was adopted in order to train the classification algorithm. Of the 78 920 objects, 64 976 were classified as DA, while the remaining 13 944 were categorized as non-DA. The objects classified as DAs are illustrated in the Gaia color-magnitude diagram in the top panel of Fig. 2. They extend from the hottest region, G_BP – G_RP ≈ -0.5, until G_BP – G_RP ≈ 1.2-1.3. This corresponds to a white dwarf effective temperature of ≈ 5000 K, where almost all the hydrogen atoms are in the ground state and therefore the spectra evolves into that of a featureless DC.

The remainder 13 944 non-DA objects were further classified into the DB, DC, DO, DQ, and DZ primary spectral types, resulting in 4957 as DB, 8496 as DC, 21 as DO, 105 as DQ, and 365 as DZ. These sources are shown in the Gaia color-magnitude diagram in the bottom panel of Fig. 2. The white dwarfs of different spectral types appear in their expected locations across the diagram: DOs appear only at the brightest, hottest part (G_BP – G_RP ≈ -0.5); DBs appear in the hotter region ( G_BP – G_RP ≈ -0.5-0), while DZs and DQs appear spread through a wider range of temperatures.

Fig. 2

Gaia color-magnitude diagram for white dwarfs classified as DA (top panel) and for those classified into various non-DA spectral types (bottom panel) by our algorithm.

4.2 Spectral subtypes

We attempted to carry out a further subclassification, despite the challenges indicated by the validation test of our classification set (described in Sect. 3.1.3). Among the 65 073 DA white dwarfs, only six were identified as DAH, with no DAB, DAO, or DAZ candidates. For the 4985 DB white dwarfs, only 67 were classified as DBA, while no DBH, DBO, DBQ, or DBZ were found. In the DO category, 12 of21 objects remained DO, while nine were classified as DOZ. The small training sample limits its reliability, however. No subtypes were identified within the 105 DQ white dwarfs. Lastly, among 365 DZ white dwarfs, no DZA or DZH candidates were found, which is consistent with the validation results.

Table 3

RF algorithm classification of Gaia 500 pc white dwarfs.

4.3 The Gaia 500 pc sample classification summary

We spectroscopically classified 78 920 objects into their primary spectral types. Of these, 64 976 objects were classified as DA (six of which were subclassified as magnetic DAH), 4957 as DB (67 as DBA), 8496 as DC, 21 as DO (9 as DOZ), 105 as DQ, and 365 as DZ. As a consequence, the number of DA, DB, DC, DO, DQ, and DZ white dwarfs in a 500 pc radius around the Sun has increased in a 595%, 417%, 532%, 80.8%, 28.2%, and 86.9%, respectively, in relation to the training sample. Whilst the primary spectral type classification can be considered as reliable, caution should be taken with the secondary spectral type assigned, mainly due to the low resolution of the Gaia spectra. Additionally, 78 791 objects were classified into single and binary. Of them, only two have been classified as WDMS candidates. All the data have been compiled into a single catalog, with a representative excerpt of which shown in Table 3. The complete table can be accessed at the CDS.

5 Analysis of the classified 500 pc white dwarf population

In this section, we describe our detailed inspection and analysis of the white dwarf population within 500 pc in terms of their spectral type classification.

5.1 Spectral content of the white dwarf population

To begin with, we here provide and compare the percentages of white dwarfs with different spectral types within 100 pc (which Article number, page 6 of 21 was the sample of study in our previous work; García-Zamora et al. 2023) and 500 pc. We do this for the training sample (i.e., the MWDD sample used), the classified sample, and the complete sample. The results are shown in Table 4, and can be visualized in Fig. 3.

The most notable difference between the proportions in both populations is that while DAs comprise barely above half of all objects within a 100 pc radius, and only slightly more than 1% are DBs, these percentages rise to over 80% for DAs and 6% for DBs in the 500 pc sample. This increase is a clear observational bias that can be understood as follows. As we move towards farther, achieving greater distances, the colder and fainter objects are more likely to fall below the observability threshold. This is clearly revealed by the dramatic drop of DC (intrinsically cool) white dwarfs from around 40% at 100 pc to nearly 10% at 500 pc. On the other hand, the observable white dwarfs will be increasingly the most luminous. These will be the hottest objects, which belong preferentially to the DA, DB, and DO spectral types. It is also worth noting that the percentages of DQ and DZ white dwarfs also decrease considerably from 100 to 500 pc, indicating that these objects are generally cool and, hence, they are more affected by observational biases when observed at greater distances.

Table 4

Spectral types percentages.

Table 5

Spectral type percentages according to the Gaia color.

Fig. 3

Top panel: fraction of white dwarfs with different spectral type with respect to the total number of classified objects as a function of distance. Bottom panel: same, but only for DOs, DQs and DZs.

5.2 Spectral types as a function of the G_BP–G_RP color

In this section, we analyze the percentages of white dwarfs with different spectral types of the classified 500 pc sample for five intervals of G_BP – G_RP color: G_BP – G_RP ≤ 0; 0 ≤ G_BP – G_RP ≤ 0.5; 0.5 ≤ G_BP – G_RP ≤ 1; 1 ≤ G_BP – G_RP ≤ 1.5; and 1.5 ≤ G_BP – G_RP. Given that the considered color is a good indicator of effective temperature, this exercise is a proxy of the spectral evolution of white dwarfs as they cool down. The proportion of objects for each interval is given in Table 5 and illustrated in Fig. 4.

As expected, the fraction of DAs is nearly constant and over 80% until G_BP – G_RP ~ 1. At this point, it starts to decrease to just over 11% in the 1 ≤G_BP – G_RP ≤ 1.5 interval, becoming null for 1.5 ≤ G_BP – G_RP. This mirrors the spectral evolution of DA white dwarfs; namely, at G_BP – G_RP ~ 1, the temperatures are too cool for the white dwarfs to display Balmer lines in their spectra and they become DCs. Indeed, the dramatic drop of DAs at this limit is a direct consequence of the large increase of DCs, the fraction of which raises from just barely over 10% to more than 80% for G_BP –G_RP > 1.

The same tendency observed for DAs is also true for DBs, which constitute over 12% of the objects when G_BP – G_RP ≤ 0; just above 4% when 0 ≤ G_BP – G_RP ≤ 0.5. A similar percentage increase is observed among DCs, which offers us clues to interpreting the interval containing the DB-DC transition region. Finally, the few DOs, in turn, appear only at G_BP – G_RP ≤ 0.

The temperature distribution of DZs is also coherent, being over 0.3% in the 0 ≤G_BP – G_RP ≤ 1.5, negligible outside it, and reaching a peak abundance over 1% in the 0.5 ≤G_BP – G_RP ≤ 1 interval. This is consistent with the spectral behavior of metals: at higher temperatures, they enter higher ionization states, with spectral lines located in the ultraviolet region.

Lastly, DQs only show a noticeable fraction in the 0.5 ≤G_BP – G_RP ≤ 1.5 interval (0.45% in the hotter half and 0.56% in the cooler half, respectively). It is also worth noting the continuous increase of DQ white dwarfs for redder G_BP – G_RP colors. This behavior is expected, since these white dwarfs are generally cool objects; furthermore, convection, which dredges carbon up into the white dwarf atmosphere, is more efficient at lower temperatures.

Fig. 4

Top panel: fraction of white dwarfs with different spectral type with respect to the total number of classified objects as a function of G_BP – G_RP color. Bottom panel: same, but only for DOs, DQs and DZs.

5.3 The spectral content of the A, B and Q-branches

One of the unexpected features revealed by the Gaia space mission was the presence of remarkably distinct branches in the color-magnitude diagram of white dwarfs (see Fig. 13, Gaia Collaboration 2018). While the so-called A branch follows the classical white dwarf cooling sequence for hydrogen-pure atmospheres, additional branches, such as the B and Q branches, deviate from it, indicating the influence of other physical processes, such as atmospheric composition changes, crystallization, and distillation effects, among others (see, e.g., Camisassa et al. 2021; Camisassa et al. 2023; Tremblay et al. 2019b). We should recall that the branches are not composed of a single spectral type (i.e., the A-branch also includes non-DAs, and vice versa). Hence, a detailed and thorough characterization of the spectral types of these branches is of crucial importance for constraining theoretical models. The results are summarized in Table 6.

5.3.1 A branch

We define the A branch as limited to the color region between 0.1 ≤ G_BP – G_RP ≤ 0.5 and bounded by the parallel lines M_G = 3.5 · ( G_BP – G_RP) + 11.9 and M_G = 3.5 · ( G_BP – G_RP) + 11.5 (see red box in Fig. 5). In the classification using our RF algorithm, we found that for the 500 pc sample, 8904 out of the 78 920 objects are classified as belonging to the A branch. Of them, 8088 (90.84%) are DAs, 149 (1.67%) are DBs, 626 (7.03)% are DCs, 5 (0.06%) are DQs, and 36 (0.4%) are DZs. Finally, as expected, no DOs appear in the A branch. When these objects classified by our algorithm are added to the already labeled sample from MWDD, we find that 10 599 objects belong to this branch: 9569 DAs (90.28%), 162 DBs (1.53%), 747 DCs (7.05%), 45 DQs (0.42%), and 76 DZs (0.72%).

The results reported here are in perfect agreement with those presented in the literature, which indicate that the A branch is predominantly composed of hydrogen-pure atmosphere white dwarfs (e.g. Jiménez-Esteban et al. 2023). Moreover, the DA versus non-DA proportion, both in the 100 pc and 500 pc samples, remains nearly identical, with the non-DA types constituting only a very small fraction of the total (less than 10%). Among these few non-DAs, the majority are DBs and DCs, consistent with the expected spectral evolution of helium-atmosphere white dwarfs (see Fig. 13 in Bédard 2024).

5.3.2 B branch

Similarly to the A branch, the B branch is defined by the color region between 0.1 ≤ G_BP – G_RP ≤ 0.5, but now limited by the lines M_G = 3.5 · (G_BP – G_RP ) + 11.9 and M_G = 3.5 · (G_BP −G_RP) + 12.2 (see blue box in Fig. 5). We found 4400 objects in the B branch, which were classified by algorithm as follows: 2877 DAs (65.39%), 224 DBs (5.09%), 1171 are DCs (26.61%), 3 DQs (0.07%), and 125 DZs (2.84%). When all classified objects in 500 pc are considered (those of our algorithm and those previously labeled), out of the 5319 objects in the B branch, 3190 (59.97%) are DAs, 229 (4.21%) are DBs, 1470 (27.64%) are DCs, 160 are DQs (3.01%), and 270 (5.08%) are DZs.

It is important to note that in the 100 pc classification, however, the ratio of classified DAs to non-DAs in the B branch is close to the 35%-65% ratio, respectively, reported by Jiménez-Esteban et al. (2023). However, for the 500 pc sample, the classified DA population dominates, making up nearly 60% of the B branch. It is possible that distance-related observational bias plays a role in this result.

Fig. 5

Top panel: adopted definition for the A (red), B (blue) and Q (black) branches in this analysis. Bottom panel: same, but zoomed in the A and B branches. For clarity, the regions have been represented over the 100 pc white dwarf sample of García-Zamora et al. (2023).

5.3.3 Q branch

In this section, we analyze the fraction of white dwarfs with different spectral types in the Q-branch. Following the analysis in Cheng et al. (2019) and Camisassa et al. (2021), we must first define the parameter ζ = M_G – 1.2 · (G_BP – G_RP). The Q-branch is then defined as the region of the color-magnitude diagram between 13.0 ≤ ζ ≤ 13.4.

Furthermore, within the Q-branch, we defined two distinct regions to differentiate between massive and ultra-massive white dwarfs. The selection is based on the cooling tracks of Camisassa et al. (2019) for white dwarf masses: one region for massive white dwarfs with masses in the range from 0.83 to 1.05 M_ʘ and another for ultra-massive white dwarfs with masses between 1.05 and 1.29 M_ʘ. A color-magnitude diagram displaying the cooling tracks, the constant ζ straight lines, and the identified massive and ultra-massive white dwarfs is shown in Fig. 6.

The massive Q branch comprises a total of 965 objects, 813 (84.25%) are DAs; 9 (0.93%) are DBs; 125 (12.95%) are DCs; 12 (1.24%) are DQs, and6 (0.62%) are DZs. No DOs are found in the massive Q branch region. Meanwhile, the ultramassive Q branch is formed by a total 338 objects, 259 (76.63%) are DAs; 57 (16.86%) are DBs; 21 (6.21%) are DCs, and 1 (0.30%) DQs.

Compared to the entire population distribution at 500 pc, we observe a slight decrease in the proportion of DAs, along with a 2.5-fold increase in the proportion of DBs for the ultramassive Q-branch (see Sect. 5.4.2 for a more detailed analysis of this result). Similarly, when taking into account the whole (massive plus ultramassive) 500 pc Q branch, the proportion of DQs in it (2.2%) is four times as large as the DQ ratio in the total 500 pc population (0.5%).

Finally, while the fact that a higher percentage of DQs appears in the massive Q-branch rather than in the ultramassive Q-branch may suggest that DQs in the Q-branch are more commonly massive rather than ultramassive, we advise against drawing such a conclusion. The low number of objects found (12 and 1, respectively); along with the low recall our RF algorithm has shown for DQs, prevent us from reaching any conclusions regarding the mass distribution of DQs.

Finally, we compared our results with those of Manser et al. (2024), who provided spectral classifications for 288 objects in the Q branch (specifically corresponding to our definition of the ultramassive Q branch) using spectra from the DESI survey. We find a similar fraction of DAs (~70%), but our analysis yields approximately half the number of DQs and nearly ten times the proportion of DBs reported in their study. A detailed discussion of this discrepancy can be found in Sect. 5.4.2.

Fig. 6

Massive and ultramassive Q-branch objects and selected DA cooling tracks used in Camisassa et al. (2019). Objects found between the ζ = 13.0, ζ = 13.4 and the 0.83 and 1.05 M cooling tracks are considered to belong to the massive Q branch; objects found between the ζ = 13.0, ζ = 13.4 and the 1.05 and 1.29 M cooling tracks are considered to belong to the massive Q branch.

Fig. 7

White dwarfs classified as DBs (red dots) and selected DB cooling tracks from Camisassa et al. (2019). Objects found below the 0.95 M cooling track (blue line) are considered to belong to a massive DB subpopulation found by our algorithm.

5.4 Peculiar features of non-DA white dwarfs

Finally, we discuss two peculiar results that arise from our classification concerning non-DA white dwarfs. First, we center our attention on the apparent deficit of DC white dwarfs at around 0.5 ≤G_BP – G_RP ≤ 0.8. Second, we discuss a seeming subpopulation (N ≈ 250) of massive objects classified as DB that are located at the top of the Q branch.

5.4.1 A DC deficit

An interesting feature found in this work is the emergence of an apparent deficit in the DC number in the 0.5 ≤G_BP – G_RP ≤0.85 region (see bottom-left panel of Fig. A.1 in Annex 1). This deficit, which corresponds to temperatures between 6500 K and 5000 K, approximately, has also been detected by Vincent et al. (2024). Moreover, Blouin et al. (2019) also report an increase in the number of hydrogen-rich objects in this temperature range, in agreement with the DC deficit found. On the other hand, such a deficit is not observed in the 40-pc volume-limited sample of O’Brien et al. (2024), raising the possibility of this deficit being an observational bias rather than a real feature.

5.4.2 Massive DBs

A second interesting finding from our RF classification is the presence of a subsample (N ≈ 250) of massive white dwarfs classified as DBs. This subsample represents approximately 5% of the total DB population and appears to be located over the Q branch, near its hottest tip. This can be seen in Fig. 7, along with selected DB cooling tracks from Camisassa et al. (2019).

This apparently massive DB white dwarf subpopulation has not been found by any other spectroscopic works (e.g. Genest-Beaulieu & Bergeron 2019; Bergeron et al. 2011) nor by any Gaia volume-limited sample of white dwarfs (Hollands et al. 2018; García-Zamora et al. 2023; O’Brien et al. 2023). However, the recent spectral classification by Vincent et al. (2024) of Gaia spectra also finds this subpopulation. Insights into this issue suggest that our RF algorithm assigns them a slightly higher probability of being magnetic DAH white dwarf, as compared to non-massive DBs. Visual inspection of the externally calibrated Gaia spectra, however, does not allow us to confirm or discard the possibility of these objects harboring a magnetic field, due to their low resolution.

Additionally, a search for these objects in the MWDD reveals that (among the few preclassified ones) only ~20% of them are indeed DB white dwarfs, while the rest are mainly magnetic DAH white dwarfs. Other exotic objects, such as DAQ or DQA white dwarfs, are also included. To discern the true nature of this subpopulation, we decided to perform a higher resolution follow-up spectroscopy study of these objects.

Table 6

Spectral types percentages of Gaia white dwarfs in the A, B, and Q branches.

6 Comparison with other works

In this section, we compare the classification results obtained in this work with those obtained through automatic classification algorithms, both supervised and unsupervised, in other studies. We also compare our RF classification with higher resolution spectra obtained by our team.

The interest of comparing these classifications lies in the fact that they use the same input material (i.e., the 110 Gaia spectral coefficients), but different classification algorithms and training sets in the case of supervised machine learning algorithms. In this way, by comparing the results and including the coincidences and discrepancies, we can gain an understanding of how these classifications work, as well as their strengths and weaknesses.

6.1 Comparison of 100 pc and 500 pc classifications

In this subsection, we describe our comparison of the 100 pc classification we we obtained in García-Zamora et al. (2023) and the assigned type to those same objects in the 500 pc classification. The rationale behind this comparison is that the assigned spectral type to common objects should coincide. However, since the training sets are different and contain not only different object proportions, but also different number of objects in each category, a small fraction of white dwarfs with different classifications is expected.

It is important to emphasize that the 100 pc spectral classification by García-Zamora et al. (2023) used the DA classification to include all white dwarfs with G_BP – G_RP ≤ 0.86 that had been classified as such in Jiménez-Esteban et al. (2023). Since the present work does not rely on any previous classification for such objects, a second 100 pc classification was derived for the purpose of this comparison. It was performed in the same way described in Sect. 4: a first DA versus non-DA classification, followed by a spectral type classification of the non-DAs.

The comparison result is shown in the confusion matrix in Fig. 8. It is clear that both classifications are largely equivalent, supported by the following metrics: an accuracy of 0.87, a balanced accuracy of 0.87, a mean F1-score of 0.80, and a G-means score of 0.85. The biggest discrepancy corresponds to the 709 objects classified as DC in the 100 pc classification, but as DA in the 500 pc classification. These are mostly cold objects, with G_BP – G_RP > 0.8. On the other hand, the 212 objects classified as DA in the 100 pc classification, but as DC in the 500 pc classification, are mostly warm, with a color of 0.3 < G_BP – G_RP < 0.8. Numerical imbalances in the training set might be the cause behind these discrepancies, as the DA-DC ratio in the training set is higher in the 500 pc sample than in the 100 pc sample.

6.2 Comparison with Vincent et al. (2024)

In this section, we compare the result of our RF classification algorithm with the recent classification provided by Vincent et al. (2024), who used a gradient-boosted decision trees (GBDT) algorithm to spectroscopically classify 100 886 white dwarfs with Gaia spectra. As a training set, the SDSS-Gaia catalog described in Gentile Fusillo et al. (2021) was used. A total of 69 618 objects was found across both catalogs. The results of the comparison are collected in a confusion matrix, shown in Fig. 9.

From the confusion matrix, we find an excellent agreement for DAs and a very good agreement for DBs (0.94 recall for DAs and 0.81 for DBs). On the other hand, the agreement is rather poor for DOs, DQs, and DZs (recalls of 0.25, 0.2, and 0.27, respectively). Many white dwarfs classified as such in Vincent et al. (2024) were classified as DC by our algorithm. One possible explanation could be the difference in classification algorithms, as well as the different training sets. Even though both works use the 110 Gaia XP spectral coefficients as input for the automatic classifiers, the classification by Vincent et al. (2024) was trained on the spectral classification of white dwarfs from the SDSS-Gaia catalog described in Gentile Fusillo et al. (2021), whereas we used the spectral classification from the MWDD. Since the output of any automatic classifier depends on the dataset used as the training sample, different results are expected.

It also should be noted that the low agreement for DOs, DQs, and DZs is not indicative of our classification being less accurate. On the contrary, more conservative; as the objects assigned the spectral DQ and DZ types have a very high probability of belonging to them. This also implies a more realistic classification in some instances, such as the DA-DC classification at the cooler end of the spectrum.

The biggest discrepancy appears in the DA/DC classification, with more than 4000 objects classified as DA in Vincent et al. (2024) being classified as DC by our algorithm. This discrepancy arises mainly for the coldest white dwarfs, as from the 4038 discrepant objects, 2254 of them have G_BP – G_RP > 1 and 3050 have G_BP – G_RP > 0.86. In this temperature range, hydrogen atoms are found mostly in their ground state; therefore the optical Balmer lines are not expected to be very prominent, if at all present. Therefore, we believe our DC classification to be more realistic, especially since the training sample used in Vincent et al. 2024 does not contain many cold objects to train the algorithm.

On the other hand, the 1912 objects classified as DC by Vincent et al. (2024) and as DA in this work are found in the middle temperature region, with approximate boundaries of G_BP – G_RP = 0 and G_BP – G_RP = 0.7. A clear interpretation of this discrepancy is more difficult to find, although it is possible that (at least at the coldest end) the DC deficit is playing a role. As for global metrics, this comparison attains an accuracy of 0.88, a balanced accuracy of 0.50, a mean F1- score of 0.56, and a G-means score of 0.42. These results show that both supervised classifications are compatible.

Fig. 8

Confusion matrix comparing 100 pc versus 500 pc classifications. The rows represent the classification from García-Zamora et al. (2023), while the columns show the classification assigned in this work.

Fig. 9

Confusion matrix for the comparison between this work and the classification in Vincent et al. (2024). The rows represent the classification from Vincent et al. (2024), while the columns show the classification assigned in this work.

6.3 Comparison with Kao et al. (2024)

Our next comparison is with the unsupervised classification presented in the recent work of Kao et al. (2024), who applied the Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP; see McInnes et al. 2018) algorithm to 96 134 white dwarfs. Their 110 Gaia coefficients were reduced to two coordinates using the UMAP algorithm and later visualized in a graph. DA, DB, DO, and DZ white dwarfs with a spectral classification in the MWDD were used to identify clustered regions on the map. However, it is worth noting that no spectral classification was attempted for the unclassified white dwarfs. As such, the present comparison will focus on the location of our classified white dwarfs in their UMAP coordinates and the clustered regions, rather than on ascertaining how many white dwarfs share the same classification.

To make this comparison, we used the UMAP coordinates of Kao et al. (2024) of the 77 417 shared white dwarfs in both classifications and represented them using the spectral classification obtained in this work as a color code. The result can be seen in Fig. 10. Comparing this with Fig. 3 of Kao et al. (2024), we confirm that our classified white dwarfs can be found in the same regions where the MWDD classified objects by Kao et al. (2024) fall. As such, we also find the cool DZ bow in the 0 < UMAP1 ≤ 3 and 1 ≤ UMAP2 ≤ 3 region; as well as some DZs in the lower half of the 3 ≤ UMAP1 ≤ 7.5 and 1 < UMAP2 < 5 region.

Moreover, we also find DB white dwarfs in the 7.5 < UMAP1 < 10 and 4 < UMAP2 < 7 region, as well as in the top right tip of the diagram, where Kao et al. (2024) identified the MWDD DB white dwarfs. Finally, DOs, both in the MWDD and found in this work, are also limited to this top right tip. In conclusion, the results of our primary spectral type classification are fully compatible with the spectral type distribution found in the UMAP of Kao et al. (2024).

Fig. 10

UMAP using the coordinates from Kao et al. (2024) and color-coded according to the spectral classification obtained in this work.

Fig. 11

Confusion matrix for the comparison between this work and the classification in Pérez-Couto et al. (2024). The rows represent the classification from Pérez-Couto et al. (2024), while the columns show the classification assigned in this work.

6.4 Comparison with Pérez-Couto et al. (2024)

The last comparison contrasts this work with the unsupervised classification presented in Pérez-Couto et al. (2024), who used self-organizing maps (Kohonen 1982), which is an unsupervised machine learning technique, to identify polluted objects in a white dwarf sample.

In their work, they used self-organizing maps to identify 66 337 white dwarf candidates, of which 61 817 were then assigned primary spectral types. A cross-match with our catalog shows that 51 581 objects have been classified in both works. To compare them, we present the corresponding confusion matrix in Fig. 11.

As the confusion matrix shows, the best agreements are found for DAs and DBs (97% and 76%, respectively), while the agreement is lower for DQs and DZs (29% and 42%, respectively). A possible cause is the low recall for both types in both works. While it is true that both works show high precision (>80%) for them, this does not guarantee that the few identified objects will actually be the same in both classifications, and coincidences between them are unlikely.

This fact is supported by the global metrics (0.94 accuracy, 0.59 balanced accuracy, 0.61 mean F1-score, 0.54 G-means score). Along with the confusion matrix, these results allow us to draw conclusions on the compatibility between both spectral classifications.

6.5 A golden sample of classified white dwarfs

From the different automatic classifications presented in this section, we can derive a golden sample; that is, a set of white dwarfs, whose primary spectral classification coincides among the different automatic classifications. The importance of such a set is that if different algorithms with different training sets assign an object the same primary spectral type, the classification can be considered to be highly reliable.

The work from Kao et al. (2024) does not classify unclassified white dwarfs. While the authors of Pérez-Couto et al. (2024) kindly provided a confusion matrix, no catalog with their spectral classification was published or accessible to us. Therefore, we are limited to deriving a golden sample from the work of Vincent et al. (2024).

A golden sample is thus derived from our work and (Vincent et al. 2024) using the 78 821 common white dwarfs. The resulting populations were: 61 992 DAs, 3853 DBs, 3318 DCs, 16 DOs, 93 DQs, and 346 DZs. The color-magnitude diagrams for the golden sample, by spectral type, can be found in Appendix B.

It is also worth noting that certain features observed in our 500 pc classification, such as the DC deficit or the existence of the massive DB white dwarf subpopulation, also appear in the golden sample. This reinforces our conclusion that they are actual physical features, rather than artifacts introduced by the machine learning algorithms.

6.6 Spectroscopic follow-up comparison

Finally, in this section, we compare the predictions of the RF algorithm with the spectra of selected targets for which higher-resolution spectroscopic follow-up has been conducted at the 10m GTC (Gran Telescopio Canarias), located in the island of La Palma. The OSIRIS (Optical System for Imaging and low-Intermediate-Resolution Integrated Spectroscopy; Cepa et al. 2013) instrument was used in the observations, together with the R1000B grism and the 0.6″ slit width, which resulted in spectra covering the 3600-7800 Å wavelength range at a resolving power of ≃1000. The complete sample has been observed between September 2024 and January 2025.

The observed spectra were reduced and calibrated using the pamela (Marsh 1989) and MOLLY packages⁶, respectively, and visually inspected to assign a spectral classification. For a total of 65 objects (2DAH,7DB,7DBA,6DO,5DQ,and38DZ)a reliable spectral type could be assigned. The observational types of these 65 objects were then compared with the spectral type predicted by our RF model, and the results are shown in the confusion matrix in Fig. 12. Finally, in Appendix C, we list the objects with the visual and the predicted spectral type.

The results reflect the excellent precision found by our RF algorithm with an accuracy of 0.94. There are minimal discrepancies between the predicted and observational types, with the only exceptions being objects classified as DB-DBA, which fall into the more ambiguous subtype classification. Additionally, some of the DO spectra present early features reminiscent of the PG 1159 type. In Appendix D, we present examples of GTC spectra for correctly classified targets, along with the original Gaia spectra used by our algorithm in the classification.

Fig. 12

Confusion matrix for the comparison between the predicted spectral type of selected candidates and the observational types derived from medium resolution (R~1000) spectra obtained at GTC OSIRIS.

2 Conclusions

The information contained in the coefficients of 78 920 Gaia white dwarf spectra has been analyzed by means of an artificial intelligence, namely, a RF algorithm. In the validation tests, we have shown their usefulness for primary spectral type classification into DA, DB, DC, DO, DQ, and DZ types. However, further categorization into subspectral types or binarity is found to be rather limited, due to the low Gaia spectral resolution in the subtype classification and the white dwarf component spectral dominance in the binarity case. Despite this, 6 DAH, 67 DBA, and 9 DOZ (2.31%, 23%, and 111% increase with respect to the training set, respectively) candidates have been found. A summary of our main finding is as follows:

A total of 78 920 objects have been classified into their primary types. Of these, 64 976 have been identified as DA (6 of which as DAH), 4957 as DB (66 of which as DBA), 8496 as DC, 21 as DO (9 of which as DOZ), 105 as DQ, and 365 as DZ;
Our algorithm shows an excellent recall for DA, DB and DC WDs (≥90%), good recall for DOs (−60%), and improvable recall for DQs and DZs (<35%);
Despite their low recall, DQs and DZs display a very good precision (≥80%), which is also true for DOs;
The high precision achieved for DO, DQ and DZ types implies that even though not all white dwarfs belonging to these spectral types are found, those that are will belong to those types with a very high level of probability;
With the possible exception of the DAH, DBA, and DOZ subtypes, our algorithm does not seem to be able to recognize any spectral subtypes. This result is not entirely unexpected, as non-prominent spectral lines that define the spectral subtypes are not expected to be detectable in Gaia low resolution spectra;
Our algorithm does not seem to be able to recognize binary candidates in the white dwarf region of the color-magnitude diagram. Of the 78 791 classified objects, only two are classified as WDMS candidates;
Our RF algorithm achieved an accuracy of 0.94 based on the comparison with 65 selected objects for which highresolution GTC spectroscopic follow-up was performed, with minimal discrepancies mainly in the DB-DBA subtype classification.

We also outline the physical implications of our analysis in the following points below:

We built a golden sample of69 618 objects, equally and independently classified by both our algorithm and the work of Vincent et al. (2024), comprising 88.21% of the classified set in the Gaia white dwarf catalog. Since two different automatic classifications assign the golden sample objects the same primary spectral type, we can conclude that this classification is highly reliable;
DAs are found to constitute more than 80% of all white dwarfs up to G_BP – G_RP = 1.0 which corresponds to an effective temperature of approximately 5000 K; this proportion is higher than in volume-limited samples and may result from the inherent observational bias of a magnitude-limited sample. For redder colors, DCs are the majority in line with the known spectral evolution of white dwarfs;
In the A branch, DAs represent more than 90% of the objects, in agreement with the findings of Jiménez-Esteban et al. (2023); whereas in the B branch, they account for approximately 60% of the objects, contrary to the 35% ratio found in the 100 pc sample reported by Jiménez-Esteban et al. (2023) The observational bias may also be the cause of the B-branch abundance discrepancies between volume-limited and magnitude-limited samples;
The DA proportion in the Q branch is found to be −70%, in line with the findings in Manser et al. (2024). Additionally, the DQ ratio in the Q-branch is found to be approximately four times the DQ ratio in the total 500 pc population;
We have identified a subpopulation (N – 250) of massive objects classified as DB white dwarfs, 164 of which also appear in Vincent et al. (2024). A search of these objects in the MWDD reveals that, among the few classified objects of this subpopulation, 20% at most are classified as DB. As a means of probing into the true nature of these objects, we are performing follow-up observations at a higher resolution;
In both our 500 pc classification and the golden sample, we found a decrease in the number of DCs in the 0.5 < G_BP> – G_RP ≤ 1 region.

In this work, we have shown that the use of machine learning techniques, in particular the Random Forest algorithm, has allowed us to spectrally classify 78 920 objects – those with Gaia spectra, which represent the vast majority of white dwarfs within 500 pc. Although spectroscopic follow-up observations with higher resolution are essential to gain a deeper understanding of these results, as well as the true nature of the white dwarfs, this initial classification enables more accurate estimates of cooling ages, spectral evolution, and star formation rates, among other fundamental characteristics of the white dwarf population.

Data availability

The data underlying this article are available in the article. Supplementary material will be shared on reasonable request to the corresponding author.

Full Table 3 is available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr (130.79.128.5) or via https://cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/699/A3

Acknowledgements

We acknowledge support from MINECO under the PID2023-148661NB-I00 grant and by the AGAUR/Generalitat de Catalunya grant SGR-386/2021. Enrique Miguel García Zamora also acknowledges financial support from Banco de Santander, under a Becas Santander Investi-gación/Ajuts de Formació de Professorat Universitari (2022_FPU-UPC_16) grant. Based on observations made with the Gran Telescopio Canarias (programme GTC6-21B), installed in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias, in the island of La Palma.

Appendix A Color-magnitude diagrams of the Gaia 500 pc classification sample

Fig. A.1

Left panels: White dwarfs classified as DA, DB and DC in this work. Right panels: Entire population (those classified in this work and those labeled in MWDD) of DA, DB, and DC white dwarfs.

Fig. A.2

Left panels: White dwarfs classified as DO, DQ and DZ in this work. Right panels: Entire population (those classified in this work and those labeled in MWDD) of DO, DQ, and DZ white dwarfs.

Appendix B Golden sample color-magnitude diagrams

Fig. B.1

White dwarfs of the Golden Sample classified as DA, DB ,DC, DO, DQ, and DZ (left to right, top to bottom, respectively).

Appendix C Observational versus prediction comparison table

Table C.1

Observational versus RF spectral prediction of a selected sample of white dwarfs

Table C.2

RF algorithm classification of Gaia 500 pc white dwarfs (continuation)

Appendix D Gaia and GTC-OSIRIS sample spectra

Fig. D.1

Gaia spectra (left column) and GTC spectra (right column) of objects classified as DAH, DB, and DBA.

Fig. D.2

Gaia spectra (left column) and GTC spectra (right column) of objects classified as DO, DQ, and DZ.

References

Althaus, L. G., Córsico, A. H., Isern, J., & García-Berro, E. 2010, A&A Rev., 18, 471 [NASA ADS] [CrossRef] [Google Scholar]
Bayo, A., Rodrigo, C., Barrado Y Navascués, D., et al. 2008, A&A, 492, 277 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bédard, A. 2024, Ap&SS, 369, 43 [CrossRef] [Google Scholar]
Bergeron, P., Wesemael, F., Dufour, P., et al. 2011, ApJ, 737, 28 [Google Scholar]
Bergeron, P., Dufour, P., Fontaine, G., et al. 2019, ApJ, 876, 67 [NASA ADS] [CrossRef] [Google Scholar]
Blouin, S., Dufour, P., Thibeault, C., & Allard, N. F. 2019, ApJ, 878, 63 [NASA ADS] [CrossRef] [Google Scholar]
Blouin, S., Bédard, A., & Tremblay, P.-E. 2023, MNRAS, 523, 3363 [NASA ADS] [CrossRef] [Google Scholar]
Breedt, E., Steeghs, D., Marsh, T. R., et al. 2017, MNRAS, 468, 2910 [NASA ADS] [CrossRef] [Google Scholar]
Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]
Camisassa, M. E., Althaus, L. G., Córsico, A. H., et al. 2019, A&A, 625, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Camisassa, M. E., Althaus, L. G., Torres, S., et al. 2021, A&A, 649, L7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Camisassa, M., Torres, S., Hollands, M., et al. 2023, A&A, 674, A213 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Carrasco, J. M., Weiler, M., Jordi, C., et al. 2021, A&A, 652, A86 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Cepa, J., Bongiovanni, A., Pérez García, A. M., et al. 2013, in Highlights of Spanish Astrophysics VII, eds. J. C. Guirado, L. M. Lara, V. Quilis, & J. Gorgas, 868 [Google Scholar]
Cheng, S., Cummings, J. D., & Ménard, B. 2019, ApJ, 886, 100 [Google Scholar]
Cunningham, T., Tremblay, P.-E., Gentile Fusillo, N. P., Hollands, M., & Cukanovaite, E. 2020, MNRAS, 492, 3540 [NASA ADS] [CrossRef] [Google Scholar]
De Angeli, F., Weiler, M., Montegriffo, P., et al. 2023, A&A, 674, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
de Jong, R. S., Agertz, O., Berbel, A. A., et al. 2019, The Messenger, 175, 3 [NASA ADS] [Google Scholar]
Dufour, P., Blouin, S., Coutu, S., et al. 2017, in Astronomical Society of the Pacific Conference Series, 509, 20th European White Dwarf Workshop, eds. P. E. Tremblay, B. Gaensicke, & T. Marsh, 3 [Google Scholar]
Echeverry, D., Torres, S., Rebassa-Mansergas, A., & Ferrer-Burjachs, A. 2022, A&A, 667, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Farihi, J., Barstow, M. A., Redfield, S., Dufour, P., & Hambly, N. C. 2010, MNRAS, 404, 2123 [NASA ADS] [Google Scholar]
Gaia Collaboration (Babusiaux, C., et al.) 2018, A&A, 616, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gaia Collaboration (Vallenari, A., et al.) 2023, A&A, 674, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
García-Zamora, E. M., Torres, S., & Rebassa-Mansergas, A. 2023, A&A, 679, A127 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Genest-Beaulieu, C., & Bergeron, P. 2019, ApJ, 871, 169 [NASA ADS] [CrossRef] [Google Scholar]
Gentile Fusillo, N. P., Tremblay, P. E., Cukanovaite, E., et al. 2021, MNRAS, 508, 3877 [NASA ADS] [CrossRef] [Google Scholar]
Hollands, M. A., Tremblay, P. E., Gänsicke, B. T., Gentile-Fusillo, N. P., & Toonen, S. 2018, MNRAS, 480, 3942 [NASA ADS] [CrossRef] [Google Scholar]
Jiménez-Esteban, F. M., Torres, S., Rebassa-Mansergas, A., et al. 2023, MNRAS, 518, 5106 [Google Scholar]
Jin, S., Trager, S. C., Dalton, G. B., et al. 2024, MNRAS, 530, 2688 [NASA ADS] [CrossRef] [Google Scholar]
Kao, M. L., Hawkins, K., Rogers, L. K., et al. 2024, ApJ, 970, 181 [NASA ADS] [CrossRef] [Google Scholar]
Kohonen, T. 1982, Biol. Cybern., 43, 59 [Google Scholar]
Kong, X., Luo, A.-L., Li, X.-R., et al. 2018, PASP, 130, 084203 [NASA ADS] [CrossRef] [Google Scholar]
Levi, M., Allen, L. E., Raichoor, A., et al. 2019, in Bulletin of the American Astronomical Society, 51, 57 [Google Scholar]
Manser, C. J., Izquierdo, P., Gänsicke, B. T., et al. 2024, MNRAS, 535, 254 [CrossRef] [Google Scholar]
Marsh, T. R. 1989, PASP, 101, 1032 [Google Scholar]
McCleery, J., Tremblay, P.-E., Gentile Fusillo, N. P., et al. 2020, MNRAS, 499, 1890 [Google Scholar]
McInnes, L., Healy, J., & Melville, J. 2018, arXiv e-prints [arXiv:1802.03426] [Google Scholar]
Montegriffo, P., De Angeli, F., Andrae, R., et al. 2023, A&A, 674, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Napiwotzki, R., Karl, C. A., Lisker, T., et al. 2020, A&A, 638, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
O’Brien, M. W., Tremblay, P. E., Gentile Fusillo, N. P., et al. 2023, MNRAS, 518, 3055 [Google Scholar]
O’Brien, M. W., Tremblay, P. E., Klein, B. L., et al. 2024, MNRAS, 527, 8687 [Google Scholar]
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
Pérez-Couto, X., Pallas-Quintela, L., Manteiga, M., Villaver, E., & Dafonte, C. 2024, ApJ, 977, 31 [Google Scholar]
Rebassa-Mansergas, A., Agurto-Gangas, C., Schreiber, M. R., Gänsicke, B. T., & Koester, D. 2013, MNRAS, 433, 3398 [Google Scholar]
Sion, E. M., Greenstein, J. L., Landstreet, J. D., et al. 1983, ApJ, 269, 253 [CrossRef] [Google Scholar]
Torres, S., García-Berro, E., & Isern, J. 1998, ApJ, 508, L71 [NASA ADS] [CrossRef] [Google Scholar]
Torres, S., Cantero, C., Rebassa-Mansergas, A., et al. 2019, MNRAS, 485, 5573 [NASA ADS] [CrossRef] [Google Scholar]
Torres, S., Cruz, P., Murillo-Ojeda, R., et al. 2023, A&A, 677, A159 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Tremblay, P. E., Cukanovaite, E., Gentile Fusillo, N. P., Cunningham, T., & Hollands, M. A. 2019a, MNRAS, 482, 5222 [Google Scholar]
Tremblay, P.-E., Fontaine, G., Gentile Fusillo, N. P., et al. 2019b, Nature, 565, 202 [CrossRef] [Google Scholar]
Tremblay, P. E., Hollands, M. A., Gentile Fusillo, N. P., et al. 2020, MNRAS, 497, 130 [NASA ADS] [CrossRef] [Google Scholar]
Vincent, O., Bergeron, P., & Dufour, P. 2023, MNRAS, 521, 760 [NASA ADS] [CrossRef] [Google Scholar]
Vincent, O., Barstow, M. A., Jordan, S., et al. 2024, A&A, 682, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Weiler, M., Carrasco, J. M., Fabricius, C., & Jordi, C. 2023, A&A, 671, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Zuckerman, B., Koester, D., Melis, C., Hansen, B. M., & Jura, M. 2007, ApJ, 671, 872 [Google Scholar]

¹

https://www.montrealwhitedwarfdatabase.org/

²

Accuracy is defined as the proportion of correct predictions.

³

F1-score is defined as 2 × Recall × Precision/(Recall + Precision).

⁴

The balanced accuracy is defined as the average of the recalls for each class.

⁵

The G-mean score is defined as the geometric mean of the recalls for every class.

⁶

Developed by Tom Marsh and available at https://cygnus.astro.warwick.ac.uk/phsaap/software/molly/html/INDEX.html

All Tables

Table 1

Validation tests classification metrics.

In the text

Table 2

Binarity validation test classification metrics.

In the text

Table 3

RF algorithm classification of Gaia 500 pc white dwarfs.

In the text

Table 4

Spectral types percentages.

In the text

Table 5

Spectral type percentages according to the Gaia color.

In the text

Table 6

Spectral types percentages of Gaia white dwarfs in the A, B, and Q branches.

In the text

Table C.1

Observational versus RF spectral prediction of a selected sample of white dwarfs

In the text

Table C.2

RF algorithm classification of Gaia 500 pc white dwarfs (continuation)

In the text

All Figures

	Fig. 1 Confusion matrices for our validation tests: DA vs non-DA (top panel) and non-DA types (bottom panel). As true label (rows) we adopted the MWDD classification, while the predicted label (columns) is the one resulting from our RF algorithm.
In the text

	Fig. 2 Gaia color-magnitude diagram for white dwarfs classified as DA (top panel) and for those classified into various non-DA spectral types (bottom panel) by our algorithm.
In the text

	Fig. 3 Top panel: fraction of white dwarfs with different spectral type with respect to the total number of classified objects as a function of distance. Bottom panel: same, but only for DOs, DQs and DZs.
In the text

	Fig. 4 Top panel: fraction of white dwarfs with different spectral type with respect to the total number of classified objects as a function of G_BP – G_RP color. Bottom panel: same, but only for DOs, DQs and DZs.
In the text

	Fig. 5 Top panel: adopted definition for the A (red), B (blue) and Q (black) branches in this analysis. Bottom panel: same, but zoomed in the A and B branches. For clarity, the regions have been represented over the 100 pc white dwarf sample of García-Zamora et al. (2023).
In the text

	Fig. 6 Massive and ultramassive Q-branch objects and selected DA cooling tracks used in Camisassa et al. (2019). Objects found between the ζ = 13.0, ζ = 13.4 and the 0.83 and 1.05 M cooling tracks are considered to belong to the massive Q branch; objects found between the ζ = 13.0, ζ = 13.4 and the 1.05 and 1.29 M cooling tracks are considered to belong to the massive Q branch.
In the text

	Fig. 7 White dwarfs classified as DBs (red dots) and selected DB cooling tracks from Camisassa et al. (2019). Objects found below the 0.95 M cooling track (blue line) are considered to belong to a massive DB subpopulation found by our algorithm.
In the text

	Fig. 8 Confusion matrix comparing 100 pc versus 500 pc classifications. The rows represent the classification from García-Zamora et al. (2023), while the columns show the classification assigned in this work.
In the text

	Fig. 9 Confusion matrix for the comparison between this work and the classification in Vincent et al. (2024). The rows represent the classification from Vincent et al. (2024), while the columns show the classification assigned in this work.
In the text

	Fig. 10 UMAP using the coordinates from Kao et al. (2024) and color-coded according to the spectral classification obtained in this work.
In the text

	Fig. 11 Confusion matrix for the comparison between this work and the classification in Pérez-Couto et al. (2024). The rows represent the classification from Pérez-Couto et al. (2024), while the columns show the classification assigned in this work.
In the text

	Fig. 12 Confusion matrix for the comparison between the predicted spectral type of selected candidates and the observational types derived from medium resolution (R~1000) spectra obtained at GTC OSIRIS.
In the text

	Fig. A.1 Left panels: White dwarfs classified as DA, DB and DC in this work. Right panels: Entire population (those classified in this work and those labeled in MWDD) of DA, DB, and DC white dwarfs.
In the text

	Fig. A.2 Left panels: White dwarfs classified as DO, DQ and DZ in this work. Right panels: Entire population (those classified in this work and those labeled in MWDD) of DO, DQ, and DZ white dwarfs.
In the text

	Fig. B.1 White dwarfs of the Golden Sample classified as DA, DB ,DC, DO, DQ, and DZ (left to right, top to bottom, respectively).
In the text

	Fig. D.1 Gaia spectra (left column) and GTC spectra (right column) of objects classified as DAH, DB, and DBA.
In the text

	Fig. D.2 Gaia spectra (left column) and GTC spectra (right column) of objects classified as DO, DQ, and DZ.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Althaus, L. G., Córsico, A. H., Isern, J., & García-Berro, E. 2010, A&A Rev., 18, 471 [NASA ADS] [CrossRef] [Google Scholar]

[2] Bayo, A., Rodrigo, C., Barrado Y Navascués, D., et al. 2008, A&A, 492, 277 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[3] Bédard, A. 2024, Ap&SS, 369, 43 [CrossRef] [Google Scholar]

[4] Bergeron, P., Wesemael, F., Dufour, P., et al. 2011, ApJ, 737, 28 [Google Scholar]

[5] Bergeron, P., Dufour, P., Fontaine, G., et al. 2019, ApJ, 876, 67 [NASA ADS] [CrossRef] [Google Scholar]

[6] Blouin, S., Dufour, P., Thibeault, C., & Allard, N. F. 2019, ApJ, 878, 63 [NASA ADS] [CrossRef] [Google Scholar]

[7] Blouin, S., Bédard, A., & Tremblay, P.-E. 2023, MNRAS, 523, 3363 [NASA ADS] [CrossRef] [Google Scholar]

[8] Breedt, E., Steeghs, D., Marsh, T. R., et al. 2017, MNRAS, 468, 2910 [NASA ADS] [CrossRef] [Google Scholar]

[9] Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]

[10] Camisassa, M. E., Althaus, L. G., Córsico, A. H., et al. 2019, A&A, 625, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[11] Camisassa, M. E., Althaus, L. G., Torres, S., et al. 2021, A&A, 649, L7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[12] Camisassa, M., Torres, S., Hollands, M., et al. 2023, A&A, 674, A213 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[13] Carrasco, J. M., Weiler, M., Jordi, C., et al. 2021, A&A, 652, A86 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[14] Cepa, J., Bongiovanni, A., Pérez García, A. M., et al. 2013, in Highlights of Spanish Astrophysics VII, eds. J. C. Guirado, L. M. Lara, V. Quilis, & J. Gorgas, 868 [Google Scholar]

[15] Cheng, S., Cummings, J. D., & Ménard, B. 2019, ApJ, 886, 100 [Google Scholar]

[16] Cunningham, T., Tremblay, P.-E., Gentile Fusillo, N. P., Hollands, M., & Cukanovaite, E. 2020, MNRAS, 492, 3540 [NASA ADS] [CrossRef] [Google Scholar]

[17] De Angeli, F., Weiler, M., Montegriffo, P., et al. 2023, A&A, 674, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[18] de Jong, R. S., Agertz, O., Berbel, A. A., et al. 2019, The Messenger, 175, 3 [NASA ADS] [Google Scholar]

[19] Dufour, P., Blouin, S., Coutu, S., et al. 2017, in Astronomical Society of the Pacific Conference Series, 509, 20th European White Dwarf Workshop, eds. P. E. Tremblay, B. Gaensicke, & T. Marsh, 3 [Google Scholar]

[20] Echeverry, D., Torres, S., Rebassa-Mansergas, A., & Ferrer-Burjachs, A. 2022, A&A, 667, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[21] Farihi, J., Barstow, M. A., Redfield, S., Dufour, P., & Hambly, N. C. 2010, MNRAS, 404, 2123 [NASA ADS] [Google Scholar]

[22] Gaia Collaboration (Babusiaux, C., et al.) 2018, A&A, 616, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[23] Gaia Collaboration (Vallenari, A., et al.) 2023, A&A, 674, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[24] García-Zamora, E. M., Torres, S., & Rebassa-Mansergas, A. 2023, A&A, 679, A127 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[25] Genest-Beaulieu, C., & Bergeron, P. 2019, ApJ, 871, 169 [NASA ADS] [CrossRef] [Google Scholar]

[26] Gentile Fusillo, N. P., Tremblay, P. E., Cukanovaite, E., et al. 2021, MNRAS, 508, 3877 [NASA ADS] [CrossRef] [Google Scholar]

[27] Hollands, M. A., Tremblay, P. E., Gänsicke, B. T., Gentile-Fusillo, N. P., & Toonen, S. 2018, MNRAS, 480, 3942 [NASA ADS] [CrossRef] [Google Scholar]

[28] Jiménez-Esteban, F. M., Torres, S., Rebassa-Mansergas, A., et al. 2023, MNRAS, 518, 5106 [Google Scholar]

[29] Jin, S., Trager, S. C., Dalton, G. B., et al. 2024, MNRAS, 530, 2688 [NASA ADS] [CrossRef] [Google Scholar]

[30] Kao, M. L., Hawkins, K., Rogers, L. K., et al. 2024, ApJ, 970, 181 [NASA ADS] [CrossRef] [Google Scholar]

[31] Kohonen, T. 1982, Biol. Cybern., 43, 59 [Google Scholar]

[32] Kong, X., Luo, A.-L., Li, X.-R., et al. 2018, PASP, 130, 084203 [NASA ADS] [CrossRef] [Google Scholar]

[33] Levi, M., Allen, L. E., Raichoor, A., et al. 2019, in Bulletin of the American Astronomical Society, 51, 57 [Google Scholar]

[34] Manser, C. J., Izquierdo, P., Gänsicke, B. T., et al. 2024, MNRAS, 535, 254 [CrossRef] [Google Scholar]

[35] Marsh, T. R. 1989, PASP, 101, 1032 [Google Scholar]

[36] McCleery, J., Tremblay, P.-E., Gentile Fusillo, N. P., et al. 2020, MNRAS, 499, 1890 [Google Scholar]

[37] McInnes, L., Healy, J., & Melville, J. 2018, arXiv e-prints [arXiv:1802.03426] [Google Scholar]

[38] Montegriffo, P., De Angeli, F., Andrae, R., et al. 2023, A&A, 674, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[39] Napiwotzki, R., Karl, C. A., Lisker, T., et al. 2020, A&A, 638, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[40] O’Brien, M. W., Tremblay, P. E., Gentile Fusillo, N. P., et al. 2023, MNRAS, 518, 3055 [Google Scholar]

[41] O’Brien, M. W., Tremblay, P. E., Klein, B. L., et al. 2024, MNRAS, 527, 8687 [Google Scholar]

[42] Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]

[43] Pérez-Couto, X., Pallas-Quintela, L., Manteiga, M., Villaver, E., & Dafonte, C. 2024, ApJ, 977, 31 [Google Scholar]

[44] Rebassa-Mansergas, A., Agurto-Gangas, C., Schreiber, M. R., Gänsicke, B. T., & Koester, D. 2013, MNRAS, 433, 3398 [Google Scholar]

[45] Sion, E. M., Greenstein, J. L., Landstreet, J. D., et al. 1983, ApJ, 269, 253 [CrossRef] [Google Scholar]

[46] Torres, S., García-Berro, E., & Isern, J. 1998, ApJ, 508, L71 [NASA ADS] [CrossRef] [Google Scholar]

[47] Torres, S., Cantero, C., Rebassa-Mansergas, A., et al. 2019, MNRAS, 485, 5573 [NASA ADS] [CrossRef] [Google Scholar]

[48] Torres, S., Cruz, P., Murillo-Ojeda, R., et al. 2023, A&A, 677, A159 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[49] Tremblay, P. E., Cukanovaite, E., Gentile Fusillo, N. P., Cunningham, T., & Hollands, M. A. 2019a, MNRAS, 482, 5222 [Google Scholar]

[50] Tremblay, P.-E., Fontaine, G., Gentile Fusillo, N. P., et al. 2019b, Nature, 565, 202 [CrossRef] [Google Scholar]

[51] Tremblay, P. E., Hollands, M. A., Gentile Fusillo, N. P., et al. 2020, MNRAS, 497, 130 [NASA ADS] [CrossRef] [Google Scholar]

[52] Vincent, O., Bergeron, P., & Dufour, P. 2023, MNRAS, 521, 760 [NASA ADS] [CrossRef] [Google Scholar]

[53] Vincent, O., Barstow, M. A., Jordan, S., et al. 2024, A&A, 682, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[54] Weiler, M., Carrasco, J. M., Fabricius, C., & Jordi, C. 2023, A&A, 671, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[55] Zuckerman, B., Koester, D., Melis, C., Hansen, B. M., & Jura, M. 2007, ApJ, 671, 872 [Google Scholar]

A random forest spectral classification of the Gaia 500 pc white dwarf population

1 Introduction

2 Method: Random Forest classification of Gaia spectral coefficients

3 Algorithm training and validation

3.1 Spectral classification validation tests

3.1.1 First validation test: DA vs non-DA

3.1.2 Non-DAs validation test

3.1.3 Spectral subtype validation test

3.2 Binarity validation test

4 Classification of the Gaia 500 pc white dwarf population

4.1 Primary spectral types

4.2 Spectral subtypes

4.3 The Gaia 500 pc sample classification summary

5 Analysis of the classified 500 pc white dwarf population

5.1 Spectral content of the white dwarf population

5.2 Spectral types as a function of the GBP–GRP color

5.3 The spectral content of the A, B and Q-branches

5.3.1 A branch

5.3.2 B branch

5.3.3 Q branch

5.4 Peculiar features of non-DA white dwarfs

5.4.1 A DC deficit

5.4.2 Massive DBs

6 Comparison with other works

6.1 Comparison of 100 pc and 500 pc classifications

6.2 Comparison with Vincent et al. (2024)

6.3 Comparison with Kao et al. (2024)

6.4 Comparison with Pérez-Couto et al. (2024)

6.5 A golden sample of classified white dwarfs

6.6 Spectroscopic follow-up comparison

2 Conclusions

Data availability

Acknowledgements

Appendix A Color-magnitude diagrams of the Gaia 500 pc classification sample

Appendix B Golden sample color-magnitude diagrams

Appendix C Observational versus prediction comparison table

Appendix D Gaia and GTC-OSIRIS sample spectra

References

All Tables

All Figures

5.2 Spectral types as a function of the G_BP–G_RP color