Open Access
Issue
A&A
Volume 683, March 2024
Article Number A199
Number of page(s) 22
Section Interstellar and circumstellar matter
DOI https://doi.org/10.1051/0004-6361/202348671
Published online 20 March 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

Diffuse interstellar bands (DIBs) are a set of absorption features in the spectra of stars, galaxies, and quasars that are observed in the optical and near-infrared bands (about 0.4–2.4 µm, see DIB surveys: Fan et al. 2019; Hamano et al. 2022; Ebenbichler et al. 2022). High-quality astronomical observations, experimental measurements, and theoretical analyses support that the DIBs originate from complex carbon-bearing molecules (e.g. Campbell et al. 2015; Omont et al. 2019; MacIsaac et al. 2022), so that DIBs can serve as chemical and kinematic tracers of the Galactic interstellar medium (ISM), even though the exact species of most DIB carriers are still unknown.

Because DIBs are weak features and might be blended with stellar lines, early studies preferred to observe a handful of hot stars as background sources because their spectra are clean. This favored the survey of DIB signals (e.g. Jenniskens & Desert 1994; Galazutdinov et al. 2000; Hobbs et al. 2008, 2009; Fan et al. 2019) and the exploration of elemental properties of DIB carriers (e.g. Friedman et al. 2011; Vos et al. 2011; Fan et al. 2017). On the other hand, large spectroscopic surveys during the past decade, such as the Radial Velocity Experiment (RAVE; Steinmetz et al. 2006), the APO Galactic Evolution Experiment (APOGEE; Majewski et al. 2017), the Galactic Archaeology with HERMES survey (GALAH; Buder et al. 2021), and the Gaia Radial Velocity Spectrometer (RVS; Cropper et al. 2018; Sartoretti et al. 2018), have observed the spectra of hundreds of thousands to tens of millions of stars, which enables statistical studies of DIB properties. For example, Lan et al. (2015) and Baron et al. (2015) mapped the DIB strength projected on the celestial sphere at high latitudes. Kos et al. (2014) and Zasowski et al. (2015) built the three-dimensional (3D) distribution of DIB strength for DIBs at 8621 Å (λ8621) and 15 273 Å, respectively. Zhao et al. (2021) and Zasowski et al. (2015) further investigated the kinematics of these two DIBs.

Because late-type stars dominate the observations in spectroscopic surveys, synthetic spectra derived from the stellar atmospheric model and atomic line lists are needed to isolate the DIB signal from the stellar components. However, an incorrect modeling of stellar lines close to the DIB signal could introduce additional uncertainties in the DIB measurements. Moreover, when the stellar residuals are comparable to the DIB features in terms of strength, a pseudo-fitting is hard to distinguish and could lead to a bias in the measurements of DIB parameters. To overcome this limitation, Kos et al. (2013) developed a data-driven method, called the “best-neighbor method” (BNM), to build artificial stellar templates for the observed spectra in the vicinity of the DIB feature (DIB window). Specifically, BNM first separated the whole spectroscopic sample into a target sample (spectra containing DIB signals) and a reference sample (spectra without DIB signals). The reference sample typically constituted sources at high latitudes and with low dust extinctions according to the assumption that low extinction represents a low abundance of ISM species. Then, for a given target spectrum, BNM determined the best-matched reference spectra based on a pixel-by-pixel comparison for the spectral region outside the DIB window. Finally, a number of best-matched reference spectra (up to 25 in Kos et al. 2013) were averaged to create a stellar template for the DIB window. The ISM spectrum within the DIB window in which the DIB signal is detected and measured was defined as the target spectrum divided by the generated stellar template. The BNM has been applied to measure DIBs in the spectra from RAVE (Kos et al. 2013), Gaia RVS (Zhao et al. 2022; Gaia Collaboration 2023a, hereafter GFPR), and GALAH (Vogrinčič et al. 2023). Other types of data-driven methods have also been applied to detect DIBs. Saydjari et al. (2023, hereafter AS23) decomposed and recognized stellar components and DIBs in public Gaia RVS spectra with a data-driven prior consisting of ~40 000 RVS low-extinction spectra. McKinnon et al. (2023) built second-order polynomial models of the normalized flux as a function of stellar parameters for ~ 17 000 red clump stars observed in APOGEE and found 84 possible DIBs (25 identified with a confidence level of 95%, 10 of which were previously known) in the residuals between observed and modeled APOGEE spectra.

The third data release of Gaia (Gaia Collaboration 2023c) contains a large number of measurements for DIB λ8621 in about 500 000 RVS spectra of individual stars (Gaia Collaboration 2023b, hereafter GDR3). DIB λ8621 was fit in the ISM spectra derived by the synthetic spectra from the general stellar parameterizer from spectroscopy (GSP-Spec) module (Recio-Blanco et al. 2023). After removing the cases with poor stellar modeling and poor DIB parameters, we defined a high-quality (HQ) DIB sample containing ~ 140 000 sightlines (see Sect. 3 in GDR3 for details). However, AS23 reported a dependence on the fit central wavelength and the Gaussian width of DIB λ8621 (see their Fig. 1) and attributed these biases to the residuals of stellar lines in the vicinity of DIB λ8621 (e.g., the Fe I lines at 8620.51 Å and 8623.97 Å in vacuum wavelength determined by Contursi et al. 2021). In this work, we improve the BNM to a machine-learning (ML) approach that replaces the pixel-by-pixel comparison of the spectral flux in finding the best-matched reference spectra by ML training. The ML approach can directly predict stellar components in the DIB window for the target spectra instead of comparing them one by one with the reference spectra. Thus, the ML approach can speed up the process and ignore irrelevant features. Furthermore, Kos et al. (2013) down-weighted the Ca II regions in the comparison between target and reference spectra because the Ca II lines are too strong to overwhelm other stellar features, but the ML model does not need to adjust the weights of stellar lines. We applied the improved BNM to process 780 000 RVS spectra published in Gaia DR3 and measured DIB λ8621 as well as the broad DIB around 8648 Å (λ8648; Zhao et al. 2022). We performed some statistical analysis of the properties of these two DIBs based on a selected reliable sample. We compared our new measurements of DIB λ8621 to those in GDR3 and AS23 to estimate the degree of biases of DIB parameters in GDR3. We note that the BNM has been applied to RVS spectra in the new focused product release (FPR) of Gaia to measure DIBs λ8621 and λ8648 (see GFPR for detailed results), but the FPR only contained DIB measurements in stacked ISM spectra, and this work measured DIBs λ8621 and λ8648 in the RVS spectra of individual stars for the first time.

The paper is organized as follows: The data processing is described in Sect. 2. Section 3 introduces our ML model and the fitting of DIBs λ8621 and λ8648. In Sect. 4, we investigate the intensity and kinematic properties of DIB λ8621, analyze the detection of DIB λ8648 in individual RVS spectra, and reassess the results of λ8621 in GDR3. The correlation between the strength and abundance of DIB λ8621, as well as the completeness of the DIB catalog, are discussed in Sect. 5. The main conclusions are summarized in Sect. 6.

thumbnail Fig. 1

Galactic spatial density distribution of 780 513 target spectra (left panel) and 36 622 reference spectra (right panel). This HEALPix (Górski et al. 2005) map has a spatial resolution of 0.92° (Nside = 64).

thumbnail Fig. 2

Two-dimensional distributions of stellar atmospheric parameters (Teff, log g [M/H]) from GSP-Spec (Recio-Blanco et al. 2023) for the training set, the validation set, the testing set, and the target set. The color represents the number of stars counted in each bin with a size of ΔTeff = 40 K, Δ log g = 0.05, and Δ[M/H] = 0.04 dex.

2 Data processing

There are 999 645 RVS spectra (R ~ 11 500, mean spectra of epoch observations) published in Gaia DR3 (Gaia Collaboration 2023c) that can be accessed through the datalink interface of the Gaia Archive1. The published RVS spectra were processed by Gaia DPAC Coordination Unit 6 (CU6) and were equally resampled between 864 and 870 nm with a spacing of 0.01 nm (2400 wavelength bins; Sartoretti et al. 2018, 2023). The spectra were normalized and shifted to the rest frame as well. Following the process in Recio-Blanco et al. (2023) and GFPR, we rebinned RVS spectra from 2400 to 800 wavelength pixels, sampled every 0.03 nm, to increase the signal-to-noise ratio (S/N). The total wavelength range of RVS spectra used in this work is between 8471.2 and 8687.5 Å to ensure that no reference spectra contained nan-value fluxes. The DIB window is defined as 8600–8680 Å (267 wavelength pixels).

After combining with the calibrated distance catalog of Bailer-Jones et al. (2021), where we made use of their geometric distances, 996900 sources were left. We calculated E(BV) for these stars based on the Planck dust map (Planck Collaboration Int. XLVIII 2016) using the Python package dustmaps (Green 2018). The target sample contains 780 513 spectra with E(BV) > 0.02 mag and an S/N> 20. The reference sample contains 36 622 spectra with E(BV) ⩽ 0.02 mag, |b| ⩾ 30°, and an S/N⩾ 50. The higher threshold of the S/N for the reference sample was chosed to achieve a better training set. The density distribution in Galactic coordinates for the target and reference samples is shown in Fig. 1. Baron et al. (2015) reported the detection of DIB signals in dust-free regions at high latitudes. These DIB signals in the reference spectra might introduce an offset or a bias when the DIB profiles are modeled in target spectra, but when we assume that DIBs like this only exist in a very small part of the reference sample, the ML model would treat them as irrelevant features and minimize their effect. Nevertheless, this problem cannot be quantified because we lack DIB maps that were built with hot stars without reference spectra.

3 Method

3.1 Stellar templates built with a random forest model

Various supervised-learning algorithms can be applied to model the stellar lines in the DIB window. In this work, we built a model based on the random forest (RF) regression, which is an ensemble bagging method that combines a large number of decision trees (Breiman 2001). The RF model predicts the stellar template within the DIB window (8600–8680 Å) for a given target spectrum using the part outside the DIB window (i.e., 8471.2–8600 Å and 8680–8687.5 Å). The model construction and prediction of the RF algorithm were completed by the Python package scikit-learn (Pedregosa et al. 2011).

The reference sample was separated into three data sets: the training set containing 21974 spectra (60%), the validation set containing 7324 spectra (20%), and the testing set containing 7324 spectra (20%). The distributions of the stellar atmospheric parameters (Teff, log g, [M/H]) from GSP-Spec (Recio-Blanco et al. 2023) of these sets, as well as the target set, are presented in Fig. 2. The reference sample (training, validation, and testing sets) has a similar coverage of stellar parameter space as the target sample, mainly covering the main sequence, subgiant and red giant branches, and an [M/H] region between −1 and 0.5 dex.

Metal-poor and extremely hot or cool stars are notably missing, but they only form a small fraction of the target sample. The stellar parameters are only used to present the space coverage, but were not used in our RF model. They are not necessary for the BNM either, but were always used to speed up the BNM process (e.g., Kos et al. 2013; Zhao et al. 2022; GFPR).

Two of the most important parameters for the RF model are the number of trees in the forest (n_estimators; NE) and the maximum depth of the tree (max_depth; MD). Therefore, we trained RF models using various NE ({20, 40, 60, 80, 100}) and MD ({2, 5, 10, 30, 50, 100}), keeping other parameters as default values in scikit-learn, and completed the model selection by the performance of the validation set. For each pair of NE and MD, we applied the trained RF model to predict the stellar components in the DIB window for each RVS spectrum in the validation set and calculated the residuals between the observed and modeled normalized fluxes at each wavelength pixel. The mean residuals of these RVS spectra as a function of the spectral wavelength for different pairs of NE and MD are shown in Fig. A.1 together with their standard deviations. For small NE and MD, structural residuals can be seen around the Ca IJ line at 8664.5 Å (Contursi et al. 2021), which means a poor modeling of this strong line. With the increase in NE and MD, the Ca II is better modeled and the standard deviation also becomes smaller. Furthermore, the degree of dispersion of the residuals becomes similar for NE ⩾ 60 and MD ⩾ 30. Their mean residuals are slightly smaller than zero, however, which might be caused by the imperfect normalization of the observed spectra. Some mean residuals with small NE and MD are very close to zero, but their dispersion is apparently stronger. Some weak structural residuals exist even for the maximum NE and MD, but they are only about 10−4. On the other hand, Fig. A.2 shows the mean of the absolute residuals (MAR), taken along the wavelength within the DIB window, of each RVS spectra in the validation set as a function of the spectral S/N. MAR decreases with the increase in S/N, presenting a strong dependence. The dependence breaks for S/N ≳ 300, where the dispersion of MAR represents the robustness of the RF model for different types of spectra. MAR would also dramatically increase for stars with extreme parameters. For NE ⩾ 20 and MD ⩾ 30, the distribution of MAR becomes similar and MAR is smaller than 0.01 for S/N ≳ 100. Because of the similar performance of the validation set for large NE and MD, we selected a final parameter pair of NE = 100 and MD = 50. We also trained RF models with larger NE and MD and found that the performance improvement was not significant. For NE = 200 and MD = 300, the mean and the standard deviation of the residuals is −0.46 and 117.58, which is only slightly better than those for NE = 100 and MD = 50.

With the selected NE = 100 and MD = 50, we applied the RF model to the testing set to evaluate its performance on the target sample. Because the S/N of 62% of the target spectra is lower than 50, we randomly selected 10000 reference spectra with 20 ⩽ S /N < 50 and added them to the testing set (a total of 17 324 reference spectra). Figure 3 presents the residuals as a function of wavelength for each spectrum in the testing set, sorted by the spectral S/N. For S /N > 50, the distribution of residuals is generally uniform along the wavelength, which indicates a good modeling of the stellar lines. Only the residuals near the Ca II line (indicated by a dashed green line in Fig. 3) are more significant than the residuals in its vicinity. The performance of the RF model becomes worse for spectra with lower S/N, showing systematic differences between the observed spectra and the modeled stellar template and structural residuals near the stellar lines. The systematic difference could be caused by the imperfect normalization for low-S/N RVS spectra. We applied a linear continuum in the DIB model (see Sect. 3.2) that can reduce this effect.

Figure 4 shows the stellar templates predicted by the RF model for four RVS spectra within the DIB window. Strong stellar lines, such as Fe I and Si I, are well modeled, and the feature of λ8621 is clearly visible in the derived ISM spectra. The residuals near the center of Ca II increase slightly. In the last example (bottom), the RVS spectrum was not perfectly normalized, but the RF model can still predict the stellar components properly. Furthermore, the ISM spectrum is well fit by our DIB model with a linear continuum (see the bottom panel in Fig. 5).

thumbnail Fig. 3

Residuals between the observed and modeled normalized fluxes as a function of the wavelength for the 17 324 RVS spectra in the testing set. The color represents the residuals, and the spectra are sorted by spectral S/N, increasing from bottom to top. S/N = 22, 50, and 100 is indicated by the dashed black, blue, and red lines, respectively. The Ca II line within the DIB window is marked as a dashed green line.

thumbnail Fig. 4

Four examples of the RF prediction within the DIB window. The black and red lines are the observed RVS spectra and the predicted stellar parameters, respectively. The blue line is the derived ISM spectrum with an offset of 0.3 for the normalized flux. Orange marks the masked region during the fittings. The Gaίa source ID of these targets is indicated. Some typical stellar lines within the DIB window determined by Contursi et al. (2021) are marked as well. The DIB fitting to these ISM spectra is shown in Fig. 5.

thumbnail Fig. 5

Examples of the fits to DIBs 18621 and λ8648 in four ISM spectra. The black and red lines are the ISM spectra and fit DIB pro-flies, respectively, normalized by the fit linear continuum. The error bars indicate the observational flux uncertainties of the RVS spectra. Orange marks the masked region during the fittings. The Gaίa source ID of these targets, E(BPRP) from Andrae et al. (2023), the EWs of the two DIBs (EW8621 and EW8648), and the S/N of the ISM spectra are indicated as well.

3.2 Fitting DIBs in ISM spectra

With the trained RF model, ISM spectra can be obtained by the modeled stellar templates in the DIB window divided by the observed spectra for each RVS source in the target sample. The S/N of the ISM spectra was calculated between 8602 and 8612 Å as mean(flux)/std(flux). Following our previous works (Zhao et al. 2022; GFPR), we modeled the profiles of the two DIBs in the ISM spectra with a Gaussian function (Eq. (1)) for 28621, a Lorentzian function (Eq. (2)) for λ8648, and a linear function for the continuum (Eq. (3)), G(λ;D,λDIB,σDIB)=D×exp((λλDIB)22σDIB2),$G\left( {\lambda ;{\cal D},{\lambda _{{\rm{DIB}}}},{\sigma _{{\rm{DIB}}}}} \right) = - {\cal D} \times \exp \left( { - {{{{\left( {\lambda - {\lambda _{{\rm{DIB}}}}} \right)}^2}} \over {2\sigma _{{\rm{DIB}}}^2}}} \right)$(1) L(λ;D,λDIB,σDIB)=(DσDIB2)(λλDIB)2+σDIB2,$L\left( {\lambda ;{\cal D},{\lambda _{{\rm{DIB}}}},{\sigma _{{\rm{DIB}}}}} \right) = {{ - \left( {{\cal D}\sigma _{{\rm{DIB}}}^2} \right)} \over {{{\left( {\lambda - {\lambda _{{\rm{DIB}}}}} \right)}^2} + \sigma _{{\rm{DIB}}}^2}},$(2) C(λ;a0,a1)=a0×λ+a1,$C\left( {\lambda ;{a_0},{a_1}} \right) = {a_0} \times \lambda + {a_1},$(3)

where 𝒟 and σDIB are the depth and width of the DIB profile, λDIB is the measured central wavelength, a0 and a1 describe the linear continuum, and λ is the wavelength. Subscripts 8621 and 8648 are used below to distinguish the profile parameters of the two DIBs. The full parameters of the DIB model are Θ = {𝒟8621,λ8621,σ8621,𝒟8648,λ8648,σ8648,a0,a1}. A Markov chain Monte Carlo (MCMC) procedure (Foreman-Mackey et al. 2013) was performed to implement the parameter optimization with flat and independent priors for the DIB parameters. The best estimates of the DIB parameters and their statistical uncertainties were taken in terms of the 50th, 16th, and 84th percentiles of the posterior distribution drawn by the MCMC procedure. We refer to Sect. 3.3 in GFPR for a very detailed description of the DIB model, priors, and the MCMC fitting procedure. We continued to use the masked region between 8660 and 8668 Å during the fitting, although the RF algorithm modeled the Ca II line much better than BNM. We note that the uncertainties of the ISM spectra used in the MCMC fitting only include the observational flux errors of the RVS spectra because the RF model cannot estimate the uncertainty of their predictions, so that the total uncertainties would be underestimated. According to Eqs. (1) and (2), the equivalent width (EW), representing the DIB strength, for λ8621 was calculated as EW8621=2π×D8621×σ8621${\rm{E}}{{\rm{W}}_{8621}} = \sqrt {2\pi } \times {{\cal D}_{8621}} \times {\sigma _{8621}}$ and for λ8648 as EW8648 = π × 𝒟8648 × σ8648. The lower (16%) and upper (84%) confidence levels of EW were estimated by the distributions of 𝒟 and σDIB drawn from the MCMC posterior samplings. The full width at half maximum (FWHM) of the two DIBs was calculated as FWHM8621=8ln(2)×σ8621$FWH{M_{8621}} = \sqrt {8\ln (2)} \times {\sigma _{8621}}$ for the Gaussian profile of λ2862.1 and FWHM8648 = 2 × σ8648 for the Lorentzian profile of λ8648.

Four examples of the DIB fits are shown in Fig. 5, whose ISM spectra are sorted by E(BPRP) calculated by Andrae et al. (2023). EW8621 and EW8648 both increase with E(BPRP). The profile of DIB λ8621 is prominent in all the ISM spectra, while the profile of DIB λ8648 is much more shallow and broader than that of λ8621. Because of the very small 𝒟8648, it is much more difficult to measure λ8648 than λ8621 in the ISM spectra derived from the individual RVS spectra. Additionally, the masked region, in which the residual of the Ca II line is clear, also affects the fit to the red wing of the λ8648 profile.

We performed an injection test following the principles in AS23. The details and discussions are presented in the Appendix B. In summary, the distribution of the Z-scores as a function of the injected DIB parameters and the S/N of the ISM spectra is perfectly uniform, which is highly consistent with the findings in AS23, primarily validating our RF model and the DIB fittings.

4 Results

4.1 Selecting a reliable DIB catalog

The DIB fitting was performed for 780 513 ISM spectra derived from the target sample. To select reliable measurements, we calculated the χdof 2=χtot 2/$\chi _{{\rm{dof }}}^2 = \chi _{{\rm{tot }}}^2/$d.o.f, where χtot 2$\chi _{{\rm{tot }}}^2$ is the total χ2 between the fit DIB profile and the ISM spectrum, and dof = 259 is the degree of freedom of the DIB model (267 wavelength pixels and eight DIB parameters). We applied a cut on 0.71<χdof 2<1.41$0.71 < \chi _{{\rm{dof }}}^2 < 1.41$. The borders were applied by AS23 based on their injection test. Generally, stricter borders provide more accurate measurements to some extent, but also lose more cases. We tried other borders and found that 0.71 and 1.41 were proper borders to exclude many of the noisy cases with pseudo-fittings. We further required 𝒟8621 > 3Rc, where Rc is defined as the standard deviation of the residuals between the flux of the ISM spectrum and the fit DIB profile in a range from λDIB − 3σDIB to λDIB + 3σDIB. Rc represented the noise level close to the center of the DIB profile, and 3RC was a strict cut for the strong DIB λ8621, while for 28648, we only required 𝒟8648 > Rc. This criterion cannot ensure a good measurement of λ8648, but can exclude very noisy cases. On the other hand, it introduces a selection bias to DIB λ8621 as the two DIBs are not required to exist in the spectra together. Finally, we required the S/N of ISM spectra to be greater than 20. All these cuts left us with 8388 DIB measurements.

Figure 6 shows the σDIBλDIB distribution of DIB λ8621 for the full target sample (left panel), the selected 8388 DIB measurements (middle panel), and the reliable measurements in AS232. The full target sample has a scattered distribution with a high-density region (red region in the figure) around the rest-frame wavelength (λ0 = 8623.141 Å; see Sect. 4.3) and the mean Gaussian width (σ8621 ≈2Å; Herbig & Leka 1991; Jenniskens & Desert 1994; AS23; GFPR) of DIB 28621. The impact of the stellar line residuals is apparent for small σ8621 that λ8621 concentrated around the Fe I lines around 8624 and 8625 Å. The selected DIB measurements in this work and AS23 both presented a quasi-Gaussian distribution in the σDIBλDIB panel, with some deviations that behave in different ways in the two catalogs. The number density of each point in the σDIBλDIB distribution was estimated by a Gaussian kernel density estimation (KDE) using the Python package scipy (Virtanen et al. 2020). Our catalog contains a tail toward small σ8621 (≲ 1Å), and λ8621 of these cases seems to be affected by the Fe I line. This is because we set an initial guess of σ8621 = 1.2Å in the MCMC fitting for all the cases so that noisy ISM spectra with weak DIB features would obtain a fit σ8621 around this initial guess. On the other hand, AS23 contains more cases with large σ8621 (≳3 Å) than our catalog, and their λ8621 there is more scattered as well. The median σ8621 in our catalog is 2.13 Å which is slightly larger than that in AS23 (1.92 Å). This may be due to the different pixel sizes of RVS spectra used in this work (0.3 Å pixel−1)3 and in AS23 (0.1 Å pixel−1). We further applied cuts to constrain λ8621 between 8620 and 8626 Å and σ8621 within 1–4 Å. Hence, the final selected DIB catalog contains 7619 measurements. This DIB catalog can be accessed via the CDS database in Strasbourg, France (Ochsenbein et al. 2000).

The reliable DIB catalog in this work was constructed only by simple cuts on χ2, noise level (S/N and Rc), and the DIB parameters (λ8621 and σ8621). It presents a Gaussian-like σDIBλDIB distribution without any significant impacts of stellar lines and shows a good correlation between EW8621 and the dust reddening (see Sect. 4.2). The catalog certainly contains pseudo-fittings, such as the outliers seen in the σDIBλDIB diagram, but they should only take a very small part of the catalog after the quality control and only have little impact on the statistical analysis of the DIB properties. Furthermore, as DIBs are weak features, investigation of specific fittings would need a visual inspection of their ISM spectra. Figure 7 shows the Galactic distribution of the number of DIB measurements and the mean EW8621 in the DIB catalog. The detected DIBs have an uneven distribution and are concentrated in the Galactic midplane and some prominent molecular regions, with a remarkable extension to high latitudes in the directions of the Galactic center (GC) and anticenter (GAC). Like dust reddening, the large mean EW8621 focuses on the Galactic plane with |b| ≲ 5° and decreases on average with the increase in latitude.

To validate the DIB catalog, we compared the fit and integrated EW of DIB λ8621 for all the 7619 measurements. Because the profiles of λ8621 and λ8648 might overlap, the fit profile of λ8648 was first subtracted from the ISM spectra normalized by the fit linear continuum, and then the remaining part within λ8621 ± 3σ8621 was integrated. Figure 8 shows the comparison between fit and integrated EW8621, as well as the EW difference (ΔEW = EWfit − EWint) as a function of the fit EW8621. The fit and integrated EW8621 are highly consistent with each other, with a mean difference of only 0.001 Å and a standard deviation of 0.016 Å. ΔEW is smaller than the uncertainty of EW8621 (0.031 Å on average) for over 96% of the measurements, and 90% of ΔEW is smaller than 0.023 Å. The difference between fit and integrated EW8621 tends to increase for measurements with large EW8621. These measurements were made in ISM spectra with generally lower S/N, where the residuals of the stellar lines would dramatically increase (the RF model performs worse with low S/N; see Fig. A.2) and consequently lead to an increase in the ΔEW. We checked some ISM spectra with large EW8621 and ΔEW and found structural features in addition to the DIB signal. These features are more likely from the stellar residuals than the possible Doppler splitting caused by multiple ISM clouds along the sightlines because they can be far away from the center of the DIB profile and usually have a much smaller depth than λ8621. Despite the heavier influence of the stellar residuals and noise, the relative EW uncertainty does not increase for large EW8621 . For EW8621 > 0.5 Å (626 measurements), the fractional error of the fit EW8621 is mainly (99.4%) within 20%, with a mean of 11.9%, and ΔEW is mainly (99.4%) smaller than 10% of the fit EW8621. The Doppler broadening caused by the unresolved multiple DIB components and the probable intrinsic asymmetry of the DIB profile may contribute to ΔEW as well. However, the S/N of the ISM spectra in this work is not high enough to distinguish these effects from the others.

thumbnail Fig. 6

Number density of the measured DIB 18621 as a function of the Gaussian width (σ8621) and the central wavelength (λ8621) of the fit DIB profile for the full target sample (left), 8388 selected DIB measurements (middle), and the reliable measurements in AS23 (right). The color of the left panel represents the number of DIBs calculated in 0.025 A × 0.1 A bins. In the middle and right panels, the color represents the number density estimated by a Gaussian KDE. The dashed red line indicates the rest-frame wavelength of DIB λ8621 determined in this work (8623.141 Å, see Sect. 4.3).

thumbnail Fig. 7

Distribution of the number of DIB measurements (NDIB, upper panel) and the mean EW8621 (lower panel) in a Galactic projection for the selected DIB catalog (7619 measurements). The NDIB and mean EW8621 were calculated in each HEALPixel with a resolution of 1.83° (Nside = 32).

thumbnail Fig. 8

Comparison between the fit and integrated EW8621 for 7619 measurements in the DIB catalog. Upper panel: the color represents the number density estimated by a Gaussian KDE. The gray bars show the uncertainty of the fit EW8621. The dashed red line traces the one-to-one correspondence. A zoom-in panel shows the distribution of the EW difference (ΔEW = EWfit − EWint). The mean (Δ) and standard deviation (σ) of ΔEW are indicated. Lower panel: distribution of the ΔEW as a function of the fit EW8621.

4.2 DIB Λ8621 and dust reddening

Both DIB EW and dust reddening can be used to map the spatial distribution of ISM species and the Galactic large-scale structures, but the EW generally has a much larger relative uncertainty than reddening at present. A tight linear correlation between DIB EW and dust reddening has been discovered for a set of strong DIBs with early-type stars as background sources (e.g. Munari et al. 2008; Friedman et al. 2011; Lan et al. 2015) despite the inevitable scatters and outliers (see the review of Krełowski 2018). On the other hand, the degree of dispersion between DIB EW and dust reddening usually increases by an order of magnitude for a spectroscopic survey data set that is dominated by late-type stars (see, e.g., Kos et al. 2013; Zasowski et al. 2015; GDR3). The relatively lower S/N of the survey spectra (compared to the specifically designed DIB observations) and the difficulty in modeling the atmospheric components of late-type stars certainly contribute to increase the dispersion in the DIB-dust correlation. However, the numerous observations in the survey should contain some sightlines in which the DIB carriers and dust grains are not spatially associated with each other because the dust grain is currently not considered a candidate DIB carrier (Cox et al. 2007, 2011).

We reviewed the correlation between DIB λ8621 and dust reddening with our new DIB measurements and two sources of reddening. Our DIB catalog contains 2957 cases with E(BPRP) from Andrae et al. (2023) and 4656 cases with Av from Green et al. (2019). The best-estimated E(BPRP) and its lower and upper confidence levels for the target stars were accessed via the Gaia archive. AV and its uncertainty were obtained with the bayestar module in dustmaps using the per-centile mode. AV equals 2.742 times the reddening unit given by bayestar (Green et al. 2019). The scatter plot between EW8621 and dust reddening is shown in Fig. 9 for E(BPRP) (upper panel) and AV (lower panel), with the median values and standard deviations taken in each EW8621 bin with a step of 0.05 Å (red dots). The median dots present a good linear relation between EW8621 and dust reddening for both E(BPRP) and AV for EW8621 ≲ 0.5 Å, with a deviation at larger EW8621. AS23 reported larger AV than expected when the median dots deviated from their linear fit to EW8621 and AV, but conversely, the median AV becomes smaller than expected in our work in these regions. This is only because the median dots were taken in EW8621 bins in this work ,but in AV bins in AS23. We checked a part of the outliers, for example, with high reddening but small EW8621, and found that many of the DIB measurements have proper DIB parameters, and their ISM spectra clearly contain DIB signals by a visual inspection. This verifies that the DIB and dust are not required to appear together. They could statistically present a linear relation only due to the accumulation of different ISM species along the sightline. Therefore, DIB EW may not be a good proxy for dust reddening in specific directions, and their ratio also varies with the investigated samples.

The linear fits to EW8621 and dust reddening performed in previous works are also plotted as dashed lines in Fig. 9. For E(BPRP), the tendency of the median dots is consistent with the fit line of GDR3 (black) and of GFPR (magenta). Furthermore, the standard deviation of the individual measurements in each EW8621 bin is much larger than the difference between GDR3 and GFPR. For Av, the median dots are closer to the line of GDR3 (red) than that of AS23 (blue). AS23 obtained a fit EW8621/Av = 0.106 ± 0.017 Å mag−1, corresponding to 3.448 mag Å−1 of E(BV)/EW8621, which is 57% larger than the value fit in GDR3 (2.198 mag Å−1). This difference is mainly caused by the systematic difference in EW8621 (see Fig. 14), but is not the control of the bias and uncertainties argued by AS23. The E(BV)/EW8621 ratio derived in different works (e.g. Wallerstein et al. 2007; Munari et al. 2008; Kos et al. 2013; Puspitarini et al. 2015) has a 20% difference on average (see Table 3 in GDR3). The result of AS23 is similar to that in Zhao et al. (2021), who used the Gaia–ESO (Gilmore et al. 2012) data set and E(BV) from Schlegel et al. (1998). We emphasize that the mean correlations between EW8621 and dust reddening derived in our series using the Gaia RVS data (GDR3, Zhao et al. 2022, GFPR) are consistent with each other within 10% (see also the discussions in Sect. 5.2 in GFPR).

thumbnail Fig. 9

Correlation between EW8621 and dust reddening for 2957 cases with E(BPRP) from Andrae et al. (2023) shown in the upper panel and for 4656 cases with Av from Green et al. (2019) shown in the lower panel. The color of the scattered points represents their number density estimated by the Gaussian KDE. The red dots and their color bars are the median values and the standard deviations calculated in each EW8621 bin with a step of 0.05 Å. The linear fits to EW8621 and dust reddening from previous works are overplotted as dashed lines: magenta for GFPR, black and red for GDR3, and blue for AS 23.

4.3 Kinematics of DIB λ8621

To study the kinematics of the carrier of DIB λ8621, it is most fundamental and important to determine its rest-frame wavelength (λ0), which is also required to identify the nature of λ8621 through the comparison to the laboratory measurements. In this work, we followed the statistical method, which assumes that the DIB radial velocity is null toward the GAC in a circular rotational obit, and thus the intercept at = 180° would indicate λ0 (see Zasowski et al. 2015; GDR3; AS23).

As the published RVS spectra have been shifted in the stellar frame where stellar lines are at their rest positions but the DIB features are additionally shifted, we converted λ8621 into the heliocentric frame (λhelio) using the radial velocity of stars (Vstar) determined in Gaia DR3 (Katz et al. 2023). We selected 77 DIB measurements in our DIB catalog with |b| < 5°, 170° < ℓ < 190°, d⩽ 3 kpc, err(λ8621)< 0.5 Å, and err(Vstar) < 5 km s−1. The information of their background stars, including the Gaia-DR3 source ID, Galactic coordinates, apparent G magnitudes, and stellar atmospheric parameters from GSP-Spec, are listed in Table C.l, as well as their λhelio We note that some cases seem to be early-type stars without GSP-Spec estimates of their stellar parameters. Nevertheless, by visual inspection, our RF model, which does not rely on stellar parameters, also works well for these spectra, even though the RVS sample is dominated by late-type stars. Figure 10 presents the slightly linear trend of λhelio around the GAC, reflecting the projection of the galactocentric rotation of the DIB carrier (Zasowski et al. 2015). Some tiny deviations of λhelio from the linear trend can be seen. In addition to the fitting uncertainty of λ8621, the turbulent motion in the DIB cloud and the possible physical changes of DIB shapes and positions (see e.g. Galazutdinov et al. 2008; Krełowski et al. 2021) probbably also contribute. The applied statistical method might reduce these effects if no strong systematic deviations existed. A least-square linear fit to λhelio and the angular departure from the GAC (ℓ) obtained an intercept of 8623.446 ± 0.030 Å. The uncertainty was estimated by a 2000 times Monte Carlo simulation according to the fit uncertainty of λ8621 We note that the mean error of λ8621 from the MCMC fitting is 0.256 Å for the 77 selected DIB measurements, which is larger than the statistical uncertainty by an order of magnitude. A factor of c/(c + U) was used to correct for the effect of solar motion, where c is the speed of light and U = 10.6 km s−1 (Reid et al. 2019) is the radial solar motion toward the GC. Finally, we obtained a λ0 = 8623.141 ± 0.030 Å for DIB λ8621, which is perfectly consistent with the result of AS23 (8623.14 ± 0.087 Å), although we did not consider the distance calibration proposed in AS23. This value is nevertheless lower than that of GDR3 (8623.23 ± 0.019 Å) by 3.0σ using our uncertainty for λ0 (0.030 Å). Our derived λ0 corresponds to 8620.766 Å in air wavelength, which is consistent with most of the literature results within 2σ, such as 8620.7 Å of Sanner et al. (1978), 8620.75 Å of Herbig & Leka (1991), 8620.79 Å of Galazutdinov et al. (2000), 8620.7 Å of Munari et al. (2008, after the correction of the solar motion), and 8620.83 Å of Zhao et al. (2021).

We applied the same selection criteria to 9763 cases in the HQ DIB catalog in GDR3 measured in the RVS spectra that are published in Gaia DR3 and obtained 67 DIB measurements. The derived λ0 is 8623.368 ± 0.037 Å , which is even larger than the result of GDR3 with 0.14 Å (3.8σ). This result suggests that compared to the full Gaia RVS data set, the use of the public sample in DR3 would lead to a redshift of λ0. Therefore, the smaller λ0 determined in this work and in AS23 is not caused by the selection bias of the sample, but by the systematic difference of λ8621. As pointed out by AS23, λ8621 in GDR3 would be affected by the improperly modeled stellar lines. On the other hand, it is not clear whether the λ0 derived in this work and AS23 with the public RVS sample is also redder than the true value. We expect that this problem could be answered by the following analysis of GFPR or the DIB measurements in Gaia DR4.

With the derived λ0, we calculated the radial velocity of the carrier of DIB λ8621 (VDIB) in the local standard of rest (LSR)4. We selected 3592 DIB measurements at low latitudes (|b| < 5°) and with accurate λ8621 (err(λ8621) < 0.5 Å) and Vstar (err(Vstar) < 5 km s−1) to present the variation in VDIB along with the Galactic longitude. The rotation of the DIB λ8621 carrier is clearly seen in the upper panel of Fig. 11, overlaid with the theoretical rotation curves with different distances from the Sun. Specifically, for a distance from the Sun (d), the galactocentric distance is calculated as RGC=(d2+R022dR0cos())1/2${R_{{\rm{GC}}}} = {\left( {{d^2} + R_0^2 - 2d \cdot {R_0} \cdot \cos (\ell )} \right)^{1/2}}$, where R0 = 8.15 kpc is the galactocentric distance of the Sun. Then the circular velocity (Θ) is predicted by model A5 in Reid et al. (2019) with RGC and , assuming b = 0. Finally, the radial velocity for a given d and is Vr = (Θ/R − Θ0/R0) · R0 · sin(), where Θ0 = 236 km s−1 is the circular velocity of the Sun.

When we consider the median VDIB in each Δ = 10° bin (red dots in Fig. 11), the carrier of λ8621 in the selected sample is mainly located within a kinematic distance of 2 kpc from the Sun, although the velocities from individual DIB measurements are much more scattered. This is a reasonable interpretation based on inspecting the mean distances to the background stars of these DIB signals, which are all larger than 2 kpc with a minimum of 2.3 kpc. Moreover, the DIB carriers toward the GAC have a larger distance on average than those toward the GC. In the lower panel in Fig. 11, the median VDIB is compared to the longitude-velocity map of 12CO J = (1−0) emission from Dame et al. (2001). We made use of the momentum-masked cube restricted to a latitude range of ±5°5. The median VDIB follows the 12CO velocity curve in the local region, especially from ≈ −90° to 1 ≈ −180°. The average VDIB deviation in each longitude bin is 13.8 km s−1, which prevents the exploration of a finer relation between DIB λ8621 and 12CO in velocity structures. Nevertheless, more DIBs with large VDIB in general can be found in the regions with high-velocity 12CO emission. For instance, the 12CO velocities between 90° and 180° concentrate in two main branches that can be interpreted as the Local and Perseus spiral arms (e.g. Reid et al. 2019). Although AS23 suggested that some DIB measurements coincide with the 12CO emission in the Perseus arm, the density distribution of VDIB there did not bifurcate. These cases consequently only represent a very small percentage of the total. For ≈−60° to 0°, VDIB coincides well with some discrete 12CO emission at −20 to −10 km s−1, suggesting that these DIB signals would originate in the Local arm.

thumbnail Fig. 10

Observed central wavelengths in vacuum of DIB λ8621 in the heliocentric frame (λhelio) as a function of the angular distance in longitude from the Galactic anticenter (∆) for 77 DIB measurements in this work. The black dots are the individual measurements with the fit uncertainties. The red line is the linear fit to the black dots.

thumbnail Fig. 11

Longitude-velocity diagram for DIB λ8621 and 12CO. Upper panel: variation of the radial velocity of DIB λ8621 carrier (VDIB) along with the Galactic longitude for 3592 selected DIB measurements. The points are colored by their number density estimated by the Gaussian KDE. The red dots with error bars are the median VDIB calculated in each longitude bin with a step of 10°. The colored lines are theoretical rotation curves calculated with the rotation model in Reid et al. (2019) with different distances from the Sun. Lower panel: median VDIB superimposed on the longitude-velocity map of 12CO J = (1−0) emission from Dame et al. (2001). The color scale displays the 12CO latitude-integrated intensity on a logarithmic scale.

4.4 Detection of DIB λ8648 in individual RVS spectra

The DIB signal around 8648 Å was first reported and measured by Sanner et al. (1978), with positive support from Herbig & Leka (1991), Jenniskens & Desert (1994), Wallerstein et al. (2007), and Munari et al. (2008), but it was missed in the DIB survey of Galazutdinov et al. (2000) and Fan et al. (2019). These inconsistent results could be due to the difficulties in measuring a weak and broad DIB feature. The high-resolution spectra of early-type stars used in these previous studies would introduce uncertainties in the continuum placement when measuring broad DIBs like this (Sonnentrucker et al. 2018). Furthermore, stellar lines would cause contamination even for early-type stars, such as the very strong Paschen 13 line (see Fig. 1 in Munari et al. 2008) and the He I line at 8648.3 Å reported by Krełowski et al. (2019) in a B-type star (HD 169454). Under these effects, any conclusions about this DIB signal could be distorted in case studies with early-type stars.

Compared to early studies, the large number of Gaia RVS spectra allows us to systematically investigate this signal with a much larger spatial coverage. Based on the BNM, we successfully modeled the stellar components of late-type stars and detected the DIB signal near 8648 Å in stacked RVS spectra in Zhao et al. 2022 and GFPR. We cite this DIB as λ8648 following the suggestion in Jenniskens & Desert (1994), but we obtained a smaller λ0 as 8645.3 ± 1.4 Å in Zhao et al. (2022). The profile of λ8648 was found to be very shallow and broad. In this work, we further verified that DIB λ8648 can be detected in individual RVS spectra, although their S/N are much lower than those after stacking. Figure 5 already shows the clear profile of λ8648 in four ISM spectra. In this section, we selected 179 measurements from the DIB catalog for a statistical study of λ8648 and its correlation with λ8621, with strict criteria: 𝒟8648 > 3RC, 8640 < λ8648 < 8655 Å, 8 < σ8648 < 12 Å, and S /N > 50.

The high consistency between the fit and integrated EW8648 (Fig. 12a) demonstrates that the DIB profile of these cases was properly fit. The general decrease of S/N for large EW8648 caused the slight deviation. It should be noted that the fit and integrated EW8648 shown in Fig. 12a were simply calculated outside the masked region between 8660 and 8668 Å where Ca II residuals would exist (see, e.g., Fig. 5). This means that they are smaller than the EW8648 shown in Fig. 12b that was directly calculated based on the fit DIB parameters.

EW8621 and EW8648 present a linear correlation with a Pearson coefficient (rp) of 0.82 (Fig. 12b). However, the EW8621/EW8648 ratio in this work is systematically lower than that in GFPR, especially for small EW8648. The cause could be a detection bias due to the limited S/N of individual RVS spectra. For 𝒟8648 is only about one-third of 𝒟8621, weak λ8648 is harder to be detected at a given magnitude of S/N than weak λ8621, resulting in a lack of measurements with large EW8621/EW8648. This effect should be less pronounced for GFPR because the S/N of the stacked ISM spectra was far higher. On the other hand, after visually checking the extreme measurements with EW8648 ≳ 0.8 Å but EW8621 ≲ 0.2 Å, we found that the jagged noise within the very broad profile of λ8648 could lead to an overestimation of EW8648. There is another possibility. If the carriers of λ8621 and λ8648 have different spatial distributions and the λ8648 carrier is more compact, the stacking of ISM spectra in a 3D volume will obtain a smaller EW8648 (lower mean abundance) than EW8621. Nevertheless, without a map of their distribution, we cannot analyze the extent of this effect. Additionally, EW8621 / EW8648 would also vary from one sightline to the next and present different relations with different samples.

The measured central wavelength of the two DIBs also presents a moderate linear relation with rp = 0.68 (Fig. 12c), but their FWHM is noise dominated, especially for λ8648. With a linear fit to λ8621 and λ8648, we made a rough estimate of λ0 for λ8648 as 8646.31 Å in vacuum, assuming that the carriers of λ8621 and λ8648 are comoving. This value corresponds to 8643.93 Å in air, which is much smaller than previous suggestions, such as 8650 Å by Sanner et al. (1978), 8648.28 Å by Jenniskens & Desert (1994), and 8649 Å by Herbig & Leka (1991). The difference between this result and Zhao et al. (2022) is 1.01 Å, slightly larger than the mean uncertainty of λ8648 of the cases we used of 0.69 Å. Figure 13 shows the correlation between EW8648 and AV from Green et al. (2019) for 93 cases (others are beyond the sky coverage of Green et al. 2019). A moderate linear correlation can be found with rp = 0.70. Compared to λ8621, DIB λ8648 presents a worse correlation with dust reddening, which has been noted in Zhao et al. (2022) and GFPR. Similar to EW8621 /EW8648, the AV/EW8648 ratio in this work is systematically lower than that in GFPR.

thumbnail Fig. 12

Correlations between DIBs λ8621 and λ8648 for 179 selected measurements for (a) fit and integrated EW8648 outside the masked region between 8660 and 8668 Å; (b) fit EW; (c) measured central wavelength; and (d) FWHM. The dashed green line in panel a traces the one-to-one correspondence. The colored points in panel b are the results from GFPR. The color in panel c represents the number density estimated by the Gaussian KDE, and the red line is a linear fit to all the data points. The Pearson coefficient (rp) of the correlation between the parameters of λ8621 and λ8648 for the 179 selected measurements is indicated in panels b and c.

4.5 Reassessing the DIB measurements in Gaia DR3

When we consider the small distances to the background stars (the median distance is 1.31 kpc) and the moderate S/N of the RVS spectra (the median S/N is 115.3) for the HQ sample of GDR3, its measured λ8621 and σ8621 should present a quasi-Gaussian distribution centered on λ0 and the mean Gaussian width with a dispersion due to the uncertainties and the Galactic rotation (like the distribution seen in Fig. 6 for this work and AS23). However, a strong dependence between λ8621 and σ8621 is clearly seen for the HQ sample of GDR3 (see Fig. 1 in AS23), which would be attributed to the improperly modeled stellar lines. On one hand, for small σ8621 (≲1 Å) and large λ8621 (around 8624–8626 Å), the fittings might be purely pseudofittings because the residuals of Fe I lines there would be stronger than the DIB features. On the other hand, the increase in σ8621 with λ8621 shifting shortward (8622–8624 Å) implies a broadening of the DIB profile caused by the noise and stellar residuals as more stellar lines at shorter wavelength in the vicinity of the DIB signal.

Compared to GDR3, the ISM spectra derived by the data-driven methods in this work and AS23 are less strongly influenced by the stellar residuals. Therefore, to estimate the magnitude of the biases caused by the stellar residuals, we compared the DIB parameters from GDR3 to those from this work and AS23 fit in the same RVS spectra, specifically, 1518 cases between GDR3 and this work and 3167 cases between GDR3 and AS23, as well as 2000 cases between this work and AS23 as a control group. We only considered the highest level of the DIB quality flag in GDR3 (i.e., QF = 0; see Sect. 2 in GDR3 and Sect. 8.9 in Recio-Blanco et al. 2023 for details). The differences in DIB parameters as a function of the fit values are shown in Fig. 14, and their statistics, including the median difference (MED), the root-mean-square difference (RMSD), and the absolute difference not exceeded by 90% of the sources (AD90), are presented in Table 1. The 𝒟8621 for AS23 (not given in their catalog) was calculated by their λ8621 and EW8621 with a Gaussian function.

The impact of the stellar residuals causes a systematic shift of λ8621 in GDR3, which is clearly seen as the systematic variation of Δλ8621 with λ8621 in Fig. 14. This phenomenon coincides with the λ8621σ8621 dependence discussed above. The MED of Δλ8621 is much larger for this work (0.073 Å) than for AS23 (0.018 Å), but the RMSD is similar (0.35 Å) and close to the pixel size, corresponding to 12 km s−1. This value is comparable to the mean uncertainty of λ8621 (0.376 Å) in GDR3 for the joint samples as well. When we consider AD90, the maximum shift for most λ8621 is about 0.56 Å (~19 km s−1), 1.5 times larger than the mean uncertainty. As a comparison, λ8621 in this work and in AS23 are highly consistent with each other, with a median difference of only 0.008 Å and with a halved RMSD and AD90. Nevertheless, Δλ8621 between AS23 and this work also presents a weak dependence on λ8621, which could be due to a weak stellar impact, even though most Δλ8621 are smaller than the RVS wavelength pixel size used in this work.

The overestimated σ8621 in GDR3 has a MED of 0.164 Å compared to this work and tends to become larger with increasing σ8621. The RMSD (0.469 Å) is slightly larger than the mean uncertainty of σ8621 in GDR3 (0.405 Å) and the AD90 reaches over 0.7 Å. As a comparison, σ8621 measured in this work is larger than that in AS23, with a nearly constant difference (a MED of –0.293 Å). In addition to the overestimated σ8621, 𝒟8621 in GDR3 is smaller than that in this work, which is different, and Δ𝒟8621 presents an increasing trend with 𝒟8621 as well. Additionally, EW8621 are highly consistent with each other between GDR3 and this work, with a MED of −0.002 Å and a RMSD of 0.030 Å, comparable to the mean EW uncertainty (0.024 Å) in GDR3. Overall, we propose that the impact of stellar residuals led to a distortion of the DIB profile in GDR3, which became slightly shallower and broader. The center of the profile was also shifted to one or two pixels at most for the joint sample, but the area of the profile (EW8621) remained unchanged.

The EW8621 in this work is systematically larger than that in AS23, and the difference increases further with the fit values and can reach around 0.1 Å for EW8621 ~ 0.3 Å. The mean ΔEW8621 is 27.1% relative to our measurements, much larger than the mean uncertainty of EW8621 (12.9%). Because AS23 and this work have consistent λ8621 and nearly constant Δσ8621, the rise in ΔEW8621 might be caused by different ML algorithms that model the DIB depth in different ways. Moreover, AS23 modeled the profile of λ8621 with a Gaussian function, but we added a Lorentzian function for λ8648 and a linear continuum accounting for RVS spectra with a poor normalization. Nevertheless, we note that GDR3 made use of synthetic spectra from stellar models and a simple Gaussian fitting, but their EW8621 is highly consistent with that in this work. Thus, the influence of the DIB model is probably not so significant. The last factor is the fitting technique. Specifically, the ISM spectrum was first derived in this work, and then the DIB profile was modeled, while in AS23, the DIB profile was implemented as a pixel-by-pixel covariance matrix, together with the stellar components and the noise, and was optimized in a set of grids. Despite the systematic difference in EW8621, the span of the 16th to the 84th percentiles of ΔEW8621 (a measure of the magnitude of the dispersion deducting the tendency) is similar to that between GDR3 and this work and that between AS23 and this work for EW8621 < 0.3 Å.

thumbnail Fig. 13

Correlation between EW8648 and AV from Green et al. (2019) for 93 selected measurements. The underlying points are the results from GFPR, colored by the number density. The Pearson coefficient (rp) for the red dots is indicated as well.

Table 1

Statistics of the differences in DIB parameters between GDR3, AS23, and this work.

thumbnail Fig. 14

Difference in DIB parameters (𝒟8621, λ8621, σ8621, and EW8621) between GDR3, AS23, and this work as a function of the fit values for the joint samples. The gray scale indicates the number density of the data points estimated by the Gaussian KDE. The data are binned with a step of 0.005 for 𝒟8621, 0.2 Å for λ8621, 0.1 Å for σ8621, and 0.01 Å for EW8621. The solid red lines in each panel represent the median differences in each bin, and the two dashed red lines show the 16th and 84th percentiles.

5 Discussion

5.1 DIB λ8621 correlates better with neutral than with molecular hydrogen

Motivated by the good consistency and the multimodality found between DIB λ8621 and 12CO in velocity structures, AS23 directly compared VDIB and VCO by a peak-finding method. Specifically, signals of λ8621 and 12CO were simply matched by the position of the background stars and the space grid of 12CO map (a resolution of 0.125° for Dame et al. 2001). For any detected 12CO emission within 1σ of VDIB, VDIB was then compared to the intensity-weighted VCO calculated within nine velocity channels around VDIB (see Sect. 3.4.2 and Appendix E in AS23 for details).+ With a linear fit restricted to |VCO| < 25km s–1, they obtained a slope of 0.95, suggesting that the DIB λ8621 carrier and 12CO are comoving, and a small intercept of –0.52 km s–1 that validated their λ0 estimation.

We followed their peak-finding method, but only considered the VCO at a peak temperature of 12CO toward each sightline and made use of 7267 DIB measurements with |b| < 30°, err(λ8621) < 1Å, and err(Vstar) < 5 km s–1. We further compared VDIB to VH I using H I data from HI4PI (HI4PI Collaboration 2016). Finally, we found 537 cases with matched λ8621 and 12CO in velocity and 1929 for λ8621 and H I. With a cross match in velocity, it is not surprising to find a strong one-to-one relation between λ8621 and 12CO as in AS23, as well as between λ8621 and H I, even with a slope closer to 1 (see the upper and middle panels in Fig. 15). The lower panel in Fig. 15 shows an example to illustrate the peak-finding method. The 12CO emission is narrow and compact mainly within ±5 km s–1, while the uncertainty of VDIB (9.78 km s–1) is much larger than the velocity span of12 CO. On the other hand, the H I emission covers a much wider velocity range and contains multiple components that cannot be resolved in the RVS spectra. With the limited accuracy of VDIB and the strong bias of the peak-finding method, it is hard to conclude that the perfect association between 12CO and the carrier of DIB λ862l implies a dumpiness of the DIB carrier. The associated velocity between DIB λ862l and 12CO, as well as H I, is more likely a result of the general Galactic rotation of these gaseous ISM species at a similar distance.

Based on the velocity-matched DIB – 12CO and DIB – H I pairs, we made a coarse investigation of the correlation between the DIB strength and the hydrogen abundance for both neutral hydrogen (NH I and molecular hydrogen (NH2${N_{{{\rm{H}}_2}}}$). We calculated their abundance as NH I = 1.823 × 1018 × IH I (HI4PI Collaboration 2016) and NH2=2×1020×WCO${N_{{{\rm{H}}_2}}} = 2 \times {10^{20}} \times {W_{{\rm{CO}}}}$ (Bolatto et al. 2013), where IH I and WCO are the velocity-integrated intensity calculated with nine velocity channels around their matched VDIB· This analysis was based on the assumption that ISM species with similar radial velocities are mainly located at a similar distance, so that the DIB features can be compared to the corresponding H I and 12CO emission, with a narrow-range integration to deduct the foreground and background contamination. This is certainly an ideal assumption. As shown in Fig. 16 on a logarithmic scale, a moderate linear correlation is EW8621 and NH I (rp = 0.74), while EW8621 is not sensitive to NH2(rp=0.21)${N_{{{\rm{H}}_2}}}\left( {{r_p} = 0.21} \right)$. Therefore, the carrier of λ862l would correlate much better with neutral hydrogen than with molecular hydrogen. Although EW8621 is proportional to the column density of the carrier only between the background star and us and the H I and 12CO observations may trace the hydrogen abundance in a much wider distance range, the narrow integration range around VDIB seems to alleviate this influence.

A set of strong optical DIBs was reported to tightly correlate with NH I, but to only present a loose correlation with NH2${N_{{{\rm{H}}_2}}}$ when NH2>1020 cm2${N_{{{\rm{H}}_2}}} > {10^{20}}{\rm{c}}{{\rm{m}}^{ - 2}}$ (e.g. Herbig 1993; Friedman et al. 2011; Lan et al. 2015). The relation between EW8621 and NH I revealed by our coarse analysis corresponds to this inference. In particular, Friedman et al. (2011) derived a tight correlation between DIB λ5780 and NH I, and the range of NH I, where they found that the correlation (20 ~ 21.5 cm–2) is similar to ours (see the dashed black line in the upper panel of Fig. 16). According to Fan et al. (2019), the E(BV)-normalized EW of λ5780 is twice as high as that of λ862l. Hence, the larger EW8621 than EW5780 at a given NH I seen in Fig. 16 would be caused by an underestimation of NH I in our analysis due to the narrow integration range. Nevertheless, we did not find a loose correlation between EW8621 and NH2${N_{{{\rm{H}}_2}}}$ even for NH2>1020 cm2${N_{{{\rm{H}}_2}}} > {10^{20}}{\rm{c}}{{\rm{m}}^{ - 2}}$, although the fit line of the EW5780NH2${\rm{E}}{{\rm{W}}_{5780}} - {N_{{{\rm{H}}_2}}}$ relation in Friedman et al. (2011) crosses the highest-density region of our sample for log [EW8621] and log[ NH2 ]$\log \left[ {{N_{{{\rm{H}}_2}}}} \right]$ (see the dashed black line in the lower panel of Fig. 16). The possible variation in the XCO factor and the saturation problem of 12CO would further hamper the investigation of the correlation between EW8621 and NH2${N_{{{\rm{H}}_2}}}$.

thumbnail Fig. 15

Correlation in velocity between DIB 18621, 12CO and H I. Upper and middle panels: correlation between the DIB λ8621 velocity (VDIB) and the in tensity-weighted velocity of the nearest12 CO (Dame et al. 2001) and H I (HI4PI Collaboration 2016) component. The color scale indicates the number density of the data points estimated by the Gaussian KDE. The error of VDIB is simply estimated by the uncertainty of λ8621. The red line is the linear fit to the LSR velocities. The fit slope (α) and intercept (t), as well as the number of the points, are indicated. The dashed green line traces the one-to-one correspondence. Lower panel: example of the peak-finding method (see Sect. 5.1). The red and blue lines are the spectra of 12CO and H I, respectively, toward the same sightline. The red star indicates the LSR velocity of the DIB signal detected on this sightline. The Galactic coordinate (,b) and the Gaia source ID of the background star are marked.

thumbnail Fig. 16

Correlation between EW8621 and NH I (upper panel) and between EW8621 and NH2${N_{{{\rm{H}}_2}}}$ (lower panel) on a logarithmic scale. The Pearson coefficient (rp) is indicated. The color scale indicates the number density. The dashed lines are the linear fit results from Friedman et al. (2011), but for DIB λ5780.

thumbnail Fig. 17

Distribution of the estimated completeness for DIBs λ8621 (upper panel) and λ8648 (lower panel) as a function of EW and σDIB based on the results of the injection tests (see Appendix B).

5.2 Completeness of the DIB catalog

With he injection test, AS23 estimated the completeness of their DIB catalog by selecting good measurements with Δ𝒟 < 20%, ΔσDIB < 20%, and ΔλDIB < 2.5 Å (differences between the fit and injected values; see Appendix F in AS23). For our selected DIB catalog, the mean uncertainty of 𝒟8621 and σ8621 is about 10% and the Δλ8621 is mainly within 0.5 between this work and GDR3 (see Table 1 and Sect. 4.5). Hence, we estimated the completeness of our DIB catalog with more rigorous criteria, that is, Δ𝒟 < 10%, ΔσDIB < 10%, and ΔλDIB < 0.5 Å, based on the results of the injection test (see Appendix B). The distribution of the estimated completeness for DIB λ8621 as a function of the injected EW8621 and σ8621 (upper panel in Fig. 17) is similar to that in AS23. The completeness generally decreases with EW8621, but its variation with σ8621 is not as clear as in AS23 because the sample we used in the injection test is smaller. In our DIB catalog, the median EW8621 is 0.2 Å and its 16th and 84th percentiles are 0.1 and 0.4 Å. The mean completeness at EW8621 ~ 0.2 Å is about 65% and between 0.1 and 0.4 Å is 68%. This is an optimistic estimate because the injection test was simple and ideal. Nevertheless, this percentile is still much higher than the fraction of the joint sample to the total between this work and AS23 (~25%), which could be a result of the selection bias on the DIB catalog (see Appendix D for a detailed discussion).

The estimated completeness for DIB λ8648 depends much less on EW8648 and σ8648, indicating a stronger influence of the correlated noise and stellar residuals on the completeness of DIB λ8648 measurements. At EW8648 ~ 0.5 Å, the mean completeness is only 25%. If we were to use loose criteria, the same as those in AS23, the completeness would increase to 53%. Although the total number of the spectra in the target sample that truly contain DIB λ8648 signals is unknown, we apparently did not obtain enough DIB λ8648 measurements (only 179 after quality filtering) even in the selected DIB catalog. One possible reason is that the selection criteria are too rigorous for DIB λ8648. Another reason is that DIB λ8648 does not exist in every spectrum with DIB λ8621 signals, that is, fewer DIB λ8648 sightlines can be detected than that for DIB λ8621. The completeness for DIB λ8648 would be overestimated as well.

Based on the estimated completeness and the comparison to AS23 (Appendix D), the DIB catalog in this work would be very pure, but only little complete. It is a critical problem to increase both the purity and the completeness of the DIB catalog. However, our RF model cannot estimate the uncertainty of its predictions meaningfully, and this is a common failure for the ML approach. We will seek more models and more intelligent injection tests or simulations in following works.

6 Summary and conclusions

We developed a random forest model for building the stellar templates within the DIB window (8600–8680 Å) using the part of the spectra outside the window. This method can be treated as an improvement of the best-neighbor method developed in Kos et al. (2013) and was applied to the RVS spectra published in Gaia DR3. The training set constituted 21 974 spectra with E(B – V) ⩽ 0.02 mag, |b| ⩾ 30°, and S/N > 50. After subtracting the stellar components by the generated templates, we fit DIB λ8621 by a Gaussian function and DIB λ8648 by a Lorentzian function as well as a linear continuum for 780 513 target spectra. These target spectra had a mean S/N of 58, and 90% of the spectral S/N was below 116. The mean distance of the background stars is 1.58 kpc, and 90% of them are located within 3.61 kpc.

Considering χdof2$\chi _{{\rm{dof}}}^2$, noise level (S/N and RC), and the constraints on λ8621 and σ8621, we selected 7619 reliable measurements for DIB λ8621 (the DIB catalog can be accessed via the CDS database). Their EW8621 presented a moderate linear correlation with dust reddening from both Andrae et al. (2023) and Green et al. (2019), and the mean EW8621/AV ratio was consistent with our previous results in GDR3 and GFPR. Using 77 DIB measurements toward the GAC and an assumption of a circular orbit, we determined an updated rest-frame wavelength of DIB λ8621 as λ0 = 8623.141 ± 0.030 Å in vacuum, corresponding to 8620.766 Å in air, which is perfectly consistent with the result in AS23, but bluer than that in GDR3. Calculated by λ0, VDIB in LSR showed a wave pattern with the Galactic longitude, revealing the projected Galactic rotation of the carrier of DIB λ8621. The median VDIB also correlated with the 12CO velocity structures in the local region, especially for the outer Galactic disk. With the peak-finding method used in AS23 and a narrow-range integration, we compared EW8621 to the neutral (NH I, from HI4PI Collaboration 2016) and molecular (NH2${N_{{{\rm{H}}_2}}}$, represented by 12CO) hydrogen column densities. This was a coarse analysis, but we found that EW8621 correlated much better with NH I (rp = 0.74) than NH2(rp=0.21)${N_{{{\rm{H}}_2}}}\left( {{r_p} = 0.21} \right)$, which was consistent with the conclusions for strong optical DIBs in previous studies.

With rigorous quality control, we obtained 179 reliable measurements of DIB λ8648 in individual RVS spectra, which further confirmed this very broad DIB feature. Its EW and central wavelength both presented a moderate linear relation with those of DIB λ8621. The λ0 of DIB λ8648 was estimated as 8646.31 Å in vacuum, corresponding to 8643.93 Å in air, assuming that the carriers of λ8621 and λ8648 are comoving.

By comparing the DIB parameters in GDR3, in AS23, and in this work, we confirmed the impact of the stellar residuals on the DIB measurements in Gaia DR3 argued by AS23. The stellar impact leads to a distortion of the DIB profile, resulting in an underestimation of 𝒟8621 and in an overestimation of σ8621. The center of the DIB profile might also be shifted (≲0.5Å), but EW8621 was consistent with our new measurements in this work with a median difference of only –0.002 Å and an RMSD of 0.030 Å.

EW8621 in this work is systematically larger than that in AS23 and the difference further increases with the fit EW. The reason for the difference might be the different ML algorithms and fitting techniques used in the two works. The selection bias of the DIB catalog was clearly revealed by the crossed groups between AS23 and this work. The DIB catalog is very pure, but has a low completeness. In the following works, we will apply more ML algorithms to different survey data and investigate their consistency and/or systematic differences.

Acknowledgements

We thank the anonymous referee for very helpful suggestions and constructive comments. This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular, the institutions participating in the Gaia Multilateral Agreement. This work is supported by the National Natural Science Foundation of China (grant No. 12203099). H.Z. is funded by the China Postdoctoral Science Foundation (No. 2022M723373) and the Jiangsu Funding Program for Excellent Postdoctoral Talent. H.Z. acknowledges the helpful discussions with Dr. Jianjun Chen, Dr. Biwei Jiang, and Xiaoxiao Ma. T.Z. acknowledges financial support of the Slovenian Research Agency (research core funding No. P1-0188) and the European Space Agency (Prodex Experiment Arrangement No. 4000142234).

Appendix A Performance of the validation set

Figures A.1 and A.2 show the distribution of residuals between the observed and modeled normalized fluxes of 7324 RVS spectra in the validation set. The performance of the validation set and the selection of the best RF model are discussed in detail in Sect. 3.1.

thumbnail Fig. A.1

Mean residuals between the observed and modeled normalized fluxes taken for 7324 RVS spectra in the validation set at each wavelength pixel in the DIB window as a function of the spectral wavelength for different parameter pairs of n_estimators (NE) and max_depth (MD) in the RF model (scikit-learn). The error bar represents the standard deviation of the residuals of individual spectra at each wavelength pixel. The mean value (Δ) and standard deviation (σ) for all the pixels are marked in each panel. All the values are enlarged by a factor of 104.

thumbnail Fig. A.2

Mean of the absolute residuals (MAR) between observed and modeled normalized fluxes, taken along the wavelength in the DIB window, for each RVS spectrum in the validation set (7324) as a function of the spectral S/N. The color represents the number density of the spectra. The dashed green lines indicate MAR = 0.01 and S/N = 100 in each panel.

Appendix B Injection test

To validate our RF model and the DIB fittings, we performed an injection test following the principles in AS23. For each observed RVS spectrum with S/N > 50 in the testing set (a total of 7324), we added a pair of synthetic DIB signals for both λ8621 and λ8648. The DIB parameters were uniformly sampled: 𝒟8621 ∈ [0.01, 0.15], λ8621 ∈ [8618, 8628] Å, σ8621 ∈ [0.6, 3.0] Å, λ8648 ∈ [8640, 8650] Å, and σ8648 ∈ [8, 12] Å, except for 𝒟8648, which was fixed as half of 𝒟8621. The ISM spectra were then derived from the observed spectra (plus synthetic DIBs) and the stellar templates predicted by the RF model. Finally, we fit the two DIBs with the same model and methods for real target spectra (Sect. 3.2). We also computed Z-scores, defined as (fit-injected values)/reported uncertainty, to quantify the bias of the fit values and the uncertainties.

The distribution of EW8621 Z-score is almost perfectly uniform with respect to the injected parameters (EW8621, λ8621, and σ8621) and the S/N of ISM spectra (see Fig. B.1), with a slight bias at the end of the injected EW8621 and S/N. Distributions like this are also found for the Z-score of λ8621 and σ8621, and they are highly consistent with the results in AS23. The larger variation of the mean differences that varies with injected DIB parameters and S/N (solid red lines in Fig. B.1) compared to AS23 are probably due to the smaller sample used in this work for the injection test. Because of the higher S/N cut (S/N ⩾ 50) for the testing set, we do not capture the bias of Z-score at low S/N reported in AS23. Furthermore , the positive bias of σ8621 at large injected EW8621 is not as strong as in AS23, but instead, there are tiny negative biases of λ8621 and σ8621 with respect to EW8621 and S/N.

With the increase in the injected EW8621 , the mean absolute difference (MAD) of EW8621 Z-score (solid white line in Fig. B.1) becomes larger than 1, indicating that the uncertainty of EW8621 is underestimated. The increasing ΔEW8621 is also found in the comparison between fit and integrated EW8621, which is caused by the increasing noise and stellar residuals in low-S/N ISM spectra (see Sect. 4.1 and Fig. 8). In the injection test, DIB features were randomly added to the observed spectra, while in fact, strong DIB signals are usually found in the spectra with generally lower S/N. Thus, the underestimation of the EW8621 uncertainty would be stronger for the target spectra. In contrast, the uncertainties of λ8621 and σ8621 seem to be overestimated, considering their Z-score variations with injected parameters.

The injection test was also applied to λ8648 (Fig. B.2). Compared to λ8621, stronger negative biases are found for EW8648, as well as positive biases for λ8648. Moreover, the uncertainties of EW8648, λ8648, and σ8648 are clearly underestimated, and the magnitude of the overestimation tends to increase with S/N. The reason probably is that the fitting of λ8648 is more easily affected by the correlated noise in its very broad profile.

Overall, we obtained similar distributions of the Z-scores with respect to the injected DIB parameters and S/N to those in AS23, which primarily verifies the reliability and accuracy in our DIB fitting. The biases in the fitting of DIB λ8621 are tiny for EW8621, λ8621, and σ8621. The reported errors drawn from the MCMC samplings can successfully describe the uncertainties of λ8621 and σ8621 (a slight overestimation), while the reported uncertainty for EW8621 probably is slightly underestimated (5%–10%). The fitting to λ8648 contains stronger biases and larger underestimated uncertainties because this shallow and broad DIB is more difficult to fit than λ8621. The injection test is still simple and ideal because we made use of a set of spectra with high S/N (⩾50) and fixed the ratio of 𝒟8621/𝒟8648 = 2.

thumbnail Fig. B.1

Distributions of the Z-scores computed from the injection tests of EW8621, λ8621, and σ8621 as a function of the injected DIB parameters and the S/N of the ISM spectra. The color scale indicates the number density estimated by the Gaussian KDE. In each panel, the solid red line indicates the variation in the mean differences of Z-scores with the injected DIB parameters and the S/N. The solid white line indicates the mean absolute differences of the Z-scores. The dashed red and white lines provide a reference for Z-score equals 0 and ±1.

thumbnail Fig. B.2

Same as Fig. B.1, but for DIB λ8648.

Appendix C 77 DIB measurements to determine λ0

Table C.1

Information of the 77 selected DIB measurements for determining λ0 for DIB λ8621 (see Sect. 4.3 for details).

Appendix D Selection effect on the DIB catalog

In addition to the systematic differences in DIB parameters (see Sect. 4.5), different selection criteria between this work and AS23 also result in very different selected DIB catalogs, that is, their joint group (2000 cases) only accounts for ~25% of the total for each. It is not plausible that so many DIB signals are detected in one work but not in another. Except for the χ2 used in both studies, our cuts are based on the 𝒟/RC and S/N of the ISM spectra to control the noise level, as well as on additional constraints on λ8621 and σ8621, while the cuts in AS23 were based on the DIB S/N defined by Δχ2 and the eigenvector in their model. We investigated the selection effect on the two DIB catalogs by comparing different crossed groups. Specifically, group 𝓐 contained 2000 DIB measurements detected in both AS23 and this work, group ℬ contained 5463 DIB measurements detected in AS23, but not in this work, and group C contained 5619 DIB measurements detected not in AS23 , but in this work. We only considered the 7463 DIB measurements in AS23 whose background stars are contained in our target sample. The σ8621λ8621 distribution and the EW8621AV correlation for different groups are shown in Figs. D.1 and D.2, respectively.

The σ8621 and λ8621 in this work presents a compact Gaussian distribution for group 𝓐, but a clear contamination for small σ8621 can be seen for group C. Although we cannot access the dropped cases in AS23 (no data for group C), for their full sample shown in Fig. 21 in AS23, there seems to be a contamination for σ8621 ≲ 1 Å as well, but these cases lead to a smaller λ8621 than in this work. Group B, with dropped measurements in this work, also presents a good σ8621λ8621 distribution, which clearly shows the selection bias of our DIB catalog. For AS23, group B shows a better σ8621 – λ8621 distribution than group A, which contains a weak distortion, while group ℬ contains measurements with large σ8621 ≳ 3 Å. The weak distortion could be the reason for the variation of Δλ8621 seen in Fig. 14 between AS23 and this work. When we compare groups ℬ and C, the cuts applied in AS23 can exclude the noisy cases with small σ8621, caused by the observational noise and/or the stellar residuals, more efficiently than ours.

For the EW8621AV correlation, the high-density regions (from red to yellow in Fig. D.2) present a consistent tendency among different groups for this work and AS23, with a similar magnitude of dispersion, despite of a systematic difference in mean EW8621 / AV ratio of AS23 and this work. The fewer scatter in groups Si and C demonstrates that our cuts can exclude more extreme cases with small AV but large EW8621 than AS23.

All groups with some selection criteria can present the proper properties for the DIB measurements (the σ8621λ8621 distribution and the EW8621AV correlation considered in this analysis), and they are self-consistent for one work with its own measurements. Hence, the DIB catalogs built in different studies are highly pure, but little complete. A catalog with more cuts (e.g., the joint sample between AS23 and this work) will obtain more accurate DIB measurements, but at the expense of the sample size.

thumbnail Fig. D.1

σ8621λ8621 distributions for the different groups defined in Appendix D. The DIB parameters in the first row come from this work, and those in the second row come from AS23. The color of the scattered points represents their number density estimated by the Gaussian KDE.

thumbnail Fig. D.2

Correlation between EW8621 and Av for the different groups defined in Appendix D. The DIB parameters in the first row come from this work, and those in the second row come from AS23. The color of the scattered points represents their number density estimated by the Gaussian KDE. The dashed red and blue lines show the linear fits to EW8621 and Av from GDR3 and AS23, respectively.

References

  1. Andrae, R., Fouesneau, M., Sordo, R., et al. 2023, A&A, 674, A27 [CrossRef] [EDP Sciences] [Google Scholar]
  2. Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Demleitner, M., & Andrae, R. 2021, AJ, 161, 147 [Google Scholar]
  3. Baron, D., Poznanski, D., Watson, D., Yao, Y., & Prochaska, J. X. 2015, MNRAS, 447, 545 [NASA ADS] [CrossRef] [Google Scholar]
  4. Bolatto, A. D., Wolfire, M., & Leroy, A. K. 2013, ARA&A, 51, 207 [CrossRef] [Google Scholar]
  5. Breiman, L. 2001, Mach. Learn., 45, 45 [NASA ADS] [CrossRef] [Google Scholar]
  6. Buder, S., Sharma, S., Kos, J., et al. 2021, MNRAS, 506, 150 [NASA ADS] [CrossRef] [Google Scholar]
  7. Campbell, E. K., Holz, M., Gerlich, D., & Maier, J. P. 2015, Nature, 523, 322 [NASA ADS] [CrossRef] [Google Scholar]
  8. Contursi, G., de Laverny, P., Recio-Blanco, A., & Palicio, P. A. 2021, A&A, 654, A130 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Cox, N. L. J., Boudin, N., Foing, B. H., et al. 2007, A&A, 465, 899 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Cox, N. L. J., Ehrenfreund, P., Foing, B. H., et al. 2011, A&A, 531, A25 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  11. Cropper, M., Katz, D., Sartoretti, P., et al. 2018, A&A, 616, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  12. Dame, T. M., Hartmann, D., & Thaddeus, P. 2001, ApJ, 547, 792 [Google Scholar]
  13. Ebenbichler, A., Postel, A., Przybilla, N., et al. 2022, A&A, 662, A81 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. Fan, H., Welty, D. E., York, D. G., et al. 2017, ApJ, 850, 194 [NASA ADS] [CrossRef] [Google Scholar]
  15. Fan, H., Hobbs, L. M., Dahlstrom, J. A., et al. 2019, ApJ, 878, 151 [NASA ADS] [CrossRef] [Google Scholar]
  16. Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman, J. 2013, PASP, 125, 306 [Google Scholar]
  17. Friedman, S. D., York, D. G., McCall, B. J., et al. 2011, ApJ, 727, 33 [NASA ADS] [CrossRef] [Google Scholar]
  18. Gaia Collaboration (Schultheis, M., et al.) 2023a, A&A, 680, A38 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Gaia Collaboration (Schultheis, M., et al.) 2023b, A&A, 674, A40 [CrossRef] [EDP Sciences] [Google Scholar]
  20. Gaia Collaboration (Vallenari, A., et al.) 2023c, A&A, 674, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Galazutdinov, G. A., Musaev, F. A., Krelowski, J., & Walker, G. A. H. 2000, PASP, 112, 648 [NASA ADS] [CrossRef] [Google Scholar]
  22. Galazutdinov, G. A., LoCurto, G., & Krelowski, J. 2008, ApJ, 682, 1076 [Google Scholar]
  23. Gilmore, G., Randich, S., Asplund, M., et al. 2012, The Messenger, 147, 25 [NASA ADS] [Google Scholar]
  24. Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
  25. Green, G. M. 2018, J. Open Source Softw., 3, 3 [Google Scholar]
  26. Green, G. M., Schlafly, E., Zucker, C., Speagle, J. S., & Finkbeiner, D. 2019, ApJ, 887, 93 [NASA ADS] [CrossRef] [Google Scholar]
  27. Hamano, S., Kobayashi, N., Kawakita, H., et al. 2022, ApJS, 262, 2 [NASA ADS] [CrossRef] [Google Scholar]
  28. Herbig, G. H. 1993, ApJ, 407, 142 [CrossRef] [Google Scholar]
  29. Herbig, G. H., & Leka, K. D. 1991, ApJ, 382, 193 [NASA ADS] [CrossRef] [Google Scholar]
  30. HI4PI Collaboration (Ben Bekhti, et al.) 2016, A&A, 594, A116 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  31. Hobbs, L. M., York, D. G., Snow, T. P., et al. 2008, ApJ, 680, 1256 [NASA ADS] [CrossRef] [Google Scholar]
  32. Hobbs, L. M., York, D. G., Thorburn, J. A., et al. 2009, ApJ, 705, 32 [Google Scholar]
  33. Jenniskens, P., & Desert, F. X. 1994, A&AS, 106, 39 [NASA ADS] [Google Scholar]
  34. Katz, D., Sartoretti, P., Guerrier, A., et al. 2023, A&A, 674, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  35. Kos, J., Zwitter, T., Grebel, E. K., et al. 2013, ApJ, 778, 86 [NASA ADS] [CrossRef] [Google Scholar]
  36. Kos, J., Zwitter, T., Wyse, R., et al. 2014, Science, 345, 791 [NASA ADS] [CrossRef] [Google Scholar]
  37. Krełowski, J. 2018, PASP, 130, 071001 [CrossRef] [Google Scholar]
  38. Krełowski, J., Galazutdinov, G., Godunova, V., & Bondar, A. 2019, Acta Astron., 69, 159 [Google Scholar]
  39. Krełowski, J., Galazutdinov, G. A., Gnacinski, P., et al. 2021, MNRAS, 508, 4241 [CrossRef] [Google Scholar]
  40. Lan, T.-W., Ménard, B., & Zhu, G. 2015, MNRAS, 452, 3629 [Google Scholar]
  41. MacIsaac, H., Cami, J., Cox, N. L. J., et al. 2022, A&A, 662, A24 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  42. Majewski, S. R., Schiavon, R. P., Frinchaboy, P. M., et al. 2017, AJ, 154, 94 [Google Scholar]
  43. McKinnon, K. A., Ness, M. K., Rockosi, C. M., & Guhathakurta, P. 2023, arXiv e-prints [arXiv:2307.05706] [Google Scholar]
  44. Munari, U., Tomasella, L., Fiorucci, M., et al. 2008, A&A, 488, 969 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  45. Ochsenbein, F., Bauer, P., & Marcout, J. 2000, A&AS, 143, 23 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  46. Omont, A., Bettinger, H. F., & Tönshoff, C. 2019, A&A, 625, A41 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  47. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 12 [Google Scholar]
  48. Planck Collaboration Int. XLVIII. 2016, A&A, 596, A109 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  49. Puspitarini, L., Lallement, R., Babusiaux, C., et al. 2015, A&A, 573, A35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  50. Recio-Blanco, A., de Laverny, P., Palicio, P. A., et al. 2023, A&A, 674, A29 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  51. Reid, M. J., Menten, K. M., Brunthaler, A., et al. 2019, ApJ, 885, 131 [Google Scholar]
  52. Sanner, F., Snell, R., & vanden Bout, P. 1978, ApJ, 226, 460 [NASA ADS] [CrossRef] [Google Scholar]
  53. Sartoretti, P., Katz, D., Cropper, M., et al. 2018, A&A, 616, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  54. Sartoretti, P., Marchal, O., Babusiaux, C., et al. 2023, A&A, 674, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  55. Saydjari, A. K., M. Uzsoy, A. S., Zucker, C., Peek, J. E. G., & Finkbeiner, D. P. 2023, ApJ, 954, 141 [NASA ADS] [CrossRef] [Google Scholar]
  56. Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525 [Google Scholar]
  57. Sonnentrucker, P., York, B., Hobbs, L. M., et al. 2018, ApJS, 237, 40 [NASA ADS] [CrossRef] [Google Scholar]
  58. Steinmetz, M., Zwitter, T., Siebert, A., et al. 2006, AJ, 132, 1645 [Google Scholar]
  59. Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nat. Methods, 17, 261 [Google Scholar]
  60. Vogrinčič, R., Kos, J., Zwitter, T., et al. 2023, MNRAS, 521, 3727 [NASA ADS] [CrossRef] [Google Scholar]
  61. Vos, D. A. I., Cox, N. L. J., Kaper, L., Spaans, M., & Ehrenfreund, P. 2011, A&A, 533, A129 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  62. Wallerstein, G., Sandstrom, K., & Gredel, R. 2007, PASP, 119, 1268 [NASA ADS] [CrossRef] [Google Scholar]
  63. Zasowski, G., Ménard, B., Bizyaev, D., et al. 2015, ApJ, 798, 35 [Google Scholar]
  64. Zhao, H., Schultheis, M., Rojas-Arriagada, A., et al. 2021, A&A, 654, A116 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  65. Zhao, H., Schultheis, M., Zwitter, T., et al. 2022, A&A, 666, L12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

2

The catalog can be accessed via https://zenodo.org/record/8303423

3

We rebinned the RVS spectra following the process in Gaia DR3, see Recio-Blanco et al. (2023) and GDR3.

4

The convention between the heliocentric frame and LSR is made with (U, V, W) = (10.6, 10.7, 7.6) km s−1 from Reid et al. (2019).

5

“GOGAL_deep_mom.fits.gz” in https://lweb.cfa.harvard.edu/rtdc/CO/CompositeSurveys/

All Tables

Table 1

Statistics of the differences in DIB parameters between GDR3, AS23, and this work.

Table C.1

Information of the 77 selected DIB measurements for determining λ0 for DIB λ8621 (see Sect. 4.3 for details).

All Figures

thumbnail Fig. 1

Galactic spatial density distribution of 780 513 target spectra (left panel) and 36 622 reference spectra (right panel). This HEALPix (Górski et al. 2005) map has a spatial resolution of 0.92° (Nside = 64).

In the text
thumbnail Fig. 2

Two-dimensional distributions of stellar atmospheric parameters (Teff, log g [M/H]) from GSP-Spec (Recio-Blanco et al. 2023) for the training set, the validation set, the testing set, and the target set. The color represents the number of stars counted in each bin with a size of ΔTeff = 40 K, Δ log g = 0.05, and Δ[M/H] = 0.04 dex.

In the text
thumbnail Fig. 3

Residuals between the observed and modeled normalized fluxes as a function of the wavelength for the 17 324 RVS spectra in the testing set. The color represents the residuals, and the spectra are sorted by spectral S/N, increasing from bottom to top. S/N = 22, 50, and 100 is indicated by the dashed black, blue, and red lines, respectively. The Ca II line within the DIB window is marked as a dashed green line.

In the text
thumbnail Fig. 4

Four examples of the RF prediction within the DIB window. The black and red lines are the observed RVS spectra and the predicted stellar parameters, respectively. The blue line is the derived ISM spectrum with an offset of 0.3 for the normalized flux. Orange marks the masked region during the fittings. The Gaίa source ID of these targets is indicated. Some typical stellar lines within the DIB window determined by Contursi et al. (2021) are marked as well. The DIB fitting to these ISM spectra is shown in Fig. 5.

In the text
thumbnail Fig. 5

Examples of the fits to DIBs 18621 and λ8648 in four ISM spectra. The black and red lines are the ISM spectra and fit DIB pro-flies, respectively, normalized by the fit linear continuum. The error bars indicate the observational flux uncertainties of the RVS spectra. Orange marks the masked region during the fittings. The Gaίa source ID of these targets, E(BPRP) from Andrae et al. (2023), the EWs of the two DIBs (EW8621 and EW8648), and the S/N of the ISM spectra are indicated as well.

In the text
thumbnail Fig. 6

Number density of the measured DIB 18621 as a function of the Gaussian width (σ8621) and the central wavelength (λ8621) of the fit DIB profile for the full target sample (left), 8388 selected DIB measurements (middle), and the reliable measurements in AS23 (right). The color of the left panel represents the number of DIBs calculated in 0.025 A × 0.1 A bins. In the middle and right panels, the color represents the number density estimated by a Gaussian KDE. The dashed red line indicates the rest-frame wavelength of DIB λ8621 determined in this work (8623.141 Å, see Sect. 4.3).

In the text
thumbnail Fig. 7

Distribution of the number of DIB measurements (NDIB, upper panel) and the mean EW8621 (lower panel) in a Galactic projection for the selected DIB catalog (7619 measurements). The NDIB and mean EW8621 were calculated in each HEALPixel with a resolution of 1.83° (Nside = 32).

In the text
thumbnail Fig. 8

Comparison between the fit and integrated EW8621 for 7619 measurements in the DIB catalog. Upper panel: the color represents the number density estimated by a Gaussian KDE. The gray bars show the uncertainty of the fit EW8621. The dashed red line traces the one-to-one correspondence. A zoom-in panel shows the distribution of the EW difference (ΔEW = EWfit − EWint). The mean (Δ) and standard deviation (σ) of ΔEW are indicated. Lower panel: distribution of the ΔEW as a function of the fit EW8621.

In the text
thumbnail Fig. 9

Correlation between EW8621 and dust reddening for 2957 cases with E(BPRP) from Andrae et al. (2023) shown in the upper panel and for 4656 cases with Av from Green et al. (2019) shown in the lower panel. The color of the scattered points represents their number density estimated by the Gaussian KDE. The red dots and their color bars are the median values and the standard deviations calculated in each EW8621 bin with a step of 0.05 Å. The linear fits to EW8621 and dust reddening from previous works are overplotted as dashed lines: magenta for GFPR, black and red for GDR3, and blue for AS 23.

In the text
thumbnail Fig. 10

Observed central wavelengths in vacuum of DIB λ8621 in the heliocentric frame (λhelio) as a function of the angular distance in longitude from the Galactic anticenter (∆) for 77 DIB measurements in this work. The black dots are the individual measurements with the fit uncertainties. The red line is the linear fit to the black dots.

In the text
thumbnail Fig. 11

Longitude-velocity diagram for DIB λ8621 and 12CO. Upper panel: variation of the radial velocity of DIB λ8621 carrier (VDIB) along with the Galactic longitude for 3592 selected DIB measurements. The points are colored by their number density estimated by the Gaussian KDE. The red dots with error bars are the median VDIB calculated in each longitude bin with a step of 10°. The colored lines are theoretical rotation curves calculated with the rotation model in Reid et al. (2019) with different distances from the Sun. Lower panel: median VDIB superimposed on the longitude-velocity map of 12CO J = (1−0) emission from Dame et al. (2001). The color scale displays the 12CO latitude-integrated intensity on a logarithmic scale.

In the text
thumbnail Fig. 12

Correlations between DIBs λ8621 and λ8648 for 179 selected measurements for (a) fit and integrated EW8648 outside the masked region between 8660 and 8668 Å; (b) fit EW; (c) measured central wavelength; and (d) FWHM. The dashed green line in panel a traces the one-to-one correspondence. The colored points in panel b are the results from GFPR. The color in panel c represents the number density estimated by the Gaussian KDE, and the red line is a linear fit to all the data points. The Pearson coefficient (rp) of the correlation between the parameters of λ8621 and λ8648 for the 179 selected measurements is indicated in panels b and c.

In the text
thumbnail Fig. 13

Correlation between EW8648 and AV from Green et al. (2019) for 93 selected measurements. The underlying points are the results from GFPR, colored by the number density. The Pearson coefficient (rp) for the red dots is indicated as well.

In the text
thumbnail Fig. 14

Difference in DIB parameters (𝒟8621, λ8621, σ8621, and EW8621) between GDR3, AS23, and this work as a function of the fit values for the joint samples. The gray scale indicates the number density of the data points estimated by the Gaussian KDE. The data are binned with a step of 0.005 for 𝒟8621, 0.2 Å for λ8621, 0.1 Å for σ8621, and 0.01 Å for EW8621. The solid red lines in each panel represent the median differences in each bin, and the two dashed red lines show the 16th and 84th percentiles.

In the text
thumbnail Fig. 15

Correlation in velocity between DIB 18621, 12CO and H I. Upper and middle panels: correlation between the DIB λ8621 velocity (VDIB) and the in tensity-weighted velocity of the nearest12 CO (Dame et al. 2001) and H I (HI4PI Collaboration 2016) component. The color scale indicates the number density of the data points estimated by the Gaussian KDE. The error of VDIB is simply estimated by the uncertainty of λ8621. The red line is the linear fit to the LSR velocities. The fit slope (α) and intercept (t), as well as the number of the points, are indicated. The dashed green line traces the one-to-one correspondence. Lower panel: example of the peak-finding method (see Sect. 5.1). The red and blue lines are the spectra of 12CO and H I, respectively, toward the same sightline. The red star indicates the LSR velocity of the DIB signal detected on this sightline. The Galactic coordinate (,b) and the Gaia source ID of the background star are marked.

In the text
thumbnail Fig. 16

Correlation between EW8621 and NH I (upper panel) and between EW8621 and NH2${N_{{{\rm{H}}_2}}}$ (lower panel) on a logarithmic scale. The Pearson coefficient (rp) is indicated. The color scale indicates the number density. The dashed lines are the linear fit results from Friedman et al. (2011), but for DIB λ5780.

In the text
thumbnail Fig. 17

Distribution of the estimated completeness for DIBs λ8621 (upper panel) and λ8648 (lower panel) as a function of EW and σDIB based on the results of the injection tests (see Appendix B).

In the text
thumbnail Fig. A.1

Mean residuals between the observed and modeled normalized fluxes taken for 7324 RVS spectra in the validation set at each wavelength pixel in the DIB window as a function of the spectral wavelength for different parameter pairs of n_estimators (NE) and max_depth (MD) in the RF model (scikit-learn). The error bar represents the standard deviation of the residuals of individual spectra at each wavelength pixel. The mean value (Δ) and standard deviation (σ) for all the pixels are marked in each panel. All the values are enlarged by a factor of 104.

In the text
thumbnail Fig. A.2

Mean of the absolute residuals (MAR) between observed and modeled normalized fluxes, taken along the wavelength in the DIB window, for each RVS spectrum in the validation set (7324) as a function of the spectral S/N. The color represents the number density of the spectra. The dashed green lines indicate MAR = 0.01 and S/N = 100 in each panel.

In the text
thumbnail Fig. B.1

Distributions of the Z-scores computed from the injection tests of EW8621, λ8621, and σ8621 as a function of the injected DIB parameters and the S/N of the ISM spectra. The color scale indicates the number density estimated by the Gaussian KDE. In each panel, the solid red line indicates the variation in the mean differences of Z-scores with the injected DIB parameters and the S/N. The solid white line indicates the mean absolute differences of the Z-scores. The dashed red and white lines provide a reference for Z-score equals 0 and ±1.

In the text
thumbnail Fig. B.2

Same as Fig. B.1, but for DIB λ8648.

In the text
thumbnail Fig. D.1

σ8621λ8621 distributions for the different groups defined in Appendix D. The DIB parameters in the first row come from this work, and those in the second row come from AS23. The color of the scattered points represents their number density estimated by the Gaussian KDE.

In the text
thumbnail Fig. D.2

Correlation between EW8621 and Av for the different groups defined in Appendix D. The DIB parameters in the first row come from this work, and those in the second row come from AS23. The color of the scattered points represents their number density estimated by the Gaussian KDE. The dashed red and blue lines show the linear fits to EW8621 and Av from GDR3 and AS23, respectively.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.