Diffuse interstellar bands in Gaia DR3 RVS spectra New measurements based on machine learning ⋆

Diffuse interstellar bands (DIBs) are weak and broad interstellar absorption features in astronomical spectra that originate from unknown molecules. To measure DIBs in spectra of late-type stars more accurately and more efficiently, we developed a random forest model to isolate the DIB features from the stellar components. We applied this method to 780 thousand spectra collected by the Gaia Radial Velocity Spectrometer (RVS) that were published in the third data release (DR3). After subtracting the stellar components, we modeled the DIB at 8621Å ( λ 8621) with a Gaussian function and the DIB around 8648Å ( λ 8648) with a Lorentzian function. After quality control, we selected 7619 reliable measurements for DIB λ 8621. The equivalent width (EW) of DIB λ 8621 presented a moderate linear correlation with dust reddening, which was consistent with our previous measurements in Gaia DR3 and the newly focused product release. The rest-frame wavelength of DIB λ 8621 was updated as λ 0 = 8623 . 141 ± 0 . 030Å in vacuum, corresponding to 8620.766Å in air, which was determined by 77 DIB measurements toward the Galactic anticenter. The mean uncertainty of the fit central wave-length of these 77 measurements is 0.256Å. With the peak-finding method and a coarse analysis, DIB λ 8621 was found to correlate better with the neutral hydrogen than with the molecular hydrogen (represented by 12 CO J = (1 − 0) emission). We also obtained 179 reliable measurements of DIB λ 8648 in the RVS spectra of individual stars for the first time, further confirming this very broad DIB feature. Its EW and central wavelength presented a linear relation with those of DIB λ 8621. A rough estimation of λ 0 for DIB λ 8648 was 8646.31Å in vacuum, corresponding to 8643.93Å in air, assuming that the carriers of λ 8621 and λ 8648 are comoving. Finally, we confirmed the impact of stellar residuals on the DIB measurements in Gaia DR3, which led to a distortion of the DIB profile and a shift of the center ( ≲ 0.5Å), but the EW was consistent with our new measurements. With our measurements and analyses, we propose that the approach based on machine learning can be widely applied to measure DIBs in numerous spectra from spectroscopic surveys.


Introduction
Diffuse interstellar bands (DIBs) are a set of absorption features in the spectra of stars, galaxies, and quasars that are observed in the optical and near-infrared bands (about 0.4-2.4µm, see DIB surveys: Fan et al. 2019;Hamano et al. 2022;Ebenbichler et al. 2022).High-quality astronomical observations, experimental measurements, and theoretical analyses support that the DIBs originate from complex carbon-bearing molecules (e.g.Campbell et al. 2015;Omont et al. 2019;MacIsaac et al. 2022), so that DIBs can serve as chemical and kinematic tracers of the Galactic interstellar medium (ISM), even though the exact species of most DIB carriers are still unknown.
Because DIBs are weak features and might be blended with stellar lines, early studies preferred to observe a handful of hot stars as background sources because their spectra are clean.This  Because late-type stars dominate the observations in spectroscopic surveys, synthetic spectra derived from the stellar atmospheric model and atomic line lists are needed to isolate the DIB signal from the stellar components.However, an incorrect modeling of stellar lines close to the DIB signal could introduce additional uncertainties in the DIB measurements.Moreover, when the stellar residuals are comparable to the DIB features in terms of strength, a pseudo-fitting is hard to distinguish and could lead to a bias in the measurements of DIB parameters.To overcome this limitation, Kos et al. (2013) developed a datadriven method, called the "best-neighbor method" (BNM), to build artificial stellar templates for the observed spectra in the vicinity of the DIB feature (DIB window).Specifically, BNM first separated the whole spectroscopic sample into a target sample (spectra containing DIB signals) and a reference sample (spectra without DIB signals).The reference sample typically constituted sources at high latitudes and with low dust extinctions according to the assumption that low extinction represents a low abundance of ISM species.Then, for a given target spectrum, BNM determined the best-matched reference spectra based on a pixel-by-pixel comparison for the spectral region outside the DIB window.Finally, a number of best-matched reference spectra (up to 25 in Kos et al. 2013) were averaged to create a stellar template for the DIB window.The ISM spectrum within the DIB window in which the DIB signal is detected and measured was defined as the target spectrum divided by the generated stellar template.The BNM has been applied to measure DIBs in the spectra from RAVE (Kos et al. 2013), Gaia RVS (Zhao et al. 2022; Gaia Collaboration 2023a, hereafter GFPR), and GALAH (Vogrinčič et al. 2023).Other types of data-driven methods have also been applied to detect DIBs.Saydjari et al. (2023, hereafter AS23) decomposed and recognized stellar components and DIBs in public Gaia RVS spectra with a data-driven prior consisting of ∼40 000 RVS low-extinction spectra.McKinnon et al. (2023) built second-order polynomial models of the normalized flux as a function of stellar parameters for ∼17 000 red clump stars observed in APOGEE and found 84 possible DIBs (25 identified with a confidence level of 95%, 10 of which were previously known) in the residuals between observed and modeled APOGEE spectra.
The third data release of Gaia (Gaia Collaboration 2023c) contains a large number of measurements for DIB λ8621 in about 500 000 RVS spectra of individual stars (Gaia Collaboration 2023b, hereafter GDR3).DIB λ8621 was fit in the ISM spectra derived by the synthetic spectra from the general stellar parameterizer from spectroscopy (GSP-Spec) module (Recio-Blanco et al. 2023).After removing the cases with poor stellar modeling and poor DIB parameters, we defined a high-quality (HQ) DIB sample containing ∼140 000 sightlines (see Sect. 3 in GDR3 for details).However, AS23 reported a dependence on the fit central wavelength and the Gaussian width of DIB λ8621 (see their Fig. 1) and attributed these biases to the residuals of stellar lines in the vicinity of DIB λ8621 (e.g., the Fe I lines at 8620.51 Å and 8623.97Å in vacuum wavelength determined by Contursi et al. 2021).In this work, we improve the BNM to a machine-learning (ML) approach that replaces the pixel-by-pixel comparison of the spectral flux in finding the best-matched reference spectra by ML training.The ML approach can directly predict stellar components in the DIB window for the target spectra instead of comparing them one by one with the reference spectra.Thus, the ML approach can speed up the process and ignore irrelevant features.Furthermore, Kos et al. (2013) down-weighted the Ca II regions in the comparison between target and reference spectra because the Ca II lines are too strong to overwhelm other stellar features, but the ML model does not need to adjust the weights of stellar lines.We applied the improved BNM to process 780 000 RVS spectra published in Gaia DR3 and measured DIB λ8621 as well as the broad DIB around 8648 Å (λ8648; Zhao et al. 2022).We performed some statistical analysis of the properties of these two DIBs based on a selected reliable sample.We compared our new measurements of DIB λ8621 to those in GDR3 and AS23 to estimate the degree of biases of DIB parameters in GDR3.We note that the BNM has been applied to RVS spectra in the new focused product release (FPR) of Gaia to measure DIBs λ8621 and λ8648 (see GFPR for detailed results), but the FPR only contained DIB measurements in stacked ISM spectra, and this work measured DIBs λ8621 and λ8648 in the RVS spectra of individual stars for the first time.
The paper is organized as follows: The data processing is described in Sect.

Data processing
There are 999 645 RVS spectra (R ∼ 11 500, mean spectra of epoch observations) published in Gaia DR3 (Gaia Collaboration 2023c) that can be accessed through the datalink interface of the Gaia Archive 1 .The published RVS spectra were processed by Gaia DPAC Coordination Unit 6 (CU6) and were equally resampled between 864 and 870 nm with a spacing of 0.01 nm (2400 wavelength bins; Sartoretti et al. 2018Sartoretti et al. , 2023)).The spectra were normalized and shifted to the rest frame as well.Following the process in Recio-Blanco et al. (2023) and GFPR, we rebinned RVS spectra from 2400 to 800 wavelength pixels, sampled every 0.03 nm, to increase the signal-to-noise ratio (S/N).The total wavelength range of RVS spectra used in this work is between 8471.2 and 8687.5 Å to ensure that no reference spectra contained nan-value fluxes.The DIB window is defined as 8600-8680 Å (267 wavelength pixels).
After combining with the calibrated distance catalog of Bailer-Jones et al. ( 2021), where we made use of their geometric distances, 996 900 sources were left.We calculated E(B − V) for these stars based on the Planck dust map (Planck Collaboration Int.XLVIII 2016) using the Python package dustmaps (Green 2018).The target sample contains 780 513 spectra with E(B − V) > 0.02 mag and an S /N > 20.The reference sample contains 36 622 spectra with E(B − V) ⩽ 0.02 mag, |b| ⩾ 30 • , and an S /N ⩾ 50.The higher threshold of the S/N for the reference sample was chosed to achieve a better training set.The density distribution in Galactic coordinates for the target and reference samples is shown in Fig. 1.Baron et al. (2015) reported the detection of DIB signals in dust-free regions at high 1 http://cdn.gea.esac.esa.int/Gaia/gdr3/Spectroscopy/rvs_mean_spectrum/ latitudes.These DIB signals in the reference spectra might introduce an offset or a bias when the DIB profiles are modeled in target spectra, but when we assume that DIBs like this only exist in a very small part of the reference sample, the ML model would treat them as irrelevant features and minimize their effect.Nevertheless, this problem cannot be quantified because we lack DIB maps that were built with hot stars without reference spectra.

Stellar templates built with a random forest model
Various supervised-learning algorithms can be applied to model the stellar lines in the DIB window.In this work, we built a model based on the random forest (RF) regression, which is an ensemble bagging method that combines a large number of decision trees (Breiman 2001).The RF model predicts the stellar template within the DIB window (8600-8680 Å) for a given target spectrum using the part outside the DIB window (i.e., 8471.2-8600 Å and 8680-8687.5 Å).The model construction and prediction of the RF algorithm were completed by the Python package scikit-learn (Pedregosa et al. 2011).
The reference sample was separated into three data sets: the training set containing 21 974 spectra (60%), the validation set containing 7324 spectra (20%), and the testing set containing 7324 spectra (20%).The distributions of the stellar atmospheric parameters (T eff , log g, [M/H]) from GSP-Spec (Recio-Blanco et al. 2023) of these sets, as well as the target set, are presented in Fig. 2. The reference sample (training, validation, and testing sets) has a similar coverage of stellar parameter space as the target sample, mainly covering the main sequence, subgiant and red giant branches, and an [M/H] region between -1 and 0.5 dex.
A199, page 3 of 22 Zhao, H., et al.: A&A, 683, A199 (2024) Metal-poor and extremely hot or cool stars are notably missing, but they only form a small fraction of the target sample.The stellar parameters are only used to present the space coverage, but were not used in our RF model.They are not necessary for the BNM either, but were always used to speed up the BNM process (e.g., Kos et al. 2013;Zhao et al. 2022;GFPR).
Two of the most important parameters for the RF model are the number of trees in the forest (n_estimators; NE) and the maximum depth of the tree (max_depth; MD).Therefore, we trained RF models using various NE ({20, 40, 60, 80, 100}) and MD ({2, 5, 10, 30, 50, 100}), keeping other parameters as default values in scikit-learn, and completed the model selection by the performance of the validation set.For each pair of NE and MD, we applied the trained RF model to predict the stellar components in the DIB window for each RVS spectrum in the validation set and calculated the residuals between the observed and modeled normalized fluxes at each wavelength pixel.The mean residuals of these RVS spectra as a function of the spectral wavelength for different pairs of NE and MD are shown in Fig. A.1 together with their standard deviations.For small NE and MD, structural residuals can be seen around the Ca II line at 8664.5 Å (Contursi et al. 2021), which means a poor modeling of this strong line.With the increase in NE and MD, the Ca II is better modeled and the standard deviation also becomes smaller.Furthermore, the degree of dispersion of the residuals becomes similar for NE ⩾ 60 and MD ⩾ 30.Their mean residuals are slightly smaller than zero, however, which might be caused by the imperfect normalization of the observed spectra.Some mean residuals with small NE and MD are very close to zero, but their dispersion is apparently stronger.Some weak structural residuals exist even for the maximum NE and MD, but they are only about 10 −4 .On the other hand, Fig. A.2 shows the mean of the absolute residuals (MAR), taken along the wavelength within the DIB window, of each RVS spectra in the validation set as a function of the spectral S/N.MAR decreases with the increase in S/N, presenting a strong dependence.The dependence breaks for S /N ≳ 300, where the dispersion of MAR represents the robustness of the RF model for different types of spectra.MAR would also dramatically increase for stars with extreme parameters.For NE ⩾ 20 and MD ⩾ 30, the distribution of MAR becomes similar and MAR is smaller than 0.01 for S /N ≳ 100.Because of the similar performance of the validation set for large NE and MD, we selected a final parameter pair of NE = 100 and MD = 50.We also trained RF models with larger NE and MD and found that the performance improvement was not significant.For NE = 200 and MD = 300, the mean and the standard deviation of the residuals is −0.46 and 117.58, which is only slightly better than those for NE = 100 and MD = 50.
With the selected NE = 100 and MD = 50, we applied the RF model to the testing set to evaluate its performance on the target sample.Because the S/N of 62% of the target spectra is lower than 50, we randomly selected 10 000 reference spectra with 20 ⩽ S /N < 50 and added them to the testing set (a total of 17 324 reference spectra).Figure 3 presents the residuals as a function of wavelength for each spectrum in the testing set, sorted by the spectral S/N.For S /N > 50, the distribution of residuals is generally uniform along the wavelength, which indicates a good modeling of the stellar lines.Only the residuals near the Ca II line (indicated by a dashed green line in Fig. 3) are more significant than the residuals in its vicinity.The performance of the RF model becomes worse for spectra with lower S/N, showing systematic differences between the observed spectra and the modeled stellar template and structural residuals near the stellar

Normalized Flux
Fig. 4. Four examples of the RF prediction within the DIB window.The black and red lines are the observed RVS spectra and the predicted stellar parameters, respectively.The blue line is the derived ISM spectrum with an offset of 0.3 for the normalized flux.Orange marks the masked region during the fittings.The Gaia source ID of these targets is indicated.Some typical stellar lines within the DIB window determined by Contursi et al. (2021) are marked as well.The DIB fitting to these ISM spectra is shown in Fig. 5.
lines.The systematic difference could be caused by the imperfect normalization for low-S/N RVS spectra.We applied a linear continuum in the DIB model (see Sect. 3.2) that can reduce this effect.
Figure 4 shows the stellar templates predicted by the RF model for four RVS spectra within the DIB window.Strong A199, page 4 of 22 Zhao, H., et al.: A&A, 683, A199 (2024)  stellar lines, such as Fe I and Si I, are well modeled, and the feature of λ8621 is clearly visible in the derived ISM spectra.
The residuals near the center of Ca II increase slightly.In the last example (bottom), the RVS spectrum was not perfectly normalized, but the RF model can still predict the stellar components properly.Furthermore, the ISM spectrum is well fit by our DIB model with a linear continuum (see the bottom panel in Fig. 5).

Fitting DIBs in ISM spectra
With the trained RF model, ISM spectra can be obtained by the modeled stellar templates in the DIB window divided by the observed spectra for each RVS source in the target sample.
The S/N of the ISM spectra was calculated between 8602 and 8612 Å as mean(flux)/std(flux).Following our previous works (Zhao et al. 2022; GFPR), we modeled the profiles of the two DIBs in the ISM spectra with a Gaussian function (Eq.( 1)) for λ8621, a Lorentzian function (Eq.( 2)) for λ8648, and a linear function for the continuum (Eq.( 3)), where D and σ DIB are the depth and width of the DIB profile, λ DIB is the measured central wavelength, a 0 and a 1 describe the linear continuum, and λ is the wavelength.Subscripts 8621 and 8648 are used below to distinguish the profile parameters of the two DIBs.The full parameters of the DIB model are Θ = {D 8621 , λ 8621 , σ 8621 , D 8648 , λ 8648 , σ 8648 , a 0 , a 1 }.A Markov chain Monte Carlo (MCMC) procedure (Foreman-Mackey et al. 2013) was performed to implement the parameter optimization with flat and independent priors for the DIB parameters.The best estimates of the DIB parameters and their statistical uncertainties were taken in terms of the 50th, 16th, and 84th percentiles of the posterior distribution drawn by the MCMC procedure.We refer to Sect.3.3 in GFPR for a very detailed description of the DIB model, priors, and the MCMC fitting procedure.We continued to use the masked region between 8660 and 8668 Å during the fitting, although the RF algorithm modeled the Ca II line much better than BNM.We note that the uncertainties of the ISM spectra used in the MCMC fitting only include the observational flux errors of the RVS spectra because the RF model cannot estimate the uncertainty of their predictions, so that the total uncertainties would be underestimated.According to Eqs. ( 1) and ( 2 Four examples of the DIB fits are shown in Fig. 5, whose ISM spectra are sorted by E(BP − RP) calculated by Andrae et al. (2023).EW 8621 and EW 8648 both increase with E(BP − RP).The profile of DIB λ8621 is prominent in all the ISM spectra, while the profile of DIB λ8648 is much more shallow and broader than that of λ8621.Because of the very small D 8648 , it is much more difficult to measure λ8648 than λ8621 in the ISM spectra derived from the individual RVS spectra.Additionally, the masked region, in which the residual of the Ca II line is clear, also affects the fit to the red wing of the λ8648 profile.
We performed an injection test following the principles in AS23.The details and discussions are presented in the Appendix B. In summary, the distribution of the Z-scores as a function of the injected DIB parameters and the S/N of the ISM spectra is perfectly uniform, which is highly consistent with the findings in AS23, primarily validating our RF model and the DIB fittings.

Selecting a reliable DIB catalog
The DIB fitting was performed for 780 513 ISM spectra derived from the target sample.To select reliable measurements, we calculated the χ 2 dof = χ 2 tot /d.o.f,where χ 2 tot is the total χ 2 between the fit DIB profile and the ISM spectrum, and dof = 259 is the degree of freedom of the DIB model (267 wavelength pixels and eight DIB parameters).We applied a cut on 0.71 < χ 2 dof < 1.41.The borders were applied by AS23 based on their injection test.Generally, stricter borders provide more accurate measurements to some extent, but also lose more cases.We tried other borders and found that 0.71 and 1.41 were proper borders to exclude many of the noisy cases with pseudo-fittings.We further required A199, page 5 of 22   (Virtanen et al. 2020).Our catalog contains a tail toward small σ 8621 (≲1 Å), and λ 8621 of these cases seems to be affected by the Fe I line.This is because we set an initial guess of σ 8621 = 1.2 Å in the MCMC fitting for all the cases so that noisy ISM spectra with weak DIB features would obtain a fit σ 8621 around this initial guess.On the other hand, AS23 contains more cases with large σ 8621 (≳3 Å) than our catalog, and their λ 8621 there is more scattered as well.The median σ 8621 in our catalog is 2.13 Å which is slightly larger than that in AS23 (1.92 Å).This may be due to the different pixel sizes of RVS spectra used in this work (0.3 Å pixel −1 )3   and in AS23 (0.1 Å pixel −1 ).We further applied cuts to constrain λ 8621 between 8620 and 8626 Å and σ 8621 within 1-4 Å.Hence, the final selected DIB catalog contains 7619 measurements.This DIB catalog can be accessed via the CDS database in Strasbourg, France (Ochsenbein et al. 2000).
The reliable DIB catalog in this work was constructed only by simple cuts on χ2 , noise level (S/N and R C ), and the DIB parameters (λ 8621 and σ 8621 ).It presents a Gaussian-like σ DIB − λ DIB distribution without any significant impacts of stellar lines and shows a good correlation between EW 8621 and the dust reddening (see Sect. 4.2).The catalog certainly contains pseudofittings, such as the outliers seen in the σ DIB − λ DIB diagram, but they should only take a very small part of the catalog after the quality control and only have little impact on the statistical analysis of the DIB properties.Furthermore, as DIBs are weak features, investigation of specific fittings would need a visual inspection of their ISM spectra.Figure 7  distribution and are concentrated in the Galactic midplane and some prominent molecular regions, with a remarkable extension to high latitudes in the directions of the Galactic center (GC) and anticenter (GAC).Like dust reddening, the large mean EW 8621 focuses on the Galactic plane with |b| ≲ 5 • and decreases on average with the increase in latitude.
To validate the DIB catalog, we compared the fit and integrated EW of DIB λ8621 for all the 7619 measurements.Because the profiles of λ8621 and λ8648 might overlap, the fit profile of λ8648 was first subtracted from the ISM spectra normalized by the fit linear continuum, and then the remaining part within λ 8621 ± 3σ 8621 was integrated.Figure 8 shows the comparison between fit and integrated EW 8621 , as well as the EW difference (∆EW = EW fit − EW int ) as a function of the fit EW 8621 .The fit and integrated EW 8621 are highly consistent with each other, with a mean difference of only 0.001 Å and a standard deviation of 0.016 Å. ∆EW is smaller than the uncertainty of EW 8621 (0.031 Å on average) for over 96% of the measurements, and 90% of ∆EW is smaller than 0.023 Å.The difference between fit and integrated EW 8621 tends to increase for measurements with large EW 8621 .These measurements were made in ISM spectra with generally lower S/N, where the residuals of the stellar lines would dramatically increase (the RF model performs worse with low S/N; see Fig . A.2) and consequently lead to an increase in the ∆EW.We checked some ISM spectra with large EW 8621 and ∆EW and found structural features in addition to the DIB signal.These features are more likely from the stellar residuals than the possible Doppler splitting caused by multiple ISM clouds along the sightlines because they can be far away from the center of the DIB profile and usually have a much smaller depth than λ8621.Despite the heavier influence of the stellar residuals and noise, the relative EW uncertainty does not increase for large EW 8621 .For EW 8621 > 0.5 Å (626 measurements), the fractional error of the fit EW 8621 is mainly (99.4%) within 20%, with a mean of 11.9%, and ∆EW is mainly (99.4%) smaller than 10% of the fit EW 8621 .The Doppler broadening caused by the unresolved multiple DIB components and the probable intrinsic asymmetry of the DIB profile may contribute to ∆EW as well.However, the S/N of the ISM spectra in this work is not high enough to distinguish these effects from the others.

DIB λ8621 and dust reddening
Both DIB EW and dust reddening can be used to map the spatial distribution of ISM species and the Galactic large-scale structures, but the EW generally has a much larger relative uncertainty than reddening at present.A tight linear correlation between DIB EW and dust reddening has been discovered for a set of strong DIBs with early-type stars as background sources (e.g.Munari et al. 2008;Friedman et al. 2011;Lan et al. 2015) despite the inevitable scatters and outliers (see the review of Krełowski 2018).On the other hand, the degree of dispersion between DIB EW and dust reddening usually increases by an order of magnitude for a spectroscopic survey data set that is dominated by late-type stars (see, e.g., Kos et al. 2013;Zasowski et al. 2015;GDR3).The relatively lower S/N of the survey spectra (compared to the specifically designed DIB observations) and the difficulty in modeling the atmospheric components of latetype stars certainly contribute to increase the dispersion in the DIB-dust correlation.However, the numerous observations in the survey should contain some sightlines in which the DIB carriers and dust grains are not spatially associated with each other because the dust grain is currently not considered a candidate DIB carrier (Cox et al. 2007(Cox et al. , 2011)).
We reviewed the correlation between DIB λ8621 and dust reddening with our new DIB measurements and two sources of reddening.Our DIB catalog contains 2957 cases with E(BP − RP) from Andrae et al. ( 2023) and 4656 cases with A V from Green et al. (2019).The best-estimated E(BP − RP) and its lower and upper confidence levels for the target stars were accessed via the Gaia archive.A V and its uncertainty were obtained with the bayestar module in dustmaps using the percentile mode.A V equals 2.742 times the reddening unit given by bayestar (Green et al. 2019).The scatter plot between EW 8621 and dust reddening is shown in Fig. 9 for E(BP − RP) (upper panel) and A V (lower panel), with the median values and standard deviations taken in each EW 8621 bin with a step of 0.05 Å (red dots).The median dots present a good linear relation between EW 8621 and dust reddening for both E(BP − RP) and A V for EW 8621 ≲ 0.5 Å, with a deviation at larger EW 8621 .AS23 reported larger A V than expected when the median dots deviated from their linear fit to EW 8621 and A V , but conversely, the median A V becomes smaller than expected in our work in these regions.This is only because the median dots were taken in EW 8621 bins in this work ,but in A V bins in AS23.We checked a part of the outliers, for example, with high reddening but small EW 8621 , and found that many of the DIB measurements have proper DIB parameters, and their ISM spectra clearly contain DIB signals by a visual inspection.This verifies that the DIB and dust are not required to appear together.They could statistically present a linear relation only due to the accumulation of different ISM species along the sightline.Therefore, DIB EW may not be a good proxy for dust reddening in specific directions, and their ratio also varies with the investigated samples.
The linear fits to EW 8621 and dust reddening performed in previous works are also plotted as dashed lines in Fig. 9.For E(BP − RP), the tendency of the median dots is consistent A199, page 7 of 22  with the fit line of GDR3 (black) and of GFPR (magenta).Furthermore, the standard deviation of the individual measurements in each EW 8621 bin is much larger than the difference between GDR3 and GFPR.For A V , the median dots are closer to the line of GDR3 (red) than that of AS23 (blue).AS23 obtained a fit EW 8621 /A V = 0.106 ± 0.017 Å mag −1 , corresponding to 3.448 mag Å −1 of E(B − V)/EW 8621 , which is 57% larger than the value fit in GDR3 (2.198 mag Å −1 ).This difference is mainly caused by the systematic difference in EW 8621 (see Fig. 14), but is not the control of the bias and uncertainties argued by AS23.The E(B − V)/EW 8621 ratio derived in different works (e.g.Wallerstein et al. 2007;Munari et al. 2008;Kos et al. 2013;Puspitarini et al. 2015) has a 20% difference on average (see Table 3 in GDR3).The result of AS23 is similar to that in Zhao et al. (2021), who used the Gaia-ESO (Gilmore et al. 2012) data set and E(B − V) from Schlegel et al. (1998).We emphasize that the mean correlations between EW 8621 and dust reddening derived in our series using the Gaia RVS data (GDR3, Zhao et al. 2022, GFPR) are consistent with each other within 10% (see also the discussions in Sect.5.2 in GFPR).

Kinematics of DIB λ8621
To study the kinematics of the carrier of DIB λ8621, it is most fundamental and important to determine its rest-frame wavelength (λ 0 ), which is also required to identify the nature of λ8621 through the comparison to the laboratory measurements.In this  work, we followed the statistical method, which assumes that the DIB radial velocity is null toward the GAC in a circular rotational obit, and thus the intercept at ℓ = 180 • would indicate λ 0 (see Zasowski et al. 2015;GDR3;AS23).
As the published RVS spectra have been shifted in the stellar frame where stellar lines are at their rest positions but the DIB features are additionally shifted, we converted λ 8621 into the heliocentric frame (λ helio ) using the radial velocity of stars (V star ) determined in Gaia DR3 (Katz et al. 2023).We selected 77 DIB measurements in our DIB catalog with |b| < 5 • , 170 • < ℓ < 190 • , d ⩽ 3 kpc, err(λ 8621 ) < 0.5 Å, and err(V star ) < 5 km s −1 .The information of their background stars, including the Gaia-DR3 source ID, Galactic coordinates, apparent G magnitudes, and stellar atmospheric parameters from GSP-Spec, are listed in Table C.1, as well as their λ helio .We note that some cases seem to be early-type stars without GSP-Spec estimates of their stellar parameters.Nevertheless, by visual inspection, our RF model, which does not rely on stellar parameters, also works well for these spectra, even though the RVS sample is dominated by late-type stars.Figure 10 presents the slightly linear trend of λ helio around the GAC, reflecting the projection of the galactocentric rotation of the DIB carrier (Zasowski et al. 2015).Some tiny deviations of λ helio from the linear trend can be seen.In addition to the fitting uncertainty of λ 8621 , the turbulent motion in the DIB cloud and the possible physical changes of DIB shapes and positions (see e.g.Galazutdinov et al. 2008;Krełowski et al. 2021) probbably also contribute.The applied statistical method might reduce these effects if no strong systematic deviations existed.A least-square linear fit to λ helio and the angular departure from the GAC (∆ℓ) obtained an intercept of 8623.446 ± 0.030 Å.The uncertainty was estimated by a 2000 times Monte Carlo simulation according to the fit uncertainty of λ 8621 .We note that the mean error of λ 8621 from the MCMC fitting is 0.256 Å for the 77 selected DIB measurements, which is larger than the statistical uncertainty by an order of magnitude.A factor of c/(c + U ⊙ ) was used to correct for the effect of solar motion, where c is the speed of light and U ⊙ = 10.6 km s −1 (Reid et al. 2019) is the radial solar motion toward the GC.Finally, we obtained a λ 0 = 8623.141± 0.030 Å for DIB λ8621, which is perfectly consistent with the result of AS23 (8623.14 ± 0.087 Å), although we did not consider the distance calibration proposed in AS23.This value is nevertheless lower than that of GDR3 (8623.23 ± 0.019 Å) by 3.0σ using A199, page 8 of 22 our uncertainty for λ 0 (0.030 Å).Our derived λ 0 corresponds to 8620.766Å in air wavelength, which is consistent with most of the literature results within 2σ, such as 8620.7 Å of Sanner et al. (1978), 8620.75Å of Herbig & Leka (1991), 8620.79Å of Galazutdinov et al. (2000), 8620.7 Å of Munari et al. (2008, after the correction of the solar motion), and 8620.83Å of Zhao et al. (2021).
We applied the same selection criteria to 9763 cases in the HQ DIB catalog in GDR3 measured in the RVS spectra that are published in Gaia DR3 and obtained 67 DIB measurements.The derived λ 0 is 8623.368± 0.037 Å , which is even larger than the result of GDR3 with 0.14 Å (3.8σ).This result suggests that compared to the full Gaia RVS data set, the use of the public sample in DR3 would lead to a redshift of λ 0 .Therefore, the smaller λ 0 determined in this work and in AS23 is not caused by the selection bias of the sample, but by the systematic difference of λ 8621 .As pointed out by AS23, λ 8621 in GDR3 would be affected by the improperly modeled stellar lines.On the other hand, it is not clear whether the λ 0 derived in this work and AS23 with the public RVS sample is also redder than the true value.We expect that this problem could be answered by the following analysis of GFPR or the DIB measurements in Gaia DR4.
With the derived λ 0 , we calculated the radial velocity of the carrier of DIB λ8621 (V DIB ) in the local standard of rest (LSR) 4 .We selected 3592 DIB measurements at low latitudes (|b| < 5 • ) and with accurate λ 8621 (err(λ 8621 ) < 0.5 Å) and V star (err(V star ) < 5 km s −1 ) to present the variation in V DIB along with the Galactic longitude.The rotation of the DIB λ8621 carrier is clearly seen in the upper panel of Fig. 11, overlaid with the theoretical rotation curves with different distances from the Sun.Specifically, for a distance from the Sun (d), the galactocentric distance is calculated as , where R 0 = 8.15 kpc is the galactocentric distance of the Sun.Then the circular velocity (Θ) is predicted by model A5 in Reid et al. (2019) with R GC and ℓ, assuming b = 0. Finally, the radial velocity for a given d and where Θ 0 = 236 km s −1 is the circular velocity of the Sun.
When we consider the median V DIB in each ∆ℓ = 10 • bin (red dots in Fig. 11), the carrier of λ8621 in the selected sample is mainly located within a kinematic distance of 2 kpc from the Sun, although the velocities from individual DIB measurements are much more scattered.This is a reasonable interpretation based on inspecting the mean distances to the background stars of these DIB signals, which are all larger than 2 kpc with a minimum of 2.3 kpc.Moreover, the DIB carriers toward the GAC have a larger distance on average than those toward the GC.In the lower panel in Fig. 11, the median V DIB is compared to the longitude-velocity map of 12 CO J = (1−0) emission from Dame et al. (2001).We made use of the momentum-masked cube restricted to a latitude range of ±5 •5 .The median V DIB follows the 12 CO velocity curve in the local region, especially from ℓ ≈ −90 • to ℓ ≈ −180 • .The average V DIB deviation in each longitude bin is 13.8 km s −1 , which prevents the exploration of a finer relation between DIB λ8621 and 12 CO in velocity structures.Nevertheless, more DIBs with large V DIB in general can be found in the regions with high-velocity 12 CO emission.For instance, the 12 CO velocities between 90 • and 180 • concentrate in two main branches that can be interpreted as the Local and   4 The convention between the heliocentric frame and LSR is made with (U ⊙ , V ⊙ , W ⊙ ) = ( 10.6, 10.7, 7.6)  Perseus spiral arms (e.g.Reid et al. 2019).Although AS23 suggested that some DIB measurements coincide with the 12 CO emission in the Perseus arm, the density distribution of V DIB there did not bifurcate.These cases consequently only represent a very small percentage of the total.For ℓ ≈ −60 • to 0 • , V DIB coincides well with some discrete 12 CO emission at −20 to −10 km s −1 , suggesting that these DIB signals would originate in the Local arm.

Detection of DIB λ8648 in individual RVS spectra
The DIB signal around 8648 Å was first reported and measured by Sanner et al. (1978), with positive support from Herbig & Leka (1991), Jenniskens & Desert (1994), Wallerstein et al. (2007), and Munari et al. (2008), but it was missed in the DIB survey of Galazutdinov et al. (2000) and Fan et al. (2019).These inconsistent results could be due to the difficulties in measuring a weak and broad DIB feature.The high-resolution spectra of early-type stars used in these previous studies would introduce uncertainties in the continuum placement when measuring broad DIBs like this (Sonnentrucker et al. 2018).Furthermore, stellar lines would cause contamination even for early-type stars, such as the very strong Paschen 13 line (see Fig. 1 in Munari et al. 2008) and the He I line at 8648.3 Å reported by Krełowski et al. (2019) in a B-type star (HD 169454).Under these effects, any conclusions about this DIB signal could be distorted in case studies with early-type stars.Compared to early studies, the large number of Gaia RVS spectra allows us to systematically investigate this signal with A199, page 9 of 22 a much larger spatial coverage.Based on the BNM, we successfully modeled the stellar components of late-type stars and detected the DIB signal near 8648 Å in stacked RVS spectra in Zhao et al. 2022 and GFPR.We cite this DIB as λ8648 following the suggestion in Jenniskens & Desert (1994), but we obtained a smaller λ 0 as 8645.3 ± 1.4 Å in Zhao et al. (2022).The profile of λ8648 was found to be very shallow and broad.In this work, we further verified that DIB λ8648 can be detected in individual RVS spectra, although their S/N are much lower than those after stacking.Figure 5 already shows the clear profile of λ8648 in four ISM spectra.In this section, we selected 179 measurements from the DIB catalog for a statistical study of λ8648 and its correlation with λ8621, with strict criteria: D 8648 > 3R C , 8640 < λ 8648 < 8655 Å, 8 < σ 8648 < 12 Å, and S /N > 50.
The high consistency between the fit and integrated EW 8648 (Fig. 12a) demonstrates that the DIB profile of these cases was properly fit.The general decrease of S/N for large EW 8648 caused the slight deviation.It should be noted that the fit and integrated EW 8648 shown in Fig. 12a were simply calculated outside the masked region between 8660 and 8668 Å where Ca II residuals would exist (see, e.g., Fig. 5).This means that they are smaller than the EW 8648 shown in Fig. 12b that was directly calculated based on the fit DIB parameters.
EW 8621 and EW 8648 present a linear correlation with a Pearson coefficient (r p ) of 0.82 (Fig. 12b).However, the EW 8621 /EW 8648 ratio in this work is systematically lower than that in GFPR, especially for small EW 8648 .The cause could be a detection bias due to the limited S/N of individual RVS spectra.For D 8648 is only about one-third of D 8621 , weak λ8648 is harder to be detected at a given magnitude of S/N than weak λ8621, resulting in a lack of measurements with large EW 8621 /EW 8648 .This effect should be less pronounced for GFPR because the S/N of the stacked ISM spectra was far higher.On the other hand, after visually checking the extreme measurements with EW 8648 ≳ 0.8 Å but EW 8621 ≲ 0.2 Å, we found that the jagged noise within the very broad profile of λ8648 could lead to an overestimation of EW 8648 .There is another possibility.If the carriers of λ8621 and λ8648 have different spatial distributions and the λ8648 carrier is more compact, the stacking of ISM spectra in a 3D volume will obtain a smaller EW 8648 (lower mean abundance) than EW 8621 .Nevertheless, without a map of their distribution, we cannot analyze the extent of this effect.Additionally, EW 8621 /EW 8648 would also vary from one sightline to the next and present different relations with different samples.
The measured central wavelength of the two DIBs also presents a moderate linear relation with r p = 0.68 (Fig. 12c), but their FWHM is noise dominated, especially for λ8648.With a linear fit to λ 8621 and λ 8648 , we made a rough estimate of λ 0 for λ8648 as 8646.31Å in vacuum, assuming that the carriers of λ8621 and λ8648 are comoving.This value corresponds to 8643.93 Å in air, which is much smaller than previous suggestions, such as 8650 Å by Sanner et al. (1978), 8648.28Å by Jenniskens & Desert (1994), and 8649 Å by Herbig & Leka (1991).The difference between this result and Zhao et al. ( 2022) is 1.01 Å, slightly larger than the mean uncertainty of λ 8648 of the cases we used of 0.69 Å. Figure 13 shows the correlation between EW 8648 and A V from Green et al. (2019) for 93 cases (others are beyond the sky coverage of Green et al. 2019).A moderate linear correlation can be found with r p = 0.70.Compared to λ8621, DIB λ8648 presents a worse correlation with dust reddening, which has been noted in Zhao et al. (2022) and GFPR.Similar to EW 8621 /EW 8648 , the A V /EW 8648 ratio in this work is systematically lower than that in GFPR.

Reassessing the DIB measurements in Gaia DR3
When we consider the small distances to the background stars (the median distance is 1.31 kpc) and the moderate S/N of the RVS spectra (the median S/N is 115.3) for the HQ sample of GDR3, its measured λ 8621 and σ 8621 should present a quasi-Gaussian distribution centered on λ 0 and the mean Gaussian width with a dispersion due to the uncertainties and the Galactic rotation (like the distribution seen in Fig. 6 for this work and AS23).However, a strong dependence between λ 8621 and σ 8621 A199, page 10 of 22 is clearly seen for the HQ sample of GDR3 (see Fig. 1 in AS23), which would be attributed to the improperly modeled stellar lines.On one hand, for small σ 8621 (≲1 Å) and large λ 8621 (around 8624-8626 Å), the fittings might be purely pseudofittings because the residuals of Fe I lines there would be stronger than the DIB features.On the other hand, the increase in σ 8621 with λ 8621 shifting shortward (8622-8624 Å) implies a broadening of the DIB profile caused by the noise and stellar residuals as more stellar lines at shorter wavelength in the vicinity of the DIB signal.
Compared to GDR3, the ISM spectra derived by the datadriven methods in this work and AS23 are less strongly influenced by the stellar residuals.Therefore, to estimate the magnitude of the biases caused by the stellar residuals, we compared the DIB parameters from GDR3 to those from this work and AS23 fit in the same RVS spectra, specifically, 1518 cases between GDR3 and this work and 3167 cases between GDR3 and AS23, as well as 2000 cases between this work and AS23 as a control group.We only considered the highest level of the DIB quality flag in GDR3 (i.e., QF = 0; see Sect. 2 in GDR3 and Sect.8.9 in Recio-Blanco et al. 2023 for details).The differences in DIB parameters as a function of the fit values are shown in Fig. 14, and their statistics, including the median difference (MED), the root-mean-square difference (RMSD), and the absolute difference not exceeded by 90% of the sources (AD90), are presented in Table 1.The D 8621 for AS23 (not given in their catalog) was calculated by their λ 8621 and EW 8621 with a Gaussian function.
The impact of the stellar residuals causes a systematic shift of λ 8621 in GDR3, which is clearly seen as the systematic variation of ∆λ 8621 with λ 8621 in Fig. 14.This phenomenon coincides with the λ 8621 − σ 8621 dependence discussed above.The MED of ∆λ 8621 is much larger for this work (0.073 Å) than for AS23 (0.018 Å), but the RMSD is similar (0.35 Å) and close to the pixel size, corresponding to 12 km s −1 .This value is comparable to the mean uncertainty of λ 8621 (0.376 Å) in GDR3 for the joint samples as well.When we consider AD90, the maximum shift for most λ 8621 is about 0.56 Å (∼19 km s −1 ), 1.5 times larger than the mean uncertainty.As a comparison, λ 8621 in this work and in AS23 are highly consistent with each other, with a median difference of only 0.008 Å and with a halved RMSD and AD90.Nevertheless, ∆λ 8621 between AS23 and this work also presents a weak dependence on λ 8621 , which could be due to a weak stellar impact, even though most ∆λ 8621 are smaller than the RVS wavelength pixel size used in this work.
The overestimated σ 8621 in GDR3 has a MED of 0.164 Å compared to this work and tends to become larger with increasing σ 8621 .The RMSD (0.469 Å) is slightly larger than the mean uncertainty of σ 8621 in GDR3 (0.405 Å) and the AD90 reaches over 0.7 Å.As a comparison, σ 8621 measured in this work is larger than that in AS23, with a nearly constant difference (a MED of −0.293 Å).In addition to the overestimated σ 8621 , D 8621 in GDR3 is smaller than that in this work, which is different, and ∆D 8621 presents an increasing trend with D 8621 as well.Additionally, EW 8621 are highly consistent with each other between GDR3 and this work, with a MED of −0.002 Å and a RMSD of 0.030 Å, comparable to the mean EW uncertainty (0.024 Å) in GDR3.Overall, we propose that the impact of stellar residuals led to a distortion of the DIB profile in GDR3, which became slightly shallower and broader.The center of the profile was also shifted to one or two pixels at most for the joint sample, but the area of the profile (EW 8621 ) remained unchanged.
The EW 8621 in this work is systematically larger than that in AS23, and the difference increases further with the fit values and can reach around 0.1 Å for EW 8621 ∼ 0.3 Å.The mean ∆EW 8621 is 27.1% relative to our measurements, much larger than the mean uncertainty of EW 8621 (12.9%).Because AS23 and this work have consistent λ 8621 and nearly constant ∆σ 8621 , the rise in ∆EW 8621 might be caused by different ML algorithms that model the DIB depth in different ways.Moreover, AS23 modeled the profile of λ8621 with a Gaussian function, but we added a Lorentzian function for λ8648 and a linear continuum accounting for RVS spectra with a poor normalization.Nevertheless, we note that GDR3 made use of synthetic spectra from stellar models and a simple Gaussian fitting, but their EW 8621 is highly consistent with that in this work.Thus, the influence of the DIB model is probably not so significant.The last factor is the fitting technique.Specifically, the ISM spectrum was first derived in A199, page 11 of 22 this work, and then the DIB profile was modeled, while in AS23, the DIB profile was implemented as a pixel-by-pixel covariance matrix, together with the stellar components and the noise, and was optimized in a set of grids.Despite the systematic difference in EW 8621 , the span of the 16th to the 84th percentiles of ∆EW 8621 (a measure of the magnitude of the dispersion deducting the tendency) is similar to that between GDR3 and this work and that between AS23 and this work for EW 8621 ≲ 0.3 Å.

DIB λ8621 correlates better with neutral than with molecular hydrogen
Motivated by the good consistency and the multimodality found between DIB λ8621 and 12 CO in velocity structures, AS23 directly compared V DIB and V CO by a peak-finding method.Specifically, signals of λ8621 and 12 CO were simply matched by the position of the background stars and the space grid of 12 CO map (a resolution of 0.125 • for Dame et al. 2001).
For any detected 12 CO emission within 1σ of V DIB , V DIB was then compared to the intensity-weighted V CO calculated within nine velocity channels around V DIB (see Sect. 3.4.2and Appendix E in AS23 for details).With a linear fit restricted velocity range and contains multiple components that cannot be resolved in the RVS spectra.With the limited accuracy of V DIB and the strong bias of the peak-finding method, it is hard to conclude that the perfect association between 12 CO and the carrier of DIB λ8621 implies a clumpiness of the DIB carrier.The associated velocity between DIB λ8621 and 12 CO, as well as H I, is more likely a result of the general Galactic rotation of these gaseous ISM species at a similar distance.
Based on the velocity-matched DIB − 12 CO and DIB -H I pairs, we made a coarse investigation of the correlation between the DIB strength and the hydrogen abundance for both neutral hydrogen (N H I ) and molecular hydrogen (N H 2 ).We calculated their abundance as N H I = 1.823 × 10 18 × I H I (HI4PI Collaboration 2016) and N H 2 = 2 × 10 20 × W CO (Bolatto et al. 2013), where I H I and W CO are the velocity-integrated intensity calculated with nine velocity channels around their matched V DIB .This analysis was based on the assumption that ISM species with similar radial velocities are mainly located at a similar distance, so that the DIB features can be compared to the corresponding H I and 12 CO emission, with a narrow-range integration to deduct the foreground and background contamination.This is certainly an ideal assumption.As shown in Fig. 16 on a logarithmic scale, a moderate linear correlation is EW 8621 and N H I (r p = 0.74), while EW 8621 is not sensitive to N H 2 (r p = 0.21).Therefore, the carrier of λ8621 would correlate much better with neutral hydrogen than with molecular hydrogen.Although EW 8621 is proportional to the column density of the carrier only between the background star and us and the H I and 12 CO observations may trace the hydrogen abundance in a much wider distance range, the narrow integration range around V DIB seems to alleviate this influence.
A set of strong optical DIBs was reported to tightly correlate with N H I , but to only present a loose correlation with N H 2 when N H 2 > 10 20 cm −2 (e.g.Herbig 1993;Friedman et al. 2011;Lan et al. 2015).The relation between EW 8621 and N H I revealed by our coarse analysis corresponds to this inference.In particular, Friedman et al. (2011) derived a tight correlation between DIB λ5780 and N H I , and the range of N H I , where they found that the correlation (20 ∼ 21.5 cm −2 ) is similar to ours (see the dashed black line in the upper panel of Fig. 16).According to Fan et al. (2019), the E(B − V)-normalized EW of λ5780 is twice as high as that of λ8621.Hence, the larger EW 8621 than EW 5780 at a given N H I seen in Fig. 16 would be caused by an underestimation of N H I in our analysis due to the narrow integration range.Nevertheless, we did not find a loose correlation between EW 8621 and N H 2 even for N H 2 > 10 20 cm −2 , although the fit line of the EW 5780 − N H 2 relation in Friedman et al. (2011) crosses the highest-density region of our sample for log[EW 8621 ] A199, page 13 of 22 and log[N H 2 ] (see the dashed black line in the lower panel of Fig. 16).The possible variation in the X CO factor and the saturation problem of 12 CO would further hamper the investigation of the correlation between EW 8621 and N H 2 .

Completeness of the DIB catalog
With he injection test, AS23 estimated the completeness of their DIB catalog by selecting good measurements with ∆D < 20%, ∆σ DIB < 20%, and ∆λ DIB < 2.5 Å (differences between the fit and injected values; see Appendix F in AS23).For our selected DIB catalog, the mean uncertainty of D 8621 and σ 8621 is about 10% and the ∆λ 8621 is mainly within 0.5 between this work and GDR3 (see Table 1 and Sect.4.5).Hence, we estimated the completeness of our DIB catalog with more rigorous criteria, that is, ∆D < 10%, ∆σ DIB < 10%, and ∆λ DIB < 0.5 Å, based on the results of the injection test (see Appendix B).The distribution of the estimated completeness for DIB λ8621 as a function of the injected EW 8621 and σ 8621 (upper panel in Fig. 17) is similar to that in AS23.The completeness generally decreases with EW 8621 , but its variation with σ 8621 is not as clear as in AS23 because the sample we used in the injection test is smaller.In our DIB catalog, the median EW 8621 is 0.2 Å and its 16th and 84th percentiles are 0.1 and 0.4 Å.The mean completeness at EW 8621 ∼ 0.2 Å is about 65% and between 0.1 and 0.4 Å is 68%.This is an optimistic estimate because the injection test was simple and ideal.Nevertheless, this percentile is still much higher than the fraction of the joint sample to the total between this work and AS23 (∼25%), which could be a result of the selection bias on the DIB catalog (see Appendix D for a detailed discussion).
The estimated completeness for DIB λ8648 depends much less on EW 8648 and σ 8648 , indicating a stronger influence of the correlated noise and stellar residuals on the completeness of DIB λ8648 measurements.At EW 8648 ∼ 0.5 Å, the mean completeness is only 25%.If we were to use loose criteria, the same as those in AS23, the completeness would increase to 53%.Although the total number of the spectra in the target sample that truly contain DIB λ8648 signals is unknown, we apparently did not obtain enough DIB λ8648 measurements (only 179 after quality filtering) even in the selected DIB catalog.One possible reason is that the selection criteria are too rigorous for DIB λ8648.Another reason is that DIB λ8648 does not exist in every spectrum with DIB λ8621 signals, that is, fewer DIB λ8648 sightlines can be detected than that for DIB λ8621.The completeness for DIB λ8648 would be overestimated as well.
Based on the estimated completeness and the comparison to AS23 (Appendix D), the DIB catalog in this work would be very pure, but only little complete.It is a critical problem to increase both the purity and the completeness of the DIB catalog.However, our RF model cannot estimate the uncertainty of its predictions meaningfully, and this is a common failure for the ML approach.We will seek more models and more intelligent injection tests or simulations in following works.

Summary and conclusions
We developed a random forest model for building the stellar templates within the DIB window (8600-8680 Å) using the part of the spectra outside the window.This method can be treated as an improvement of the best-neighbor method developed in Kos et al. (2013) and was applied to the RVS spectra published in Gaia DR3.The training set constituted 21 974 spectra with E(B − V) ⩽ 0.02 mag, |b| ⩾ 30 • , and S /N > 50.After subtracting the stellar components by the generated templates, we fit DIB λ8621 by a Gaussian function and DIB λ8648 by a Lorentzian function as well as a linear continuum for 780 513 target spectra.These target spectra had a mean S/N of 58, and 90% of the spectral S/N was below 116.The mean distance of the background stars is 1.58 kpc, and 90% of them are located within 3.61 kpc.
Considering χ 2 dof , noise level (S/N and R C ), and the constraints on λ 8621 and σ 8621 , we selected 7619 reliable measurements for DIB λ8621 (the DIB catalog can be accessed via the CDS database).Their EW 8621 presented a moderate linear correlation with dust reddening from both Andrae et al. (2023) and Green et al. (2019), and the mean EW 8621 /A V ratio was consistent with our previous results in GDR3 and GFPR.Using 77 DIB measurements toward the GAC and an assumption of a circular orbit, we determined an updated rest-frame wavelength of DIB λ8621 as λ 0 = 8623.141± 0.030 Å in vacuum, corresponding to 8620.766Å in air, which is perfectly consistent with the result in AS23, but bluer than that in GDR3.Calculated by λ 0 , V DIB in LSR showed a wave pattern with the Galactic longitude, revealing the projected Galactic rotation of the carrier of DIB λ8621.The median V DIB also correlated with the 12 CO velocity structures in the local region, especially for the outer Galactic disk.With the peak-finding method used in AS23 and a narrow-range integration, we compared EW 8621 to the neutral (N H I , from HI4PI Collaboration 2016) and molecular (N H 2 , A199, page 14 of 22 Zhao, H., et al.: A&A, 683, A199 (2024) represented by 12 CO) hydrogen column densities.This was a coarse analysis, but we found that EW 8621 correlated much better with N H I (r p = 0.74) than N H 2 (r p = 0.21), which was consistent with the conclusions for strong optical DIBs in previous studies.
With rigorous quality control, we obtained 179 reliable measurements of DIB λ8648 in individual RVS spectra, which further confirmed this very broad DIB feature.Its EW and central wavelength both presented a moderate linear relation with those of DIB λ8621.The λ 0 of DIB λ8648 was estimated as 8646.31Å in vacuum, corresponding to 8643.93 Å in air, assuming that the carriers of λ8621 and λ8648 are comoving.
By comparing the DIB parameters in GDR3, in AS23, and in this work, we confirmed the impact of the stellar residuals on the DIB measurements in Gaia DR3 argued by AS23.The stellar impact leads to a distortion of the DIB profile, resulting in an underestimation of D 8621 and in an overestimation of σ 8621 .The center of the DIB profile might also be shifted (≲0.5 Å), but EW 8621 was consistent with our new measurements in this work with a median difference of only −0.002 Å and an RMSD of 0.030 Å.
EW 8621 in this work is systematically larger than that in AS23 and the difference further increases with the fit EW.The reason for the difference might be the different ML algorithms and fitting techniques used in the two works.The selection bias of the DIB catalog was clearly revealed by the crossed groups between AS23 and this work.The DIB catalog is very pure, but has a low completeness.In the following works, we will apply more ML algorithms to different survey data and investigate their consistency and/or systematic differences.
2. Section 3 introduces our ML model and the fitting of DIBs λ8621 and λ8648.In Sect.4, we investigate the intensity and kinematic properties of DIB λ8621, analyze the detection of DIB λ8648 in individual RVS spectra, and reassess the results of λ8621 in GDR3.The correlation between the strength and abundance of DIB λ8621, as well as the completeness of the DIB catalog, are discussed in Sect. 5.The main conclusions are summarized in Sect.6. A199, page 2 of 22Zhao, H., et al.: A&A, 683, A199 (2024)

Fig. 2 .
Fig. 2. Two-dimensional distributions of stellar atmospheric parameters (T eff , log g, [M/H]) from GSP-Spec (Recio-Blanco et al. 2023) for the training set, the validation set, the testing set, and the target set.The color represents the number of stars counted in each bin with a size of ∆T eff = 40 K, ∆ log g = 0.05, and ∆[M/H] = 0.04 dex.

Fig. 3 .
Fig. 3. Residuals between the observed and modeled normalized fluxes as a function of the wavelength for the 17 324 RVS spectra in the testing set.The color represents the residuals, and the spectra are sorted by spectral S/N, increasing from bottom to top.S /N = 22, 50, and 100 is indicated by the dashed black, blue, and red lines, respectively.The Ca II line within the DIB window is marked as a dashed green line.

Fig. 5 .
Fig. 5. Examples of the fits to DIBs λ8621 and λ8648 in four ISM spectra.The black and red lines are the ISM spectra and fit DIB profiles, respectively, normalized by the fit linear continuum.The error bars indicate the observational flux uncertainties of the RVS spectra.Orange marks the masked region during the fittings.The Gaia source ID of these targets, E(BP − RP) from Andrae et al. (2023), the EWs of the two DIBs (EW 8621 and EW 8648 ), and the S/N of the ISM spectra are indicated as well.

Fig. 6 .
Fig. 6.Number density of the measured DIB λ8621 as a function of the Gaussian width (σ 8621 ) and the central wavelength (λ 8621 ) of the fit DIB profile for the full target sample (left), 8388 selected DIB measurements (middle), and the reliable measurements in AS23 (right).The color of the left panel represents the number of DIBs calculated in 0.025 Å × 0.1 Å bins.In the middle and right panels, the color represents the number density estimated by a Gaussian KDE.The dashed red line indicates the rest-frame wavelength of DIB λ8621 determined in this work (8623.141Å, see Sect.4.3).

Fig. 7 .
Fig. 7. Distribution of the number of DIB measurements (N DIB , upper panel) and the mean EW 8621 (lower panel) in a Galactic projection for the selected DIB catalog (7619 measurements).The N DIB and mean EW 8621 were calculated in each HEALPixel with a resolution of 1.83 • (N side = 32).

Fig. 8 .
Fig. 8.Comparison between the fit and integrated EW 8621 for 7619 measurements in the DIB catalog.Upper panel: the color represents the number density estimated by a Gaussian KDE.The gray bars show the uncertainty of the fit EW 8621 .The dashed red line traces the one-to-one correspondence.A zoom-in panel shows the distribution of the EW difference (∆EW = EW fit − EW int ).The mean (∆) and standard deviation (σ) of ∆EW are indicated.Lower panel: distribution of the ∆EW as a function of the fit EW 8621 .

Fig. 9 .
Fig. 9. Correlation between EW 8621 and dust reddening for 2957 cases with E(BP − RP) from Andrae et al. (2023) shown in the upper panel and for 4656 cases with A V from Green et al. (2019) shown in the lower panel.The color of the scattered points represents their number density estimated by the Gaussian KDE.The red dots and their color bars are the median values and the standard deviations calculated in each EW 8621 bin with a step of 0.05 Å.The linear fits to EW 8621 and dust reddening from previous works are overplotted as dashed lines: magenta for GFPR, black and red for GDR3, and blue for AS23.

Fig. 10 .
Fig. 10.Observed central wavelengths in vacuum of DIB λ8621 in the heliocentric frame (λ helio ) as a function of the angular distance in longitude from the Galactic anticenter (∆ℓ) for 77 DIB measurements in this work.The black dots are the individual measurements with the fit uncertainties.The red line is the linear fit to the black dots.

log
Fig. 11.Longitude-velocity diagram for DIB λ8621 and 12 CO.Upper panel: variation of the radial velocity of DIB λ8621 carrier (V DIB ) along with the Galactic longitude for 3592 selected DIB measurements.The points are colored by their number density estimated by the Gaussian KDE.The red dots with error bars are the median V DIB calculated in each longitude bin with a step of 10 • .The colored lines are theoretical rotation curves calculated with the rotation model in Reid et al. (2019) with different distances from the Sun.Lower panel: median V DIB superimposed on the longitude-velocity map of 12 CO J = (1−0) emission from Dame et al. (2001).The color scale displays the 12 CO latitudeintegrated intensity on a logarithmic scale.

Fig. 12 .
Fig. 12. Correlations between DIBs λ8621 and λ8648 for 179 selected measurements for (a) fit and integrated EW 8648 outside the masked region between 8660 and 8668 Å; (b) fit EW; (c) measured central wavelength; and (d) FWHM.The dashed green line in panel a traces the one-to-one correspondence.The colored points in panel b are the results from GFPR.The color in panel c represents the number density estimated by the Gaussian KDE, and the red line is a linear fit to all the data points.The Pearson coefficient (r p ) of the correlation between the parameters of λ8621 and λ8648 for the 179 selected measurements is indicated in panels b and c.

Fig. 13 .
Fig. 13.Correlation between EW 8648 and A V from Green et al. (2019) for 93 selected measurements.The underlying points are the results from GFPR, colored by the number density.The Pearson coefficient (r p ) for the red dots is indicated as well.

Fig. 14 .
Fig. 14.Difference in DIB parameters (D 8621 , λ 8621 , σ 8621 , and EW 8621 ) between GDR3, AS23, and this work as a function of the fit values for the joint samples.The gray scale indicates the number density of the data points estimated by the Gaussian KDE.The data are binned with a step of 0.005 for D 8621 , 0.2 Å for λ 8621 , 0.1 Å for σ 8621 , and 0.01 Å for EW 8621 .The solid red lines in each panel represent the median differences in each bin, and the two dashed red lines show the 16th and 84th percentiles.

Fig. 16 .
Fig. 16.Correlation between EW 8621 and N H I (upper panel) and between EW 8621 and N H 2 (lower panel) on a logarithmic scale.The Pearson coefficient (r p ) is indicated.The color scale indicates the number density.The dashed lines are the linear fit results from Friedman et al. (2011), but for DIB λ5780.

ZhaoFig. 17 .
Fig. 17.Distribution of the estimated completeness for DIBs λ8621 (upper panel) and λ8648 (lower panel) as a function of EW and σ DIB based on the results of the injection tests (see Appendix B).

Table 1 .
Statistics of the differences in DIB parameters between GDR3, AS23, and this work.