Constraining the H 2 column densities in the diffuse interstellar medium using dust extinction and H i data

Context. Carbon monoxide (CO) is a poor tracer of H 2 in the di ff use interstellar medium (ISM), where most of the carbon is not incorporated into CO molecules unlike the situation at higher extinctions. Aims. We present a novel, indirect method to constrain H 2 column densities ( N H 2 ) without employing CO observations. We show that previously–recognized nonlinearities in the relation between the extinction, A V ( H 2 ), derived from dust emission and the H i column density ( N HI ) are due to the presence of molecular gas. Methods. We employ archival N H 2 data, obtained from the UV spectra of stars, and calculate A V ( H 2 ) towards these sight lines using 3D extinction maps. The following relation fits the data: log N H 2 = 1 . 38742 (cid:0) log A V ( H 2 ) (cid:1) 3 − 0 . 05359 (cid:0) log A V ( H 2 ) (cid:1) 2 + 0 . 25722 log A V ( H 2 ) − 20 . 67191. This relation is useful for constraining N H 2 in the di ff use ISM as it requires only N HI and dust extinction data, which are both easily accessible. In 95% of the cases, the estimates produced by the fitted equation have deviations under a factor of 3 . 5. We construct a N H 2 map of our Galaxy and compare it to the CO integrated intensity ( W CO ) distribution. Results. We find that the average ratio ( X CO ) between N H 2 and W CO is approximately equal to 2 × 10 20 cm − 2 (K km s − 1 ) − 1 , consistent with previous estimates. However, we find that the X CO factor varies by orders of magnitude on arcminute scales between the outer and the central portions of molecular clouds. For regions with N H 2 ≳ 10 20 cm − 2 , we estimate that the average H 2 fractional abundance, f H 2 = 2 N H 2 / (2 N H 2 + N HI ), is 0 . 25. Multiple (distinct) largely atomic clouds are likely found along high–extinction sightlines ( A V ≥ 1 mag), hence limiting f H 2 in these directions. Conclusions. More than 50% of the lines of sight with N H 2 ≥ 10 20 cm − 2 are untraceable by CO with a J = 1–0 sensitivity limit W CO = 1 K km s − 1 .


Introduction
Molecular hydrogen is the most abundant molecule in the Universe, rendering it one of the most, if not the most, important chemical constituents.Molecular hydrogen's lack of a permanent electric dipole moment results in its rotational levels being connected only by quadrupole transitions.Their weakness, together with the large spacing of its rotational levels, results in rotational emission from H 2 being extremely weak at the temperatures typically encountered in molecular regions and is only rarely observed in infrared emission.
The use of carbon monoxide (CO) emission lines is another powerful method for probing N H 2 .CO is an abundant molecule in the ISM.It is abundant in regions where N H 2 ≥ 10 20 cm −2 , as self-shielding there is sufficient to prevent rapid photodissociation of CO by the background ISM radiation field.Thus, when there is CO, there is also H 2 , although the opposite is not always true (e.g., Grenier et al. 2005; Planck Collaboration XIX 2011; Barriault et al. 2010;Langer et al. 2010Langer et al. , 2014Langer et al. , 2015;;Velusamy et al. 2010;Pineda et al. 2013;Goldsmith 2013;Skalidis et al. 2022;Madden et al. 2020;Kalberla et al. 2020;Murray et al. 2018;Lebouteiller et al. 2019;Glover & Smith 2016;Seifried et al. 2020;Li et al. 2015).
The CO integrated intensity (W CO ) can be used as a proxy for N H 2 by employing the X CO factor, defined as (1) In our Galaxy, the average value of X CO is ⟨X CO ⟩ ≈ 2 × 10 20 cm −2 (K km s −1 ) −1 (Bolatto et al. 2013).When X CO is measured, factor of 2 deviations are expected from the average value due to the assumptions employed to estimate N H 2 by various methods (Dame et al. 2001;Bolatto et al. 2013).
The conversion of W CO to N H 2 using the Galactic average X CO is meaningful only in a statistical sense, and when it is applied on Galactic scales.Within individual clouds in the interstellar medium (ISM), X CO can vary by orders of magnitude (e.g., Pineda et al. 2008Pineda et al. , 2013;;Lee et al. 2014;Ripple et al. 2013) due to the sensitivity of X CO to local ISM conditions, including gas density, turbulent line width, and the interstellar radiation field (Visser et al. 2009;Glover & Clark 2016;Gong et al. 2017).
Diffuse regions of the ISM (defined as having A V ≲ 1 mag, and density n ≲ 100 cm −3 ) may contain a significant fraction of molecular hydrogen but relatively little CO compared to the abundance of carbon (Grenier et al. 2005).The reason is that the formation of H 2 formation takes place at smaller extinctions than does that of CO due to the rapid onset of self-shielding at lower values of the extinction.
In the transition phase where the H 2 density has started rising but that of CO is still small, oxygen and carbon are largely in atomic form.In this region, then, H 2 is primarily associated with ionized (C + ) or neutral (C I) carbon, and not with CO.Molecular hydrogen in regions that are undetectable in CO are referred to as CO-dark H 2 .The mass of CO-dark H 2 is estimated to be ∼30% of the total (primarily H 2 ) mass (Pineda et al. 2010;Kalberla et al. 2020).
Finally, N H 2 can be constrained using dust extinction.Dust in the ISM is well mixed with gas (Boulanger et al. 1996;Rachford et al. 2009).Thus, the extinction and reddening produced by dust is proportional to the total (atomic plus molecular hydrogen) gas column along the LOS (Bohlin et al. 1978).All established methodologies (Sect.6) concur that in regions where the gas is atomic, the dust reddening, E(B − V), is linearly correlated with N H I (Liszt & Gerin 2023;Lenz et al. 2017;Nguyen et al. 2018;Shull & Panopoulou 2023); changes in the dust or hydrogen content cause N H I /E(B − V) to vary (Liszt 2014b;Shull & Panopoulou 2023).At high Galactic latitudes, Lenz et al. (2017) found that the linear correlation between E(B − V) and N H I holds for N H I ≲4 × 10 20 cm −2 .At higher H I column densities, E(B − V) increases nonlinearly with N H I , behavior plausibly attributed to the presence of molecular gas.This implies that the residuals between the total dust extinction (or reddening) and the extinction induced by dust mixed with H I gas should probe N H 2 (Planck Collaboration XIX 2011;Paradis et al. 2012;Lenz et al. 2015;Kalberla et al. 2020).
This strategy for estimating N H 2 can only be applied if two conditions are satisfied: (1) the N H I /E(B − V) ratio must be constrained, and (2) a mapping function that converts the extinction residuals to N H 2 exists.N H I /E(B − V) has been constrained in the past by various authors (e.g., Liszt 2014b; Lenz et al. 2017;Nguyen et al. 2018;Shull et al. 2021).The major challenge is to find an accurate way to convert the extinction residuals to N H 2 .
In this paper, we present a novel method for converting the dust extinction residuals to N H 2 and then construct a full-sky N H 2 map of our Galaxy without using CO observations.We use our N H 2 estimates to constrain some of the molecular gas properties of our Galaxy.
We calculated the residuals between the total extinction and the extinction expected from dust mixed with H I gas toward lines of sight (LOSs) with N H 2 measurements; these N H 2 measurements have been obtained from absorption lines (Sect.2).We find that the extinction residuals are well correlated with N H 2 and fit them with a polynomial function.We then calculated the extinction residuals using full-sky N H I and E(B − V) maps (Sect.3).Finally, we converted the extinction residuals to N H 2 using our fitted function in order to construct a full-sky N H 2 map at 16 ′ resolution.
We compared our constructed N H 2 map with the CO intensity map of Dame et al. (2001) (Sect. 4), reproducing the previously reported global value of X CO .Our N H 2 estimates are consistent with previous work that combines dust extinction residuals and the X CO factor (Sect. 4.2).We constructed a full-sky map of the molecular fractional abundance and show that molecular hydrogen is less abundant than atomic hydrogen (Sect.5).In Sect.5.2, we determine the proportion of CO-dark molecular hydrogen in relation to the total amount of molecular hydrogen.We explored the N H /A V ratio and compared our findings against previous results (Sect.6).In Sects.7 and 8, we discuss our results in the context of the existing literature and present our conclusions.

Probing molecular gas using dust extinction
For atomic gas, Lenz et al. (2017) found that dust reddening correlates with the column density of the atomic hydrogen as N H I /E(B − V) = 8.8 × 10 21 cm −2 mag −1 .This is our reference value, and we show that our results remain statistically the same when we consider variations about this ratio (Sect.2.2.2).When the gas becomes molecular, E(B − V) increases nonlinearly with respect to N H I .
We assumed a constant total-to-selective extinction (Cardelli et al. 1989;Schlafly & Finkbeiner 2011) and calculated the visual extinction of dust mixed with atomic gas, (2) R V fluctuates throughout our Galaxy (e.g., Peek & Schiminovich 2013;Zhang et al. 2023).However, the uniformity of R V is implicit in our method, because we employed dust reddening Fig. 1.Extinction of dust mixed with molecular gas as a function of N H 2 .The H 2 column densities were determined from UV observations toward stars, and the A V (H 2 ) were determined from the nonlinearities of the E(B − V) versus N H I relation (Eq.( 3)). Green stars identify outlying points that were not considered in the fitting.At high column densities, A V (H 2 ) is strongly correlated with N H 2 , indicating that the nonlinear increase in dust reddening with respect to N H I is induced by the presence of molecular gas.The orange curves were randomly sampled from the posterior distribution of the Markov chain Monte Carlo fitting, representing the fitting uncertainties.The green curve corresponds to the mean posterior profile that we employ to convert A V (H 2 ) to N H 2 (Eq.( 6)).maps that were constructed using far-infrared multiwavelength dust intensity data (Sect.3, Appendix A).These maps directly constrain the dust emission properties, such as dust temperature and opacity, when some dust model is fitted to the data, and indirectly E(B − V), by assuming some extinction law and R V = 3.1 (e.g., Schlegel et al. 1998).In addition, most of the existing constraints of N H I /E(B − V) also rely on these dust reddening maps.Therefore, the uniformity of R V inevitably propagates in our methods and accounts for some of our uncertainties, which are calculated in Sect.2.4.
We defined the residuals between the total dust extinction and A V (H I) as where A V (H) denotes the total extinction from dust associated with both atomic and molecular hydrogen.On the left hand side, we use H 2 in the parentheses because we hypothesize, and verify below (Fig. 1), that these residuals trace the visual extinction of dust mixed with molecular hydrogen.Equation ( 3) is the cornerstone of our methodology for estimating N H 2 .A V (H I) can be estimated using N H I measurements (Eq.( 2)) from surveys such as HI4PI (HI4PI Collaboration 2016), while A V (H) can be extracted from publicly available full-sky dust extinction maps (e.g., Schlegel et al. 1998;Planck Collaboration Int. XXXV 2016).The next step in our strategy is to find an empirical relation for N H 2 as a function of A V (H 2 ).Before proceeding, we discuss some pathologies in the definition of A V (H 2 ).

Pathologies in the definition
Dust extinction is a positive definite quantity, but A V (H 2 ) can become negative if A V (H I) > A V (H).This situation can be encountered when either the signal-to-noise ratio (S/N) of the measurements is low or the value of N H I /E(B − V) deviates from its global value; variations of this ratio propagate to A V (H 2 ), due to Eq. (3) (Sect.2.2.2).We present a thought experiment to demonstrate how the use of a global N H I /E(B − V) (Eq.( 2)) can affect the estimation of A V (H 2 ).
Consider an ISM cloud with a local N H I /E(B − V) ratio smaller than the Galactic value.We considered the imaginary cloud to have a given A V (H) and that the gas is atomic.Thus, the total extinction of the cloud will be produced by dust mixed with H I gas.If we used the local N H I /E(B − V) of the cloud, we would derive that A V (H) = A V (H I), and hence A V (H 2 ) = 0.However, in Eq. ( 3) we use the global value from Lenz et al. (2017), which would yield for the cloud of our thought experiment that A V (H) < A V (H I), and hence A V (H 2 ) < 0.
Thus, A V (H 2 ) can become negative due to the use of a global N H I /E(B − V), which limits the detectability of our method.However, this is something that we cannot overcome because local measurements are limited.For this reason, for every measurement with A V (H 2 ) < 0, we considered gas to be fully atomic, and it is not considered in the analysis below.

Mapping
We calculated A V (H 2 ) using Eq. ( 3) toward several LOSs with reliable direct N H 2 determinations.The direct N H 2 measurements were obtained by observing the Lyman-Werner absorption lines toward background sources in our Galaxy.We extracted the N H 2 measurements, with their corresponding observational uncertainties, from the catalogues of Sheffer et al. (2008), Gillmon et al. (2006), Gudennavar et al. (2012), andShull et al. (2021); these catalogues also provide estimates for N H I , which allow us to obtain A V (H I) (Eq.( 2)).For measurements with asymmetric observational uncertainties, we calculated their average value.
The majority of the sources used for estimating N H 2 are nearby -at distances of less than 1 kpc -and close to the Galactic plane.Therefore, the column densities obtained from the absorption lines of these stars trace the integrated gas abundances up to the distance of each object, and not the total gas column.Therefore, we also needed to calculate the dust extinction up to the distance of each star (d ⋆ ).Below, we explain how we obtained fractional extinctions up to the distance of each star using the publicly available 3D extinction map of Green et al. (2019).
For each object with N H 2 measurement, we extracted d ⋆ from the catalogue of Bailer-Jones et al. (2021), which uses parallaxes from Gaia Data Release 3 (Gaia Collaboration 2021).We then calculated the dust extinction integral of each star as where z is the LOS distance between the observer and the star at distance d ⋆ , and E(B − V)(z) denotes the dust reddening at a given distance z.We extracted the E(B − V)(z) values from the 3D map (Bayestar19) of Green et al. (2019) 1 .This map provides a sample of E(B − V)(z) profiles drawn from a posterior distribution; for the calculation of E(B − V) ⋆ we used the most probable extinction profile.Dust reddening values from the map of Green et al. (2019) are calibrated for the Sloan Digital Sky Survey (SDSS) bands.We converted to dust reddening by multiplying the output of their map by 0.884, which is the standard color correction recommended by those authors.Then, we converted E(B − V) ⋆ to A V (H) by multiplying by R V .
Both A V (H) and A V (H I) are constrained for the LOSs where N H 2 measurements exist from spectroscopic observations.This allows us to calculate A V (H 2 ) for each LOS and compare with N H 2 .Several objects with spectroscopically constrained N H 2 are located in the anticenter of our Galaxy, which are not covered by the extinction map of Green et al. (2019).These measurements were not considered in the analysis below.
Figure 1 displays A V (H 2 ) versus N H 2 .We do not show measurements with log A V (H 2 ) (mag) ≲ −1.2 because they are nondetections.We also note that some measurements, mostly with low S/N, yielded negative A V (H 2 ).These were also discarded.
For log N H 2 (cm −2 ) ≲ 19, A V (H 2 ) remains constant with respect to N H 2 .Most of the measurements there have S /N < 3 2 , and hence we cannot confidently determine whether the flatness of the profile is physical or induced by noise in the data.We observe a strong correlation between A V (H 2 ) and N H 2 for log N H 2 ≥ 20.This correlation strongly supports the initial hypothesis that the extinction residuals, found in the total dust extinction when the extinction induced by dust mixed with H I is subtracted, yield the extinction of dust mixed with H 2 (as given by Eq. ( 3)).
The various quantities involved in our analysis (E(B − V), N H 2 , N H I ) are measured with beams of differing size: a) N H 2 is measured from UV absorption lines with a "pencil beam" defined by the star, b) several N H I measurements have been constrained by fitting the Lyα profiles (e.g., as in Shull et al. 2021), which are also characterized by pencil beams, while some of the N H I measurements were obtained from H I emission line data, where the resolution is 16 ′ (HI4PI Collaboration 2016), and c) the resolution of the 3D extinction map of Green et al. (2019) varies from 3.4 ′ to 13.7 ′ .Some of the scatter of the points in Fig. 1 is a result of this effect, but we note that the beam mismatch seems to be less important for diffuse and extended ISM clouds (e.g., Pineda et al. 2017;Murray et al. 2018).Although we cannot make a correction for this issue, we include it when we calculate the confidence levels of our method (Sect.2.4).

Observational uncertainties in A V (H 2 )
There are two different sources of uncertainty that affect the estimates of A V (H 2 ): (1) Uncertainties coming from the 3D extinction map of Green et al. (2019), and (2) distance uncertainties of stars.
To evaluate the dust extinction uncertainties, we sampled 300 random values from the posterior distribution of the Green et al. (2019) map at the most probable distance of each star (d ⋆ ).We calculated the extinction spread (σ ext ) at d ⋆ .
Parallax uncertainties affect the distance estimates of a star.We consider the distance upper and lower limits denoted as d ⋆,+ and d ⋆,− , respectively.From the map of Green et al. (2019), we calculated the most probable extinction at d ⋆,+ , and at d ⋆,− , denoted as E(B − V) (d ⋆,+ ) and E(B − V) (d ⋆,− ), respectively.We then calculated the differences ).The average (σ d ) between σ d+ and σ d− measures the extinction uncertainties due to parallax uncertainties.We take the total dust extinction uncertainty to be the quadratic sum of σ ext and σ d .These uncertainties are shown as the error bars in Fig. 1 and discussed in the following.
Fig. 2. Same as Fig. 1 but for different N H I /E(B − V) ratios.The black points with error bars correspond to our formal measurements using the N H I /E(B − V) constraint of Lenz et al. (2017).Blue and cyan points correspond to measurements obtained when we assume that N H I /E(B − V) ×10 −21 is equal to 10 and 8 cm −2 mag −1 , respectively.Approximately all data points show statistically consistent results.

Uncertainties from N H I /E(B -V) variations
At high Galactic latitudes (|b| ≥ 60 • ), Lenz et al. (2017) found that the correlation between E(B − V) and N H I is very tight, with intrinsic variations only 0.015 mag.However, as shown recently by Shull & Panopoulou (2023), the uncertainties in the E(B − V) versus N H I relation are larger than what is inferred by Lenz et al. (2017), with variations being dominated by systematic uncertainties in the employed datasets.Lenz et al. (2017) used the E(B − V) map of Schlegel et al. (1998), and N H I estimated using only emission line data from HI4PI Collaboration (2016), assuming that the line is optically thin.However, the E(B − V) values of Planck Collaboration XI (2014) are systematically larger than the values of Schlegel et al. (1998) at high Galactic latitudes (Shull & Panopoulou 2023).Thus, if the Planck map were employed, the ratio between N H I and E(B − V) would be biased toward smaller values than what would have been inferred from the Schlegel et al. (1998) map.In addition, beam effects in the H I emission data can induce significant scatter in the obtained N H I constraints.Several of these uncertainties are summarized in Shull & Panopoulou (2023), who found that N H I /E(B − V) ×10 −21 ranges from 8 to 10 cm −2 mag −1 ; this range is orders of magnitude larger than what is claimed by Lenz et al. (2017), and represents the formal uncertainty in N H I /E(B − V).We explored how N H I /E(B − V) variations, which propagate to our calculations through Eq. ( 2), affect our results.
Figure 2 shows A V (H 2 ) versus N H 2 for every measurement, except for the outliers (green stars), shown in Fig. 1.Black points correspond to A V (H 2 ) when we use the Lenz et al. (2017) relation, N H I /E(B − V) ×10 −21 = 8.8 cm −2 mag −1 ; the error bars are calculated as explained in Sect.2.2.1.The blue and cyan points correspond to the obtained A V (H 2 ) when we assume that N H I /E(B − V) ×10 −21 is equal to 10 and 8 cm −2 mag −1 , respectively.
For log N H 2 > 20.56, the colored (black and cyan) points are very close to the black points.In this regime, gas is mostly molecular, and thus the contribution of the dust mixed with atomic gas to the total dust extinction is minor.Thus, the uncertainty of N H I /E(B − V) has a weak effect in high-extinction A161, page 4 of 20 regions.When N H 2 is low, deviations between black and colored points become more prominent, because the relative abundance of atomic gas is higher there.However, even there, the offset between the black and the colored points is statistically insignificant for the vast majority of the points (Fig. 2).There are only six exceptions, but this is only a minor fraction of the sample.
We conclude that our analysis remains robust in relation to variations of N H I /E(B − V), which implies that uncertainties coming from the 3D extinction maps (Green et al. 2019) or Gaia distances (Sect.2.2.1) are more important.However, variations in N H I /E(B − V) likely contribute to the scatter of the observed A V (H 2 )-N H 2 relation (Fig. 1).Our estimated confidence intervals include all the aforementioned uncertainties (Sect.2.4).

Fitting the data
We fitted a polynomial to the A V (H 2 )-N H 2 relation using Bayesian analysis.We assumed uniform priors, and calculated the following likelihood: where σ is the quadratic sum of the observational uncertainties of log A V (H 2 ), and log N H 2 , y i corresponds to the measured A V (H 2 ), ỹ is the intrinsic model, which we assume to be a polynomial, and i is the measurement index.
We sampled the parameter space with the "emcee" Markov chain Monte Carlo sampler (Foreman-Mackey et al. 2013).Measurements shown as green stars in Fig. 1 were not included in the fit because they are variable sources (Sect.2.2.1).However, the fit changes only slightly even if they are included.
The best-fit polynomial equation is with uncertainties for the fitted coefficients (from highest to smallest power of the logarithm) being 0.00769, 0.42823, 7.88171, and 49.91439.The spread of the fit, shown by the orange lines in Fig. 1, grows for log N H 2 (cm −2 ) < 19, making our fit unreliable in that range.This determines the sensitivity threshold of our method.
In practice, N H 2 is usually the unknown, while A V (H 2 ) can be measured from dust extinction and N H I observational data.N H 2 can be estimated from A V (H 2 ) using the following analytical relation, which also fits our data well, The fitting uncertainties of the coefficients, from highest to smallest power of the logarithm, are 0.17822, 0.17277, 0.05248, and 0.01555.The above equation maps A V (H 2 ) to N H 2 , and we used it to construct a full-sky N H 2 map of our Galaxy (Sect.3).

Fitting different polynomials
We experimented with the degree of the polynomial used to fit Eq. ( 7).For even polynomials, we find that N H 2 is reduced at high A V (H 2 ) because the coefficient of the highest power of the fitted polynomial is always negative, hence making the curve concave downward.When we forced the coefficient of the highest-power term to be positive (by employing this constraint in the prior distribution), the reduction of N H 2 disappeared.However, we wish to keep a flat distribution of priors over the entire domain range (uninformative priors).We thus decided to work with polynomials whose highest-power term is odd because they did not show any reduction of N H 2 at high A V (H 2 ).
A linear polynomial would not reproduce the nonlinear behavior of the data at log N H 2 (cm −2 ) ≲ 20, and log A V (H 2 ) (mag) ≲ −1 (Fig. 1).As a result, linear models would underestimate N H 2 in low-extinction regions.We conclude that a cubic polynomial (as shown in Eq. ( 7)) is the simplest function that accurately represents the data, while higher-degree polynomials overfit the data.

Outliers in the
In the fitting process, we excluded the four measurements shown as green stars (Fig. 1).According to the SIMBAD database, one of the outliers (identified as HD 200120) is a Be star -this object also appears in the Be star catalogue of the International Astronomical Union (Jaschek & Egret 1982), hence further supporting the validity of the spectral classification of this object -while there is a T Tauri (HD 40111) and a β Cephei variable (HD 172140) star.Both Be and T Tauri stars can show some variability in both photometric and spectroscopic observations: Be stars eject material, due to their rapid rotation, while the brightness of T Tauri objects can vary significantly, due to their high accretion rate, even within months.β Cephei are B-type stars whose surface pulsates owing to their rapid rotation.The last object that was considered an outlier (HD 164816) is a Be star according to the SIMBAD database.However, accurate spectroscopic observations suggest that it is a binary system (Sota et al. 2014).In either case, some degree of variability is expected, which explains why this measurement is an outlier.We consider the N H 2 measurements toward these stars to be untrustworthy and did not include them in the fit.But even if we include these measurements, the fit changes insignificantly.

Confidence intervals in the estimated N H2
Our fitted model (Eq.( 7)) allows us to estimate N H 2 using H I and E(B − V) data.But the estimates do not come without uncertainties.
The uncertainties of the fit (shown by the orange lines in Fig. 1) are much smaller than the variance of the points about the fitted curve.These measured variations could be induced by several factors, such as variations in metallicity gradients, H I emission line optical depth effects (Sect.7), or due to the beam mismatch of the employed data, unaccounted for in our model.Therefore, the fitting uncertainties alone overestimate our confidence on the N H 2 estimates derived by Eq. ( 7).
To place reasonable confidence levels, we assumed that the depicted A V (H 2 ) spread (Fig. 1) represents the intrinsic spread in our Galaxy.We calculated the linear scale ratio (r) between the observed A V (H 2 ) with the value obtained from Eq. ( 7) for every data point with S /N ≥ 3. From the cumulative distribution function of r -the distribution of r is asymmetric with skewness equal to 1.38 -we obtained the following probabilities: for 68% of the measurements r ϵ [0.75, 2.48], for 95% r ϵ [0.46, 3.54], and for 99% r ϵ [0.30, 4.78].This implies that in 95% of the cases, the "true" N H 2 should not deviate by a factor greater than 3.5 from the values obtained from the fit (Eq.( 7)).

Full-sky N H 2 map of our Galaxy
We use the N H I data from the HI4PI survey (HI4PI Collaboration 2016) and the dust extinction map of Schlegel et al. (1998) -the values of this map are multiplied by 0.884 to account for systematic uncertainties (Schlafly et al. 2010;Schlafly & Finkbeiner 2011) -to construct a full-sky N H 2 map of our Galaxy.For comparison, we use the Planck extinction map.Overall, the Planck map shows enhanced extinction at high Galactic latitudes; this is also noted by Shull & Panopoulou (2023).The various estimated quantities for the molecular hydrogen do not change by more than a factor of 2 when the Planck map is used (Appendix A).
For the construction of the full-sky N H 2 map, we calculated A V (H 2 ) using the N H I and the E(B − V) maps (Eq.( 3)), and then converted to N H 2 (Eq.( 7)).The resolution of the extinction map (6.1 ′ ) is higher than that of the N H I data (16.2 ′ ).We thus smoothed the extinction map to the resolution of the H I data.Figure 3 (top panel) shows a Mollweide projection of our constructed N H 2 map for the Milky Way.We only visualize regions with log N H 2 ≥ 19 because below this limit our calibration relation is dominated by noise (Fig. 1).
We identify several molecular clouds of the Gould Belt, including the Polaris Flare (ℓ, b ∼ 125 We observe many small-angular-scale diffuse molecular clouds in the southern Galactic hemisphere, while in the northern hemisphere molecular structures seem to be coherent over scales of many degrees. We hardly see any molecular gas structures with log N H 2 (cm −2 ) ≥ 19 above the Galactic plane, for |b| ≥ 60 • .This, however, seems to be the case only when we use the Schlegel et al. (1998) extinction map.When we construct the N H 2 map using extinctions from Planck Collaboration XII (2020), we see some molecular structures extending up to the northern and southern poles, and weaker emission from both extended and small-scale clouds at higher Galactic latitudes (Fig. A.1).In both N H 2 maps (Figs. 3, and A.1), the majority of the molecular gas seems to lie within |b| ≲ 45 • .
In the Galactic plane, extinctions are larger than the extinction range used for the fitting of the N H 2 -A V (H 2 ) relation (Eq.( 7)).Therefore, the estimated H 2 column densities in the Galactic plane were derived by extrapolating the fitted relation to larger values of A V .

Comparing N H 2 and W CO
Figure 3 shows our N H 2 map (top panel) and the CO (J = 1-0) integrated intensity, W CO , map (bottom panel) from Dame et al. (2001).The two maps are well correlated, but the observed structures in the N H 2 map are more extended than in W CO .In this section our analysis is focused on CO-bright H 2 regions, while in Sect.5.2 we study the properties of CO-dark H 2 , which extends to higher Galactic latitudes than CO-bright H 2 .
Figure 4 shows the full-sky X CO (Eq.( 1)) map obtained with our N H 2 measurements and W CO from Dame et al. (2001).We calculated X CO toward LOSs with W CO ≥ 1 K km s −1 , which is the noise level in the CO data.We find that the average value of X CO is approximately equal to 2 × 10 20 cm −2 (K km s −1 ) −1 , which matches exactly with the previously reported value (Bolatto et al. 2013;Liszt & Gerin 2016).
We observe that in many regions X CO varies by orders of magnitude within a few arcminutes.The small-scale variations X CO are more prominent in regions above the Galactic plane and in the Galactic plane but only for 270 • > ℓ > 90 • .X CO tends to be high (log X CO ≳ 20.5) in the surroundings of molecular regions, while it decreases significantly (log X CO ≈ 19.5) in the inner parts of molecular clouds.
The X CO is enhanced in the surroundings of clouds because CO molecules are more effectively photo-dissociated there.The low CO abundance implies that W CO is small, and hence X CO becomes high.On the other hand, in the inner parts of the molecular clouds, the total column density is greater, reducing the photo-destruction rate of molecules due to both shielding by dust and by self-shielding, and the CO abundance increases.The enhanced CO abundance implies that W CO increases and thus X CO decreases.In the Galactic plane, X CO becomes large because N H 2 is maximum there and W CO saturates, due to the high optical depth of the J = 1-0 transition usually employed to determine W CO .
The observed behavior is consistent with past observations, and numerical simulations (Bolatto et al. 2013).Our results demonstrate that CO is a good tracer of N H 2 only for LOSs where CO is the dominant form of carbon, and the CO rotational line is not saturated.
We explore quantitatively the correlation of the CO observables (X CO , W CO ) with A V and N H 2 .Figure 5 shows the 2D probability density functions of all possible combinations of A V , X CO , W CO , and N H 2 .Measurements with N H 2 > 10 21 cm −2 and A V > 5.5 mag were derived by extrapolating our fitted equation, and hence no strong conclusions are made for this regime.The following paragraphs summarize the results of the comparison.
A v -X CO .The majority of points are clustered around X CO ≈ 10 20 cm −2 (K km s −1 ) −1 .The relation looks relatively flat in this regime, although there is a slight anticorrelation; this is more evident in the low-probability contours, which are concave upward.The anticorrelation is due to the presence of CO-dark H 2 gas in low-extinction regions (Seifried et al. 2020;Borchert et al. 2022).For log A V ≳ 1, X CO increases with A V because the CO line saturates.The critical extinction that marks the CO line saturation is log A V ≈ 0.6 ⇒ A V ≈ 4 mag, which is consistent with previous estimates toward the Perseus molecular cloud (Pineda et al. 2008;Lee et al. 2014).The local ISM conditions, such as the density and intensity of the background radiation field, can change the value of A V where CO saturates.For this reason, there is an extinction range where a nonlinear increase in X CO is observed (e.g., Lombardi et al. 2006).Similar behavior between X CO and A V is also observed in numerical simulations (Borchert et al. 2022).
A v -W CO .Most of the points lie in the range 0.3 ≲ log W CO (K km s −1 ) ≲ 1. W CO varies by orders of magnitude for log A V ≲ 1, while the dispersion in W CO decreases significantly for log A V ≥ 1.The scaling between A V and W CO transitions at log A V ≈ 0.6, and W CO converges to a maximum value W CO ≈ 100 K km s −1 .This convergence is due to the increase in the optical depth of the CO line, which becomes optically thick, and is consistent with previous observations (Pineda et al. 2008;Lee et al. 2014).
A v -N H2 .N H 2 increases nonlinearly with respect to A V for log A V ≲ 0, because H I is transformed to H 2 ; the transition takes place for log N H (cm −2 ) in the range ≈ 20.1-20.8(Savage et al. 1977;Gillmon et al. 2006;Bellomi et al. 2020).If we assume A161, page 6 of 20  Fig. 4. Mollweide projection of our constructed X CO map.Our global value for X CO is 2 × 10 20 K km s −1 cm −2 , which is consistent with previous estimates.X CO varies by orders of magnitude within a few tens of arcminutes.In the surroundings of molecular regions, X CO ∼10 21 cm −2 (K km s −1 ) −1 , while in the inner parts of clouds it decreases to X CO ∼5 × 10 19 cm −2 (K km s −1 ) −1 .In the Galactic center, X CO increases significantly due to the high optical depths in that region.The angular resolution is 16 ′ , and NSIDE=1024.A161, page 8 of 20 Skalidis, R., et al.: A&A, 682, A161 (2024) that A V /N H = 5.34 × 10 −22 cm 2 mag (Bohlin et al. 1978), then we obtain that −1.2 ≲ log A V (mag) ≲ −0.5, which is consistent with the extinction range where we observe the nonlinear increase in N H 2 .For log A V ≳ 0, the correlation between N H 2 and A V becomes quasi-linear.The positive correlation between A V and N H 2 reflects the fact that we see more gas when there is more dust and that the hydrogen is almost entirely molecular.We explore the dust-to-gas ratio in more detail in Sect.6.In agreement with previous studies (Liszt & Gerin 2023), we find that only a few sightlines have appreciable molecular gas content, log N H 2 (cm −2 ) ≳ 19 when log A V (mag) ≲ −0.5.N H2 -W CO .The observed behavior in this figure is similar to that of A V −W CO .For log N H 2 (cm −2 ) > 22, W CO converges to a maximum value because the line saturates.For log N H 2 (cm −2 ) < 21.5, the two quantities are weakly correlated: W CO varies by four orders of magnitude for 20 ≲ log N H 2 ≲ 21.The rapid increase in W CO as a function of N H 2 reflects the conversion of atomic carbon to CO and the resulting rise in fractional CO abundance.The most commonly found sightlines have W CO ≈ 10 K km s −1 and N H 2 ≈ 10 21 cm −2 , which explains why the global X CO is close to 10 20 cm −2 (K km s −1 ) −1 .
W CO -X CO .For log W CO ≲ 0.6, X CO is anticorrelated with W CO due to Eq. ( 1).For log W CO ≈ 0.6, the CO line saturates and X CO becomes independent of W CO , which converges to 100 K km s −1 .

Comparison to the N H 2 map of Kalberla et al. (2020)
Kalberla et al. (2020) used the nonlinear deviations in the E(B − V)−N H I relation to estimate N H 2 .Our method is similar to that of Kalberla et al. (2020), although with some key differences that we discuss below.

Summary of the Kalberla et al. (2020) method
H I emits at various Doppler-shifted frequencies (velocities).Each velocity component is considered to be a distinct cloud along the LOS.Kalberla et al. (2020) decomposed the H I emission spectra into distinct Gaussian components 3 .Each Gaussian is characterized by an amplitude and a dispersion: the amplitude is proportional to the total H I emitting gas, while the spread (σ u ) represents the internal velocity dispersion of the H I gas, consisting of a thermal and turbulent component.Kalberla et al. (2020) assigned an effective temperature (T D ) -in their paper this is referred to as the Doppler temperatureto each decomposed Gaussian component, where T D represents the total width of the Gaussian component.They calculated the dust extinction expected from each H I Gaussian component, by assuming a constant dust-to-gas ratio (Lenz et al. 2017).Then, they added the E(B − V) contribution from all Gaussian components to estimate the total dust extinction induced by dust mixed 3 The decomposition of the emission spectrum into Gaussians has been widely applied in the past (e.g., Miville-Deschênes et al. 2002, 2017;Heiles & Troland 2003a;Murray et al. 2014;Kalberla & Haud 2015, 2018).There are some uncertainties that can affect the decomposition of the emission spectra: (1) Contributions from non-Gaussian intensity peaks cannot be adequately captured.(2) Decomposing a spectrum into Gaussians is an ill-defined problem because different numbers of Gaussians can fit a spectrum equally well.(3) Distinct clouds that emit at frequencies with small Doppler shifts can appear as a single component in emission.Some of the aforementioned problems have been treated with the development of sophisticated decomposition techniques that require minimum user input (Lindner et al. 2015;Marchal et al. 2019;Riener et al. 2019).
with H I gas.They compared their estimated H I based extinction to the total extinction, using the map of Schlegel et al. (1998), and attributed the residuals to the presence of molecular gas.

Similarities and differences with our method
The similarities between our method and that of Kalberla et al. (2020) are the following: (1) both methods rely on the assumption of a universal dust-to-gas ratio and on the hypothesis that nonlinear deviations in the E(B − V)N H I relation trace the molecular gas content.(2) Both methods employ a mapping function that converts the E(B − V)N H I nonlinearities to N H 2 .
The major difference between our method and that of Kalberla et al. (2020) lies on how the mapping function was obtained.Kalberla et al. (2020) derived their mapping function by minimizing the extinction nonlinearities from the linear relation through bootstrapping.We derived our mapping function from independent data by using N H 2 measurements from UV spectra of background objects (Sect.2).In addition, Kalberla et al. (2020) fitted their calibration function in regions where CO is undetectable.In CO-emitting regions, they estimated N H 2 by assuming a constant X CO .On the other hand, our method does not require any assumption concerning X CO .Finally, we note that Kalberla et al. (2020) decomposed the H I components based on their spread (T D ), while we treated all the H I components uniformly.

Comparison of the maps
Figure 6 shows the N H 2 map of Kalberla et al. (2020) together with our map at 2 • resolution.We only include regions with log N H 2 (cm −2 ) ≥ 20, because below that threshold both methods might be susceptible to noise artifacts.
The N H 2 structures are more extended in the map of Kalberla et al. (2020) than ours.This difference could be due to degeneracies in the definition of T D , which consists of a nonthermal (turbulent) and thermal component.The method of Kalberla et al. (2020) employs T D as a proxy for the gas kinetic temperature, which is correlated with the molecular abundance.This approximation is accurate because the distribution of sonic Mach numbers in the diffuse ISM has a well-defined peak 4 (Heiles & Troland 2003b).Although T D represents the average gas temperature in the diffuse ISM, deviations toward individual regions are inevitably present.
We define the difference in the column densities of our map and the map of Kalberla et al. (2020) as Figure 7 shows a full-sky map of ∆ logN H 2 , calculated only in regions where both maps yield N H 2 ≥ 10 20 cm −2 .The majority of the pixels have ∆ log N H 2 < 0, while in the Galactic plane we find that ∆ log N H 2 > 0.
4 The sonic Mach number (M s ) is defined as M s = δu turb /c s , where δu turb , and c s correspond to the turbulent and thermal sound speed, respectively.The Doppler temperature is T D = 21.86 δu 2 (Payne et al. 1980), where δu represents the broadening of the H I emission line, which consists of a thermal and turbulent component, hence T D ∝ δu 2 turb + δu 2 thermal .In the diffuse ISM, the average sonic Mach number is ⟨M s ⟩ = 3.1 (Heiles & Troland 2003b).The above equations suggest that ⟨T D ⟩ ∝ 3.1 ⟨c 2 s ⟩ + ⟨δu 2 thermal ⟩.Both c s , and u thermal vary with the square root of the gas temperature, and thus ⟨T D ⟩ ∝ ⟨T ⟩.This implies that the average T D probes the average gas temperature in the diffuse ISM.In the absence of a characteristic sonic Mach number, the turbulent and thermal components of T D become inseparable.
A161, page 9 of 20  We focus on Galactic latitudes |b| ≥ 10 • , because typical extinctions in the Galactic plane exceed the A V range used for the calibration of the two methods.Figure 8 shows the distribution of ∆ log N H 2 of all pixels at |b| ≥ 10 • .The distribution peaks at -0.5, which implies that the N H 2 estimates in the map of Kalberla et al. (2020) are, on average, ∼3 times larger than ours.Their estimates are consistent within the uncertainties of our method (Sect.2.4).
The method of Kalberla et al. (2020) was calibrated in COdark H 2 regions, while in CO-bright regions they estimated N H 2 by assuming that X CO = 4 × 10 20 cm −2 (K km s −1 ) −1 .Although this value is consistent with the range of local X CO variations, 1.7-4.1 × 10 20 cm −2 (K km s −1 ) −1 , it is a factor of 2 larger than the Galactic average (Bolatto et al. 2013).If Kalberla et al. ( 2020) had adopted the global value of X CO , then our estimates would differ, on average, by less than a factor of 2.  8)) of our N H 2 estimates and those of Kalberla et al. (2020) for regions with |b| ≥ 10 • .Our N H 2 estimates are, on average, three times smaller than those inferred by Kalberla et al. (2020).This difference likely results in large part from the X CO value employed in the method of Kalberla et al. (2020).
comparison of the ∆ log N H 2 (Fig. 7) and the W CO map (Fig. 3, bottom panel) shows that the N H 2 estimates of the two methods agree well, ∆ log N H 2 ≈ 0, in regions where CO is undetected.

Characterization of the total gas properties of our Galaxy
Characterizing the total gas properties of our Galaxy is an important but challenging task.It is hard to constrain N H 2 , which is usually estimated using CO observations by assuming some global (and constant) X CO value.However, X CO varies by orders of magnitude between the various regions (Sect.4.1), and in addition a significant portion of the molecular gas lies in diffuse clouds that are largely devoid of CO (CO-dark H 2 ).Dust traces the total hydrogen column density, including atomic and molecular in either CO-dark or CO-bright form.For the N H 2 estimate, we employed the nonlinearities in the dust extinction (Sect.2).Thus, our N H 2 map (Fig. 3) probes the total H 2 column (CO-dark and CO-bright), which enables us to explore the various gas properties irrespective of the gas phase (atomic or molecular, CO-dark or CO-bright H 2 ).In the following sections, we use our constructed N H 2 map to explore the relative abundances between H I and H 2 gas, and we constrain the sky distribution of CO-dark and CO-bright H 2 .

Relative abundance of atomic and molecular gas
The molecular fractional abundance ( f H 2 ) is defined as When f H 2 → 0, then the total gas column is mostly in atomic form, while when f H 2 → 1 gas is 100% molecular.Some fraction of the dust is expected to be mixed with ionized hydrogen, which is omitted in Eq. ( 9).The contribution of H + in dust extinction, if any, is only expected in low-extinction regions (A V ≲ 0.093 mag, Liszt 2014a).However, this is below the extinction range where our method is applicable, which is log N H 2 ≥ 19 (Fig. 1) or log A V ≥ −0.5 (Fig. 2).Thus, we expect that our estimated f H 2 remains statistically unchanged even if some ionized hydrogen is present.We used the N H 2 values from the map we constructed (Fig. 3, upper panel) and the N H I data from the HI4PI survey to construct a full-sky f H 2 map (Fig. 9).We observe that close to the Galactic center (ℓ ≲ 60  2020) to derive the distances of Taurus and Orion, which are close to 150 and 400 pc, respectively.The distance estimates to the Polaris Flare are more uncertain, ranging from 100 pc up to 400 pc (Heithausen & Thaddeus 1990;Zagury et al. 1999;Brunt 2003;Schlafly et al. 2014;Panopoulou et al. 2016).The North Polar Spur is a gigantic radio loop and there is a debate regarding its distance.Some Galactic models suggest that this structure is a few kiloparsecs away from the Sun.However, recently Panopoulou et al. (2021) compared submillimeter and radio polarization data and found that the maximum distance of the North Polar Spur is approximately equal to 400 pc from the Sun.Altogether, all the aforementioned large-scale H 2 structures above the Galactic plane are relatively close to the Sun.
To be conservative, we assume that the maximum distance of the aforementioned molecular clouds is 500 pc.Assuming that b = 45 • , which is the maximum Galactic Latitude where molecular clouds are observed in Fig. 3, we obtain a vertical distance from the Galactic plane equal to 350 pc.This distance barely exceeds the outer edge of the Local Bubble (Pelgrims et al. 2020).Therefore, most of the H 2 gas of our Galaxy lies within a vertical distance of 350 pc of the Galactic plane.This is consistent with previous constraints of the H 2 scale height of our Galaxy (Heyer & Dame 2015;Marasco et al. 2017).
Figure 10 shows the distribution of the logarithm of f H 2 .The distribution is nearly symmetric with an average equal to ⟨ f H 2 ⟩ ≈ 25%, which is consistent with previous constraints derived with UV spectra of sparsely located point sources (Shull et al. 2021).From the cumulative distribution function of f shown in Fig. 11, we calculate that a large fraction (80%) of our Galaxy with N H 2 ≥ 10 20 cm −2 has f H 2 ≲ 40%.This indicates that either the majority of the molecular gas of our Galaxy lies in diffuse molecular clouds or that atomic clouds are more abundant than molecular clouds.Below we argue that the latter is more probable.
We display f H 2 versus A V in Fig. 12.For −0.5 ≲ log A V ≲ −0.2, log f H 2 increases from -2.0 (1% ) to -0.5 (30%), because the atomic gas transitions to molecular in this range of extinction, given an ambient radiation field of ∼1 Habing.The observed nonlinear increase in f H 2 , is similar to the predicted behavior of semi-analytical models of the H I → H 2 transition (Draine & Bertoldi 1996;Krumholz et al. 2008Krumholz et al. , 2009;;McKee & Krumholz 2010;Sternberg et al. 2014Sternberg et al. , 2024;;Bialy & Sternberg 2016).For −0.2 ≲ log A V ≲ 0.7, we observe that log f H 2 slightly decreases from -0.5 (30%) to -0.7 (20%).Although A V increases by a factor of 5, f H 2 decreases by a factor of 1.5.This suppression of f H 2 could be due to young stars that are illuminating the surrounding material and hence responsible for photo-destruction of the molecular gas (Madden et al. 2020).We consider this scenario unlikely because the decline of f H 2 starts at A V ≈ 1 mag, which is typical of translucent (non-star-forming) clouds, such as the Polaris Flare.
We conclude simply that there are more atomic than molecular clouds in most of the LOSs of our Galaxy, even where we observe molecular gas.The concatenation of atomic clouds increases N H I and consequently A V , but not N H 2 ; hence, the reduction in f H 2 .
This suppression of f H 2 is more prominent close to the Galactic plane: the majority of LOSs with log f H 2 ≈ −1 and log A V ≈ 0.7 are observed at |b| ≈ 5 • .However, even in regions at high Galactic latitudes (|b| > 40 • ), which cover a large portion of the sky, we observed a similar trend.For |b| > 40 • , there are several LOSs with log f H 2 ≈ −0.5 and log A V ≈ 0.
This suppression is expected from the following approximate calculation.The average number of atomic clouds per LOS, as indicated by H I emission line data, is close to three for |b| > 40 • (Panopoulou & Lenz 2020).On the other hand, the number of molecular clouds per LOS is a maximum two because: (1) molecular gas is associated with the H I cloud having an exceptionally large column, usually observed in H I emission data at low velocities measured with respect to the local standard of rest, and (2) molecular gas associated with H I clouds at intermediate local-standard-of-rest velocities tend to have lower column densities than clouds with lower velocities and only occasionally show any sign of H 2 (e.g., Lenz et al. 2015;Röhser et al. 2016a,b).Thus, we expect the ratio of the number of molecular clouds to atomic clouds to be a maximum of ≈0.67, implying that the high-Galactic-latitude sky has more atomic than molecular clouds.The majority of LOSs with log f H 2 ≈ 0 and log A V ≳ 0.7 are toward the Galactic midplane (ℓ ≲ 30 • and ℓ ≳ 330 Our conclusion about the decrease in f H 2 at high A V is also evident in Fig. 20 of Planck Collaboration XXIV (2011).These authors compare f H 2 and N H , where the N H 2 measurements used for the computation of f H 2 have been obtained from UV spectra (Rachford et al. 2002(Rachford et al. , 2009;;Gillmon et al. 2006;Wakker 2006).This suggests that despite the significant beam difference (16 ′ in our study, and pencil beams for the UV data), the observed trend is robust.The concatenation of multiple clouds along the LOS can significantly complicate observational studies of the H I → H 2 transition (Browning et al. 2003).

The properties of CO-dark and CO-bright H 2
There is a significant fraction of molecular gas that is not traced by CO.This CO-dark H 2 is observed in diffuse ISM clouds where the H I → H 2 transition is ongoing.In regions where the H I → H 2 transition is well advanced, carbon monoxide formation can also occur.Between the extinctions that characterize the predominance of H 2 and CO, carbon transitions from C + to C I. For typical ISM conditions, the onset of the H 2 formation takes place at A V ≲ 0.5 mag, the C + → C I transition at 1 ≲ A V ≲ 3 mag, while CO becomes the dominant carrier of carbon at A V ≥ 3 mag.During the transition from C + to C I, H 2 , whose electronic transitions due to photon absorption (Lyman-Werner lines) start at 11.18 eV, has already started forming but carbon is still in atomic form.Sufficient shielding is required to reduce the energy of incident photons below the CO photo-dissociation threshold (11.09 eV), hence allowing the buildup of the abundance of CO.Until this energy threshold is reached, the H 2 absorption lines are detectable but without any corresponding CO emission line.In this case, molecular hydrogen is primarily mixed with atomic carbon.
A161, page 13 of 20 Skalidis, R., et al.: A&A, 682, A161 (2024) The integrated intensity of the [C II] line (at 158 µm) is proportional to the gas density squared (Goldsmith et al. 2018) at the densities characteristic of diffuse and translucent clouds.Toward several clouds with untraceable CO, the observed [C II] intensity, even assuming all carbon is in the form of C + , is larger than that expected solely from C-H I collisions (Langer et al. 2010).The enhancement of the [C II] line indicates the presence of H 2 gas.
The above discussion makes it clear that our ability to trace CO-dark H 2 relies on the observational uncertainties in the CO line.Even in diffuse regions, there should be some CO emission, although very weak.Strictly speaking CO-dark H 2 is located in regions where the atomic carbon is more abundant than CO, or in terms of observables, when the [C II] or [C I] lines are stronger than CO.In this work, we adopt a detectability threshold of the CO line in order to define CO-dark H 2 .The detectability limit is determined by the noise in the CO data of Dame et al. (2001), which is 1 K km s −1 .
We adopt the definition that molecular gas is in CO-dark form when the following conditions are satisfied: (1) log N H 2 ≥ 19 cm −2 , and (2) W CO ≤ 1 K km s −1 .On the other hand, we define molecular gas that is traceable by CO, CO-bright H 2 , as: (1) log N H 2 ≥ 19 cm −2 and (2) Figure 13 shows our inferred CO-dark (upper panel), and CO-bright (lower panel) N H 2 maps.We observe that CO-dark H 2 extends to high Galactic latitudes (|b| ≈ 60 • ), and has generally lower column densities than CO-bright H 2 , because CO-dark H 2 is characteristic of regions having lower densities than CO-bright H 2 .CO-dark H 2 increases toward lower Galactic Latitudes, but in the Galactic plane almost all of the molecular gas is in CO-bright form due to the enhanced densities, and column densities there.CO-bright H 2 lies close to the Galactic plane and only a few CO-emitting clouds can be seen at higher latitudes (|b| ≈ 30 • ).
We estimate that CO-dark H 2 covers ∼17% of the total sky area, while CO-bright H 2 ∼9% of the sky.These values, however, should be considered with some caution because the W CO map of Dame et al. (2001) is considered to be complete only for |b| ≤ 32 • , as indicated by the comparison between the CO intensities with far-infrared and H I data5 .Thus, the W CO map may miss some CO emission at high Galactic latitudes (|b| > 32 • ), although we do not expect the emission at such high b to be significant.Our argument is supported by the CO survey of Dame & Thaddeus (2022), which is complete for the entire northern sky, and shows that very few LOSs have detectable CO emission -assuming the sensitivity of the Dame et al. (2001) survey -at |b| > 40 • .Therefore, our inferred fractional sky coverage of the CO-dark, and CO-bright H 2 components should be considered as upper and lower limits, respectively.Finally, we find that 66% of the sightlines with log N H 2 (cm −2 ) ≥ 19 are in CO-dark form, while the remaining 34% of sightlines are in CO-bright form.If in the definition of CO-dark H 2 we increase the threshold to log N H 2 (cm −2 ) ≥ 20, we obtain that 57% of the LOSs are in CO-dark and 43% in CO-bright form.

Dust-to-gas ratio
N H /E(B − V) is a key ratio for ISM studies.Existing constraints on the Galactic average N H /E(B − V) vary significantly (Liszt & Gerin 2023).The constraint of Bohlin et al. (1978), N H /E(B − V) = 5.8 × 10 21 cm −2 mag −1 , has been the gold standard for several decades.But, several recent works (Lenz et al. 2017;Planck Collaboration XI 2014;Shull et al. 2021;Liszt & Gerin 2023;Liszt 2014a;Gillmon et al. 2006;Rachford et al. 2009) have found larger values, although Kalberla et al. (2020) quote a smaller value for the Galactic average of N H /E(B − V).However, when Kalberla et al. (2020) excluded regions close to the Galactic plane, they found a larger value than Bohlin et al. (1978).
The only direct way to constrain N H , and hence the dustto-gas ratio, is by the use of spectra of background objects (e.g., Bohlin et al. 1978;Sheffer et al. 2008;Gillmon et al. 2006;Rachford et al. 2009;Shull et al. 2021).However, obtaining spectra is time consuming and only a limited number of LOSs can be targeted, usually toward relatively bright stars and quasars.The sky is sparsely sampled, which implies that the N H /E(B − V) estimates could be biased as spectroscopic surveys usually focus on the Galactic plane.Indirect methods for constraining the dust-to-gas ratio require some assumptions and approximations, but they have the advantage of uniformly sampling the Galaxy (Paradis et al. 2012;Planck Collaboration XIX 2011;Planck Collaboration XI 2014;Kalberla et al. 2020).
In Fig. 14, we present our N H /E(B − V) constraint and compare it to previous work.The figure shows the total hydrogen column density, calculated as N H = N H I + 2N H 2 , including data covering the full celestial sphere; the data employed are the N H I full-sky map from HI4PI Collaboration (2016), and our constructed N H 2 map (Fig. 3).The colored dashed lines correspond to previous constraints.
All the depicted lines are consistent, within variations, with our data.However, the green and red lines, which correspond to 7.0 × 10 21 cm −2 mag −1 (Planck Collaboration XI 2014) and 5.8 × 10 21 cm −2 mag −1 (Bohlin et al. 1978), respectively, agree more closely with the mean behavior of our data.
We can approximate, to first order, the N H versus E(B − V) relation with a line.But, a closer inspection to Fig. 14 suggests that some nonlinearities arise.For E(B − V) ≤ 0.1 mag, the average behavior of our data agrees well with the green line describing the results of Planck Collaboration XI (2014), while for E(B − V) > 0.1 mag the red profile of Bohlin et al. (1978) seems to be a better fit.This indicates that at larger E(B − V) the dust-to-gas ratio decreases, which could happen due to dust coagulation.
W H I ≤ 300 K km s −1 and nonlinear for W H I > 300 K km s −1 .Assuming that the H I line is optically thin the H I column density is given by N H I = 1.823 ×10 18 W H I (Draine 2011), yielding a transition at N H I ≈ 5 × 10 20 cm −2 .Boulanger et al. (1996) argued that the nonlinear excess between dust intensity and W H I is due to the presence of molecular gas that is only probed by dust intensity, which traces both the atomic and molecular gas phase, while the 21 cm emission line probes only the atomic phase.Their argument was based on the results of Savage et al. (1977), who found, using the H 2 absorption lines, that the molecular gas fraction rapidly increases when N H I ≈ 5 × 10 20 cm −2 .An alternative explanation of the nonlinear relationship between dust intensity and W H I for W H I > 300 K km s −1 is that at some point the emission line becomes optical thick and W H I misses a significant faction of the H I due to self-absorption, and reduction of the observed intensity (Burton et al. 1992).
The same debate is found in more recent papers.Fukui et al. (2014Fukui et al. ( , 2015) ) explored the correlation between W H I and the dust opacity (τ 353 ), but they also added one more free parameter in the analysis: the dust temperature (T d ).The dust temperature is anticorrelated with the total gas column.Warm dust traces the outer regions of ISM clouds, which are primarily atomic, while cold dust traces the inner, primarily molecular, regions of the clouds.Fukui et al. (2015) showed that the correlation between W H I and τ 353 is linear when T d > 20 K.When dust is cold (T d < 20 K), τ 353 increases nonlinearly with respect to W H I .Fukui et al. (2015) argued that this nonlinear excess of τ 353 with respect to W H I is consistent with the presence of optically thick H I gas, and although some H 2 gas might be present, it is less abundant than H I. At sufficiently high column densities, the increase in τ 353 would be dominated by the formation of the molecular gas, which is traceable by the CO emission line.However, Fukui et al. (2015) suggest that the excess of τ 353 at low W H I is not due to CO-dark H 2 , but due to optically thick H I. Murray et al. (2018) addressed the question of whether the H I optical depth effects can explain the excess of τ 353 when W H I is high.They compiled a large catalogue of τ H I absorption data, which provide the only direct way to estimate τ H I .Their sample consisted of archival data toward 151 LOSs that are covered within the Galactic Arecibo L-band Feed Array (GALFA) H I survey footprint.Following the analysis of Fukui et al. (2015), Murray et al. (2018) found that as τ H I increases, the H I gas that is missed by the 21 cm emission line is less than a factor of 1.3; similar results have been found by Liszt (2019).This means that the increase in τ H I cannot adequately explain the nonlinear relation between W H I and τ 353 .
We found that nonlinear deviations in the extinction with respect to N H I correlate well with N H 2 .Our findings (Fig. 1) support the claims of Boulanger et al. (1996), andMurray et al. (2018) -that extinction residuals are induced by molecular hydrogen.However, given the spread in our obtained A V versus N H 2 relation (Fig. 1), some, but not dominant, contribution of optically thick H I gas is also plausible.

Variations in X CO
The X CO factor is an important observable for estimating the molecular gas content of molecular clouds and entire galaxies (Lee et al. 2014;Sun et al. 2018Sun et al. , 2020)).X CO is expected to vary as a function of the local ISM properties, such as metallicity, density, and intensity of the UV background field (Gong et al. 2017(Gong et al. , 2018)).Observations of local ISM clouds suggest that the X CO factor is relatively constant, with variations not exceeding a factor of 2 (Bolatto et al. 2013;Liszt & Gerin 2016).
The constancy of X CO indicates that ISM clouds in our Galaxy have generally similar properties, such as temperature and surface densities.This uniformity has important consequences regarding the nature of ISM clouds.For example, it explains the relation between nonthermal velocity linewidths and cloud size -known as Larson's relation (Larson 1981) -as the outcome of clouds with magnetically supported boundaries (Mouschovias & Psaltis 1995).A plethora of theoretical works address the question of why ISM clouds in the Milky Way are so similar (e.g., Narayanan & Hopkins 2013;Clark & Glover 2015).
We have found that CO-emitting regions are most probably observed when 20.5 ≲ log N H 2 (cm −2 ) ≲ 21, and 3 ≲ W CO ≲ 8 K km s −1 (Fig. 5).Our results thus suggest, in agreement with previous studies, that our Galaxy favors clouds with relatively uniform CO and surface density properties.This is also supported by our inferred f H 2 distribution (Fig. 10), which is nearly symmetric and narrow; the 1 σ of the distribution is 0.15-0.43.However, in individual molecular clouds (on angular scales of a few arcminutes), X CO can vary by an order of magnitude from the outer to the inner parts, as atomic gas transitions to fully molecular form.

Molecular hydrogen column densities as a function of the galactocentric distance
The Galactic plane contains most of the molecular gas of our Galaxy.Several authors employed data from the CO survey of Dame et al. (2001) to model the H 2 surface as a function of distance from the Galactic center, but omitted any contribution from CO-dark H 2 (Bronfman et al. 1988;Wouterloot et al. 1990;Nakanishi & Sofue 2006, 2016;Marasco et al. 2017;Miville-Deschênes et al. 2017).
A common conclusion from these works is that the H 2 surface density decreases nonlinearly at large galactocentric distances, although the different studies differ on the exact cutoff distance.It would be interesting to see how the effect of the COdark H 2 could affect the relative abundance between the H I and H 2 surface densities as a function of galactocentric distance.The A161, page 16 of 20 Skalidis, R., et al.: A&A, 682, A161 (2024) presence of CO-dark H 2 could yield more extended H 2 surface density profiles, as shown by Pineda et al. (2013) who estimated that the H 2 gas in CO-dark form corresponds to ∼20% of the total molecular abundance at 4 kpc and ∼80% at 10 kpc.
Our f H 2 map (Fig. 9) shows that in the Galactic plane gas is mostly molecular for ℓ ≲ 60 • and ℓ ≳ 300 • , while it becomes fully atomic for 90 • ≲ ℓ ≲ 270 • .Despite the presence of COdark H 2 , the overall contribution of molecular gas at large Galactic latitudes is minimal; the abundance of CO-dark H 2 becomes maximum at 150 • ≲ ℓ ≲ 210 • , but there f H 2 is only 10%.Thus, the CO-dark H 2 abundance seems to increase with galactocentric distance, but this only affects the exact distance where H I starts becoming more abundant than H 2 .

CO-dark H 2 estimates in the solar neighborhood
Existing constraints on the relative abundance of CO-dark H 2 in relation to total gas imply that a significant amount of molecular gas exists in CO-dark form (Planck Collaboration XIX 2011;Paradis et al. 2012;Kalberla et al. 2020).As stated by Li et al. (2018), the derived abundances of CO-dark H 2 depend on the detectability threshold of the CO line.This was demonstrated by Donate & Magnani (2017) who performed a high-sensitivity CO survey toward a region with existing, lower sensitivity CO data.The (new) high-sensitivity survey showed that the relative abundance of CO-dark with respect to the total H 2 gas was approximately two times smaller than what had been inferred by the (previous) low-sensitivity CO survey.That said, the definition of CO-dark H 2 is vague, making the comparison of the various results in the literature challenging.
Our definition of CO-dark and CO-bright H 2 components differs from several past works (e.g., Paradis et al. 2012;Kalberla et al. 2020).In these works, the decomposition is based on the models of photo-dissociation regions (Wolfire et al. 2010;Velusamy et al. 2010).In photo-dissociation-region models, molecular gas phases are organized into layers with CO-dark corresponding to a diffuse, and hot layer that surrounds the dense, and cold CO-bright H 2 layer.
We define CO-dark H 2 , molecular hydrogen in LOSs devoid of CO.Using this definition, CO-dark and CO-bright H 2 never coexist (Fig. 13).However, our definition allows us to emphasize that a significant fraction of the molecular sky (∼60%) is untraceable in CO maps, when a detectability threshold equal to 1 K km s −1 is employed (Sect.5.2).Our estimates should be treated as upper limits, given the incompleteness of the Dame et al. (2001) CO survey at high Galactic latitudes.However, we do not expect significant variations from these estimates because CO emission is minimal in that portion of the Milky Way.

Conclusions
Molecular hydrogen is the most abundant molecule in the ISM, but it is hard to directly observe it and infer its column density.Several indirect methods have been used to probe N H 2 , usually by observing the CO emission lines and employing a CO-H 2 conversion factor (X CO ; Eq. ( 1)).However, CO misses the diffuse molecular hydrogen that lies in translucent clouds, where densities are not sufficient to allow a rapid formation of CO and column densities are insufficient to provide effective shielding against photo-dissociation.The result is CO-dark H 2 .We present a new indirect method for estimating N H 2 using N H I and dust extinction data.
We show that the nonlinear increments of dust extinction with respect to N H I , A V (H 2 ), observed for N H I ≥ 3 × 10 20 cm −2 (Lenz et al. 2017), are due to the presence of molecular hydrogen.We measure the implied extinction due to molecular hydrogen, A V (H 2 ), using publicly available extinction maps (Green et al. 2019), toward LOSs with direct N H 2 measurements that have been obtained with spectroscopic data.We show that A V (H 2 ) correlates with N H 2 (Fig. 1).We obtain a best-fit model for the A V (H 2 )-N H 2 relation (Eq.( 7)).The fitted relation is the basis of our methodology for estimating N H 2 .
We employed the following assumptions, which are motivated by the literature: (1) N H I /E(B − V) = 8.8 × 10 21 cm −2 mag −1 (Lenz et al. 2017) and (2) R V = 3.1 (Cardelli et al. 1989;Schlafly et al. 2014).We used the fitted relation (Eq.( 7)) to construct a full-sky N H 2 map at 16 ′ resolution (top panel in Fig. 3); our N H 2 map does not employ CO observations and hence traces both the CO-dark (diffuse) and CO-bright H 2 .The accuracy of the obtained N H 2 estimates is better than a factor of 2, 3, and 5, with a probability of 68%, 95%, and 98%, respectively.
We compared our inferred N H 2 estimates with those of Kalberla et al. (2020) and find good agreement (Sect.4.2); there is some discrepancy, which can be explained by the assumptions of the employed methods.We also compared our N H 2 map with the CO total intensity obtained from the composite survey of Dame et al. (2001) (our Sect. 4.1).We constructed a map of X CO , which is the ratio between N H 2 and W CO (Fig. 4).We estimate that the Galactic value of X CO is approximately equal to 2 × 10 20 cm −2 (K km s −1 ) −1 , which is consistent with previous constraints (Bolatto et al. 2013;Lada & Dame 2020).However, we find that X CO can vary by orders of magnitude on angular scales of a few arcminutes.
In the diffuse (peripheral) parts of ISM clouds, X CO > 10 21 cm −2 (K km s −1 ) −1 because a large portion of the molecular hydrogen there is not traced by CO, hence the large conversion factor.On the other hand, in the inner (denser) parts of the cloud, where CO is well mixed with molecular hydrogen, X CO decreases to ∼5 × 10 19 cm −2 (K km s −1 ) −1 .Toward LOSs with A V ≳ 10 mag -mostly encountered in the Galactic plane -the CO line saturates and X CO > 10 21 cm −2 (K km s −1 ) −1 (Fig. 5).
We used our N H 2 map and archival N H I data to construct a map (Fig. 9) of the molecular fractional abundance ( f H 2 ), which is the ratio between N H 2 and the total gas column (Eq.9).For our Galaxy, we find an average f H 2 equal to 25% (Fig. 11), which is consistent with previous estimates (Bellomi et al. 2020;Shull et al. 2021).In the Galactic plane (b = 0 • ), for 90 • ≲ ℓ ≲ 270 • , f H 2 decreases abruptly from unity to 0.1.This agrees with previous constraints of the molecular gas distribution of our Galaxy (Miville-Deschênes et al. 2017;Marasco et al. 2017).We find that f H 2 approaches unity, implying that gas is fully molecular, only toward 3% of the LOSs with N H 2 ≥ 10 20 cm −2 .A large fraction of the sky (∼66%) with N H 2 ≥ 10 19 cm −2 is undetected in CO maps when the sensitivity of the CO data is W CO = 1 K km s −1 (Sect.5.2).
We explored the correlation between N H and E(B − V) (Sect.6).Our data favor a N H /E(B − V) ratio close to 7 × 10 21 cm −2 mag −1 for E(B − V) ≲ 0.1 mag, consistent with the constraint of Planck Collaboration XI (2014).For E(B − V) ≳ 0.1 mag, our data fit better with the constraint of Bohlin et al. (1978), N H /E(B − V) ≈ 5.8 × 10 21 cm −2 mag −1 .All the constructed maps are made publicly available through the Strasbourg astronomical Data Center.

Fig. 3 .
Fig. 3. Comparison between N H 2 and W CO maps.Top panel: Mollweide projection of our N H 2 map.The dark blue color indicates regions with 19 ≲ log N H 2 (cm −2 ) ≲ 20.Bottom panel: W CO map fromDame et al. (2001).Our N H 2 map was constructed by calculating A V (H 2 ), using E(B − V) and N H I data, and then converting to N H 2 using the relation given in Eq. (7).The N H 2 map shows several relatively large angular scale structures.As these are CO-dark H 2 regions, they are undetectable in the W CO map.(Sect.5.2).In both maps, longitude increases to the left from the central (vertical) line at ℓ = 0 • , the grid spacing, shown by the white solid lines, is 30 • , the angular resolution is 16 ′ , and the HEALPix parameter NSIDE = 1024.

Fig. 6 .
Fig.6.N H 2 map constructed using the method presented here (upper panel) and using that inKalberla et al. (2020; lower panel).The resolution of both maps is 2 • and NSIDE = 1024.

Fig. 9 .
Fig.9.All-sky image of the logarithm of the fractional H 2 abundance, f H 2 .The mean value of log f H 2 is equal to -0.6, with standard deviation equal to 0.23.
• and ℓ ≳ 300 • , b = 0 • ), gas tends to be fully molecular.On the other hand, f H 2 dramatically decreases in the outer parts of the Galactic plane (300 • ≳ ℓ ≳ 60 • , b = 0 • ), meaning that the relative abundance of H I increases.The f H 2 full-sky map also shows several large-scale structures above the Galactic plane with enhanced molecular abundances (log f H 2 ≳ 0.5 ): (1) the Taurus molecular cloud at ℓ, b ∼ 170 • − 15 • , (2) the Polaris Flare at ℓ, b ∼ 125 • , 30 • , (3) the North Polar Spur at ℓ, b ∼ 27 • , 10 • , and (4) Orion at ℓ, b ∼ 170 • , −15 • .Below we argue that the proximity of these molecular structures imply that the scale height of N H 2 of our Galaxy should be less than 300 pc.Zucker et al. (2021) used the 3D extinction map of Leike et al. (

Fig. 13 .
Fig. 13.Comparison of the CO-dark and CO-bright N H 2 maps.Upper panel: full-sky CO-dark N H 2 map, showing sightlines with N H 2 ≥ 10 19 cm −2but devoid of CO (W CO ≤ 1 K km s −1 ).Bottom panel: CO-bright N H 2 map, corresponding to N H 2 ≥ 10 19 cm −2 and W CO > 1 K km s −1 .The CO-dark H 2 spans an area approximately equal to 17% of the total sky.This is a factor of 2 larger than that of the CO-bright H 2 , and extends up to high Galactic Latitude.On the other hand, CO-bright H 2 is mostly concentrated toward the Galactic plane.Moving away from the Galactic plane, we observe that H 2 transitions from CO-bright to CO-dark form at |b| ≈ 5 deg.Although we still observe a few CO-bright H 2 regions at higher latitudes above the Galactic plane, most of the molecular gas is in the CO-dark form there.