Atomic and molecular gas properties during cloud formation

Molecular clouds, which harbor the birthplaces of stars, form out of the atomic phase of the interstellar medium (ISM). We aim to characterize the atomic and molecular phases of the ISM and set their physical properties into the context of cloud formation processes. We studied the cold neutral medium (CNM) by means of $\rm HI$ self-absorption (HISA) toward the giant molecular filament GMF20.0-17.9 and compared our results with molecular gas traced by $^{13}\rm CO$ emission. We fitted baselines of HISA features to $\rm HI$ emission spectra using first and second order polynomial functions. The CNM identified by this method spatially correlates with the morphology of the molecular gas toward the western region. However, no spatial correlation between HISA and $^{13}\rm CO$ is evident toward the eastern part of the filament. The distribution of HISA peak velocities and line widths agrees well with $^{13}\rm CO$ within the whole filament. The column density probability density functions (N-PDFs) of HISA (CNM) and $\rm HI$ emission (tracing both the CNM and the warm neutral medium, WNM) have a log-normal shape for all parts of the filament, indicative of turbulent motions as the main driver for these structures. The $\rm H_2$ N-PDFs show a broad log-normal distribution with a power-law tail suggesting the onset of gravitational contraction. The saturation of $\rm HI$ column density is observed at $\sim$25$\rm\,M_{\odot}\,pc^{-2}$. We conjecture that different evolutionary stages are evident within the filament. In the eastern region, we witness the onset of molecular cloud formation out of the atomic gas reservoir while the western part is more evolved, as it reveals pronounced $\rm H_2$ column density peaks and signs of active star formation.


Introduction
Molecular clouds play a key role in star formation processes. Stars are born in the dense interiors of molecular clouds that form out of the atomic phase of the highly turbulent interstellar medium (ISM) (Larson 1981;Clark et al. 2012;Dobbs et al. 2014;Sternberg et al. 2014;Klessen & Glover 2016). Molecular clouds consist mainly of molecular hydrogen (Larson 2003;Mac Low & Klessen 2004;McKee & Ostriker 2007;Dobbs et al. 2014). However, the cloud formation process out of the diffuse atomic phase is still not well constrained. According to the standard photodissociation region (PDR) model, layers of cold atomic hydrogen can effectively shield the cloud from photodissociating UV radiation at sufficiently high densities, allowing a more complete conversion of H i to its molecular form. The cold neutral medium (CNM) with temperatures of ≤ 300 K and volume densities of 10 − 100 cm −3 (McKee & Ostriker 1977;Heiles & Troland 2003;Wolfire et al. 2003;Kalberla & Kerp 2009) is thought, due to its relatively high density, to be a key component in the conversion process from diffuse atomic hydrogen to its molecular phase. Constraining the physical and dynamical properties of the CNM is therefore crucial to understand early cloud formation processes.
The CNM is a major constituent of the ISM (see e.g., Ferrière 2001; Heiles & Troland 2003). Even though the observation of the H i 21cm line allows one to study the properties of atomic hydrogen in general, it is difficult to attribute certain properties to different components of H i. In pressure equilib-Article number, page 1 of 21 arXiv:2008.13502v1 [astro-ph.GA] 31 Aug 2020 A&A proofs: manuscript no. main rium, atomic hydrogen can exist in different phases (e.g., Mc-Kee & Ostriker 1977;Wolfire et al. 2003). Observations of H i 21cm line emission are generally attributed to both warm neutral medium (WNM) and CNM. To separate the WNM from the CNM, we make use of the presence of H i self-absorption (HISA; see e.g., Riegel & Crutcher 1972;Knapp 1974;van der Werf et al. 1988;Feldt 1993;Gibson et al. 2000Gibson et al. , 2005aKavars et al. 2003;Dénes et al. 2018;Wang et al. 2020b) to trace the cold atomic phase. H i self-absorption is found throughout the Milky Way in various environments. Many studies have focused on the detection of HISA, first detected in 1954 (Heeschen 1954(Heeschen , 1955, in known sources, but statistical treatments of the kinematic properties and densities of the CNM in large-scale highresolution maps are still rare. For HISA to be detected, sufficient background emission of warmer gas along the line of sight is required. Since the warm component of atomic hydrogen is more diffuse, it fills up a larger volume than the cold component (McKee & Ostriker 1977;Stahler & Palla 2005;Kalberla & Kerp 2009). H i self-absorption occurs when a cold H i cloud is located in front of a warmer H i emitting cloud. Self-absorption can occur within the same cloud but can also be induced by an emitting cloud in the far background that has the same velocity as the absorbing medium with respect to the local standard of rest v LSR . Therefore, the clouds do not have to be spatially associated for HISA to be observable. While absorption against strong continuum sources does yield a direct measurement of the optical depth, the discreteness of the sources only delivers an incomplete grid of optical depth measurements (e.g., Wang et al. 2020a). The interpolation of optical depths across an entire H i cloud is challenging. Therefore, the great advantage of HISA is that larger areas of cold atomic hydrogen can be mapped.
Large filamentary gas structures, also known as Giant Molecular Filaments (GMFs), are suitable to study the CNM on large scales. These objects are the largest coherent structures found in the Milky Way and are subject of many studies probing the physical properties of the Galactic ISM (Jackson et al. 2010;Goodman et al. 2014;Ragan et al. 2014;Zucker et al. 2015Zucker et al. , 2018Abreu-Vicente et al. 2016). We study the hydrogen content by means of HISA, atomic and molecular line emission toward the giant molecular filament GMF20.0-17.9 (Ragan et al. 2014). We address the physical processes driving the kinematics of the CNM and the properties that lead to molecular cloud formation. GMF20.0-17.9 was already identified in part by Tackenberg et al. (2013). Furthermore, Zucker et al. (2015Zucker et al. ( , 2018) define a subsection of this filament as a "bone" of the Scutum-Centaurus (SC) spiral arm. GMF20.0-17.9 is characterized by grouping several infrared dark clouds (IRDCs) into a single structure that is velocity-coherent as traced by 13 CO emission. Figure 1 shows an overview of GMF20.0-17.9. Prominent IRDC features along the 13 CO emission are visible in the Spitzer 8 µm image, in particular toward the western part of the filament. It furthermore shows features of stellar activity. GMF20.0-17.9 extends from 20.2 • to 17.6 • in Galactic longitude and +0.3 • to −0.7 • in Galactic latitude. At the computed kinematic near distance of 3.3-3.7 kpc, this corresponds to a projected length of ∼170 pc. Ragan et al. (2014) associate the velocity range of 37 − 50 km s −1 with GMF20.0-17.9. The filament is near the midplane of the Galaxy, and the velocity of the lower longitude part at ∼18 • agrees fairly well with that of the near SC spiral arm (Vallée 2008;Reid et al. 2014Reid et al. , 2019. However, the sense of the velocity gradient of GMF20.0-17.9 as defined by Ragan et al. (2014) goes against the trend of the spiral arm structure. Zucker et al. (2015) argue that the bone at 19.2 • 18.6 • , b ≈ −0.1 • traces the spine of the SC spiral arm well. This discrepancy is attributed to the different methodology for defining filaments and can be brought into agreement if only the lower longitude section of GMF20.0-17.9 is considered. The ATLASGAL survey (Schuller et al. 2009) reveals several high-density clumps within GMF20.0-17.9, particularly in the western part of the filament. Zhang et al. (2019) identified young stellar object (YSO) populations within all currently known GMFs and derive a star formation rate (SFR) of SFR = 1.2 · 10 3 M Myr −1 and efficiency (SFE) of SFE = 0.01 for GMF20.0-17.9, which is consistent with SFEs of nearby starforming regions (see Zhang et al. 2019, and references therein).

H i 21 cm line and continuum
The following analysis employed the H i and 1.4 GHz continuum data from the THOR survey (The H i/OH Recombination line survey of the inner Milky Way; Beuther et al. 2016;Wang et al. 2020a). The H i and 1.4 GHz continuum data include observations from the Karl G. Jansky Very Large Array (VLA) in both C-and D-configuration as well as single-dish observations from the Green Bank Telescope (GBT) and Effelsberg, respectively, to recover missing flux on short uv spacings. Depending on the purpose of the analysis, different data products were utilized. For the analysis of H i emission and the subsequent identification of HISA features, the combined THOR H i data (VLA C+D + GBT) without continuum were used. The final data have been smoothed to an angular resolution of ∆Θ = 40 for better brightness sensitivity that is required especially for studying HISA. The rms noise in emission-free channels is ∼5 K. The spectral resolution is ∆v = 1.5 km s −1 . The final THOR 1.4 GHz continuum emission data (VLA C+D + Effelsberg) have an angular resolution of ∆Θ = 25 .
Additionally, optical depths were derived from H i absorption against strong continuum sources. For that purpose, THORonly data that comprise H i emission with continuum were used. THOR C-array-only data have a higher angular resolution, making them suitable to study absorption against discrete continuum sources. Since these data consist of observations from the VLA in C-array configuration only, large-scale H i emission is effectively filtered out. The THOR-only data have an angular resolution of ∆Θ ∼ 20 , depending slightly on Galactic longitude. For more details about the THOR data, we refer to the two data release papers by Beuther et al. (2016) and Wang et al. (2020a).
We used the Galactic Ring Survey 13 CO(1-0) data (GRS; Jackson et al. 2006) to investigate the kinematic properties of the molecular gas and estimate the 13 CO and H 2 column density. The GRS 13 CO data have an angular and spectral resolution of ∆Θ = 46 and ∆v = 0.21 km s −1 , respectively.

H i self-absorption (HISA) extraction
In the following section, different methods to identify and extract HISA spectra from the H i emission are discussed. Several approaches have been tested as the accurate extraction of HISA spectra poses a challenging task.
The random motion of individual H i clouds, superposed on the Galactic rotation, contributes significantly to the broadening of the observed 21cm emission and creates multiple emission peaks as seen in Fig. 2 take local 13 CO emission peaks as a reference point. We thus identify HISA by constraining these features kinematically. The 13 CO emission peaks at different velocities are not associated with GMF20.0-17.9 as their velocities are attributed to neighboring spiral arm structures (e.g., Vallée 2008;Reid et al. 2014Reid et al. , 2019. For the analysis of the physical properties of HISA, we followed the derivation by Gibson et al. (2000) and Wang et al. (2020b). A comprehensive discussion of the radiative transfer of HISA clouds is given in Gibson et al. (2000), Kavars et al. (2003), and Li & Goldsmith (2003). Adopting the geometric model from Gibson et al. (2000), we identify four different cloud components when looking toward a HISA cloud, which we describe below.
According to this model (see Fig. 2 in Wang et al. 2020b), we observe emitting foreground and background clouds that have spin temperatures of T fg and T bg , respectively. Between these clouds a cold absorbing HISA cloud can be located, with a spin temperature of T HISA . Diffuse continuum emission, T cont , is as-sumed to be in the background. Strong continuum point sources will be neglected as they contaminate the absorption features that are caused by HISA.
By comparing an "on" spectrum, where a HISA cloud is located along the line of sight, with the "off" spectrum that we would observe in the absence of the HISA cloud, we can derive the optical depth of the HISA component (see e.g., Eq. (6) in Gibson et al. 2000), defined as with the dimensionless parameter p ≡ T bg (1 − e −τ bg ) /T off describing the fraction of background emission in the optically thin limit (Feldt 1993). Assuming a HISA spin temperature T s (= T HISA ), we can then calculate the H i column density of the cold H i gas using the general form (Wilson et al. 2013) where T s is the spin temperature of atomic hydrogen and τ (T s , v) describes the optical depth.
The dashed red curve is a second-order polynomial fit (T off ) to the absorption-free channels of the H i spectrum at 33.5-43.0 km s −1 and 56.0-65.5 km s −1 (see Sect. 2.2). We estimated the HISA spectrum by then subtracting the H i spectra from the fitted background emission. The GRS 13 CO spectrum (Jackson et al. 2006) covering velocities from −5 km s −1 to 135 km s −1 is shown in blue and has been multiplied by a factor of ten for better visibility.
To reliably identify HISA features, it is crucial to know the emission in the absence of a HISA cloud. Many methods have been tested to estimate T off in Eq. (1) (e.g., Gibson et al. 2000;Kavars et al. 2003;Li & Goldsmith 2003;Krčo et al. 2008;Wang et al. 2020b). Wang et al. (2020b) tested estimating the background spectrum T off by measuring several off positions offset from apparent absorption features at slightly shifted lines of sight. Their spectra partly show large variations depending on the line of sight, so the assumption that the H i background emission stays spatially constant does not really hold. Therefore, we refrain from selecting actual off positions to estimate T off . Instead, we estimate T off for each line of sight by fitting the baselines of absorption features with polynomial functions. The fits reconstruct an off spectrum as if there was no absorption present. Different studies have been conducted successfully by applying polynomial fitting procedures (Kavars et al. 2003;Li & Goldsmith 2003;Wang et al. 2020b).
We extensively tested various methods to find an independent and systematic fitting procedure. Fitting the baselines with first and second order polynomial functions yielded the most robust results as these functions are not sensitive to small-scale fluctuations along the spectral axis. We therefore rebinned the spectral axis of H i emission by a factor of two, which gave the best results for reconstructing T off , independent of the chosen velocities at which the baselines were fitted. Higher-order polynomial functions are prone to either over-or underestimating the background spectrum. As outlined below, we utilized a combination of first and second order polynomials in order to fit the baselines of HISA spectra. For the baseline fitting, we furthermore smoothed the H i emission maps spatially to an angular resolution of ∆Θ = 80 to enhance the brightness sensitivity.
Irrespective of the actual presence of 13 CO emission at individual positions, every pixel spectrum is searched for HISA and fitted at the velocities 33.5 − 43.0 km s −1 and 56.0 − 65.5 km s −1 , omitting the velocity range associated with GMF20.0-17.9. In the first cycle of the fitting procedure, all spectra are fitted with second order polynomial functions ( f (x) = ax 2 + bx + c). Spectra that are contaminated by continuum emission produce bad second order polynomial fits, with a > 0. For those spectra, we used first order fits instead. Figure 3 presents a comparison between first and second order polynomial fits toward a position that is contaminated by diffuse continuum emission, which can contribute to the broadening of the absorption profile.   Figure 4 shows our baseline fitting procedure and extracted HISA spectra from example regions of GMF20.0-17.9. The example regions have been selected based on the visual inspection of the H i and 13 CO emission maps. The spectra show the case of HISA with strong, weak, and no molecular counterparts as well as no HISA at all. The final HISA maps were inferred by subtracting the native THOR H i emission with a spatial and spectral resolution of 40 and 1.5 km s −1 , respectively, from the fitted baselines. The rms noise of the extracted HISA spectra is ∼8 K and arises from the noise of the observations and the uncertainty of the fitting procedure. We discuss these uncertainties in Appendix A.
Using this approach, we are biased in the search of HISA since we utilize 13 CO velocities to constrain the velocities of extracted HISA features. We lack a systematic approach to find HISA independent of molecular line emission. At the spectral resolution of 1.5 km s −1 , we are not able to detect narrow selfabsorption features (HINSA; Li & Goldsmith 2003;Krčo et al. 2008) that can be identified through line profile characteristics, such as the line width and the second derivative of the absorption feature. H i self-absorption features with line widths ≥ 1 km s −1 are difficult to differentiate from emission troughs. The kinematic information of molecular line emission is therefore crucial in our analysis.

H i self-absorption
In order to compare the kinematics in a statistical sense, we regridded the HISA data to the same pixel scale as the 13 CO GRS data. The properties and kinematics of the CNM were analyzed by fitting single Gaussian components to the HISA spectra. Due to the limited velocity resolution, it is not feasible to resolve multiple HISA components between 43 and 56 km s −1 . Fits that have a peak intensity of > 25 K (∼3σ) and a line width between 1.5 and 20 km s −1 (FWHM) are considered good. The fitted peak values of the extracted HISA spectra are shown in Fig. 5. The derived HISA peaks have intensities between ∼30 K and ∼70 K. By comparing the inferred HISA features with the molecular gas emission, the filament can be separated into two subregions (see Fig. 5). The western part of the filament (19.3 • 17.9 • ) shows good spatial correlation be-tween HISA and 13 CO as the cold atomic gas is expected to be closely associated with its molecular counterpart. We assess the spatial correlation quantitatively in Sect. 4.4 to confirm this finding. However, the eastern part of the filament (20.5 • 19.5 • ) shows significant HISA that does not spatially overlap with the 13 CO emission at the velocities around ∼45 km s −1 . On the eastern side of the cloud, the CNM as traced by HISA appears to envelop the denser molecular filament. The extracted features indicate the presence of a cold H i cloud as the velocities generally agree with the molecular gas (Fig. 6). Furthermore, optical depth measurements against bright continuum sources reveal high optical depths in the same velocity regime (Sect. 3.3.3). This underlines the robustness of the extraction method. We examined the two subregions separately in the following analysis.

Kinematics
We smoothed the 13 CO spectra to a spectral resolution of 1.5 km s −1 and applied single-component Gaussian fitting to be consistent in our analysis. Emission features with a peak intensity of > 1.25 K(∼ 5σ) and a line width 1.5 km s −1 < FWHM < 20 km s −1 are considered to be good fits. The peak velocity maps of HISA and 13 CO are presented in Fig. 6. The peak velocities of HISA in the eastern part of the filament show a velocity of ∼44 − 46 km s −1 . The western part reveals slightly higher peak velocities from ∼45 to ∼49 km s −1 . The peak velocities of 13 CO show a coherent distribution along the filament (Ragan et al. 2014).
Although there are slight systematic differences in peak velocity at some positions, the median of the histograms of peak velocities reveal good agreement between H i and 13 CO emission in both the eastern and western regions (Fig. 7). The similar velocities are a confirmation that the extracted HISA structures are trustworthy, even though HISA and 13 CO show a lower degree of line-of-sight correlation in the eastern part of the filament. The HISA structures in the northern part of the eastern region reveal large line widths of ∼8-10 km s −1 (Fig. 8). The bulk of HISA south of the 13 CO contours shows line widths of 3-6 km s −1 . Possible implications of this line width enhancement are discussed in Sect. 4.1.
The 13 CO line widths are ∼2-3 km s −1 in the western part and show line widths that are slightly higher in the eastern part ( Fig. 9). Assuming a kinetic temperature, we can estimate the expected thermal line width. In local thermodynamic equilibrium (LTE), the thermal line width (FWHM) is given by ∆v th = 8 ln2 k B T k /(µm H ), where k B , T k , and µ are the Boltzmann constant, kinetic temperature, and the mean molecular weight of H i and the CO molecule in terms of the mass of a hydrogen atom m H , respectively. If different line broadening effects are uncorrelated, the total observed line width will be ∆v obs = ∆v 2 th + ∆v 2 nth + ∆v 2 res , where ∆v nth is the line width due to nonthermal effects and ∆v res is the line width introduced by our spectral resolution and is equal to 1.5 km s −1 .
The observed 13 CO line widths even at the lower end of the distribution at 2-3 km s −1 cannot be explained by thermal line broadening. Effects such as turbulent motions are most likely the dominant driver for the broadening of the 13 CO line. More than 70% of the observed HISA line widths are ≥ 3 km s −1 .
18.00°18.50°19.00°19.50°20.00°20.50°G alactic Longitude We can investigate the three-dimensional Mach number of the filament by assuming isotropic turbulence M = √ 3 σ nth /c s , where σ nth is the nonthermal one-dimensional velocity dispersion that is related to the nonthermal line width via ∆v nth = √ 8 ln2 σ nth . The sound speed c s is estimated using a mean molecular weight µ = 2.34 for the molecular gas and µ = 1.27 for the cold H i phase (Allen 1973;Cox 2000). To calculate the thermal component of the velocity dispersion, we assumed the spin temperature of the cold atomic hydrogen to be close to the kinetic temperature and set T k = T HISA = 40 K. As we find 13 CO excitation temperatures as high as ∼25 K where the line is becoming optically thick (see Sect. 3.3.2), we assumed that the actual kinetic temperature of 13 CO is close to the excitation temperature in those regions, meaning these regions are dense and in LTE. We therefore set a uniform kinetic temperature of T k = 20 K for the Mach number estimates of 13 CO. Figure 10 shows that the Mach number of the CNM traced by HISA peaks at ∼3, indicating that a significant fraction of the CNM has transonic and supersonic velocities. Furthermore, there is an indication of a shoulder at M ∼ 6. The Mach numbers of 13 CO show a broad distribution and are dominated by supersonic motions. The distribution is slightly skewed toward higher Mach numbers as we observe multiple 13 CO components between 43 and 56 km s −1 in some regions. Consequently, fitting single Gaussian components results in overly broad line widths where we observe multiple velocity components. Hence, the nonthermal velocity dispersion and therefore the Mach number is systematically overestimated. If we utilize a single Gaussian component fit on the GRS spectra at the full spectral resolution of 0.21 km s −1 , the 13 CO Mach number distribution does not significantly change. Therefore, the spectral smoothing has a negligible effect on the derivation of the Mach numbers if we fit single components. To address the uncertainty of possible component blending, we assumed that the 13 CO line width is on average systematically overestimated by 30% due to the single-component fitting. The Mach number distribution is then shifted to lower values with a more pronounced peak (see Fig. 10). Furthermore, if the filament lacks spatial isotropy, we could be overestimating the Mach number by a factor as high as √ 3, which would lead to a distribution with a median of M ∼ 6 ( Fig. 10).

CNM column density traced by HISA
We calculated the column density of the CNM traced by HISA following Eqs. (1) and (2). We therefore have to assume either an optical depth or a spin temperature. As we know that HISA traces the coldest component of atomic hydrogen, we assumed a constant spin temperature of T S = 40 K for the whole cloud to calculate the column density. This is a typical spin temperature of cold self-absorbing H i clouds (e.g., Knapp 1974;Gibson et al. 2000;Heiles & Troland 2003). We emphasize that a constant spin temperature is an approximation that might not hold for every region of the cloud. However, the maximum spin temperature is constrained in Appendix A.2 and the actual temperature variation should be moderate. Different spin temperatures (if constant over the whole cloud) will not change the structure of the column density distribution in the cloud but only change the normalization factor. Furthermore, we have to assume the fraction of background emission parameterized by the factor p (Eq. 1). Although we cannot measure this parameter directly, we can constrain p by its effect on the spin temperature and the location of the cloud. Because of the cloud's location toward the inner Galactic plane ( ∼ 19 • ) and its distance of ∼3.5 kpc (Ragan et al. 2014), it is unlikely that most of the H i emission originates in the foreground. The fraction of background emission should therefore be at least p 0.5. Since self-absorption can also be induced by H i emission from the far side of the Galaxy due to the kinematic distance ambiguity, the background fraction p should be systematically higher than the foreground emission fraction. Therefore, 18.00°18.50°19.00°19.50°20.00°20.50°-01.  we assumed a background fraction of p = 0.9 and discuss its uncertainties in Appendix A.1. Wang et al. (2020b)     more, Gibson et al. (2000) argue that the HISA detection is bi-ased toward higher p values since a high background fraction is more efficient in producing prominent HISA features.

Molecular gas column density traced by 13 CO
In the optically thin limit, the 13 CO column density is computed by (Wilson et al. 2013) where N( 13 CO) is the column density of 13 CO molecules in cm −2 , dv is in units of km s −1 , T B and T ex are the brightness temperature and excitation temperature of the 13 CO line in units of Kelvin, respectively. By assuming that the excitation temperatures of 12 CO and 13 CO are the same in LTE, we derived the excitation temperature from 12 CO(1-0) emission data of the FOREST Unbiased Galactic plane Imaging survey with the Nobeyama 45m telescope (FUGIN; Umemoto et al. 2017), using (Wilson et al. 2013) where T 12 B is the brightness temperature of the 12 CO line in units of Kelvin. The FUGIN 12 CO data have an angular and spectral resolution of ∆Θ = 20 and ∆v = 1.3 km s −1 , respectively. To calculate the excitation temperature, we reprojected the data cube on the same spatial and spectral grid as the GRS 13 CO data of GMF20.0-17.9.
We find a lower limit to the excitation temperature of 5 K for regions where the 12 CO brightness temperature reaches the 5σ level (2 K). We can then derive the optical depth of the 13 CO line from the excitation and brightness temperature, using (see e.g., Wilson et al. 2013;Schneider et al. 2016) We estimate a lower limit of the optical depth of τ ∼ 0.06 for 13 CO brightness temperatures above 1.25 K (∼5σ) and the highest excitation temperatures we find (∼25 K). Hence, we set the optical depth to τ = 0.06 in regions where τ < 0.06. Only few positions show optical depths as high as τ ∼ 2. We employ a correction factor to compensate for high optical depth effects by replacing the integral in Eq. (4) with (Frerking et al. 1982;Goldsmith & Langer 1999) This correction factor is accurate to 15% for τ < 2.
To translate the 13 CO column density into a column density of molecular hydrogen, we first estimated the relative abundance of 12 CO with respect to 13 CO. Milam et al. (2005) and Giannetti et al. (2014) derived relative abundance relations based on different CO isotopologs and metallicities. At the Galactocentric radius of D GC = 5.0 kpc (Ragan et al. 2014), these relations give [ 12 CO]/[ 13 CO] abundances between 40 and 56. Given the large uncertainty of these numbers, we chose a canonical conversion factor of 45. The relative abundance of the main isotopolog 12 CO compared to molecular hydrogen is given in Fontani et al. (2012) who derive an H 2 abundance with respect to 12 CO of X −1 12 CO = 7500. Therefore, we adopted a conversion factor of [H 2 ]/[ 13 CO] = 3.4 × 10 5 . The derived H 2 column densities have uncertainties of at least a factor of two due to the large uncertainties in these relations. Furthermore, CO might not always be a good tracer of H 2 as "CO-dark H 2 " could account for a significant fraction of the total H 2 (Pineda et al. 2008;Goodman et al. 2009;Pineda et al. 2013;Smith et al. 2014).
For the derivation of the column densities we integrated over the whole velocity range between 43 and 56 km s −1 where we find HISA. The H i (CNM) and H 2 column densities derived from HISA, and 13 CO, respectively, are presented in the lower panels of Fig. 11. The column densities are partly correlated in the western part of the filament but the strongest cold H i column density peaks in the eastern part ( ∼ 20 • , b ∼ +0.2 • ) do not show an H 2 column density counterpart. The strongest H 2 column density peak in the western part ( ∼ 18.1 • , b ∼ −0.3 • ) reveals little H i column density but coincides with continuum emission. Continuum emission contaminates self-absorption features and hence makes it difficult to measure HISA. Most locations that are associated with continuum emission do not show HISA counterparts. However, we can measure the optical depth toward strong continuum emission sources and thus constrain the spin temperature of the HISA cloud. This is addressed in the following subsection.

Atomic gas column density seen in H i emission
In addition to HISA, we investigated the properties of atomic hydrogen (WNM+CNM) by measuring the column density from H i emission and correcting for optical depth effects and diffuse continuum. We can utilize the measurement of the optical depth to constrain the spin temperature of the cold atomic hydrogen (see Appendix A.2). Further details of optical depth and column density corrections are given in Bihr et al. (2015) and Wang et al. (2020b).
As the optically thin assumption might not hold for some regions, we can utilize strong continuum emission sources to directly measure the optical depth. H i continuum absorption (HICA) is a classical method to derive the properties of the CNM (e.g., Strasser & Taylor 2004;Heiles & Troland 2003  clouds (T s ∼ 100 K), we observe the H i cloud in absorption. The absorption feature is furthermore dominated by the CNM since the absorption is proportional to T −1 s . By measuring on and off positions, we can directly compute the optical depth of H i. The optical depth is given by (see Bihr et al. 2015;Wang et al. 2020b) where T on and T off is the H i brightness temperature toward a strong continuum background source and offset from the source, respectively. The brightness temperature T cont describes the continuum level of the background source that is not affected by H i absorption. The advantage of this method is the direct measurement of the optical depth. However, the HICA method requires strong continuum emission sources. As most strong continuum sources are discrete point sources, this method results in an incomplete census of optical depth measurements (see Wang et al. 2020a, for a compilation of all optical depth measurements in the THOR survey). Consequently, the intrinsic structure of individual H i clouds cannot be determined. Some continuum emission sources also show extended structures. Therefore, finding reliable off positions could pose difficulties. As we exploited THOR-only (VLA C-configuration) data for this measurement, we filter out most large-scale H i emission. The THOR-only data reveal H i emission of less than 30 K, often just within the noise. Therefore, we can neglect the emission of the H i cloud in Eq. (8) and set T off = 0. We can then calculate the optical depth without measuring an off position by Depending on the brightness of the continuum source and the H i optical depth, the absorption spectrum can approach zero. Due to the noise in the spectra, the spectra can exhibit brightness temperatures smaller than zero, which is not physically meaningful. We therefore report a lower limit for the optical depth where the absorption T on becomes smaller than 5σ. Besides strong continuum sources we observe weak continuum emission throughout the Galactic plane. This component has brightness temperatures between 10 and 50 K. For the derivation of the H i column densities we employed the combined THOR data as in the case of HISA. The continuum emission has been subtracted during data reduction as described in Sect. 2.1. However, even weak continuum emission can still influence the observed brightness temperature. If we neglect weak continuum emission, the measured H i emission will be underestimated as weak continuum emission can suppress a significant fraction of H i emission (e.g., Bihr et al. 2015). Consequently, the derived H i column densities will be underestimated. We took this effect into account when computing the H i column density (see Bihr et al. 2015, Eq. 9).
In contrast to Wang et al. (2020a), where they used a 6σ threshold to select continuum sources, we measured the optical depth of atomic hydrogen toward the brightest continuum sources with brightness temperatures T cont > 200 K to not suffer from low saturation limits since we expect the optical depth to be high. Four sources have been identified with this threshold. The measured optical depths of these sources vary between 0.5 and 2.5 (lower limit). We selected the continuum source G19.075-0.287 (Wang et al. 2018) as a representative source for the optical depth as it is not in the 5σ saturation at most velocities between 43 and 56 km s −1 , which gives a mean optical depth of τ ∼ 0.9 (Fig. 12). This is a reasonable approximation as the optical depth map derived by Wang et al. (2020a) gives a mean optical depth of ∼1.0 when averaged over the whole filament.
However, this optical depth measurement is a lower limit as G19.075-0.287 is a Galactic H ii region located in the Galactic plane. As no strong extragalactic continuum sources are identified toward the position of GMF20.0-17.9, the optical depth measurement has drawbacks in the current investigation. For the emission data, we have to take into account an opacity contribution from the far side H i beyond the location of the H ii region. To first approximation, we assumed that the optical depth from the background is similar to that of the measured foreground. We therefore adopted 2 × τ(v LSR ) for the whole map and corrected the H i column density for the optical depth per velocity channel. Given the corrected mean optical depth of 2 × τ ∼ 1.8, the opacity correction factor τ/(1 − e −τ ) increases the mean column density by a factor of ∼2.
The derived column densities are a result of the H i emission stemming from both the kinematic far (12.0 kpc) and near (3.5 kpc) side of the Milky Way. The kinematic distances have been obtained using the Kinematic Distance Utilities 1 (Wenger et al. 2018) and employing the Galactic rotation model from Reid et al. (2019). Since the distribution of the atomic gas in the Galactic plane is approximately axisymmetric with respect to the Galactic center (Kalberla & Dedes 2008), we can assume that the atomic gas density distribution in the vertical direction is similar for a given Galactocentric radius. Using the average vertical density profile from Dickey & Lockman (1990), we can estimate the gas fraction at the kinematic near and far side for each line of sight. Since most H i emission is observed close to the Galactic midplane, the foreground gas fraction is ∼50%. Therefore, due to the kinematic distance ambiguity half of the H i emission is attributed to the background, which is not associated with GMF20.0-17.9. We derived the H i column density map shown in the top panel of Fig. 11 taking into account only the near side gas at 3.5 kpc.   (Wang et al. 2018). The plot shows the optical depth as a function of LSR velocity and was computed using Eq. (9). For some channels, the absorption spectrum saturates and the measured optical depth is a lower limit of τ = 1.6, which is indicated by the horizontal dotted line. The gray shaded area indicates the velocity range between 43 and 56 km s −1 , where HISA features have been extracted.

Masses
As we have determined the column densities and know the distance of GMF20.0-17.9 (∼3.5 kpc, Ragan et al. 2014), we can directly estimate the atomic and molecular mass of each part of the filament (see Table 1). All masses were calculated from the column densities integrated over 43 -56 km s −1 .
The molecular hydrogen mass of the whole filament as marked by both the red polygons in Fig. 11 is 3.5×10 5 M . Inside the polygon regions, the mass of the total atomic hydrogen, accounting for WNM and CNM measured from H i emission, corresponds to ∼75% of the H 2 mass (2.6 × 10 5 M ) for the whole filament after correcting for optical depth effects, weak continuum emission, and the kinematic distance ambiguity. However, if we take into account all diffuse H i emission beyond the polygon regions, arising from the region between 20.6 > > 17.6 • and −1.25 < b < +0.5 • , the mass of the total H i component rises by 75% to ∼4.6×10 5 M . The molecular filament is therefore embedded in a large gas reservoir of atomic hydrogen.
The CNM mass traced by HISA corresponds to 1-5% of the molecular mass, depending on the region and assumed spin temperature. The uncertainty of the column density directly translates to an uncertainty of mass. If we assume a spin temperature of 20 K, instead of our canonical value of 40 K, the mass traced by HISA decreases by a factor of ∼3. Hence, the largest uncertainty arises from the assumption of a spin temperature. We are able to constrain an upper limit of the spin temperature for the column density derivation by assuming an optically thick (τ → ∞) cloud, as we show in Appendix A.2.
The atomic mass fraction generally increases toward the eastern part of the filament, agreeing with our findings in the column density distributions (see Sect. 4.2 for details). As discussed in Appendix A, we estimated a column density uncertainty of ∼40% to account for systematic differences and noise in our baseline extraction method. Depending on the background fraction p, the column density further varies by a factor of ∼2 between p = 0.9 − 0.7 (see Appendix A.1). It is difficult to exactly quantify the uncertainty of the cold H i column density and mass traced by HISA. Considering the estimated uncertainties due to the extraction method, background fraction, and spin temperature, the HISA-traced column density and mass have an uncertainty of a factor of 2 -4.
As we study the CNM through HISA, we might miss a significant fraction of it. Since we can only trace gas that is cold enough to be observed in absorption against a warmer background, we are limited in our HISA detection by the requirement of sufficient background emission. The CNM has temperatures 300 K (McKee & Ostriker 1977;Wolfire et al. 2003). For future investigations, simulations could help to quantify the fraction of CNM that is invisible to our HISA method. The computed H 2 column density and mass has an uncertainty of at least a factor of two due to the uncertainties in the value for the CO-to-H 2 conversion (Milam et al. 2005;Fontani et al. 2012;Giannetti et al. 2014). Furthermore, we could miss a significant fraction of CO-dark H 2 column density (Pineda et al. 2008;Goodman et al. 2009;Pineda et al. 2013). Simulations suggest that the fraction of CO-dark H 2 could even be as high as ∼50% at conditions typical of the Milky Way disk (Smith et al. 2014;Duarte-Cabral & Dobbs 2016). However, this fraction should be moderate in low-temperature environments (Glover & Smith 2016).

Kinematics
The histograms of line-of-sight peak velocities derived from HISA and 13 CO generally agree for both parts of the filament. The results of the Gaussian fits to the spectra might not always reflect the actual kinematic structure as 13 CO emission exhibits multiple velocity components in some regions between 37-50 km s −1 . However, the 13 CO velocities generally agree with the HISA features in a statistical sense, even in the eastern part of the filament where we do not observe a spatial correlation along the line of sight. We take this as a confirmation that our extracted HISA features are in fact due to self-absorption.
The line widths of HISA show a broad distribution of 2 -10 km s −1 . The eastern part reveals enhanced HISA line widths, which are 3 -4 km s −1 higher toward the north of the molecular filament. Although speculative, this could be a signature of the compression of H i gas passing through the spiral arm potential and triggering H 2 formation (Bergin et al. 2004). As the gas is leaving the spiral arm structure, this could inject turbulence on the downstream side that enhances the line widths. Although observationally difficult to distinguish, simulations of the galactic dynamics of the ISM suggest that there are systematic differences in velocity dispersion between molecular clouds within the spiral arm potential and inter-arm clouds (Dobbs 2015;Duarte-Cabral & Dobbs 2016. The morphological and kinematic differences in each part of the filament could therefore be related to its position with respect to the spiral arm potential. However, in order to differentiate between different scenarios, we need to investigate synthetic H i observations, which is beyond the scope of our current analysis. We note that the broadened HISA lines toward some positions of the cloud might be subject to resolution effects and could be the superposition of multiple lines. Spectrum 5 in Fig. 4 clearly shows multiple 13 CO components where we detect an enhanced HISA line width. The lack of spatial correlation between HISA and 13 CO, particularly in the eastern region, makes it difficult to assess if multiple 13 CO components are preferentially associated with enhanced HISA line widths. Since the velocity dispersion is mostly due to turbulence in both tracers, we conclude that the agreement in velocities is robust.

Column density probability density functions (N-PDFs)
The column density maps derived in Fig. 11 can be further evaluated by determining their probability density function (PDF). Column or volume density PDFs are commonly used as a measure of the density structure and physical processes acting within the cloud (e.g., Kainulainen et al. 2014). A log-normal shape of the N-PDF is usually attributed to turbulent motions dominating the early diffuse phase of a cloud's evolution. Furthermore, the width of the log-normal distribution reflects the amplitude of turbulence and can be associated with the Mach number (e.g., Padoan et al. 1997;Passot & Vázquez-Semadeni 1998;Kritsuk et al. 2007;Federrath et al. 2008). In later evolutionary stages, molecular clouds can develop high-density regions due to the increasing effect of self gravity, producing a power-law tail in their N-PDF. Molecular cloud complexes that show star-forming activity favor this scenario as they reveal such power-law tails (Kainulainen et al. 2009;Schneider et al. 2013Schneider et al. , 2016. The shape of the resulting N-PDF is also sensitive to the regions where column densities are taken into account, especially in the low column density regime (Lombardi et al. 2015), and it is sensitive to the treatment of zero spacing information in inter- 3.5 × 10 5 2.6 × 10 5 (4.6 × 10 5 ) (a) 4.6 × 10 3 1.3 × 10 4 1% 4% 75% (130%) (a) East 1.1 × 10 5 1.2 × 10 5 1.9 × 10 3 5.7 × 10 3 2% 5% 110% West 2.3 × 10 5 1.5 × 10 5 2.6 × 10 3 7.5 × 10 3 1% 3% 65% Notes. The masses were calculated for each part of the filament as well as the whole filament marked by the red polygons in Fig. 11. The second column gives the molecular hydrogen mass as traced by 13 CO emission. The third column shows the total atomic hydrogen mass inferred from the optical depth and continuum corrected H i emission. The fourth and fifth column present the mass of the cold atomic hydrogen traced by HISA with an assumed spin temperature of 20 and 40 K, respectively. The last three columns give the corresponding mass fractions with respect to the H 2 mass. (a) This mass was calculated using the corrected H i emission between 43 and 56 km s −1 , 20.6 > > 17.6 • and −1.25 < b < +0.5 • .
ferometric data (Ossenkopf-Okada et al. 2016). We derived each N-PDF from the regions marked by the red polygons in Fig. 11, respectively. However, if we take into account low column densities that extend beyond the region enclosed by the polygons, we will miss a significant fraction of low column densities and hence the shape of the N-PDF will not recover the structure at the lower end well. We therefore chose to derive the N-PDFs from column densities approximately within the last closed contours that are still within the selected polygon regions. In order to compare the H i column densities with those of molecular hydrogen, we converted N(H 2 ) to N(H) to construct the H 2 N-PDFs. For the N-PDFs, we chose closed contours of 1.9 × 10 20 and 3.2 × 10 21 cm −2 for HISA and H 2 , respectively. The selected closed contours that go beyond the polygon regions do not significantly change the shape of the N-PDFs. We set the column density threshold for H i emission to 2.7 × 10 21 cm −2 . It is difficult to define a last closed contour for H i emission. However, this boundary bias has a negligible impact on the shape of the H i N-PDF as we observe a small range of column densities due to the diffuse nature of H i emission. For turbulence-dominated gas, last closed contours are not essential to sample the N-PDF properly (Körtgen et al. 2019). We furthermore normalized each N-PDF by the mean column density. Figure 13 presents the N-PDFs of H i emission, HISA, and H 2 column densities for each part of the filament (east/west) as well as the whole filament (east+west), respectively. As expected from the column density maps in Fig. 11, the N-PDF of the CNM as traced by HISA peaks at lower column densities than molecular hydrogen. The HISA N-PDFs are well represented by a log-normal function. The results of log-normal fits are shown in Table 2. The log-normal shape implies that turbulent motions might be dominant and gravitational collapse leading to high column density peaks is not visible in HISA within the whole filament. There is no significant difference between the subregions defined in the eastern and western part of the filament. The widths of the HISA N-PDFs are the same for both regions. The mean column density derived from HISA is ∼3×10 20 cm −2 .
The mean column densities and widths of the N-PDFs agree well with those found by Wang et al. (2020b) for GMF38.1-32.4a. To investigate how observational uncertainties affect the width of the N-PDF, Wang et al. (2020b) created model images of H i and continuum emission with similar properties and noise as the real THOR data. They introduced artificial H i absorption features from known column density distributions and added them to the model data. They extracted the HISA fea- Notes. The second column shows the mean column density of each component designated in the first column. The third column presents the widths of the log-normal function fitted to the N-PDFs. The last column shows the index of the power-law (PL) function fitted to the tail of the H 2 and All gas N-PDFs.
tures using a similar method and derived column density distributions showing that the widths of the N-PDFs they find do not significantly increase due to observational uncertainties or the HISA extraction method. They conclude that the widths of the derived N-PDFs are robust and not subject to broadening introduced by observational noise and the fitting approach. The mean column densities of molecular hydrogen are about an order of magnitude higher than the column densities of HISA. In contrast to the HISA N-PDFs, the N-PDFs of molecular hydrogen are poorly represented by a log-normal function. Even though the eastern region does not show similarly high column density peaks as the western region, a power-law behavior is evident in both column density distributions. Therefore, powerlaw functions (p(x) ∝ x −α ) were additionally fitted to the high column density tail of the H 2 N-PDFs. The best minimal column density for the power-law fit is obtained from the minimal Kolmogorov-Smirnov distance between the fit and the N-PDF. The fits were performed using the python package Powerlaw (Alstott et al. 2014). The fitted parameters of the power-law functions are also listed in Table 2.
Power-law tails can be a sign of gravitational collapse, which creates high column density peaks (Klessen 2000;Federrath et al. 2008;Kainulainen et al. 2009;Schneider et al. 2016). In agreement with observations, simulations of self-gravitating, turbulent molecular clouds show that star-forming activity reveals strong deviations from the log-normal shape in the form of power-law tails toward high column densities (Kritsuk et al. 2011). The slope of the power-law tails can then be associated with the evolutionary stage of the cloud, with shallower slopes indicative of an increasing degree of star formation efficiency (Federrath & Klessen 2013). In general, theoretical studies and simulations of molecular clouds can reproduce N-PDFs of different forms, depending on the degree of turbulence, star-forming activity, and magnetic field support (Vazquez-Semadeni 1994;Federrath et al. 2010;Burkhart et al. 2015a). Both subregions miss a small fraction of low column densities above the closed contour threshold. However, the shape of the H 2 N-PDFs does not change significantly if we take into account all closed contours beyond the polygon regions.
The N-PDFs derived from the H i emission that traces a combination of CNM and WNM show a narrow log-normal shape with widths of σ = 0.09−0.12. Observations toward well-known molecular cloud complexes also show N-PDFs of H i emission with narrow log-normal shapes (Burkhart et al. 2015b;Imara & Burkhart 2016;Rebolledo et al. 2017). We might overestimate the column densities as the optical depth derived from absorption (see Sect. 3.3.3) is mostly due to cold atomic gas acting as the absorbing medium. However, we used this optical depth measurement to correct for H i emission that might also be attributed to warm and optically thin gas. Bihr et al. (2015) assess this effect and investigate the overestimation by comparing the corrected total H i column densities with actual column densities of known spin temperatures and optical depths. They show that this overestimate is < 10% for measured CNM optical depths τ < 1.5. This effect is therefore negligible compared to the un-certainty of the optical depth measurement itself. Furthermore, this systematic effect does not significantly affect the shape of the N-PDF.
The mean column densities inferred from H i emission are N HI ∼ 3 × 10 21 cm −2 . The H i column densities show a narrow log-normal distribution driven by turbulent motions whereas the N-PDFs of molecular hydrogen show a broad distribution with a power-law behavior toward high column densities that might be subject to gravitational collapse. The column densities traced by H i emission are an order of magnitude higher than those traced by self-absorption. The narrow width of the N-PDF represents the diffuse nature of H i emission while the broader column density distribution traced by HISA indicates a clumpier structure of the CNM.
We examined the column density distribution of the entire hydrogen content of the filament, that is, both the atomic and molecular phase of GMF20.0-17.9. We derived an "All gas" N-PDF in Fig. 14 by adding together the column densities of all three tracers. We fitted the high column density tail of the distri- Fig. 14. All gas N-PDF of GMF20.0-17.9. The PDF is derived by adding the column densities of H i, HISA, and H 2 . The plot shows the derived N-PDF of the whole filament marked by both the red polygons in Fig. 11. The red vertical dashed and solid line marks the column density threshold (last closed contour) at 4.5 × 10 21 cm −2 and mean column density, respectively. The red solid line indicates the power-law fit to the high column density tail of the distribution.
bution with a power-law function. The N-PDF can be very well described by a single power-law function. The western part shows higher H 2 column density peaks and a shallower power-law tail in the N-PDF. The H 2 column densities are generally lower in the eastern part of the filament. The ATLASGAL survey (Schuller et al. 2009) reveals several high-density clumps in the western part of the filament and few clumps in the eastern part within the 13 CO velocity range. The N-PDFs traced by HISA do not show significant differences between each part of the filament. However, the mass of HISA compared to its molecular counterpart does therefore increase toward the eastern part of the filament. The maximum spin temperature of our extracted HISA features is also lower in the eastern subregion (Fig. A.3). This might be an indication that the eastern subfilament is a young, cold H i cloud while the western region exhibits a more evolved structure and star-forming activity. To furthermore test the validity of this hypothesis, we would need to extend our analysis to a larger sample of GMFs to de-duce statistical evidence. This will be addressed in a future analysis. Simulations of cloud formation could also give constraints on signatures of kinematics and column densities in atomic and molecular line tracers. However, this is beyond the scope of this current investigation.

Signatures of phase transition
The conversion of atomic to molecular gas (H i-to-H 2 ) is fundamental for molecular cloud formation processes. Theoretical models predict for a single H i-to-H 2 transition a mass surface density threshold of Σ HI ∼ 5 -10 M pc −2 for solar metallicity (Krumholz et al. 2008(Krumholz et al. , 2009Sternberg et al. 2014). In such models, the H i-to-H 2 transitions are computed assuming a balance between far-UV photodissociation and molecular formation, and accounting for the rapid attenuation of the radiation field due to H 2 self-shielding and dust absorption (see also Klessen & Glover 2016). Figure 15 presents the atomic hydrogen as a function of the total hydrogen mass surface density. We take into account all H i column densities traced by the corrected H i emission between 20.6 > > 17.6 • and −1.25 < b < +0.5 • . The figure reveals a saturation of atomic hydrogen at a mass surface density of ∼20 -30 M pc −2 . A least squares fit to the mean of the distribution yields a mass surface density threshold of ∼25 M pc −2 (= 3 × 10 21 cm −2 ). When examined individually, both the eastern and western subregion show the same H i saturation level within the uncertainties (24 and 26 M pc −2 , respectively). This transition exceeds the column density threshold predicted by theoretical models (Krumholz et al. 2008(Krumholz et al. , 2009Sternberg et al. 2014). Bihr et al. (2015) report a column density threshold of 50 -80 M pc −2 toward the star-forming region W43, which is significantly higher than predicted transitions at ∼5 -10 M pc −2 . Bialy et al. (2017) argue that such high mass surface density thresholds cannot be explained by typical physical properties of the CNM as it would require an unrealistically high UV radiation field or low dust-to-gas ratio. As the clumpiness of a molecular cloud might regulate how far UV radiation penetrates the medium (e.g., Stutzki et al. 1988;Shibai et al. 1991), Bialy et al. (2017) suggest that the high thresholds can naturally be explained by a superposition of multiple transition layers observed along the line of sight. These authors predict a mass surface density threshold of ∼13 M pc −2 for the more active star-forming region W43. Wang et al. (2020b) find similar values of 14 -23 M pc −2 toward GMF38.1-32.4a, where the atomic gas surface density saturates to an almost flat distribution.
While the derived atomic mass surface densities are a result of the combined column densities of WNM and CNM, the shielding from dissociating Lyman-Werner (LW) photons provided by a transition layer between atomic and molecular gas should be dominated by the CNM (Krumholz et al. 2009). The H 2 formation rate per atom scales as the number density n, so the CNM, due to its higher density, is far more effective at shielding than the WNM. The observed transition should therefore be an upper limit and the actual critical surface density is ≤ 25 M pc −2 , depending to the first approximation on the ratio Σ CNM /Σ WNM . Taking these considerations into account, we conclude that we observe at most 3-5 transition layers of atomic to molecular gas between 43 and 56 km s −1 .

Spatial correlation between atomic and molecular gas
The Histogram of Oriented Gradients 2 (HOG) is a method based on machine vision to study the spatial correlation in the emission by two or more spectral line tracers across velocity channels in an unbiased and systematic way. In Appendix B we briefly outline the basic principles involved in this method. A comprehensive description is given by Soler et al. (2019). We applied the HOG on each part of the filament to investigate the spatial correlation between H i and 13 CO. The output of the HOG analysis is a matrix where the rows and columns correspond to the different velocity channels in each tracer, as shown in Fig. 16. The number in each matrix position corresponds to the projected Rayleigh statistic (V), which is an optimal estimator of the morphological correlation between the velocity channel maps as evaluated by the orientation of its intensity gradients. High values of V correspond to high spatial correlation and values of V ≈ 0 correspond to very low spatial correlation. The intensity gradients are calculated using a Gaussian derivative kernel whose width determines the spatial scales under consideration. To exploit the available spatial resolution, we selected a derivative kernel size that matches the synthesized beam size of the GRS 13 CO data, that is, 46 . The projected Rayleigh statistic is a measure of the significance of the spatial correlation, V ≈ √ 2 is roughly the equivalent of a 1σ deviation from complete lack of correlation, which corresponds to a flat distribution in the angles between the intensity gradients. However, the significance of the result also has to be evaluated with respect to the chance correlation that may be present between the velocity channel maps. We use the standard deviation of V in the velocity range between 10 and 90 km s −1 as an estimate of the amplitude of the chance correlation against which we can evaluate the significance of the V values. This assumes that there are enough independent velocitychannel maps in the selected velocity range. Figure 16 presents the correlation distribution between H i and 13 CO for all parts of the filament as a function of velocity. We observe a significant spatial correlation in the velocity channels around v HI ≈ v13 CO ∼ 43 km s −1 and ∼ 47 km s −1 toward the west. However, the eastern part of the filament shows no significant correlation between H i and 13 CO emission at the velocities of GMF20.0-17.9. The observed correlation within the whole filament is therefore dominated by the western region. While we computed the spatial correlation between H i and 13 CO emission, we test the validity of the correlation by applying the HOG analysis to the inferred HISA and 13 CO emission maps. The HOG yields similar findings for HISA and 13 CO. As the absence of spatial correlation within the eastern part of the filament is reproduced in both analyses, we are confident that we do not observe any significant spatial correlation between H i and 13 CO within the eastern part of the filament. Small kernel sizes close or equal to the angular resolution of the telescope makes features produced by noise and nonideal telescope beams more evident. Since spatial correlation is expected across multiple scales (Green 1993;Lazarian & Pogosyan 2000;Lazarian et al. 2001), we also examine the correlation in each analysis by setting the kernel size to 90 , which is approximately twice the angular resolution of the THOR and GRS data (40 and 46 , respectively). The differences we find between the eastern and western region with our HISA method are reproduced by the HOG analysis, irrespective of the spatial scale we use. Thus, we consider these findings robust and not an artifact of our HISA extraction method.
We conclude that the CNM appears to be associated with molecular gas in the western part whereas the molecular gas seems to be decoupled from its atomic counterpart in the more diffuse cloud envelope toward the east. The systematic differences in spatial correlation between east and west can be interpreted as an indication of different evolutionary stages.

Conclusions
We have studied the atomic and molecular gas within the giant molecular filament GMF20.0-17.9. The molecular component is traced by GRS 13 CO observations whereas the atomic gas is observed via H i emission from the THOR survey. We isolated HISA features to disentangle the CNM from the atomic gas traced by H i emission. We aimed to study the properties of the CNM as traced by HISA and compare our findings with the molecular counterpart. The results are summarized in the following: 1. We extracted HISA features by estimating the H i emission spectrum in the absence of HISA. We employed a combination of first and second order polynomial functions to fit the baselines of HISA spectra at velocities adjacent to HISA features. This method gave the most reliable and robust results among the procedures we tested. 2. The extracted HISA features reveal a spatial correlation with 13 CO emission toward the western region of the filament while the eastern part shows no evidence that HISA traces the molecular gas. This finding is supported by the HOG analysis reporting significant spatial correlation toward the western part and no correlation toward the eastern part of the filament. However, the peak velocities of HISA and 13 CO are in good agreement in both parts of the filament. The observed line widths of 13 CO and HISA suggest that nonthermal effects like turbulent motions are the dominant driver for most regions within the filament. 3. We derived H 2 column densities from 13 CO emission and compared the molecular column density distribution with its atomic counterpart. The HISA column densities show a more diffuse structure compared to those of molecular hydrogen. The H 2 column densities reveal high-density peaks, particularly in the western part of the filament. The mass ratio of H i (traced by HISA) and H 2 is 0.01 − 0.05, depending on the assumed spin temperature and region. This mass ratio increases toward the eastern part of the filament. The total H i mass traced by H i emission is similar to the molecular mass within the defined regions. The mass surface density threshold from the total H i to H 2 is observed to be ∼25 M pc −2 , in excess of predictions by theoretical models. However, this result can naturally be explained by a superposition of multiple transition layers or an additional WNM fraction that is less effective at shielding. 4. The HISA N-PDFs can be well described by log-normal functions in both parts of the filament, indicative of turbulent motions as the main driver for these structures. While the magnitude of the column densities are dependent on the assumed parameters of spin temperature and background fraction, the shape and width of the N-PDFs are robust. The N-PDFs of H i emission tracing both the WNM and CNM of the atomic gas represent the diffuse structure and show a narrow log-normal shape. The H 2 column densities show a broad log-normal distribution with an indication of a powerlaw tail, more pronounced in the western part of the filament. 5. We speculate that the two parts of the filament reflect different evolutionary stages. Interestingly, the derived HISA features in the eastern part of the cloud show lower maximum spin temperatures. This favors the scenario of a younger, less evolved cloud that is forming molecular gas out of the atomic gas reservoir. The western region harbors signs of active star formation and shows more pronounced column density peaks of H 2 . Moreover, the mass fraction of H 2 compared to cold atomic hydrogen traced by HISA is larger toward the western part of the filament. While the HISA features correlate well with the molecular gas in the western part of the filament, they lack spatial correlation with the molecular component in the eastern region. Furthermore, we speculate that signatures of spiral arm interaction with atomic gas are visible toward the eastern part of the filament, due to an enhancement of line widths. The spatial structure and kinematics provide useful observables for theoretical models and simulations.
A statistical treatment of the HISA properties in the Galactic plane is still missing. However, this case study toward a known large-scale filament, which is complementary to the analysis by Wang et al. (2020b), serves as a good laboratory to study the properties of the CNM.  Fig. 11. The blue, red, and green distributions correspond to a background fraction of 0.9, 0.8, and 0.7, respectively. The vertical lines show the corresponding mean column densities.
of ∼2 from p = 0.9 to 0.7. We can furthermore estimate the amount of background emission from the radial H i volume density distribution in the Galactic plane. For Galactocentric radii 7 R 35 kpc Kalberla & Dedes (2008) report an average mid-plane volume density distribution of n(R) ∼ n 0 e −(R−R )/R n with n 0 = 0.9 cm −3 , R = 8.5 kpc, and R n = 3.15 kpc. Assuming a constant volume density of n(R < 7 kpc) = n(R = 7 kpc), we can integrate the densities along the line of sight and estimate the amount of gas up to the distance of the filament (foreground) and beyond (background). We thus obtained a background fraction of ∼0.92 by integrating up to a Galactocentric distance of 35 kpc. This relation gives the averaged distribution of the northern and southern Galactic plane and could hold systematic differences in 18.00°18.50°19.00°19.50°20.00°20.50°G alactic Longitude distance from the origin after taking unit steps in the direction determined by each angle Φ i j,lm . The expectation value of a random walk in two dimensions is zero, so any significant deviation from the origin would indicate preferential angles. We are particularly interested whether intensity gradients tend to be preferentially parallel (Φ i j,lm = 0) and how strong that preference is. The sum over a preferred orientation angle of Φ i j,lm = 0 will result in high V values, just as a preferred orientation angle of Φ i j,lm = 0 in a "random" walk would result in a large distance from the starting point. Spatially uncorrelated data would reveal V values of approximately zero. Associated physical structures probed by spectral line emission are not independent across velocities and inherently show a correlation as well. To assess the statistical significance of each velocity-channel map, we used the mean value V of the V over the velocity range that we analyzed and estimated the standard deviation by the population variance of the distribution. We therefore assumed that over a broad velocity range the velocity channels are independent. Another uncertainty arises from observational noise in the data. The HOG addresses this by the use of Monte Carlo sampling to propagate the uncertainties in the observations. Therefore, for each velocity-channel map n different realizations are generated within the observational uncertainty and with the same mean intensity. Using this sampling, the uncertainty of the correlation can be determined by estimating the variance of the correlation of different Monte Carlo realizations. Since we expect a contribution from nonuniform noise introduced by the observation, we report only ≥ 5σ confidence levels.