Cloud formation in the atomic and molecular phase: H I self absorption (HISA) towards a giant molecular filament

Molecular clouds form from the atomic phase of the interstellar medium. However, characterizing the transition between the atomic and the molecular interstellar medium (ISM) is a complex observational task. Here we address cloud formation processes by combining H I self absorption (HISA) with molecular line data. Column density probability density functions (N-PDFs) are a common tool for examining molecular clouds. One scenario proposed by numerical simulations is that the N-PDF evolves from a log-normal shape at early times to a power-law-like shape at later times. To date, investigations of N-PDFs have been mostly limited to the molecular component of the cloud. In this paper, we study the cold atomic component of the giant molecular ﬁlament GMF38.1-32.4a (GMF38a, distance=3.4 kpc, length ∼ 230 pc), calculate its N-PDFs, and study its kinematics. We identify an extended HISA feature, which is partly correlated with the 13 CO emission. The peak velocities of the HISA and 13 CO observations agree well on the eastern side of the ﬁlament, whereas a velocity offset of approximately 4 km s − 1 is found on the western side. The sonic Mach number we derive from the linewidth measurements shows that a large fraction of the HISA, which is ascribed to the cold neutral medium (CNM), is at subsonic and transonic velocities. The column density of the CNM part is on the order of 10 20 to 10 21 cm − 2 . The column density of molecular hydrogen, traced by 13 CO, is an order of magnitude higher. The N-PDFs from HISA (CNM), H I emission (the warm and cold neutral medium), and 13 CO (molecular component) are well described by log-normal functions, which is in agreement with turbulent motions being the main driver of cloud dynamics. The N-PDF of the molecular component also shows a power law in the high column-density region, indicating self-gravity. We suggest that we are witnessing two different evolutionary stages within the ﬁlament. The eastern subregion seems to be forming a molecular cloud out of the atomic gas, whereas the western subregion already shows high column density peaks, active star formation, and evidence of related feedback processes.


Introduction
Stars, one of the key components of our Universe, form in molecular clouds which are composed mainly of molecular hydrogen (e.g., Larson 2003;Stahler & Palla 2005;McKee & Ostriker 2007;Dobbs et al. 2014;Tan et al. 2014), yet the formation process of molecular clouds is still under debate. Various studies show molecular clouds form out of relatively diffuse atomic hydrogen gas (e.g., Larson 1981;Blitz et al. 2007;Clark et al. 2012;Hennebelle & Falgarone 2012;Dobbs et al. 2014;Sternberg et al. 2014;. Different processes have been proposed (e.g., Hennebelle & Falgarone 2012;Dobbs et al. 2014;. Basically, the atomic gas contracts and the increased column density shield the cloud from the interstellar UV radiation and cool down to form molecular gas. The cold neutral medium (CNM) with a typical temperature in the range of 40-100 K and volume density >10−100 cm −3 (McKee & Ostriker 1977;Wolfire et al. 1995;Wilson et al. 2010) is the key component connecting the diffuse atomic gas with the molecular gas. Thus, observational constraints on the physical properties of the CNM, such as density distribution and kinematics, are crucial to understanding the formation of the molecular cloud.
Although 21 cm H I line emission offers a straightforward tool to study atomic hydrogen, it is difficult to determine the properties of the gas from which it arises. The main challenge is the coexistence of the warm neutral medium (WNM) and the CNM assumed to be in pressure equilibrium (McKee & Ostriker 1977;Wolfire et al. 1995Wolfire et al. , 2003. Studies of H I self absorption (HISA) overcome this problem, as they only trace the CNM. HISA was first detected by Heeschen (1954Heeschen ( , 1955 towards the Galactic center. HISA features occur if cold, dense atomic hydrogen is in front of a warmer emission background (e.g., Knapp 1974). Since then, a number of observations that have been carried out with single dish telescopes and interferometers, and HISA features were found to be widespread in the Milky Way (e.g., Riegel & Crutcher 1972;Knapp 1974;Heiles & Gordon 1975;McCutcheon et al. 1978;Levinson & Brown 1980;Minn 1981;Shuter et al. 1987;van der Werf et al. 1988;van der Werf & Goss 1989;Montgomery et al. 1995;Gibson et al. 2000Gibson et al. , 2005aKavars et al. 2003Kavars et al. , 2005Dénes et al. 2018). The spin temperature of the cold H I responsible for HISA ranges from ∼10-60 K, (e.g. Gibson et al. 2000;Kavars et al. 2005;McClure-Griffiths et al. 2006). A special case of the HISA features, so-called H I narrow self absorption (HINSA) features, were studied towards nearby molecular clouds, revealing small linewidths on the order of 1 km s −1 (Li & Goldsmith 2003;Goldsmith & Li 2005;Krčo et al. 2008;Krčo & Goldsmith 2010;Zuo et al. 2018). However, studies characterizing the column density and the kinematic distribution of the CNM in large maps are still rare.
Recent observations have revealed a group of large (≥100 pc) and massive (≥10 5 M ) filaments, known as giant molecular filaments (GMFs), which may be linked to Galactic dynamics and trace the gravitational mid-plane in the Milky Way (MW; Jackson et al. 2010;Goodman et al. 2014;Wang et al. 2015Wang et al. , 2016Zucker et al. 2015Zucker et al. , 2018Abreu-Vicente et al. 2016;Li et al. 2016;Zhang et al. 2019). These observations show that GMFs are the largest coherent gas structures in our Milky Way, and often contain different evolutionary stages of the star formation regions simultaneously in the same filament (Goodman et al. 2014;Zucker et al. 2015), which makes them ideal targets for studying the CNM properties in different environments that lead to molecular cloud formation.
A common tool to study molecular clouds are the probability density functions of the column density (N-PDFs; see e.g., Ostriker et al. 2001;Lombardi et al. 2008;Kainulainen et al. 2009; Alves de Oliveira et al. 2014;Sadavoy et al. 2014;Abreu-Vicente et al. 2015;Schneider et al. 2015;Lin et al. 2017;Chen et al. 2018). The shape of N-PDFs is predicted to depend on the physical processes acting within the cloud. In the early evolution of a molecular cloud, turbulent motions within the cloud dominate and the N-PDF reveals a log-normal shape. The width of the log-normal N-PDF is also determined by the turbulent motions (see e.g., Federrath et al. 2010;Ballesteros-Paredes et al. 2011;Kritsuk et al. 2011;Federrath & Klessen 2013;Burkhart et al. 2015a;Bialy et al. 2017a). In this scenario, more evolved clouds develop a highdensity power-law tail, indicating that the cloud structure has evolved and gravity dominates. Observations indicate that starforming clouds show such tails, lending support to this scenario (e.g., Kainulainen et al. 2009;Schneider et al. 2013). The slope of the power-law N-PDF can be related to evolutionary stages of the clouds with steeper slopes possibly indicating with earlier evolutionary stages (e.g., Kritsuk et al. 2011;Federrath & Klessen 2013;Ward et al. 2014).
High-mass star-forming regions reveal multiple power-laws, having a shallower slope for the highest density regions. This indicates a slower collapse for such regions (Schneider et al. 2015). Lombardi et al. (2015) and Alves et al. (2017) present a contrasting argument, reporting that all N-PDFs have a powerlaw shape and the log-normal shape could be an observational bias, a view point that has triggered considerable controversy (Ossenkopf-Okada et al. 2016;Chen et al. 2018;Körtgen et al. 2019). Theoretical work and simulations of molecular clouds also reproduce N-PDFs in different forms (e.g., Vazquez-Semadeni 1994;Federrath et al. 2010;Federrath & Klessen 2012;Burkhart et al. 2015a). Burkhart et al. (2015b) and Imara & Burkhart (2016) studied nearby molecular clouds, and report H I N-PDFs with a log-normal shape, without any power-law tail. Rebolledo et al. (2017) studied the Carina and Gum 31 molecular complex, where the H I N-PDF also shows a log-normal shape.
To investigate the transition of atomic to molecular hydrogen in more detail, we examine the hydrogen content with HISA measurements in detail for GMF38.1-32.4a (GMF38a, Ragan et al. 2014). With a velocity range between 50 and 60 km s −1 (Ragan et al. 2014), GMF38a is at a median distance of 3.4 kpc from the Sun (Galactocentric distance ∼5.9 kpc) estimated from the Bayesian Distance Estimator tool (Reid et al. 2016). The top panel of Fig. 1 shows the integrated 13 CO emission from the GRS survey (Jackson et al. 2006). Our goal is to study the kinematics of this GMF in the molecular and atomic hydrogen traced by 13 CO emission and HISA, respectively. Furthermore, we analyze N-PDFs for the atomic and molecular hydrogen, and compare their properties. HI integrated intensity, [50,60] Jackson et al. 2006) integrated intensity contours in the range v LSR = 50−60 km s −1 overlaid on integrated H I emission in the same velocity range. The black ellipses show the "off-positions" whose spectra are shown in Fig. 5 and the black "×" signs mark positions whose spectra are show in Fig. 6. Bottom panel: same 13 CO integrated intensity contours overlaid on 1.4 GHz continuum emission from the THOR survey (Wang et al. 2018). The contours in the top panel indicate integrated 13 CO emission levels of 5, 10, 20, and 30 K km s −1 . The contours in the bottom panel indicate integrated 13 CO emission levels of 5 K km s −1 for reference. The dashed box in the top panel outlines the region that is discussed in the following sections and shown in Figs. 7,9,11,14,and 19. (Beuther et al. 2016). Each pointing was observed for 4 × 2 min to ensure a uniform uv-coverage. The spectral window for the H I 21 cm line was set to have a bandwidth of 2 MHz (∼400 km s −1 ) and a spectral resolution of 3.91 kHz (∼0.82 km s −1 ). The data calibration was done with CASA 1 (McMullin et al. 2007). The flux and bandpass were calibrated with the quasar 3C 286. J1822-0938 was used for the phase and gain calibration (see also Beuther et al. 2016).

Observations and methods
To recover the large scale structure, we combined the C-configuration data with the H I Very Large Array Galactic Plane Survey (VGPS, Stil et al. 2006), which consists of VLA D-configuration data combined with single-dish observations from the Green Bank Telescope (GBT). We subtracted the 1 http://casa.nrao.edu; version 4.1.0. continuum in the visibility datasets (with the CASA command uvcontsub), and used the multiscale CLEAN in CASA 2 to image the three adjacent tiles of continuum-subtracted C-configuration data together with D-configuration data. A pixel size of 4 , a spectral resolution of 1.5 km s −1 , and a robust weighting value of 0.45 were used. The resulting images, which have a synthesized beam between 20 and 40 over the entire coverage of the THOR survey, were all smoothed to a common resolution of 40 . The images were further combined with the VGPS images (D+GBT) using the task "feather" in CASA to recover the large-scale structure. We compared the flux of the combined H I data with the single dish GBT data from VGPS. The flux agrees with each other within 5.7%. Considering that the typical absolute flux The cold absorbing cloud (HISA) with temperature T HISA is surrounded by emitting clouds with temperature T fg and T bg . Behind the H I clouds, several continuum sources can be situated, either diffuse or discrete (marked with a star). calibration uncertainty for the VLA at 1.4 GHz is ∼5% (Beuther et al. 2016), it is reasonable to conclude that our combined H I data fully recover the extended emission. The noise level in the line-free channel is about 4 K per 1.5 km s −1 .
Additionally, the THOR C-configuration only H I line with the continuum data (Beuther et al. 2016) are used to measure the H I optical depth towards bright continuum sources in the background. The THOR+VGPS 1.4 GHz continuum data (VLA C+D+GBT, Wang et al. 2018) are employed to estimate the diffuse continuum emission in the background. By comparing the flux density of the known SNRs (Green 2014), Anderson et al. (2017) showed that the flux retrieved from the combined continuum data is consistent with the literature. Thus, the continuum data also recover the extended emission.

H I self absorption
The integrated H I emission over the velocity range 50-60 km s −1 , shown in the top panel of Fig. 1, reveals diffuse emission covering a larger area than the 13 CO emission. The strongest H I emission does not coincide with 13 CO emission, but an anti-correlation between the H I and 13 CO emission is suggested. Our analysis in the following section shows that this anti-correlation is due to the HISA: the cold atomic hydrogen absorbs the emission from an emitting atomic hydrogen cloud in the background, that is, H I self absorption. The terminology "H I self absorption" can be misleading. The emission and absorption processes can occur in the same cloud, but it is possible that the H I emission originates from a distant background cloud, which covers a similar or larger range of LSR velocities as the absorbing foreground cloud as illustrated in Fig. 2. A comprehensive discussion about the radiative transfer of HISA features can be found in Gibson et al. (2000), Kavars et al. (2003), Li & Goldsmith (2003), and Goldsmith & Li (2005). In general, we observe an emitting foreground and background H I cloud, which have spin temperatures, T fg and T bg , respectively. The cold, absorbing H I cloud can be located between these two emitting clouds, having the spin temperature, T HISA . Furthermore, we observe 1.4 GHz continuum emission, which can be a diffuse Galactic component or arise from discrete strong sources. For simplicity, we assume that the continuum emission is situated in the background. In this, we will exclude the possibility of strong, discrete continuum sources and consider only the weak diffuse continuum background when estimating the HISA properties. In Sect. 4.3, we will utilize strong continuum sources to determine the optical depth of the atomic hydrogen, which can help us to constrain the spin temperature of HISA. Following the equation of radiative transfer in Rybicki & Lightman (1979), the T-on T-off Fig. 3. Sample spectrum showing a prominent HISA feature around v LSR ∼ 55 km s −1 . The actual H I spectra is shown in black (T on ) and the estimated background emission using a second order polynomial fit (see Sect. 2.3) is shown in blue (T off ). measured on and off position brightness temperatures of the line above the continuum at a certain velocity are: where τ fg , τ bg , τ HISA are the corresponding optical depths of each component shown in Fig. 2 and T cont is the continuum brightness temperature. During the data reduction, we subtract the continuum emission from the H I visibility data (see also Sect. 2), which is indicated by the last term (−T cont , see Sect. 2). An example spectrum illustrating T on and fitted T off is shown in Fig. 3. Assuming on and off spectra share the same T cont and calculating the difference, we get: This equation can be further simplified by introducing the dimensionless parameter p (e.g., Feldt 1993;Gibson et al. 2000): That means for p = 1, there is no foreground emission and for p = 0.5, the foreground and background emission are equal. Measuring p is difficult and it usually has to be assumed. As a last simplification, we assume that the foreground and background clouds are optically thin and therefore τ fg and τ bg are small (Gibson et al. 2000). This results in: (see Sect. 2). With these observable quantities we can estimate the properties of the HISA using Eq. (4). Specifically, we derive the cloud spin temperature T HISA and the optical depth τ HISA . We cannot disentangle the spin temperature and the optical depth. Figure 4 shows an example of the relation between τ HISA and T HISA for the region close to the strong continuum source G34.256+0.146 of T off = 103 K, T on = 50 K, and T cont = 17 K. The different colors represent different values of p from 0.4 to 1. The black vertical line indicates the temperature of the cosmic microwave background radiation (Fixsen et al. 1996;Fixsen 2009; Planck Collaboration XIII 2016) T = 2.7 K. Since the L band continuum background emission in the Galactic plane is larger than 0 (bottom panel Fig. 1), the spin temperature must be larger than 2.7 K. The general interpretation of the curves is that a higher optical depth is necessary to produce the assumed absorption feature for higher spin temperatures. This dependency becomes very steep at a certain point, depending on p. In Sect. 4 we shall discuss the relations among τ, spin temperature, and p in detail.

Background estimate to measure T off
To extract a reliable HISA feature, it is crucial to know the background H I emission. Different methods can be found in the literature to perform this task. The first one is to use absorptionfree H I emission spectra, located close to the absorption feature (e.g., Gibson et al. 2000), referred to as "off-positions". This method assumes that the H I background emission stays spatially constant over the absorption feature, which might be true for spatially small HISA features. We tested this method by extracting H I spectra from five different regions, which are labeled as "Off 1" to "Off 5" in Fig. 1. These off-positions were chosen to be regions without significant 1.4 GHz continuum emission and without 13 CO emission. Furthermore, these regions did not show significant self absorption features at the velocity range of v LSR = 50−60 km s −1 . The corresponding spectra are presented in Fig. 5. These spectra reveal large variations, which makes it difficult to use them as a common off-position. The second method utilizes a fit to the absorption free channels of the H I spectra to get T off . This method is applied Off 1 Off 2 Off 3 Off 4 Off 5 mean Fig. 5. Selected H I emission spectra around the GMF38a, which can be used as "off-positions". The regions we used for the extraction are shown in Fig. 1. The black line shows the mean spectrum of all five off-spectra and the gray shaded area indicates the velocity range of the HISA feature (v LSR = 50-60 km s −1 ). frequently, using different functions to fit the H I emission, for example, linear fits (e.g., Minn 1981;Montgomery et al. 1995;McClure-Griffiths et al. 2006) or polynomials with different order (e.g. Myers et al. 1978;Bowers et al. 1980;Shuter et al. 1987;Kavars et al. 2003;Li & Goldsmith 2003). Figure 6 presents five different spectra from different positions indicated in Fig. 1. We used second and fourth order polynomials to fit the spectra for the velocity range around the HISA (v LSR = 40−50 and 60−70 km s −1 ). A polynomial function of the third order gave very similar results as the polynomial of the second order, thus, for clarity, we do not show it here.
It is difficult to estimate which function is more suitable to fit the H I spectra. For regions without absorption, we expect that the fitted spectra represents the actual spectra. Spectrum 3 in Fig. 6 shows such a region and both functions represent the H I spectra well. Spectra 2 and 4 in Fig. 6 represents H I absorption features and the difference between the second and forth order polynomial is small. In contrast to this, Spectra 1 and 5 in Fig. 6 reveal a large difference between the fit functions. The fourth order polynomial fit is much higher (∼50 K) than the second order polynomial for Spectrum 1, but much lower than the second order polynomial for Spectrum 5. It is not obvious which function describes the H I spectra more accurately. However, the fourth order polynomial might overestimate the actual spectra as steep slopes within the fitted velocity range would result in high values for the fitted spectra. In contrast to this, the second order polynomial might underestimate the H I emission for this spectra. Hence, the fourth order polynomial might be an upper limit and the second order polynomial might be a lower limit. We will use both functions in the following analysis to estimate the uncertainty of T off and to extract HISA.
Another method was to utilize the second derivative representation of the spectrum as described in Krčo et al. (2008). Krčo et al. (2008) demonstrated that HINSA feature would become dominant in the second derivative representation. We also tested this method. For narrow HISA spectra (such as Spectrum 2 in Fig. 6) the second derivative technique can recover the HISA spectra relatively well. However, for broad spectra (e.g., Spectrum 1 and Spectrum 5), the HISA spectra were filtered out by the method. Therefore, we do not use this method in our analysis. McCutcheon et al. (1978), Winnberg et al. (1980), and Andersson et al. (1991) Fig. 1. The black lines represent the H I spectra, the colors correspond to the HISA. The gray spectra indicate the mean of the five off-positions presented in Fig. 5. The blue and red dashed lines represent the fits of the background to the HISA spectrum for a polynomial of second and fourth order, respectively, using the velocity range of v LSR = 40−50 and 60−70 km s −1 for the baseline of the fit. The blue and red solid lines show the difference between fitted H I spectra and the measured H I spectra (T on−off in Eq. (4)) for a polynomial of second and fourth order, respectively. We fitted Spectra 1, 2, and 5 using a Gaussian function, shown by the black solid curve on top of the HISA spectra. fit the spectra and to derive the off spectra. However, as pointed out by McCutcheon et al. (1978), this method only works if the absorption feature is very narrow, the shape of the total spectrum is simple, and can be represented by a few Gaussians. On the other hand Dénes et al. (2018) used a machine learning method, the Autonomous Gaussian Decomposition algorithm (AGD; GAUSSPY) developed by Lindner et al. (2015), to decompose the emission spectra while masking out the HISA features to derive the off spectra. However, they only need to deal with 47 spectra and it is not clear how well this method would work for our region with ∼2 million spectra. It is definitely worthwhile to test the machine learning method in the future, but it is beyond the scope of this paper.
The mean spectrum of the five off-positions shown in Fig. 5 is shown in gray in Fig. 6 as well. While the mean off-position represents the H I spectra of T off well in some cases (e.g., Spectrum 2 in Fig. 6), but in general it does not (e.g., Spectrum 1 or 5 in Fig. 6). There are apparent variations in the H I spectrum at velocities outside of the HISA feature, such the assumption of a uniform H I emission background does not seem to be adequate. The mean spectrum method was also discussed by Myers et al. (1978) and McCutcheon et al. (1978), and they concluded that it is not suitable for HISA studies for the same reason. Hence, we will use the polynomial fit method to extract the HISA feature rather than using a mean off-spectrum.
The noise of the extracted HISA spectra measured at the velocity ranges in v LSR = 40−50 and 60−70 km s −1 is ∼8 K per 1.5 km s −1 for the second order polynomial fit, ∼5 K for the forth order fit. Since a forth order polynomial function can always fit the small bumps in the spectra better than the second order polynomial function, it is no surprise the HISA spectra extracted with the forth order fit have smaller noise.

HISA extraction
With the methods described in Sect. 2.3 we can estimate T off in Eq. (4). Using this information, we can measure the depth of the absorption feature (T off − T on ). To analyze the absorption features, we use a Gaussian curve to fit them. This allows us to study the exact depth of the absorption features and their kinematics. Fits that result in a peak intensity higher than 25 K (∼3 and 5 times the noise level of the HISA spectra extracted with second and forth oder fit, respectively), 1.5 km s −1 < full width half maximum (FWHM) linewidth (= channel width) <20 km s −1 are considered as good fit. The peak values of the fitted Gaussian curves are shown in Fig. 7 for different ways of estimating T off . The absorption depth of the HISA shows values between ∼30 and ∼80 K.
As discussed in Sect. 2.3, different methods to estimate T off resulted in differences for the peak value. Those based on the spatially averaged off-positions (bottom panel in Fig. 7) show unrealistically large values at the edges as well as in the presence of continuum sources. In the first case, as the Galactic H I emission drops at the edge, this method falsely identifies the edge regions as absorption features. In the second case, the method picks up the strong H I absorption towards the bright continuum emission in the background, as the structure of the continuum emission is clearly visible (bottom panel in Fig. 7).
Fitting T off of each H I spectra with a polynomial function circumvents these problems. Hence, the polynomial fitting method is more appropriate for our analysis and in the following we focus on the determination of T off from HISA-free channels in the T on spectrum.
We found large differences in T off estimates made with polynomials of second and forth order, in particular towards regions around l = 35 • and l = 33.8 • (Fig. 7). Using a forth order polynomial for the background estimate, we found significantly more absorption due to a possible overestimate of the background emission (Sect. 2.3). Other regions are not affected significantly by the choice of the fit function, for instance, the regions around spectrum 1 or 2. The uncertainties of the HISA properties introduced by different polynomial function fittings will be discussed in Sect. 4.7.

Kinematics
In this section, we discuss the kinematic properties of the HISA features. In contrast to the absorption depth, the peak velocity is not significantly affected by the choice of the fit function. Therefore we present here only the velocity structure using the second order polynomial for the determination of T off . The velocity structure revealed by the fourth order polynomial is similar.
To compare the kinematics of the HISA feature with those of the 13 CO data, we resampled both the 13 CO and H I data to the same spatial and velocity resolution (pixel size of 22 and spectral channel width of 1.5 km s −1 ) and applied Gaussian fitting to the data sets pixel by pixel to determine the peak velocity and FWHM linewidth. Due to limited sensitivity and spectral resolution of our H I data, we cannot disentangle multiple velocity Top and middle panels: method using a polynomial fits with second and forth order, respectively. Bottom panel: method using a spatially averaged off spectrum shown in Fig. 5. The white contours represent the integrated 13 CO emission at levels of 5 K km s −1 for reference. The circles from left to right marked the same positions as in Fig. 1 where Spectra 1 to 5 shown in Fig. 6 are extracted, respectively.
components. Therefore, we chose to fit both the 13 CO and HISA data with a single Gaussian component for simplicity and consistency. The Gaussian fitting to the HISA spectra is described in Sect. 2.4. For the 13 CO data, fits that result in a peak intensity higher than 2 K (∼10 times noise level of the 13 CO datacube at 1.5 km s −1 channel width), 1.5 km s −1 < FWHM linewidth <20 km s −1 are considered good fits for the 13 CO data. A few examples of 13 CO spectra are shown in Fig. 8 to demonstrate the fitting results. The peak velocity maps are presented in Fig. 9. The 13 CO peak velocity shows that the majority of the filament is at ∼54-58 km s −1 . The western part of the 13 CO filament around l = 34 • is slightly red-shifted compared to rest of the filament, and has a velocity of ∼57-58 km s −1 . For this region, the peak velocities revealed by the HISA feature are at ∼54-55 km s −1 , which is about 3-4 km s −1 lower than the 13 CO velocity. This can also be seen in the right panel of Fig. 10, where we present a histogram of the HISA and 13 CO peak velocities for the eastern and western side of the GMF, respectively. In contrast to this, the eastern side of the filament around l = 36.5 • shows a close correlation of the peak velocities as shown in the left panel of Fig. 10. We will discuss this effect in Sect. 4.1. The western and eastern region we refer to here are both under the Galactic coordinate framework.
The linewidths of both 13 CO and HISA are shown in Fig. 11. The linewidth for the 13 CO emission shows extremely high values of more than 10 km s −1 for the central region of the filament A139, page 7 of 21  The positions corresponding to each spectrum are shown in Fig. 1. For Spectra 1, 4, and 5, we can fit the spectrum with a Gaussian component to the 13 CO spectrum successfully, we show the Gaussian fitting result with a black solid curve in each panel. For Spectra 2 and 3, the Gaussian fitting results did not meet the criteria we described in Sect. 3.1 and were ignored. around l = 35 • . However, these values have to be treated cautiously as the 13 CO emission exhibits multiple lines in this region and we only use a single Gaussian function to fit them. On the eastern side of the filament around l = 36.5 • we find mostly single components for the 13 CO emission and the linewidth is ∆v ∼ 2 to 4 km s −1 . The linewidth of the HISA feature shows values ∆v ∼ 3−6 km s −1 for the whole filament. This can also be seen in the left panel of Fig. 12, where a histogram of the linewidths is shown. The linewidth distribution of the 13 CO emission is systematically higher in the western region of the filament, whereas the linewidth of the HISA feature is similar to the eastern region. This result has to be treated cautiously as we see multiple components for the 13 CO line within the western region, which increases the linewidth.
To estimate the contribution of the non-thermal component in the HISA features and the molecular emission, we assume the relation σ nth = σ 2 obs − σ 2 th − σ 2 res , where σ obs is the measured velocity dispersion, σ th is the radial component of the thermal velocity dispersion, and σ res is the velocity dispersion introduced by the channel width of our data (1.5 km s −1 ). Assuming a Gaussian line profile with the FWHM linewidth ∆v obtained from the aforementioned fittings we get σ obs = ∆v/ √ 8ln2, and σ res = 1.5/ √ 8ln2 km s −1 . Assuming a Maxwell-Boltzmann velocity distribution, σ th = k B T k /(µm H ), where k B is the Boltzmann constant, µ is the molecular weight, m H is the mass of the hydrogen atom, and T k is the kinetic temperature. For HISA, we assume T k = T HISA = 40 K. The peak excitation temperature of 13 CO we derived for the filament is ∼25 K (see Sect. 3.2.1) which agrees to what Roman-Duval et al. (2010) found for the GRS molecular clouds. If we assume that the brightest 13 CO emission is coming from regions where the line is optically thick and thermalized, then this excitation temperature will be comparable to the actual gas kinetic temperature, which must therefore have a value close to 20 K. Furthermore, simulations of molecular clouds in a variety of different radiation fields show that the CO mass-weighted temperature of the gas is typically in the range 10-30 K, with very little dependence on the local environment (Peñaloza et al. 2018). Therefore, we assume a uniform T k of 20 K for 13 CO. Assuming spatial isotropy, the Mach number is estimated to be √ 3σ nth /c s , where c s is the sound speed estimated using a mean molecular weight µ = 2.34 for the molecular cloud and µ = 1.27 for the H I cloud (Allen 1973;Cox 2000). The distribution of the Mach number for HISA and 13 CO across the whole filament (Fig. 13) shows that the 13 CO emission is dominated by supersonic motions, whereas HISA features have a much smaller Mach number in general with a significant fraction of the HISA features, that is, a significant fraction of the CNM being at subsonic and transonic velocities. If the HISA and 13 CO lack spatial isotropy, we could be overestimating the Mach number by a maximum factor of √ 3.

H 2 column density
We use the 13 CO(1-0) data from the Galactic Ring Survey (GRS, Jackson et al. 2006) to derive the column density and kinematic properties of the molecular gas of the filament. The data have an angular resolution Θ = 46 and a velocity resolution of ∆v = 0.21 km s −1 . We can estimate the column density of the 13 CO molecule including opacity correction with the equation (Wilson et al. 2010): where N( 13 CO) is the column density of the 13 CO in units of cm −2 , dv is the velocity in km s −1 , τ 13 is the 13 CO opacity, T MB is the main beam brightness temperature, and T ex is the excitation temperature. We do not have a direct measurement for the excitation temperature, and we assume that the excitation temperatures of the 12 CO and 13 CO are the same. Assuming that the 12 CO line is optically thick, we used the 12 CO(1-0) data from the FOREST unbiased Galactic plane imaging survey with the Nobeyama 45 m telescope (FUGIN; Umemoto et al. 2017) to estimate T ex following the formula (Wilson et al. 2010): where T mb ( 12 CO) is the peak main-beam brightness temperature of 12 CO(1-0) line. We calculated T ex for regions where T mb ( 12 CO) is above the 5σ level (2 K), which results in a T ex between ∼5 and 25 K. For regions where T mb ( 12 CO) is below the 5σ level (2 K), an upper limit of 5 K for T ex is applied. Following Eq. (B.6) in Schneider et al. (2016), we can derive τ 13 from T ex and T mb ( 13 CO). For regions where T mb ( 13 CO) is above 5σ level (1.05 K), τ 13 is estimated to be between ∼0.1 and 3. For regions where T mb ( 13 CO) is below 5σ level (1.05 K), an upper limit of 0.1 for τ 13 is applied. The majority of the region along the filament has a τ 13 1.0, only the region around (l = 34.30, b = 0.18) and a few pixels around (l = 35.55, b = 0.0) have a τ 13 > 2.0. For the Galactocentric distance of 5.9 kpc of GMF38a, the fractional abundance of 13 CO relative to H 2 is estimated A139, page 8 of 21   to be 2.9 × 10 −6 following the relations reported by Giannetti et al. (2014). With this abundance, we converted the N( 13 CO) to N(H 2 ). Zhang et al. (2019) employed a similar method (uniform CO abundance, T mb ( 12 CO) → T ex ) to estimate the column density and mass of a sample of GMFs (they estimated the mass of GMF38a to be ∼3.8−11.0 × 10 5 M ), and discussed in detail the uncertainties brought in by T ex and the 13 CO abundance. According to their results, the 1σ uncertainty of the column density estimated from 13 CO is ∼50%. Simulations show that the abundance of 13 CO could vary and we could underestimate the column density for the low column density part by ∼40% (N( 13 CO) < 10 16 , or N(H 2 ) < 3.4 × 10 21 in this paper, Szűcs et al. 2014Szűcs et al. , 2016.  Fig. 14).

CNM column density from HISA measurements
Besides the kinematics, the column density of H I is also a critical cloud parameter. To estimate the column density of the HISA feature we use the equation given by Wilson et al. (2010): where T S and τ are the spin temperature and optical depth, respectively. For the CNM traced by HISA, T S = T HISA and τ = τ HISA (Eq. (4)). However, as mentioned in Sect. 2.2 we measure the spin temperature and the optical depth together and disentangling them is difficult. Hence, we assume a constant spin temperature over the cloud and calculate the optical depth using Eq. (4). As we have no measurement for p, this value is A139, page 10 of 21  Fig. 13. Histograms of the Mach number of the HISA and 13 CO emission, shown in black and red, respectively. difficult to estimate. Considering GMF38a is at the near side of the Milky way, we assume a value of p = 0.9 in the following and discuss the corresponding uncertainties in Sects. 4.2 and 4.7. We will discuss the uncertainty of column density brought in by the method we choose to estimate the background temperature T off in Sect. 4.7 We integrate between 50 and 60 km −1 and derive the column density map shown in Fig. 14, assuming for the HISA feature a spin temperature T S = 40 K, p = 0.9 and using a second order polynomial to estimate the background temperature T off . Larger spin temperatures do not change the structure of the column density map significantly, but will increase its value everywhere.
The column density peaks in the 13 CO map do not coincide well with the column density peaks of the atomic hydrogen. As shown in Fig. 1, the highest peak in the 13 CO (around l = 34 • ) coincides with a strong continuum source and hence makes the determination of the HISA feature at this position impossible. However, we use this continuum source to constrain the optical depth, which we present in Sect. 4.3. The highest column density peak for the atomic hydrogen can be found in the eastern area of the filament (l = 36.5 • ). In this region, the H 2 is diffuse and its column density is low. Another CNM column density peak can be found in the center of the filament around l = 35.4 • , b = + 0.3 • . This CNM feature has almost a round shape and only a very weak counterpart in the 13 CO emission.
Assuming a typical CNM thermal pressure of P CNM / k ∼ 4000 K cm −3 (Heiles 1997;Jenkins & Tripp 2011;Goldsmith 2013), and T K = T HISA = 40 K, we can estimate the size of the CNM along the line of sight by dividing the column density by P CNM /k/T K . Depending on the column density, the line of sight size is estimated to be ∼0.5-3 pc, which is much smaller than the width and length of the filament (∼25 and ∼230 pc, respectively). Arguing the other way round, if the line of sight size of the CNM is similar to the width of the filament (∼25 pc), the thermal pressure would be significantly lower, and the CNM could be under-pressured or the pressure may be dominated by some other (magnetic or turbulent) component.

Atomic gas column density from H I emission
Since HISA traces only the CNM, it is likely that this component is surrounded or even mixed with a warm component. We also derive the column density traced by the H I emission (between 50 and 60 km s −1 , 32.65 • < l < 37.27 • and |b| ≤ 1.25 • ), which traces both the CNM and WNM. Following the method described in Bihr et al. (2015), we derived the mean H I optical depth map from the strong continuum sources in the background (G33.498+0.194, G33.810-0.189, G33.915+0.110, G34.133+ 0.471, G35.053-0.518, G35.574+0.068, G35.947+0.379, G36.056+0.357, and G36.551+0.002; Wang et al. 2018). Following the method described in Bihr et al. (2015), the optical depth measured towards the continuum source is: where T on, cont is the on-continuum-source brightness temperature, and T off, cont is the off-continuum-source temperature. Since we use the THOR C array data to calculate τ, the smooth, largescale structure is mostly filtered out (Beuther et al. 2016). We can neglect the off emission T off, cont and simplify Eq. (8) to: For channels with a T on, cont value smaller than 3 times the rms, we use the 3σ value to get a lower limit of τ. The mean optical depth varies between 1.1 and 1.9 from 50 to 60 km s −1 . The optical depth corrected spin temperature is T S = T B /(1 − e −τ ), where T B is the brightness temperature of the H I emission. The optical depth corrected atomic hydrogen column density is calculated with Eq. (7). Since the absorption features towards these continuum sources between 50 and 60 km s −1 often saturates, the optical depth we derived is a lower limit as shown in Fig. 18 and we are underestimating the H I column density. The column density we derived from H I emission is a combined result from the far (10.3 kpc) and near side (3.4 kpc) due to the kinematic distance ambiguity. Assuming the atomic gas in the Galactic plane is approximately axisymmetric with respect to the Galactic center (Kalberla & Dedes 2008), the atomic gas at near side and far side that are at the same Galactocentric distance share the same density distribution in the vertical direction. We used the average vertical density profile described by Lockman (1984;see Eq. (5) and Table 1 in their paper) to estimate how much gas is at near distance for each line of sight and derived the column density map of H I emission at 3.4 kpc shown in Fig. 14.
By comparing the column density maps of CNM and H 2 , we produce the CNM-to-H 2 ratio map and show it for the Eastern and Western regions in Fig. 15. We masked out regions outside the column density threshold contours shown in Fig. 14. In both subregions, the CNM-to-H 2 ratio varies between ∼0.5 and 25% with a median value of ∼9%. Figure 15 shows that the outer layers of the filament have high CNM-to-H 2 ratio, while the inner regions show lower CNM-to-H 2 ratio. This change from the outside to the inside of the cloud can be interpreted as signature of the conversion of atomic to molecular gas with increasing density. Zuo et al. (2018) studied HINSA towards nearby clouds and found a much lower ratio between 0.2 and 2%. Figure 16 shows the surface density comparison between the atomic hydrogen (CNM+WNM) and the total gas (CNM+WNM+H 2 ). The surface density of the atomic hydrogen rises up to ∼14−23 M pc −2 (∼1.8−2.9 × 10 21 cm −2 ) and then saturates to an almost flat distribution. This turnover is at lower values than that found by Bihr et al. (2015) towards W43 (50-80 M pc −2 ) but it is still higher than the 10 M pc −2 observed towards nearby clouds (Lee et al. 2015) and predicted by models (e.g., Krumholz et al. 2008Krumholz et al. , 2009Sternberg et al. 2014). Such higher than predicted H I column densities can be explained by the clumpy nature of the ISM with several H I-to-H2 transitions along the line of sight (Bialy et al. 2017b

Mass estimate
As we know the column density and the distance to the cloud (∼3.4 kpc), we can directly estimate the mass of H I and H 2 gas. We do so for three different regions, the "full filament" (red polygon in Fig. 14), the "eastern region" (eastern green dashed polygons in Fig. 14) and the "western region" (western green dashed polygons in Fig. 14). Table 1 summarizes the mass measurements. The molecular hydrogen mass for the entire filament is ∼3.6 × 10 5 M and the CNM mass traced by the HISA is significantly less, showing values of 3.4 × 10 3 -9.5 × 10 3 M , depending on the assumed spin temperature. Furthermore, we studied the cold atomic to the molecular hydrogen mass ratio M(CNM)/M(H 2 ). For the entire filament, this value is between 1 and 3%, again depending on the assumed A139, page 12 of 21 Y. Wang et al.: HISA study towards a GMF Full filament 3.6 × 10 5 3.4 × 10 3 9.5 × 10 3 3% 2.3 × 10 5 (6.1 × 10 5 ) (a) Eastern region 7.9 × 10 4 1.1 × 10 3 3.4 × 10 3 4% 5.0 × 10 4 Western region 1.1 × 10 5 1.1 × 10 3 2.9 × 10 3 3% 7.1 × 10 4 Notes. (a) This mass is calculated including all H I emission between 50 and 60 km s −1 , 32.65 • < l < 37.27 • , |b| ≤ 1.1 • (bottom panel in Fig. 14 spin temperature. However, this ratio has to be treated cautiously. The HISA extraction method does not work reliably in the center of the filament due to strong continuum emission and we might miss some H I mass. The mass ratio for the smaller regions show slightly higher values (3 to 4%). The H 2 column density shows significantly higher values for the western region in comparison to the eastern region. In contrast to this, the H I column density reveals a prominent peak on the eastern side and hence the M(CNM)/M(H 2 ) ratio is lower for the western than that for the eastern region. Considering that 13 CO does not trace all the molecular hydrogen gas (Pineda et al. 2008;Goodman et al. 2009;Gong et al. 2018), the M(CNM)/M(H 2 ) ratio we derive could be just an upper limit. We further estimate the mass of the atomic component traced by H I emission. Within the same area (polygons in Fig. 14), the atomic gas traced by H I emission has lower mass as the molecular gas (Table 1). However, since there is no clear boundary of the GMF shown in the H I column density map (see bottom panel in Fig. 14), we can assume all H I emission between 50 and 60 km s −1 within 32.65 • < l < 37.27 • and |b| ≤ 1.1 • (same latitude range as the GRS coverage) is associated with the filament to obtain the mass. The mass is estimated to be 6.1 × 10 5 M , which is about ∼60 times larger than the mass estimated for the CNM traced by HISA (40 K). The mass of the atomic hydrogen is about 60% larger than the molecular hydrogen mass (3.6 × 10 5 M ), which makes sense since the molecular cloud is surrounded by a large reservoir of atomic gas.

Kinematics
For nearby galaxies, the ratios of the CO to H I linewidth is around σ HI /σ CO = 1−1.4 (Caldú-Primo et al. 2013;Mogotsi et al. 2016). The linewidth values found in these studies are approximately σ ∼ 6−12 km s −1 , which corresponds to ∆v FWHM ∼ 14−28 km s −1 for both the H I and CO lines. These measurements are done for the CO and H I emission over large regions (∼0.5 kpc), which can increase the linewidth due to superposition of different velocity components in the supersonically turbulent ISM (e.g., . Our Galactic HISA measurements of GMF38a show significantly smaller values for the linewidth of ∆v FWHM ∼ 2−8 km s −1 for CNM. The reason is that we observe a much smaller region and we are able to separate multiple components. Furthermore, they observe H I emission and the linewidth is dominated by WNM, whereas we observe cold H I absorption features produced by CNM. To study the linewidth ratio in detail, we determine this ratio for the eastern region, which is indicated in Fig. 11. We focus on this region as it is not significantly affected by multiple component line spectra. A histogram of the H I/ 13 CO ratio is shown in Fig. 17. The mean values for the linewidths are ∆v FWHM ( 13 CO) = 3.6 km s −1 and ∆v FWHM (H I) = 4.5 km s −1 .

The percentage of background emission -p
The percentage of background emission, parameterized with p, is difficult to estimate. Different assumptions can be found in the literature. For example, McClure-Griffiths et al. (2006) and Dénes et al. (2018) assume that p = 1 and 0.9 for the observed Riegel-Crutcher cloud, respectively, as the corresponding distance is small (∼125 pc). Rebolledo et al. (2017) studied the HISA features in the Gum 31 molecular complex with a simple two component assumption (p = 1). Li & Goldsmith (2003) studied the H I narrow self-absorption towards dark clouds in the Taurus and Perseus region. Since these clouds are located at high Galactic latitude away from the Galactic mid-plane, they can assume a simple Gaussian Galactic H I disk model (Lockman 1984) and estimate the factor p with the complementary error function.
Since the background and foreground H I emission occurs from warm and diffuse H I clouds, we do not expect fluctuations of this emission on small scales. Hence, the assumption of a constant p for the entire filament is reasonable. Furthermore, we can obtain a lower limit for p. As shown in Fig. 4, low values of p 0.4 are not feasible as the spin temperature would become smaller than the temperature of the CMB. As we further discussed in Sect. 4.4, low values of p 0.7 would also result an unrealistic low spin temperature.
For Galactocentric radius 7 R 35 kpc, Kalberla & Dedes (2008) Fig. 18. Spectrum of the H I optical depth towards the UCH II region G34.256+0.146 using the THOR data. For some channels, the absorption spectra saturates and the measured optical depth is a lower limit of τ = 3.5, which is indicated by the dotted line. The gray shaded area indicates the velocity range of the HISA feature (v LSR = 50−60 km s −1 ). n 0 = 0.9 cm −3 , and R n = 3.15 kpc, R = 8.5 kpc (IAU recommendations). Assuming n(R < 7 kpc) = n(7 kpc), we integrate the density along the line of sight of GMF38a (l = 35.5 • , b = 0 • ) and obtain the amount of gas in the foreground and background of GMF38a. Assuming optically thin emission, p is estimated to be 0.91, which agrees with our adopted value of 0.9.

Optical depth measurement toward a strong continuum source
We can use strong continuum sources to estimate the optical depth of the CNM. The UCH II region G34.256+0.146 (see Fig. 1) is an ideal candidate to perform this task as it is very bright (T cont (max) ∼ 1300 K) and slightly extended (d ∼ 70 ). We use the THOR data to extract the absorption spectrum towards the continuum peak and determine the optical depth and the lower-limit of the optical depth using Eq. (9). The optical depth spectrum is shown in Fig. 18. In the velocity range of the HISA feature, the absorption spectrum saturates and the determined optical depth τ = 3.5 is a lower limit. Furthermore, since the UCH II region is at the same distance as the filament (Anderson et al. 2014) and associated with the filament, there could be CNM that are behind the UCH II region and are not traced by the absorption spectrum. Therefore, the determined optical depth represents a lower limit. As explained in Sect. 2.2, the general HISA extraction method measures the optical depth together with the spin temperature and we are not able to disentangle them. However, the additional information from the absorption spectra towards the strong continuum source and the corresponding optical depth measurement allows us to overcome this problem. Figure 4 presents the optical depth as a function of the spin temperature for different values of p. The lower limit of the optical depth measurement is shown at τ = 3.5 using a black horizontal line. Assuming p = 0.9 reveals a spin temperature of T HISA ∼ 55 K. This is a bit higher than the assumed spin temperature for the column density determination presented in Fig. 14 (40 K). Since we measure the spin temperature close to an UCH II region, we expect rather high values.
A problem for HISA studies is that low level of background emission can be interpreted as an absorption feature. Studying very narrow absorption features, HINSA (e.g., Li & Goldsmith 2003;Goldsmith & Li 2005;Goldsmith et al. 2007;Krčo & Goldsmith 2010), can avoid this problem, since the steep absorption profiles of these HINSA features cannot be induced by two broad emission profiles on each side of the absorption feature. The broad HISA features we identified could be caused by two emission components. Fortunately, the optical depth information from the spectrum toward the strong continuum source helps to solve this problem. Since the optical depth is high (τ > 3.5) for the velocity range of the HISA feature (Fig. 18), we are confident that we actually indeed observe a HISA feature rather than missing H I emission. Furthermore, the correlation of the HISA feature with the 13 CO emission is another indicator of cold, dense H I.

Maximum spin temperature
As explained in Sect. 4.3, we can use strong continuum sources to measure the optical depth and therefore disentangle the spin temperature and the optical depth. However, this is only possible in a selected number of locations, that is, in the vicinity of a strong continuum source. In general we can only give the spin temperature as a function of the optical depth. Figure 4 shows that the function becomes very steep for certain spin temperatures. Hence, this shows that the maximum spin temperature will be reached for the case of large optical depth. This can also be seen by solving Eq. (4) for T HISA : Since T on−off is always negative, T HISA reaches an upper limit for a given T off and T cont when T on−off 1−e −τ has its minimum value, which occurs for τ → ∞. This means the maximal T HISA is: This equation depends on the assumption of the ratio of foreground and background emission, which is described by the factor p. For p = 1, the upper limit of the spin temperature reaches a maximum. We can use this information to calculate the upper limit of the spin temperature for each pixel in our map. However, we do not assume p = 1, but rather a more realistic value of p = 0.9. The result is given in Fig. 19. We focus the discussion on regions offset from strong continuum sources, since for strong continuum sources the attenuation of the continuum exceeds by far the contribution of self-absorption to the H I absorption spectrum. As expected, we see a clear anti-correlation of the upper limit for the spin temperature and the column density of the HISA feature. A weaker absorption feature (T on−off is always negative) will result in a higher T HISA , and a lower column density. The lowest values are found for the compact HISA feature in the center around l = 35.5 • with values around T HISA (max.) ∼ 25 K. Similar values can be found for the eastern region of the filament, whereas the western side of the filament shows in general higher values around T HISA (max.) ∼ 75 K. As this is only an upper limit for the spin temperature, we cannot directly infer the actual temperature. However, it is plausible that the H I spin temperature is higher on the western side of the cloud due to star formation activity and feedback processes, such as the prominent UCH II region. We will discuss this aspect further in Sect. 4.6.
For the H I column density determination in Sect. 3.2, we assumed a spin temperature of T HISA = 40 K. As seen in Fig. 19, this is higher than the upper limit of the spin temperature for ( Fig. 19. Maximum spin temperature (Eq. (11)) of the absorption features assuming p = 0.9. The black contours show the HISA column density map assuming T HISA = 40 K, p = 0.9, and the contour levels are 1.5, 3.5, and 5.5 × 10 20 cm −2 with a smoothing of 3 pixels. Regions with strong continuum emission are not reliable, such as SNR W44 and G34.256+0.146 (see Fig. 1).
certain cold regions. Hence for a few channels we cannot determine a column density for these regions, we exclude these pixels in these channels from our column density calculation.
Since we observe only small regions with T HISA (max.) < 40 K in only a few velocity channels, the column density calculation is not affected significantly. However, assuming a larger value for the spin temperature increases this effect and larger regions are affected, which would make the determined column density unreliable. Furthermore, as indicated in Eq. (11), the factor p also affects the value of T HISA (max.). If we take p = 0.7, the T HISA (max.) for the whole filament would drop ∼20 K. Regions around l = 35.5 • and in the eastern part of the filament would have a T HISA (max.) < 10 K, which is highly unlikely, since simulations find very little H I has a temperature 20 K (Glover & Smith 2016). Furthermore, previous HISA study towards nearby molecular clouds reveals a T S of 20-80 K (Gibson et al. 2000(Gibson et al. , 2005bDénes et al. 2018). Therefore, our assumption of T HISA = 40 K and p = 0.9 in the previous sections is reasonable.

Column density probability density functions (N-PDFs)
The column density maps derived in Sect. 3.2 can be utilized to determine the probability density functions of column densities (N-PDFs). We resampled all the column density maps into the same spatial resolution and constructed the N-PDFs. Figure 20 presents the N-PDFs in units of hydrogen atoms per square cm for the entire filament traced by HISA, H I emission and 13 CO. For the HISA feature, we assumed a spin temperature of T HISA = 40 K, p = 0.9, and used a second order polynomial to estimate the background emission, which are the same assumptions used to produce Fig. 14. We derived the H 2 column density from 13 CO (see Sect. 3.2.1), and converted it into the unit of hydrogen atoms per square centimeter for easy comparison with HISA and H I emission. We calculated the column density traced by H I emission with optical depth correction (see Sect. 3.3).
In the following, we first consider the quantification of the shapes of the observed N-PDFs and then the interpretation of the observed shapes. It has been argued that it may be necessary to consider "completeness" of the N-PDFs when defining the column density range that can be studied (Kainulainen & Tan 2013, see also Brunt 2015;Lombardi et al. 2015;Alves et al. 2017), although, it is unclear if such a requirement is meaningful when studying possibly turbulence-dominated gas (Körtgen et al. 2019). In our analysis, we define the completeness for the HISA and 13 CO data, but do not apply such requirement for the H I emission data that we expect to be clearly turbulence-dominated.  We define the completeness with the help of the lowest "closed contours" in the column density maps. These lowest closed contours are 1.5 × 10 20 cm −2 for the HISA data and 1.2 × 10 21 cm −2 for the 13 CO data and lead to regions that are marked with red polygons in Fig. 14. For H I emission, we include all the data within 32.65 • < l < 37.27 • , b < 1.1 • and use the column density level of 1.7 × 10 21 cm −2 to define the range over which the range is analyzed. We normalize all N-PDFs by the mean column density of the tracer (shown in Table 2). We first note that the N-PDFs of all components appear curved in log-log representation, even when only taking into account the data above the completeness levels. Therefore, we do not make an effort to quantify the N-PDFs with single powerlaw functions. Given the appearance of the N-PDFs, we decided to quantify their shapes through fits of log-normal functions. The N-PDF from H I emission shows a clear log-normal shape. We cannot identify a clear peak in the N-PDFs of HISA and 13 CO, but proceed with the log-normal fit nevertheless. The best fit for HISA is not well constrained and shows some excess over the fit at the high column density side. The best fit for 13 CO agrees well with a log-normal function over a wide range at low column densities, but shows an excess over it at higher columns; a powerlaw could also describe the high column densities relatively well. We fitted the power-law tail of the 13 CO N-PDF with a powerlaw (p(x) ∝ x −α ) from the optimal column density threshold N min , which results in the minimum Kolmogorov-Smirnov distance between the fit and the N-PDF. The fit was performed with the python package Powerlaw (Alstott et al. 2014). The fitted parameters of the log-normal functions and power-laws, when applicable, are listed in Table 2.
The N-PDF from the H I emission peaks at 2.2 × 10 21 cm −2 , corresponding to A V = 1 (Güver & Özel 2009). Studies towards nearby clouds have found that the N-PDFs from H I emission there peak around ∼1−2 × 10 21 cm −2 (Imara & Burkhart 2016), slight lower than we find here for our target. The widths of the fitted log-normal models vary among the tracers; the H I emission has the smallest width, followed by HISA and the 13 CO that has the widest width (see Table 2). The narrowness of the H I emission N-PDF indicates that H I emission is relatively smoothly distributed without large variations and substructure (Fig. 14). The width of H I emission N-PDF is similar to those seen towards nearby clouds (Burkhart et al. 2015b;Imara & Burkhart 2016). We note that the width of the HISA N-PDF is much larger than the error introduced by our HISA extraction method (see Appendix B); we consider it is robustly between the widths of H I emission and 13 CO.
We also examine the N-PDF of "all gas" (Fig. A.1), derived by adding together the column density maps from all three tracers (Fig. 21). We note that "all gas" does not, in fact, trace all the gas, since there is some amount of H 2 that 13 CO does not trace (Pineda et al. 2008;Goodman et al. 2009), a.k.a. CO-dark gas. This fraction could be as high as 26 to 79% of the total H 2 gas (Gong et al. 2018). The N-PDF of "all gas" (Fig. 21) can not be fitted with single log-normal or power-law function. Similar to what we applied to the 13 CO, we fitted the N-PDF with a log-normal function and the high-column density side with a power-law, but obviously, a single power law cannot account for the peak in the observed distribution.
Next, we discuss the N-PDFs of the two subregions of the filament. The N-PDFs for the eastern and western regions, indicated with dashed polygons in Fig. 14 Fig. 14), and the vertical solid line mark the mean column density. The red line shows the power-law fit with an index −α to the high column density tail. the molecular gas show very different shapes for two subregions. While the N-PDF for the eastern subregion can be described by a log-normal function, the N-PDF for the western subregion shows a clear power-law shape.
Recall that we pointed out in Sect. 1 that theoretical studies predicted that the shapes of the N-PDFs are depend on the physical processes acting within the cloud, a log-normal N-PDF indicates that turbulent dominates and a power-law indicates that gravity dominates (e.g., Federrath et al. 2010;Ballesteros-Paredes et al. 2011;Kritsuk et al. 2011;Federrath & Klessen 2013;Burkhart et al. 2015a). A possible explanation for the different N-PDF for different subregions is that the western subregion shows ongoing high-mass star formation activities, that is, UCH II region, indicating that the molecular cloud is dominated by gravitational collapse.
One possible reason we do not see a power-law tail in the N-PDF of the molecular gas in the eastern subregion is that the 13 CO could be frozen onto the dust grains in the densest and coldest part of the molecular cloud (e.g., Giannetti et al. 2014). In this case we could not recover the high density part, whereas in the western subregion the feedback effects from the UCH II have released the 13 CO molecules from the dust grains. As the line is not excited at very low density, altogether 13 CO only traces the intermediate density gas between the dense star forming cores, and so these do not show up in the N-PDF (see e.g., Ossenkopf 2002 for a similar effect in the ∆-variance.) We estimated the 13 CO depletion factor by comparing the molecular cloud column density we derived from 13 CO with the H 2 column density derived from the ATLASGAL dust continuum. Four ATLASGAL dense clumps (AGAL036.406+ 00.021, AGAL036.666-00.114, AGAL036.826-00.039, and AGAL036.839-00.022) that have velocities within the respective 13 CO velocity range are located in the eastern subregion (Urquhart et al. 2018). We smoothed the ATLASGAL image into the same pixel size and angular resolution calculated the column density of molecular hydrogen following Eq. (15) in Giannetti et al. (2014). The same dust opacity κ = 1.8 cm 2 g −1 as that used by Giannetti et al. (2014) is employed, and we assume the dust temperature from Herschel Hi-GAL results (Marsh et al. 2017). The 13 CO depletion factors we derived are between 2 and 4, with a mean value of 3, which agrees with the numbers (Giannetti et al. 2014) derived for infrared dark sources. Therefore, the high column density part of the N-PDF in the eastern subregion could be underestimated due to 13 CO depletion.

Evolutionary stages
As mentioned in Sect. 3.2, we observe a significant difference in the distribution of the H 2 column density for the eastern and western subregion of the filament. The eastern subregion shows a more diffuse column density distribution, whereas the western subregion reveals several high column density peaks. Hence, the western subregion reveals a power-law in the N-PDFs shown in Fig. 22 and the eastern subregion shows a log-normal shaped PDF. Furthermore, we see a luminous UCH II region within the western subregion, whereas the eastern subregion does not harbor significant continuum emission. The ATLASGAL survey (Schuller et al. 2009) shows strong extended submillimeter emission with a group of star forming clumps in the western subregion, whereas only a few unresolved continuum clumps are found with velocities within the respective 13 CO velocity range in the eastern subregion (Ragan et al. 2014;Urquhart et al. 2018). All these different tracers are indicative of active high mass star A139, page 17 of 21 A&A 634, A139 (2020) formation with strong feedback effects on the western side of the filament, while the eastern side shows no effects of feedback.
The kinematics of the HISA do not exhibit significant differences between the eastern and western subregion. However, comparing the HISA and 13 CO kinematics shows an interesting difference. The eastern subregion shows similar peak velocities for the H I and CO, whereas the western subregion reveals a difference of ∼4 km s −1 . Using the newly developed histogram of oriented gradients (HOG) tool, Soler et al. (2019) confirmed morphological correlation between the H I and 13 CO emission in velocity channels separated by 3-4 km s −1 towards the western subregion. The H I at ∼54 km s −1 is spatially correlated with 13 CO emission at ∼57.5 km s −1 (Fig. 15 in Soler et al. 2019). They further demonstrated by applying the HOG analysis on synthetic observations from MHD simulations that this velocity offset between 13 CO and H I could arise from more general molecular cloud formation conditions. Another explanation for this velocity offset could be the feedback from the expanding UCH II region G34.256+0.146. A significant amount of the 13 CO gas at ∼54 km s −1 has been driven away by the radiation from the forming high-mass star.
Studying the cold H I column density, we found higher column density peaks on the eastern side in comparison to a more diffuse H I column density structure on the western side. The maximum spin temperature also shows smaller values on the eastern side. This might be an indication of a younger, colder, and more dense H I cloud on the eastern side in comparison to a more evolved cloud on the western side. It is possible that the dense H I cloud on the eastern side of the filament is about to become a dense molecular cloud, forming high density peaks and subsequently form stars. However, further observations or simulations are needed to support this hypothesis.

Uncertainties for the determined HISA properties
Several factors introduce uncertainties to the determined properties of the HISA features. In the following we will discuss three contributions: the ratio of foreground to background emission -factor p, different methods to determine the background emission and the assumption of the spin temperature T HISA . Figure 4 also shows that for a fixed spin temperature T HISA , the larger the value of p is, the lower the optical depth τ is, hence the lower the column density is. Depending on the p value we choose (between 0.7 and 0.9), the column density can change by at most a factor of ∼2. This is shown in Fig. 23 for N-PDFs of the entire filament assuming three different values for p. The column density structure stays almost constant, but the actual values are shifted for different p values.
As discussed in Sect. 2.3, the chosen method for the background estimate can influence the absorption depth of the HISA feature and thus the column density. We showed in Sect. 2.3 that the best method is a polynomial fit to the H I spectra and interpolate for the HISA feature. The difference for a second or fourth order polynomial is negligible for most regions. As we can see in the bottom panel of Fig. 23, the column density structures of the second and fourth order polynomial are almost identical, the mean column density of the fourth order polynomial fit is ∼3% higher than the one of second order (Table 3). Thus we chose to use a second order polynomial fit for T off .
Another important factor is the assumption of a constant spin temperature for the cloud. This is obviously a poor assumption, but using additional measurements we can constrain the range of the spin temperature. The most important one is the upper limit for the spin temperature introduced in Sect. 4.4. Using this information, we can constrain the spin temperature to values of T HISA < 70 K for the majority of the HISA features. For the CNM mass estimation given in Sect. 3.3, we assumed a spin temperature of T HISA = 20 and 40 K. As seen in Fig. 23, the N-PDF does not change significantly, but higher spin temperatures result in larger column densities and masses (Table 3). In Sect. 3.3, we showed that the mass is about a factor of three larger for T HISA = 40 K with respect to that assuming T HISA = 20 K. Furthermore, we performed our HISA extraction method on model images (see Appendix B) to estimate the uncertainty brought in by the T off fitting method to the width of the N-PDF. The test results show that the "instrument broadening" introduced by the fitting method to the N-PDF is not very large and the log-normal width of the CNM we derive in the paper is robust.
In summary, it is difficult to quantify exactly the uncertainty of the CNM column density and mass. Considering all assumptions, the CNM mass has an uncertainty of a factor of 2-3 (dominated by the uncertainty of T S ), which is similar to the H 2 mass uncertainty based on 13 CO. However, we showed that the shape of the column density PDF is robust. A139, page 18 of 21 Y. Wang et al.: HISA study towards a GMF Table 3. Results of the fits to the N-PDFs of CNM for different parameters (Fig. 23)

Conclusions
We studied atomic and molecular gas components of the giant molecular filament GMF38a. The molecular component is traced via observations of 13 CO, whereas the cold atomic gas is observed via HISA features and H I 21 cm emission. The main results can be summarized as: 1. We extracted HISA by estimating the background emission with different methods. For the observed giant filament, a polynomial fit of second order to the neighboring channels of each HISA is the most reliable method from the different options we tested to estimate the background emission. 2. The HISA features and the 13 CO emission are spatially correlated. While in the eastern subregion they correlate well, the peak velocities of the two tracers show an offset of ∼4 km s −1 for the western subregion of the filament. 3. Although the linewidth ratio between HISA and 13 CO is around unity, the Mach number estimation shows that the 13 CO emission is dominated by supersonic turbulent motions, whereas a large fraction of the CNM is at subsonic or transonic velocities. 4. Assuming a spin temperature of T HISA = 40 K for the H I, we determined the column densities of the cold dense H I and compared them to the H 2 column density distribution derived from the 13 CO emission. The column density peaks do not coincide and the H I column density shows in general a diffuse structure. The H 2 column density reveals prominent peaks in the western subregion of the filament whereas the eastern subregion appears more diffuse. The CNM-to-H 2 column density ratio varies between 0.5 and 25% with a median value of ∼9%. The outer layer of the filament exhibits a higher ratio than that toward the cloud centers. The surface density of atomic hydrogen peaks at ∼14−23 M pc −2 (corresponding to ∼1.8−2.9 × 10 21 cm −2 ). Furthermore, the mass traced by HISA is only ∼3% of the molecular mass, and ∼1.6% of the mass traced by atomic H I emission. 5. Studying the N-PDFs, we are able to provide constraints on the physical processes within the cloud. The location of the HISA N-PDF is strongly dependent on the assumed parameters, but the width is not. The N-PDFs of CNM, H I emission, and H 2 can be described by a log-normal function, which indicates turbulent motions as the main physical driver. Only the H 2 column density of the western subregion within the filament is characterized by a high column density powerlaw structure, consistent with the observed star formation activity. Adding the column density maps of all three tracers (CNM, H I emission, and H 2 ) up, we generated a column density map of "all gas." 6. We hypothesize that the eastern and western sides of the filament represent different evolutionary stages. The eastern side represents an earlier stage, which is currently forming a dense molecular cloud out of the atomic reservoir. As we do not observe high molecular column density peaks, the H I shows low spin temperatures and high column densities. In contrast, the western side of the filament shows high H 2 column density peaks, signs of active star formation, such as UCH II regions, and, in general, a warmer and less dense atomic counterpart. These differences provide interesting constraints for theoretical models and simulations of the formation of molecular clouds.

Appendix B: CNM N-PDF width test
To test how much our HISA identification and fitting method broadens the N-PDF of the CNM, we tested our HISA extraction method on artificial H I emission maps with absorption features. We made a model H I map with a relative uniform peak T B ∼ 85 K with a noise of ∼4 K, which is similar to the real data we have. The H I spectra in the model map have a random linewidth varying between ∼30 and 45 km s −1 , and a random peak velocity between 47.5 and 52.5 km s −1 . We also made a model continuum map with a relative uniform T B ∼ 13 K with a noise of 0.7 K, and both values are similar to the diffuse continuum emission flux and the noise level of the real data. Both model images have the same pixel size (10 ) and beam size (40 ) as the real data.
For the first test, we generate artificial absorption features from a single column density value with T HISA = 40 K and p = 0.9, and added them into the model H I image. This column density value equals to the mean CNM column density of the GMF38a in Table 2 (log 10 (N(H)) = 20.39 cm −2 ). The artificial absorption features peak at 50 km −1 and have a linewidth of 5 km s −1 . Following the method described in Sects. 2 and 3, we extracted the HISA spectra, estimated the column density and construct the N-PDF shown in the top panel of Fig. B.1. The absorption features we put in the model image all have the same column density, so the modeled N-PDF is a delta function. Due to noise and the uncertainty we brought in through our HISA extraction method, the N-PDF we derived has a width of 0.19, and about 10% lower mean column density than the input one.
For the second test, instead of a single column density value, we generate artificial absorption features from a log-normal distribution column density with T HISA = 40 K and p = 0.9, and added them into the model H I image. The input log-normal distribution has a width of 0.15, and a peak at the mean CNM column density of the GMF38a in Table 2 (log 10 (N(H)) = 20.39 cm −2 ). Similarly, we ran our procedure on the model image and the N-PDF is shown in the bottom panel of Fig. B.1. The input width of 0.15 is broadened to 0.22, and the mean column density is also about 10% lower than the input one.
Both tests demonstrate that the "instrument broadening" contribution to the N-PDF of our method is not very large and the log-normal width of the CNM we derive in the paper is robust.