Distance of Hi-GAL sources

Aims. Distances are key to determining the physical properties of sources. In the Galaxy, large (>10 000) homogeneous samples of sources for which distance are available, covering the whole Galactic distance range, are still missing. Here we present a catalog of velocity and distance for a large sample (>100 000) of Hi-GAL compact sources. Methods. We developed a fully automatic Python package to extract the velocity and determine the distance. To assign a velocity to a Hi-GAL compact source, the code uses all the available spectroscopic data complemented by a morphological analysis. Once the velocity is determined, if no stellar or maser parallax distance is known, the kinematic distance is calculated and the distance ambiguity (for sources located inside the Solar circle) is solved with the H i self-absorption method or from distance–extinction data. Results. Among the 150 223 compact sources of the Hi-GAL catalog, we obtained a distance for 124 069 sources for the 5σ catalog (and 128 351 sources for the 3σ catalog), where σ represents the noise level of each molecular spectrum used for the line detections made at 5σ and 3σ to produce the respective catalogs.


Introduction
In the Galaxy, studying the early phases of (high-mass) star formation requires distance information but often faces problems of high extinction and large distances. The study of star formation laws, from small to large scales, is based on distances and this is at the heart of the Galactic plane large-scale surveys and high-resolution pointed observation studies. Among these surveys, the Herschel Galactic plane survey Hi-GAL (Molinari et al. 2010) delivered unprecedented multi-wavelength (70 µm, 160 µm, 250 µm, 350 µm, and 500 µm) observations of star-formation sites at different spatial scales, namely from core to clump scales (Elia et al. 2017). However, the derivation of general conclusions regarding Galactic star-formation laws requires the determination of distances for a large number of sources.
Many studies have presented results for compact sources in the Galaxy using available molecular data to determine the radial velocity of sources with respect to the local standard of rest (LSR), and then to derive their kinematic distance using a model of the Galactic rotation curve (Russeil et al. 2011;Ellsworth-Bowers et al. 2015;Wienen et al. 2015;Whitaker et al. 2017;Zetterlund et al. 2017). Recently, Gaia (Gaia Collaboration 2000), measuring stellar parallaxes, revolutionized our view of the distribution and distances of sources in our Galaxy (Cantat-Gaudin et al. 2018;Zucker et al. 2019;Yan et al. 2019;Lallement et al. 2019) but this holds for a limited volume around the Sun (heliocentric distance up to 3 kpc in the Galactic plane) and for main sequence stars which do not suffer from high extinction (up to A V 3.5). Indeed, several methods could be used to determine the distance of Hi-GAL sources with Gaia data. One can determine the distance to the parental molecular cloud from the line-of-sight extinction, but as underlined by Yan et al. (2019) and Lallement et al. (2019) this is limited to extended clouds and to distances up to 3 kpc. One could determine the distance of any associated cluster, but looking at Cantat-Gaudin & Anders (2020) the mean Bayesian Gaia distance of the open clusters (with −2 • < b < 2 • ) is 2.84 kpc (maximal distance reached being 9.75 kpc). Finally, one could determine the distance of young stellar objects (YSOs) embedded in

Sample and data used
The Herschel infrared survey of the Galactic plane, Hi-GAL (Molinari et al. 2010), is a complete survey of the Galactic plane performed in five infrared photometric bands centered at 70, 160, 250, 350 and 500 µm. After band merging, 150 223 compact sources have a spectral energy distribution (SED) eligible for gray-body fit (Elia et al., in prep.) but lack the distance information needed to evaluate their physical properties. A first version of the catalog is presented in Elia et al. (2017) for 100 922 sources located in the inner Galaxy (−71 • < l < −67 • ). Table 1 list the spectroscopic surveys used to assign radial velocities to the Hi-GAL compact sources. Some surveys cover a large area of the Galactic plane, such as for example the SEDIGISM survey (−60 • < l < +18 • , |b| < 0.5 • ) while some others have a more limited coverage, such as CHIMPS (28 • < l < 46 • and |b| < 0.5 • ). Also, there are some gaps in the coverage of the Galactic plane, such as for sources in the longitude range 195 • -205 • , for sources with |b| > 0.5 • and 1 • < l < 17 • , and for sources with b > 1.7 • and 66 • < l < 101 • .

Distance determination method
The first step in determining the distance is to measure the radial velocity of the Hi-GAL sources. Here, because most of the Hi-GAL sources are expected to be associated with dense molecular material, we measure their radial velocities from molecular line observations. Many spectroscopic surveys of the Galactic plane are available and have been used for this work; these are listed in Table 1.
To derive the distance of sources in the Milky Way we have to assume that the measured radial velocity (V LSR ) with respect 1 FP7-SPACE-2013-1 call with Grant Agreement No. 607380. to the local standard of rest of the source arises from its differential Galactic rotation. Subsequently, using a model for the rotation of the Galaxy, one can obtain the kinematic distance to the source from its radial velocity, V LSR . To this end, we adopt the revised Galactic rotation curve presented in Russeil et al. (2017). The rotation curve models axially symmetric circular orbits, but the true rotation pattern can be more complicated. Indeed velocities can depart from circular rotation, mainly due to streaming motions (Burton 1971;Stark & Brand 1989;Russeil 2003), and then can cause large uncertainty in the derived distances. Typical velocity departures are of about 10 km s −1 (e.g., Roman-Duval et al. 2009;Anderson et al. 2012;Wienen et al. 2015) but can reach up to 40 km s −1 (Brand & Blitz 1993). In addition, while the procedure of transforming velocities into distances is straightforward in the outer Galaxy, there is the wellknown kinematic distance ambiguity (KDA) for sources in the inner Galaxy. Each velocity measurement leads to two possible distances, the near and far distances, which correspond to the two equidistant points from the tangent point (the closest position on the line of sight to the Galactic center). Choosing between the near and far distances is only possible using additional information, such as for example extinction and H i data cubes along the same line of sight (e.g., Russeil et al. 2011).

Algorithm outline
We developed the Python program starbird to determine the velocity and the distance of Hi-GAL sources. starbird assigns heliocentric radial velocities and distances to sources of the Hi-GAL survey by means of the VIALACTEA Knowledge Base (VLKB, Molinaro et al. 2016). The starbird program has five main steps, as shown on Fig. 1.
starbird accepts an input table of sources in a specific format. This table contains mandatory information on the sources including a source running number, a source name which here is the Hi-GAL name (as given in the catalog of Elia et al. 2017), the Galactic and equatorial coordinates (in degrees), and the source elliptical footprint information (the major and minor axes full width at half maximum (FWHM) in arc-seconds and position angle (PA) in degrees) measured at 250 µm. starbird executes a series of actions (labeled QUERY, DOWNLOAD, RADIAL, FILTER, MORPHO, SELECT-VLSR, BROWSE and COMPUTE) which are described below.

Velocity extraction
The first step to finding the source velocity is to download the molecular data cubes and fit the spectra extracted at the position of the source. For a given source, the tasks QUERY and DOWN-LOAD list, query, and download the portion of the observational data cubes available in the VLKB. Depending on the Galactic longitude of the source there can be up to 11 different molecular data cubes available (see Table 1).
For all downloaded data cubes, the RADIAL task extracts a single profile at the central position of the source and performs a multi-Gaussian fitting. The developed fitting method is described in Appendix A. The FILTER task is then applied on the fitted lines, and is applied in every region of the l−V LSR plane except in kinematic avoidance zones (±20 degrees range around the Galactic center and anti-center direction), because in these regions anomalous streaming motions cause V LSR to strongly depart from the ideal circular motion. It first removes all individual velocity components that have a post-fit peak intensity with a signal-to-noise ratio (S/N) < 5 for the 5σ catalog and <3  Notes. (a) His is the threshold used in the FILTER task to remove spikes. No threshold is applied for surveys without such artefact.
for the 3σ catalog. It then removes spiky features by removing overly narrow lines. This rejection is based on a minimum fitted FWHM threshold that depends on the survey (see last column in Table 1 and Appendix A). FILTER also removes the velocities that would lead to a negative rotation frequency and those that are inconsistent with the angular rotation model; the suspected extragalactic velocities (i.e., those that give a galactocentric distance larger than 17 kpc); the kinematically forbidden V LSR (i.e., those that go beyond the tangent point velocity by more than 20 km s −1 ); and finally it applies a selection process favoring velocities falling in the l − V LSR map from Dame et al. (2001). Figure 2 shows an example of fitted spectra for the Hi-GAL source HIGALBM340.2690-0.0542 for which a velocity of −125 km s −1 is adopted. On this example we note that the densegas tracer lines (Figs. 2e and f) allow a reliable and unique velocity determination. Because they better probe the clumps velocity and show simpler spectra, dense-gas tracers better determine the velocity of a source. Indeed, for BGPS (Bolocam Galactic Plane Survey) objects, Ellsworth-Bowers et al. (2015) show that for objects with a HCO + (3-2) velocity, 95% have 13 CO(1-0) velocities in agreement. In addition, we need that the dense-gas tracer line often corresponds to the most intense line in the lowerdensity-gas tracer. However, in about 40% of the cases only the low-density-gas tracer, or a single molecular transition spectrum with multiple lines, is available. However, instead of adopting the most intense line velocity directly, we perform a morphological analysis as described in the following section.

Morphological analysis and velocity assignment
Most of the time the observed spectra show several peaks and then the velocity extraction process returns several velocities. Along the line of sight, the molecular emission can come from a diffuse layer or from a more structured medium such as clumps and filaments. One way to choose the best velocity is to suppose that the Hi-GAL sources belong to such a structured molecular medium. The MORPHO task has been developed to lead such a morphological analysis. For every fitted velocity, the script automatically extracts the plane from the data cube at this velocity and performs a basic source extraction using SExtractor software (Bertin & Arnouts 1996) to reveal the presence of some elliptical emission structures. If the molecular emission is diffuse then no source will be found, while if the emission is structured, molecular sources will be detected and some of them may intersect the Hi-GAL footprint. For the SExtractor detected sources, their roundness, their center angular distance to the Hi-GAL source center, and their overlapping area and filling factor with the Hi-GAL source are evaluated. A score is then attributed (coded by an integer between 0 and 4) spotting whether there is no overlapping source, a partial overlap, the source is included in the Hi-GAL one (and vice versa), or a complete overlap, respectively. The task SELECT-VSLR then scans all the available information on the fitted lines in order to choose the best line. Regarding the existence of positive scores for the molecular lines, different criteria are used. For each of these lines, an additional condition is applied: the line has to be confirmed by the presence of another line in a different molecular tracer among those available (with a velocity difference of less than 7 km s −1 ). If no line is confirmed, we choose by default the velocity of the most intense line from the densest available tracer. For confirmed lines the choice of the velocity is as follows. If lines have a morphological score attached to their velocities (by the previous action MORPHO), the velocity of the highest scoring line is selected. If two or more lines share the best score, the best angular separation (the smallest) is the next criterion. If they also have the same angular separation, the roundness (the highest) is the next criterion. If none of the confirmed lines have a score, we choose the velocity of the most intense line from the confirmed densest molecular tracer lines.

Output table
Source position, V LSR , molecular tracer, Heliocentric and galactocentric distances, labels ...  Figure 3 shows an example of the morphological analysis and the velocity assignment performed to the source HIGALBM340.1288-0.1837. For this source two molecular data cubes are available and both extracted spectra are in agreement, showing three lines at −3.4 km s −1 , −52.5 km s −1 , and −125.1 km s −1 but with different relative intensity. In MOPRA 12 CO(1-0), the highest peak is for the −52 km s −1 while in SEDIGISM 13 CO(2-1) this is the −125.1 km s −1 . This could be due to an optical thickness effect knowing that the 12 CO(1-0) emission usually comes from the cloud surface and/or a more diffuse medium. Thanks to the morphology analysis we can see that a structured source is found towards the Hi-GAL source (clear overlapping of the footprints) only at −125.1 km s −1 (Fig. 3e) which leads us to assign this velocity to the source. This also confirms that the −52.5 km s −1 velocity component corresponds to more-diffuse emission as illustrated in Fig. 3d.

Distance determination: solving the kinematic distance ambiguity
The distance determination is performed by the task COMPUTE. The distance of a source can be derived either from kinematics or in a more direct way (stellar distance of the associated H ii region or maser parallax distance). For the kinematic distance calculation, the Reid et al. (2009) fortran code was used but with the revised rotation curve adopted for the VIALACTEA project (Russeil et al. 2017). When the V LSR of a source is assigned, the kinematic distance (dkin) can be calculated. At this step, three scenarios must In all panels the black, red, and green curves are the extracted profile, the fitted lines, and the residual, respectively. The background gray area is the ±σ noise value where σ noise is the root mean squared value of the noise amplitude (see Appendix A). The horizontal blue line shows the 5σ level.
be considered: (1) the source is outside the solar circle and therefore dkin is unique; (2) the source lies inside the Solar circle but its velocity is forbidden, hence the tangent point distance is adopted; (3) the source lies inside the Solar circle with a valid velocity, and therefore two distances are possible and the kinematic distance ambiguity (KDA) must be solved to choose between the far distance, d FAR , and the near distance, d NEAR .
To solve the KDA, or to directly adopt a stellar or maser parallax distance, the task first cross-correlates (both spatially and in velocity, when possible) the source with available published catalogs (listed in the note of Table 2) in which the KDA is already solved, or giving us sufficient information to solve it or to assign a stellar or maser parallax distance.
Because Hi-GAL sources located in quadrants Q2 and Q3 are not affected by any KDA, their distance is directly assigned from their known radial velocity. For the Hi-GAL sources located in Q1 and Q4, the near/far distance ambiguity is resolved going down through the list of methods arranged in order of priority (see priority order in Table 2  whether the source concerned by KDA is located at the near or far distance (d NEAR or d FAR ) are summarized in Table 2. For example, because infrared-dark clouds (IRDCs) are cold and dense molecular clouds seen as extinction features against the bright Galactic background, if a source can be associated to an IRDC the near distance will be favored. If a source can be associated to an optical H ii region or to a region with a maser parallax then the stellar distance of the H ii region (or the near distance) or the parallax distance will be adopted.
Following the H i absorption/emission method (e.g., Anderson & Bania 2009), when a source can be assigned to a radio H ii region for which H 2 CO or H i line of sight absorption lines are observed with absolute velocity larger than the source V LSR then the far distance will be adopted. However, as many sources have no such information, we develop automatic scripts to perform H i self-absorption (HISA) and extinction curve analysis.
By default, the source is placed at the far distance but if a HISA feature is found at the source V LSR or if an extinction feature is found close to the near distance then d NEAR is adopted. The HISA is illustrated here for the source HIGAL BM329.2039+0.7101 (Fig. 4). The script automatically identifies all the significant (S/N larger than 3) and regular dips in the HI spectrum (see Appendix B). If a dip is found at the same velocity (within a 3 km s −1 tolerance) as the source V LSR , as is the case here, then d NEAR is adopted.
The extinction cubes are from Marshall et al. (2006) 2 . They give A K s value in the form of "l, b, distance" data cubes. The extinction profile (A K s vs. distance along the line of sight) at the central position of the source is extracted, and then the extinction layers along the line of sight are identified from the jump they induce in the extinction profile. As these jumps are easier to identify on the three-order derivative, the script automatically identifies them as minima deeper than 3σ from the baseline level. The extinction analysis is illustrated here for the source HIGALBM1.7647-0.4806 (Fig. 5) for which d FAR = 13.04 kpc and d NEAR = 3.63 kpc, respectively. In our example, one extinction layer at 3.7 kpc is identified, favoring the near distance. We therefore adopt d NEAR = 3.63 kpc for this source.
A last option used to distinguish between d FAR and d NEAR is to follow the method used by Solomon et al. (1987). This method considers the physical distance of the source to the Galactic plane: if by taking the far distance the cloud is too far from the Galactic plane (i.e., >140 pc, which is the scale height of the gaseous disk of the Galaxy) then the near distance will be favored. We note that because the sources with KDA are located within the Solar circle, this criterion is not affected by the Galactic warp which starts outside the Solar circle. References. The distance determination script tests the different methods in a hierarchical way following the priority list given in Table 2. At the beginning, the source is placed at the far distance, and then this distance is changed if one of the listed conditions is fulfilled. Usually the choice of the distance is validated by only one method (which is indicated as the "STATUS" in the final table), but because all the methods are tested, the number of validated methods is coded in the final table as the "PROBA" number; the higher this number, the more reliable the distance decision.

Results
Results are presented in Table 3 (the complete table is available at CDS and the 3σ version of the table is also delivered as described in Appendix E). The columns are as follows: Col. 1: Hi-GAL source name; Cols. 2-5: equatorial and Galactic coordinates; Cols. 6 and 7: adopted velocity and its uncertainty; Col. 8: molecular tracer used for the velocity determination; Cols. 9 and 10: assigned distance and its uncertainty; Col. 11: method used for the distance determination as listed in Table 2; Col. 12: probability that the source has the assigned distance (this is a simple evaluation of the number of different methods assigning the chosen distance, e.g., PROBA = 0.22 means that two methods over the nine agree with the adopted distance assignment); Cols. 13-18: far and near kinematic distances and their upper and lower uncertainties (without any correction applied); Col. 19: the galactocentric distance; and Col. 20: detected extinction features listed as distance, A K s pairs. One problem that arises when determining the kinematic distances in the Galaxy is that the arm perturbation can introduce radial and azimuthal gas streaming motions (e.g., Roberts 1969;Englmaier 2000;Ramón-Fox & Bonnell 2018). Such streaming motions are taken into account in the determination of the distance uncertainty by assuming a typical velocity dispersion of 7 km s −1 , but when the streaming motion induces a systematic and larger velocity shift, as is the case in the Perseus arm between l = 90 • and l = 160 • (e.g., Roberts 1972;Georgelin & Georgelin 1976;Brand & Blitz 1993) we must add a velocity correction. We then first delineate the Perseus arm by determining the arm velocity range (only between l = 90 • to l = 160 • ) from the 12 CO longitude-velocity plot of Dame et al. (2001). We then select HiGAL sources in these velocity and longitude ranges and apply to these sources (4405 sources with a kinematic distance) a correction of 14.9 km s −1 (Russeil et al. 2007) to the measured V LSR before re-calculating the kinematic distance. This correction is illustrated in Fig. 6. Figure 7 shows the distribution of sources with determined radial velocities. We are able to assign a velocity for 124 069 sources (82.6%). We can split the sample of sources with no velocity assignment into sources that are not covered by any molecular surveys (57.4%) and sources for which the extracted spectrum is too noisy to perform any line fitting or to obtain a fitted line above a 5σ threshold (42.6%). For the first group of sources, we clearly identify the two gaps in the longitude coverage (around l = 190 • and l = 270 • ) and latitude coverage limitation of the survey for some part of Q1 and Q2. For the second group of sources we note that 19% of them have a velocity if the fitted line intensity threshold is lowered to 3σ (instead of 5σ). Figure 8 shows a zoom (−60 • < l < 60 • ) on the longitude-velocity distribution of all Hi-GAL sources for which we obtained a velocity (124069 sources). The gray-scale image shows the distribution of the molecular gas traced by the integrated 12 CO emission from Dame et al. (2001). We note that the sources closely follow the large-scale prominent molecular features. Urquhart et al. (2018) derived the physical properties of approximately 8000 ATLASGAL clumps (located in quadrants 4 and 1). To do so, they first assigned radial velocities to the clumps and used a series of criteria (see their Fig. 5 with the flow chart) to determine distances for the clumps. In order to compare our results with their velocity and distance determinations, we first cross-correlated the ATLASGAL and Hi-GAL compact-source catalogs with a radius of 14 for the cone search. ATLASGAL and Hi-GAL sources are paired on the basis of their physical proximity. Due to the maximum radius of 14 and differences in ATLASGAL and Hi-GAL survey coverage, the number of paired sources is 6268. Among them, 5976 sources have velocity in both surveys. Figure 9 shows the comparison of velocity determination for paired ATLASGAL and Hi-GAL sources. The colors code the separation between the sources. We note that the source scattering around the one-to-one line is not due to mismatching. We find that 89.4% and 91.8% of the sources have a velocity difference smaller than 5 km s −1 and 10 km s −1 , respectively.
To make a comparison of distance determinations for paired sources, and in particular to compare the KDA solution, we must select sources with similar velocity. We then select sources with velocity difference smaller than 5 km s −1 . In addition, we keep sources with longitude outside ±12 • around the center

Notes.
The full  Fig. 6. Longitude-distance plot in the range of the Perseus arm streaming motion correction (l = 90 • and l = 160 • ). Blue and red symbols are the source distance before and after the correction. Sources that draw horizontal alignment are sources with the same stellar distance assignation. and anticenter directions (because their distance determination is highly uncertain which is due to noncircular motion in the bar or the perpendicularity of the circular motion to the line of sight), and we exclude ATLASGAL sources labeled as "ambiguous" and Hi-GAL sources labeled "NO_KDA". Figure 10 shows the comparison of distance determinations for this sample (3528 sources). In Fig. 10a, apart from the sources distributed around the equality line, we observe the classical behavior of anti-correlated distances, which is due to the differences in the distance disambiguation, that is, difference in assignation for a given source between the near and far distance (see Sect. 3.4). This is observed for any sources catalog, as illustrated in   KDA distance has been specifically solved (Flags (iv) to (viii)) we find that 29.5% of the sources disagree for the KDA solution adopted. Nevertheless, we find that 2248 sources (64%) have a distance difference of less than 0.7 kpc (0.7 kpc being the typical distance uncertainty value for the paired Hi-GAL sources) with a mean value of 0.21 kpc. To better understand the distance difference between ATLASGAL and Hi-GAL sources we identify in Fig. 10b the sources with the tangent distance choice and sources with velocity lower than 10 km s −1 , which are nearby sources (for which the velocity is more representative of any local motions) or sources close to the Solar circle (for which any particular motions produce inaccurate kinematic distances).
We performed the same selection and the same comparison (see Fig. 11) with the BGPS catalog (Ellsworth-Bowers et al. 2015) which probes mainly quadrants 1 and 2. The number of paired sources with velocity is 2242, among which 89.2% (2000 sources) have velocity in agreement (∆v ≤ 5 km s −1 ). From this subsample of sources, we discard those with an unconstrained KDA solution in both surveys. This leaves us with 799 sources among which 64% (510 sources) have a distance difference of less than 0.5 kpc (which is the typical distance uncertainty value for the paired Hi-GAL sources) with a mean value of 0.16 kpc. In addition, selecting the BGPS sources for which the KDA distance has been specifically solved (Flag N or F) we find that 29.6% of the sources disagree for the KDA solution adopted; a proportion similar to that found for the ATLASGAL comparison. Zucker et al. (2020) present a method that combines stellar photometric data with Gaia DR2 parallax measurements in a Bayesian framework to infer the distances of nearby dust clouds to a typical accuracy of ∼5%. These latter authors derived a catalog of distances to molecular clouds presented in the Handbook of Star Formation, Volumes I (Reipurth 2008a) and II (Reipurth 2008b). To compare our distance results with those of Zucker et al. (2020) we select the regions that are covered by Hi-GAL in their list. There are 13 regions common to both our list and that of Zucker et al. (2020), namely CMa OB1 , Carina, Cygnus X, Gem OB1 , IC 2944, L1293, M20, M17, North America, RCW38, W3, W4, and W5. We then select the Hi-GAL sources located at the positions of each region as listed by Zucker et al. (2020) and in a similar area (0.2 • ) but within ±5 km s −1 around the systemic velocity of the region. For most of the region the adopted velocity is the H ii region V LSR , but for CMa OB1 , Gem OB1 , and IC 2944 the adopted velocity is the V LSR calculated from the mean radial velocity of the stars (Mel'Nik & Dambis 2009;Sepúlveda et al. 2011) and for L1293 it is the molecular velocity (Conrad et al. 2017). We then split the sample into regions located in quadrants 2 and 3 (then with no KDA) and in quadrants 1 and 4 (with KDA) and display them in Fig. 12, a velocity-distance plot (see also Fig. D.1). From Fig. 12 we find a good agreement for regions W3, W4, W5, RCW38, Carina, and CMa OB1 , an agreement within the error bars for Gem OB1 , Cygnus X, North America, IC 2944, and L1293, a shift for M17, and a strong disagreement for M20. For M20 and Gem OB1 , their longitude placing them close to the Galactic center and anti-center directions, respectively, leads to an unreliable kinematic distance. Similarly, for North-America, although a few sources agree, others are at the adopted stellar distance while several are too far but can be explained by their longitude being near l = 90 • , a location where the kinematic distance is also uncertain mainly because of the small value of the velocity in this region. In W5, many sources have been placed at the stellar distance which is slightly farther than the Zucker et al. (2020) distance but in agreement with the 2.3 kpc distance found for W3 and W5 by Chen et al. (2020). The most surprising is the disagreement found for M20 and M17. For these two regions, 10 among 13 and 10 among 19 Hi-GAL sources, respectively, have been placed at the far distance (not plotted in Fig. 12). However, for M17 the OB star distance is 2.1 kpc (Hoffmeister et al. 2008), in agreement with our data. This could suggest that the cloud detected by Zucker et al. (2020) is in front of M17 or its distance is underestimated. We note that Chen et al. (2020) find distances between 1.6 and 1.8 kpc for clouds in the neighborhood of M17. We can invoke the same explanation for M20, as Marshall et al. (2009) and Schultheis et al. (2014) estimated a distance greater than 4 kpc for dark clouds and extinction features in the direction and in the neighborhood of the H ii region while Tapia et al. (2018) reevaluate the M20 distance to 2 kpc.
Comparison with results presented in Zucker et al. (2020) allows us to illustrate the fact that when studying a particular region it is better to associate the region and the Hi-GAL sources from their velocities and only then to adopt the same distance calculated from the mean or representative velocity of the region. Indeed a small difference in velocity (due e.g., to internal motion) could lead to a large difference in distance.
We must keep in mind that the comparison presented here is done on small areas and in lines of sight that are not always pointing directly toward the considered H ii regions. To avoid this, we illustrate the Hi-GAL sources association with a typical star forming region, the W3-W4 H ii regions complex, for a larger view. We select all the Hi-GAL sources located in the (l, b) coverage of the region and plot these sources in (l, b, v) and (l, b, D) diagrams, v and D representing the velocity and distance obtained from this work. Results are presented for the W3-W4 regions. Figure 13 shows the Herschel PACS 160 µm image of the W3-W4 Galactic star forming regions together with the 1544 Hi-GAL sources overlaid as points, color-coded according to the velocity-range to which they belong. The associated sources (858 sources, shown in green) have velocities in the range [−55, −40] km s −1 which corresponds to velocities observed for the ionized and molecular gas in these regions. Unfitted sources are coded in blue (422 sources) and sources with velocities out of the [−55, −40] km s −1 range are coded in red (264 sources). Of the 1122 sources with velocity information, 858 (76.5%) are seen in the range −55 to −40 km s −1 .
From Fig. 13 it is clear that knowledge of velocity (and distance) is key to studying star formation. We note that the blue unfitted sources represent an important number of sources. Dedicated studies can be envisioned to obtain molecular data for these sources and ascertain their velocity and distance. Figure 14 shows the (l, b, D) 3D diagram for the regions. Over the 1544 sources (with 422 unfitted, therefore a total of 1122 sources), 478 (43%) are observed in the 1.8-2.3 kpc range, the range obtained by Zucker et al. (2020) using Gaia data for the W3-W4 regions (see also Navarete et al. 2019 for W3). The remaining 644 sources (57%) are observed out of this distance range. Figure 15 shows the velocity versus distance plot for the W3-W4 regions. The 1544 Hi-GAL sources are represented together with the selected velocity range (−55 to −40 km s −1 ) in green and the selected distance range (1.8-2.3 kpc) in red. We see that, within the velocity range [−35, 0] and [−90, −60], sources follow the relation between velocity and distance given by the kinematic distance derivation. A similar relation is observed within −35 to −60 km s −1 where the velocity correction has been applied to take into account streaming motions A74, page 10 of 15 Fig. 12. Velocity-distance plot of Hi-GAL sources in regions from Zucker et al. (2020) for sources in quadrants 1 and 2 (a) and in quadrants 2 and 4 (b), respectively. The velocity (±5 km s −1 ) and the distance (and its uncertainty) of the sight lines as listed in Zucker et al. (2020) are plotted in black. In panel a the Hi-GAL sources are plotted in red (for W3 and Gem OB1 ), blue (for W5 and L1293), and green (for W5, CMa OB1 and RCW38). In panel b the Hi-GAL sources are plotted in red (for Carina, North America, and M20), blue (for Cygnus X), and green (for IC 2944 and M17).  Zucker et al. (2020) for these two star forming regions. The color coding is the same as in Fig. 13, that is, sources in the range 1.8-2.3 kpc are coded in green, sources outside this range are coded in red, and unfitted sources (with no distance information) are coded in blue; the distance layer for these latter sources is put arbitrarily at 3 kpc for clarity in the plot. observed in this region of the Galaxy. We clearly see the correspondence between the velocity range and the distance range, visually explaining how a lower number of sources are found in the corresponding selected distance range (1.8-2.3 kpc) when compared to the number of sources found in the selected velocity range.
In all cases studied, we note that about 50% of the Hi-GAL sources detected in a given region are in the D layer associated with a given molecular star forming cloud. We also note the possibility that for some unfitted sources (hence with no velocity or distance information), when a clear spatial association of the source can be ascertained with complementary information (e.g., near-infrared images), that this spatial association could be used to assign the source to a given molecular star forming region. We recommend users take a careful look at an infrared 2D image of the region, the (l, b, v) and (l, b, D) diagrams, when working on specific regions.

Conclusions
In an unprecedented effort to deliver reliable velocity and distance information for a large number of Hi-GAL sources, we developed the Python program starbird, which enables the determination and the assignation of velocity and distance to the Hi-GAL compact sources. We obtained a distance for 124 069 compact sources (among 150 223 sources) of the Hi-GAL catalog for the 5σ level (and 128 351 sources for the 3σ level). This is the first time that this information has been determined for such a large number of compact sources in a homogeneous way. This Hi-GAL compact source catalog with distance determination constitutes an invaluable resource for studies related to star formation in the Galaxy and Galaxy structure.
A comparison of our results with published velocities and distances coming from ATLASGAL and BGPS surveys shows that the velocity assignment is effective (with typically 89.3% of the sources with a velocity difference of less than 5 km s −1 ), and the resolution of the KDA disagrees in only 29.6% of the cases. The comparison with nearby regions from Zucker et al. (2020) illustrates that the cataloged distances can be used statistically to lead studies at the scale of the Galaxy. When studying a particular region, we recommend comparing the velocity and spatial distribution of the Hi-GAL sources with the systemic velocity of the region and its morphology, respectively. Finally, a path to improvement will be to update the distance (up to 4 kpc) of the nearby optical H ii regions and star forming regions (e.g., Cantat-Gaudin & Anders 2020; Zucker et al. 2020;Maíz Apellániz et al. 2020) used in the process and the inclusion of recent molecular cloud distance catalogs (such as Chen et al. 2020), both based on the recent Gaia-DR2 data. HI Self-absorption dips > 3 rms In order to solve the KDA, we developed a code to analyze the dips in H i spectra (see Sect. 3.4). The code first identifies all the extrema (Fig. B.1 upper panel) and then selects only the ones above three times the noise level (σ noise ). A dip is then defined as the intersection between a rising edge and a falling edge (displayed by the arrows in Fig. B.1 -middle panel) linking a minimum to its two nearest maxima. In addition, the amplitude of the two edges must be larger than 3σ noise to be considered as significant. In this frame, the smaller and/or the overly irregular (partially refilled, asymmetric, one edge below 3σ noise ) dips are not kept by the code (Fig. B.1 lower panel).

Appendix C: BGPS and ATLASGAL comparison
To better evaluate the reliability of our distance determination we use the comparison of ATLASGAL and BGPS catalogs as a reference. There are 697 sources with distance in common in these two samples. Figure C.1(a) shows that they have velocity in good agreement with 92% (634 sources) of them having a velocity difference ≤5 km s −1 . Figure C.1(b) shows the typical shape due to the differences in the distance disambiguation (as for Figures 10 and 11 sources in the ±12 • toward the Galactic center direction were discarded). We find that ∼50% (324) of sources have a distance difference ≤0.5 kpc (the typical uncertainty of the BGPS distances).

Appendix E: The 3σ catalog
In addition to the 5σ catalog, which is constructed from velocity extracted from lines with post-fit peak intensity with S/N > 5 we also produce (and deliver it as Table E.1) a 3σ catalog (selecting lines with post-fit peak intensity with S/N > 3). With this lower selection level we can assign velocity to 4282 additional sources. The comparison of these catalogs (see Fig. E.1) shows that among the sources with velocity in both catalogs, 89.5% have similar velocity (V LSR ≤ 5 km s −1 ) and 90.8% have a distance difference ≤1 kpc.