Toward Cosmicflows-4: The \hi data catalog

In this study, we present an update of a compilation of line width measurements of neutral atomic hydrogen (HI) galaxy spectra at 21 cm wavelength. Our All Digital HI (ADHI) catalog consists of the previous release augmented with our new HI observations and an analysis of archival data. This study provides the required HI information to measure the distances of spiral galaxies through the application of the Tully-Fisher (TF) relation. We conducted observations at the Green Bank telescope (GBT) and reprocessed spectra obtained at the Nancay radiotelescope by the Nancay Interstellar Baryons Legacy Extragalactic Survey (NIBLES) and Kinematics of the Local Universe (KLUN) collaborations and we analyzed the recently published full completion Arecibo Legacy Fast ALFA (ALFALFA) 100% survey in order to identify galaxies with good quality HI line width measurements. This paper adds new HI data adequate for TF use for 385 galaxies observed at GBT, 889 galaxies from archival Nancay spectra, and 1,515 rescaled Arecibo ALFALFA spectra. In total, this release adds 1,274 new good quality measurements to the ADHI catalog. Today, the ADHI database contains 18,874 galaxies, for which 15,433 have good quality data for TF use. The final goal is to compute accurate distances to spiral galaxies, which will be included in the next generation of peculiar velocities catalog: Cosmicflows-4.


Introduction
In 2006, we launched a large observational and theoretical program to study the large-scale structures and dynamics of the local Universe up to z ∼ 0.1. Some remarkable outcomes of this project include the discovery of our home supercluster Laniakea (Tully et al. 2014), the Dipole Repeller (Hoffman et al. 2017, DR), the Cold Spot Repeller , and the cartography of the Local Void .
Recently, we set up a distance-velocity calculator that is publicly accessible through the Extragalactic Distance Database 1 (EDD). It is for any astronomer needing information on the local gravitational velocity field at a specified location in the local Universe (Kourkchi et al. 2020a).
The Hi line widths data in our Cosmicflows catalogs come from our observations at Green Bank telescope (GBT) and our reprocessing of spectra obtained at other giant radio telescope archives such as Parkes, Nançay, Effelsberg, and Arecibo, which were all gathered in the All Digital HI catalog (ADHI, Courtois et al. 2009Courtois et al. , 2011Courtois & Tully 2015). Table 1 shows a list of the various data and their respective publications in Col. 2, which was used to build the ADHI catalog. Column 1 corresponds to the code used to refer to the source in the catalog.

Measurement of the line width parameter
In the Cosmicflows program, we measured the neutral hydrogen line width at 21 cm wavelength enclosing 50% of the cumulative HI line flux, W m50 (Courtois et al. 2011). This parameter, W m50 , is the line width measured at the flux level that is 50% of the mean flux, averaged in channels within the wavelength range enclosing 90% of the total integrated flux. However, the parameter W m50 is only an empirical measure of the true width of an HI galaxy velocity profile. A correction for redshift and instrumental broadening should be applied: W c m50 = W m50 1+z − 2∆vλ, where z is the redshift, ∆v is the smoothed spectral resolution, and λ = 0.25 is an empirically determined constant. The observed line width can also be adjusted by separating out the broadening from turbulent motions and offsets to produce an approximation to 2V max , where V max characterizes the rotation rate over the main body of a galaxy. Tully & Fouque (1985) defined the parameter W mx as: The parameters W c,m50 = 100 km s −1 and W t,m50 = 9 km s −1 were set after tests conducted in Courtois et al. (2009), and they This paper characterized the transition from boxcar to Gaussian intrinsic profiles and the turbulent broadening for the observed line width considered, respectively. It was then related to the rotation rate V max by: where i is the inclination of the galaxy from face-on relative to the observer. Inclinations have been evaluated using an online graphical tool, Galaxy Inclination Zoo (GIZ) 2 , in a collaborative science project with citizens. Please refer to Sect. 2.3 of Kourkchi et al. (2020b) for further details. Details regarding the W m50 and W mx line width parameters and comparisons with alternatives are discussed in Courtois et al. (2009). Finally, the error of the line width, e W , is given by (Courtois et al. 2009): S /N ≥ 17 e W = 8; 2 < S /N < 17 e W = 21.6 − 0.8 S /N; where S/N is the signal-to-noise ratio (S/N). An HI target is considered adequate for estimating its distance with the Tully-Fisher (TF). Although the KLUN, NIBLES, and ALFALFA data have already been processed by their own collaborations, with methodologies described in their respective literature, these data samples have been reprocessed with the analysis described in this section in order to include them in the ADHI catalog, which gathers various HI data treated with that same procedure.

Nançay: NIBLES I and KLUN17
In this section, we consider two HI data releases that were obtained at the Nançay radio telescope by the KLUN and NIBLES collaborations: the KLUN17 release (Theureau et al. 2017) Table 1). Figure 1 shows the sky distribution in equatorial coordinates of all HI data considered in this paper. The Nançay data considered in this section (both KLUN17 and NIBLES I ) are represented by red squares. The other samples displayed in this figure are described further below in the next sections. The data points in the Nançay samples are distributed homogeneously in the two regions.
The redshift distribution of the two Nançay samples is shown in red in Fig. 2. Similarly to Fig. 1, the other HI samples are detailed later in the paper. We mainly find nearby galaxies, essentially below 6000 km s −1 . The inset plot in the top right corner of Fig. 2 shows the redshift distribution of each Nançay subset. The purple and orange histograms represent the NIBLES I and KLUN17 subsamples, respectively. The NIBLES I sample primarily contains nearby galaxies below 3000 km s −1 , and very few distant galaxies up to 8000 km s −1 . In contrast, the KLUN17 data slightly contribute to the addition of distant galaxies for the next Cosmicflows catalog, as it mostly contains galaxies above 5000 km s −1 , and up to 12 000 km s −1 .
All data have been duly reduced and analyzed by their respective collaborations. However, for consistency, in order to include the line width measurements in the ADHI database, the HI parameter needs to be remeasured with the method used in the Cosmicflows collaboration. We reprocessed 1864 HI spectra from the NIBLES I collaboration. A total of 1443 line widths were detected and measured, including 808 new additions to the ADHI database, of which 565 are of an adequate quality for TF measurement. Regarding the KLUN17 sample, we obtained 500 detections after remeasureming the line width on 828 HI spectra. This provides 393 new additions to ADHI, of which 324 are of a suffient high quality to be used for TF purposes.
The left panel of Fig. 3 shows the logarithm of the integrated flux as a function of the logarithm of the line width, and the right panel shows the distribution of HI mass as a function of the redshift. Identically to the previous Figs. 1 and 2, the Nançay data are represented by red squares, and the rest of the figures are described later in the paper.
As discussed above, line widths were remeasured with the original Nançay spectra in order to be integrated into the ADHI catalog. In Fig. 4, we compare the measurements published in this paper and added to ADHI as W CF with the ones published by the NIBLES I and KLUN17 collaborations. These are indicated by W published and represented as purple triangles and orange squares, respectively. The left panel shows the raw line widths W 50 , while the right panel shows the line widths W mx corrected with Eq. (1). In both panels, the remeasured W 50 in the NIBLES I sample exhibits a significant positive offset of 11 km s −1 from the originally published W 50 . This is explained by the difference in measuring the width at 50% of the maximum flux and at 50% of the mean integrated flux. The KLUN sample shows consistency with the Cosmicflows measurements for both raw and corrected line widths.

Arecibo: ALFALFA
The Arecibo Legacy Fast ALFA (ALFALFA) survey is a blind extragalactic HI survey to conduct a census of the local HI universe over a cosmologically significant volume. The full 100% ALFALFA extragalactic HI source catalog was published in Haynes et al. (2018). It contains more than 31 500 extragalactic HI line sources detected out to z < 0.06 in the Arecibo telescope's declination range of 0 < δ < 38 degrees.

Right Ascension deg
Sky distribution in equatorial coordinates of all HI data considered in this paper. The red squares, green dots, and blue triangles correspond to the HI data from the Nançay (NIBLES I AND KLUN17 HI data releases), Arecibo (ALFALFA), and Greenbank telescopes, respectively. All of the galaxies shown are new additions to the ADHI catalog, and their line width measurements are adequate for the use of the TF relation.

Identifying galaxies with adequate measurements
The various HI parameters obtained by the ALFALFA collaboration have been published and are available online for 32 612 galaxies. However, to date, only 40% of the actual HI spectra are available in a digital usable format (e.g., ascii, fits) (α40, 15 000 galaxies, published in Haynes et al. 2011). These data were already remeasured by the Cosmicflows collaboration, bringing a total of 3898 new good quality line width measurements to the ADHI catalog out of 15 000 reprocessed spectra (Courtois & Tully 2015). The source code hgm2011 listed in Table 1 corresponds to the α40 entry in the ADHI catalog. However, the remaining ALFALFA spectra of the full α100 catalog were not available in the α40 release, so the HI parameters cannot be directly re-estimated and included in ADHI.
In this section, we compare the parameters from the α40 catalog obtained by ALFALFA and Cosmicflows. After establishing relations between these parameters, we then estimate the values of the HI parameters of the whole ALFALFA-100% catalog, as if they were remeasured by the Cosmicflows collaborations. Throughout this section, parameters obtained by the ALFALFA (published in the α40 catalog) and Cosmicflows collaborations are identified by the subscripts α and CF, respectively.
As our goal is to estimate which ALFALFA-100% line width measurements may be adequate for deriving distances, we first consider the error, e W , on the HI line width. The definition of this parameter differs between the two studies. When comparing the two parameters e α W and e CF W , one can easily notice that e α W e CF W . The S/Ns, S/N CF , and, S/N α , are defined differently. In the Cosmicflows collaboration, the S/N is derived as the ratio of the signal at 50% of the mean flux over the noise measured beyond the extremities of the signal (Courtois & Tully 2015). The S/N derived by the ALFALFA collaboration is defined in Eq. (2)    S/Ns from both collaborations by the following relation: which is illustrated in the left panel of Fig. 5 by a solid red line. We can then estimate the value of S/N α in the Cosmicflows definition, which is denoted by S/N α→CF hereafter.

Comparing the ALFALFA and Cosmicflows line widths
The line widths obtained with the 70% completion ALFALFA release (α70) and Cosmicflows catalogs have been compared without reprocessing the spectra in Kourkchi et al. (2019), using a subsample of galaxies with adequate quality HI measurements. They were part of both catalogs at that time. The result was W CF mx = W α 50 − 6, where W CF mx is the line width in km s −1 taken from ADHI (including corrections, see Eq. (1)), and W α 50 is the line width provided by the ALFALFA α70 catalog.
In this section, we follow the same methodology to compare line widths from both collaborations, considering now the newly published full 100% ALFALFA catalog Haynes et al. (2018) which contains 32 612 galaxies. Line widths extracted from this catalog are noted as W α 50 . We are comparing them to a sample containing 15 433 good quality only line width measurements extracted from the ADHI dataset, corrected with Eq. (1) and represented by W CF mx . The number of common galaxies in both datasets is 3970.
In the right panel of Fig. 5, we compare the line widths W α 50 and W CF mx by plotting the difference W CF mx − W α 50 as a function of the S/N, S/N α . We notice that there is significant dispersion at small S/Ns, where S /N α < 15, which is represented by a black dotted line. Line width measurements above this threshold yield the following linear relationship between the two parameters: represented in the right panel by a solid red line. This relation is slightly different from the one obtained previously in Kourkchi et al. (2019). We attribute this slight difference to the use of an earlier data release of the ALFALFA survey, α70, by Kourkchi et al. (2019).

Application to the full ALFALFA data
In this section, Eq. (5) is applied to the full 100% ALFALFA catalog (galaxies previously included in ADHI with high quality measurements are excluded) to compute how many and which galaxies could be added to the upcoming Cosmicflows-4 catalog. Galaxies with HI line width errors (in the Cosmicflows definition) of e α→CF W ≤ 20 km s −1 are suitable for the TF method and as such for inclusion in the new upcoming Cosmicflows-4 catalog. We identify 1515 new galaxies which satisfy this criterion. Figure 1 displays the sky distribution of the additional 1515 ALFALFA galaxies that are involved in the construction of the upcoming CF4 catalog. ALFALFA galaxies are represented by green dots. A homogeneous coverage can be easily noticed in the region covered by Arecibo. Figure 2 shows the redshift distribution of the new galaxies from ALFALFA in green. Three peaks are observed at 2000 km s −1 , 5000 km s −1 , and 8000 km s −1 . This sample will thus be a significant contribution to CF4. It will include a large number of distant galaxies up to 17 000 km s −1 . However, it is important to note that only 338 galaxies (∼22% of the sample) have a redshift larger than 8000 km s −1 .
The left panel of Fig. 3 shows the integrated flux as a function of the line width. ALFALFA galaxies are represented by green dots. The blind ALFALFA survey mostly detects galaxies near the peak of the HI luminosity function where the largest volume is explored at a given flux. Hence, detection is favored for giant galaxies with a large line width and intrinsic HI fluxes. Smaller galaxies at the same apparent flux levels are drawn from much smaller volumes.
The distribution of the HI mass as a function of the redshift is shown in the right panel of Fig. 3. ALFALFA galaxies are once again displayed as green dots. The luminosity function (LF) of galaxies is defined as the number of galaxies per Mpc 3 in a luminosity interval dM centered on magnitude M (Zwaan et al. 2001). In Fig. 3, we observe that the new galaxies observed satisfactorily with GBT are significantly less massive than the mean (or typical) ones from ALFALFA. The ALFALFA remeasurements are consistent in terms of the luminosity function with other surveys. The point concerning our GBT observations is that we do not aim to observe a representative survey of galaxies showing a good sample of a different luminosity or mass distribution galaxies. Our goal is to measure galaxies as far as possible, regarding their velocity recession, in order to obtain an enlarged distribution of velocities with independent distances (using the TF extraction), for our further studies of peculiar velocity. In other terms, we are not looking for a representative sample of galaxies following a luminosity function, rather we target galaxies with spatial, recession velocity and distances as various as possible.

Selection of targets
Targets were selected from a compilation of two samples: a sample of flat galaxies from the Revised Flat Galaxy Catalog (RFGC Karachentsev et al. 1999) to which a selection cut of δ > 36 deg was applied to avoid overlap with the ALFALFA sample; and a sample of galaxies near to the Dipole Repeller , DR) -galaxies were extracted from the LEDA database 3 (Makarov et al. 2014) with the cuts α > 20 deg, 16 < δ < 65 deg, 3, 000 < cz < 12000 km s −1 , and i > 50 deg.
Several samples were added later to augment the number of potential targets. First, a sample of spiral galaxies extracted from LEDA with the following cuts applied: δ > 36 deg (no overlap with ALFALFA), 3000 < cz < 9000 km s −1 , and i > 55 deg. The cuts were later modified in order to obtain more distant galaxies: δ > 38 deg, 3000 < cz < 10 000 km s −1 , and 40 < i < 85 deg. Lastly, a few SNIa hosts from SNfactory (Rigault et al. 2020;Aldering et al. 2002) with a redshift up to 12 000 km s −1 were added to the list of targets.
All galaxies that are already part of the ADHI catalog with adequate HI profiles were removed from the samples described above. Several galaxies with almost adequate HI profiles from the GB300 and Nançay telescopes have also been added to the list of potential targets.
This extensive list now contains 5460 galaxies, all of which have been inspected and are observed if they satisfy the following criteria. First, we needed to make sure that the galaxies were spiral galaxies with HI gas. One can look at the Pan-STARRS (Chambers et al. 2016) optical images in the g, r, i, z, and y bands. A galaxy with HI, especially its disk, is hardly visible in the y band, while it is very bright at the g band. We also searched the optical spectra from SDSS DR12 (Alam et al. 2015) or LAMOST (Luo et al. 2015) for a significant Hα emission line, hinting at the presence of young stars and therefore HI gas. Spectra containing Na, Mg, and Ca emission lines are indicative of old stellar populations and little HI. Available Pan-STARRS images and SDSS or LAMOST spectra were systematically inspected by eye for each target. A total of 54% of the targets had an optical spectrum available. This ratio is not entirely satisfactory, but we did not find a recent, available optical spectrum for all our targets. Secondly, we did not observe collisions or interacting galaxies. If the distribution of the HI gas is disturbed, it does not allow us to measure the rotation speed of each galaxy. Moreover, the size of the radio lobe must be considered. Two galaxies within the radio lobe and located at the same redshift would lead to a confused spectrum because the galaxies spectra would overlap. Last but not least, we also checked the photometry quality and the inclination of the potential target to ensure the TF relation is applicable. Inclinations are indispensable to de-project the HI line widths. We require that inclinations be greater than 45 degrees from face-on.
After inspection of 5460 potential targets, 628 galaxies were selected and observed. The target selection procedure was not completely carried out in the sense that we observed  Notes. The full table is available at the CDS. The error e W = 100 km s −1 corresponds to the confusion case where, for example, two galaxies are in the detection field, e W = 500 km s −1 , which corresponds to the no detection case.
well-defined targets with our first selection of criteria but which ultimately do not give a satisfactory result after observation.

Observations: Strategy and planning
A total of 610 h of observations have been conducted from December 2017 to May 2019. During observations, the telescope was configured as follows. We used the L-band receiver Rcvr1_2, detecting frequencies between 1.15 GHz and 1.73 GHz. For this receiver, the gain of the GBT is 2 K Jy −1 , and the size of the radiolobe is 9 arcmin. The spectral resolution considered during observations is 0.9 km s −1 .
The following ON-OFF-ON methodology was used throughout our observations. This observing strategy allows two ON-OFF scans to be conducted on the same source during a single session of observation. It consists of one 300 s scan on the source (ten integrations of 30 s), followed by a 300 s scan on the sky, then a second 300 s scan on the source. A total of 15 min is spent on a single source during a session. This strategy permits an additional 5 min of observation time per source and per session compared to the usual ON-OFF methodology.
During an observation run, targets are selected automatically from the list of inspected targets based on their proximity relative to the telescope orientation. Other target selection criteria are 0 < δ < 85 deg and that the angular separation between the target and the Sun should be higher than 10 deg. After each run, we reduced the collected data as described in Sect. 3.3.3 below and updated the target list, accordingly. Keywords are used to label and identify galaxies to be observed. The ON-OFF-ON cycles are repeated until the signal is sufficient to give an adequate line width measurement (S/N high enough to give an acceptable error on the line width, see Sect. 2). The total integration time per source goes from 15 min (one ON-OFF-ON cycle), to several hours for the most distant galaxies, or when the spectra were contaminated by radio frequency interferences (RFIs) as detailed in Sect. 3.3.3 below.

Data reduction and RFIs
Each integration of a single ON-OFF pair is calibrated with the getsigref function of GBTIDL 4 in order to obtain the final spectra. After removing eventual RFIs as described in this section below, a Hanning smoothing (hanning function) is applied in order to obtain a resolution of 3.6 km s −1 .
In total, 628 HI spectra of galaxies have been obtained at GBT, of which 407 correspond to detections and 385 are acceptable for the use of the TF relation. The raw HI parameters measured on the 628 spectra obtained are available in Table 2, as well as complementary data. Column 1 corresponds to the ID number of the target in the Principal Galaxy catalog (PGC). Notes. Columns 1 and 3 give the PGC number, the heliocentric velocity V hel in km s −1 , and the HI profile after the measurement of the line width.
In the case of non-detection, the raw spectra obtained from the GBT is shown in red. Columns 2 and 4 give the raw line width W 50 and its error e W in km s −1 , as well as the PanSTARRS image composite of the bands y/i/g. The full table is available at the CDS.
Column 2 gives an alternative name. Columns 3 and 4 correspond to the input (heliocentric) velocity V hel and the velocity measured on the profile V 50 . The line width W 50 and its uncertainty e W , as well as the S/N and the measured HI line flux are provided in Cols. 5-8, respectively. We note that we use the error codes e W = 100 km s −1 and e w = 500 km s −1 for confused spectra and non-detections, respectively. The inclination is listed in Col. 9, and the total B-band magnitude is provided in Col. 10. All non-HI parameters (input velocity, inclination, and magnitude) have been extracted from the LEDA database. All HI profiles and corresponding PanSTARRS optical images of the 628 galaxies observed can be found in Table 3, which is available electronically at the journal's homepage. The main problems faced during data reduction arose from RFI. The single peak RFIs are removed easily by interpolation and do not affect the final spectra and measurements of HI parameters.
However, a significant RFI located at 8500 km s −1 (or 1.381 MHz) has been encountered frequently. It may be generated by a radar or a GPS system used for nuclear tests. This RFI destroyed a significant amount of HI spectra of distant galaxies located between 8000 km s −1 and 9000 km s −1 . The RFI is visible, for example, on the HI spectra of PGC6177 located at cz = 8252 km s −1 , as shown in Fig. 6. The HI line on contaminated spectra is not visible at all as the amplitude of the RFI (up to ≈10 Jy) is much larger than the usual amplitude of an HI line Fig. 6. HI spectra of PGC6177 located at cz = 8, 252 km s −1 . A strong radio frequency interference (RFI) occurring frequently at 8500 km s −1 can be noticed, while the HI line is not visible.
(≈15 mJy). We chose to ignore integrations of a single scan contaminated by this RFI when deriving the final HI spectra. The FFT methodology (sigma clipping in Fourier space) suggested by Hunt et al. (2016) has been tested on contaminated spectra with no success. Please refer to this publication for more details on the method. Finally, 192 targets have been observed for which no HI lines have been detected, and 29 spectra are confused. Sadly, a significant amount of observing time has been lost on these targets, even though all targets have been deeply inspected prior to observing to check if an HI disk may be present.

New additions to the ADHI collection
The sky distribution of the 385 new additions to the ADHI collection is shown in Fig. 1. GBT galaxies are displayed as blue triangles. We focused on the northern celestial hemisphere to address the lack of the data coverage by Cosmicflows-3 in this region.
The redshift distribution of these galaxies is represented in blue in Fig. 2. Mostly distant galaxies (up to 10 000 km s −1 ) have been observed and added to ADHI.
The left panel of Fig. 3 shows the log of the integrated flux as a function of the log of the HI line width. We mostly observed massive galaxies with low flux. The right panel shows the distribution of the HI mass as a function of the redshift.

From Cosmicflows-3 to Cosmicflows-4
The sky distribution of all new line width measurements adequate for TF is shown in Fig. 7. The black triangles correspond to the CF3 catalog, while the red dots represent the new HI data. Most of the new data are located in the north in order to correct the north-south asymmetry in Cosmicflows-3. The imbalance in the number of galaxies is indeed slightly corrected. In CF3, ∼70% of the data were located in the south and only ∼30% in the north. After adding the new HI data presented in this paper, ∼55% of data should be in the south and ∼45% in the north.
In Fig. 8, we compare the redshift distributions of the CF3 data and the new HI data to be added to CF4. The CF3 data are represented in gray with a black dotted line, while the new HI data are over-plotted and shown in red. The combination of CF3 with these new data to produce CF4 is shown in Fig. 8 in gray with a solid black line. The contribution of the TF method to CF4 becomes less significant as cz increases. The new HI data added in the north are not distant enough to fully compensate for the lack of data in this region, compared to the south which is mainly covered by the fundamental plane method with 6dF.
We have updated the ADHI catalog by incorporating all galaxies with new HI observations at GBT and those with reprocessed Nançay HI measurements presented in this study, as listed in Table 4. Column 1 corresponds to the ID number of the target in the Principal Galaxy catalog (PGC). An alternative name is given in Col. 2. The source of the spectra and the telescope used to conduct the observation are given in Cols. 3 and 4, respectively. The heliocentric velocity V hel is provided in Col. 5. Main HI parameters are listed in Cols. 6-8.

Conclusion
We have acquired 385 high quality HI measurements from 628 targets using the GBT and 889 good measurements of the HI line width from 2692 remeasured Nançay spectra. These 1274 newly observed and reprocessed galaxies have been added to our ADHI catalog maintained at the Extragalactic Distance Database (EDD). Furthermore, we identify 1515 new ALFALFA galaxies with spectra that may be sufficient for TF distance measurement. In total, this paper brings an additional 2789 new galaxies that were not previously in our All Digital Catalog. However, this study does not fully counterbalance the celestial north-south asymmetrical coverage of the HI observations in the Cosmicflows catalogs. In the north, future HI surveys, such as Apertif in the Netherlands and FAST in China, could potentially improve this imbalance.