The ALPINE-ALMA [CII] survey Survey strategy, observations, and sample properties of 118 star-forming galaxies at 4 <z <6

The ALMA-ALPINE [CII] survey is aimed at characterizing the properties of a sample of normal star-forming galaxies (SFGs). The ALMA Large Program to INvestigate (ALPINE) features 118 galaxies observed in the [CII]-158 µ m line and far infrared (FIR) continuum emission during the period of rapid mass assembly, right after the end of the HI reionization, at redshifts of 4 < z < 6. We present the survey science goals, the observational strategy, and the sample selection of the 118 galaxies observed with ALMA, with an average beam minor axis of about 0 . 85 (cid:48)(cid:48) , or ∼ 5kpc at the median redshift of the survey. The properties of the sample are described, including spectroscopic redshifts derived from the UV-rest frame, stellar masses, and star-formation rates obtained from a spectral energy distribution (SED) ﬁtting. The observed properties derived from the ALMA data are presented and discussed in terms of the overall detection rate in [CII] and FIR continuum, with the observed signal-to-noise distribution. The sample is representative of the SFG population in the main sequence at these redshifts. The overall detection rate in [CII] is 64% for a signal-to-noise ratio (S / N) threshold larger than 3.5 corresponding to a 95% purity (40% detection rate for S / N > 5). Based on a visual inspection of the [CII] data cubes together with the large wealth of ancillary data, we ﬁnd a surprisingly wide range of galaxy types, including 40% that are mergers, 20% extended and dispersion-dominated, 13% compact, and 11% rotating discs, with the remaining 16% too faint to be classiﬁed. This diversity indicates that a wide array of physical processes must be at work at this epoch, ﬁrst and foremost, those of galaxy mergers. This paper sets a reference sample for the gas distribution in normal SFGs at 4 < z < 6, a key epoch in galaxy assembly, which is ideally suited for studies with future facilities, such as the James Webb Space Telescope (JWST) and the Extremely Large Telescopes (ELTs).


Introduction
The mass assembly in galaxies at different epochs is the result of several physical processes which, in combination, produce the remarkable observed evolution of the star formation rate density (SFRD) with cosmic time (Silk & Mamon 2012;Madau & Dickinson 2014;Dayal & Ferrara 2018, and references therein).The SFRD first rises during the reionization epoch to reach its peak at Z ∼ 2−3 following a ∼1 dex increase in ∼3 Gyr.It then decreases by ∼0.8 dex in ∼10 Gyr up to the current point in time (Madau & Dickinson 2014;Bouwens et al. 2015).Along with star formation, the total stellar mass density (SMD) in galaxies is observed to rise steeply from early timescales to z ∼ 2, followed by a slower increase at z < 2 (Ilbert et al. 2013).
The key element at the root of the SFRD and SMD evolution is the transformation of gas into stars within a hierarchical picture of galaxy assembly.Two main processes are shown, thanks to increasingly detailed simulations, to drive this evolution: gas accretion and galaxy merging (Hopkins et al. 2006;Dekel et al. 2009;Bournaud et al. 2011;Naab & Ostriker 2017).We expect this to be tempered by feedback processes from gas expelled from galaxies by strong AGN or stellar jets and winds (Silk 1997(Silk , 2013;;Hopkins et al. 2008).While this is appealing from the standpoint of theory and simulation, there is actually very little observational evidence for a comprehensive, consistent, and quantitative picture, particularly among the early cosmic epochs, when the major phase of mass assembly is underway.Galaxy mergers, major and minor, are observed at all epochs (e.g., Conselice 2014), with a major merger rate increasing from the local universe to z ∼ 2 (e.g., Lotz et al. 2011;López-Sanjuan et al. 2013;Mantha et al. 2018), and possibly flattening to z ∼ 4−5 (e.g., Tasca et al. 2014;Ventou et al. 2017), while gas accretion suffers from weak signatures that are difficult to identify observationally, and its effects are only identified indirectly (e.g., Bouché et al. 2013).On the other hand, feedback processes are directly measured (e.g., Le Fèvre et al. 2019) and thought to affect both the bright and faint end of the galaxy luminosity function (LF; e.g., Croton et al. 2006;Hopkins et al. 2008;Gabor et al. 2010Gabor et al. , 2011)), which is due primarily to AGN and stellar processes, respectively.
To disentangle the relative contributions of these processes, the far infrared (FIR) domain that is redshifted in the sub-mm for high-z galaxies is proving to be a particularly rich resource of information.From the sub-mm, it is now possible to investigate the properties of star-forming galaxies up to the epoch of HI reionization.The [CII]-158 µm line is the dominant coolant, making it one the strongest FIR lines.The [CII] emission is primarily coming from photo-dissociation regions (PDR) and cold neutral medium (CNM) of molecular clouds.
[CII] at high-z has raised considerable interest as it probes the gas which stars form out of in normal galaxies (e.g., Ferrara et al. 2019, and references therein).It also broadly traces star formation activity, offering an important window on galaxy formation (Carilli & Walter 2013;De Looze et al. 2014).This has led to the detection of strong [CII] emitters, up to very high redshifts (Capak et al. 2015, Carniani et al. 2017), which is an easier measurement than the FIR continuum.Searching for [CII] emission, interpreting and simulating the observations, and comparing them with other emission lines, such as Lyman-α, has therefore become a major new way of studying high-z galaxies.The strong UV radiation in high-z galaxies results in a non-negligible fraction of [CII] emission from the extended warm interstellar medium (ISM; e.g., Capak et al. 2015;Faisst et al. 2017).The [CII] emission together with its morphology provides important information on the SFR and ISM properties (e.g., Vallini et al. 2015;Wellons et al. 2016;Olsen et al. 2017).The FIR continuum emission adjacent to [CII] is near the peak of the FIR emission.It constrains the total FIR luminosity and provides a measurement of the total SFR when combined with UV continuum measurements.It can also be combined with UV colors and luminosity to construct the infrared-excess (IRX, defined as L FIR /L UV ) versus UV color diagnostic, providing insights into the spatial distribution of dust, dust grain properties, and metallicity (Reddy et al. 2012;Faisst et al. 2017).
Simulations of galaxy formation during and right after reionization provide information on the possible properties of these galaxies despite the difficulty in taking into account early galaxyformation processes during and right after the Epoch of reionization in a consistent way (e.g., Dayal et al. 2013;Maiolino et al. 2015).Specific predictions related to [CII] emission are useful for guiding and making comparisons with the observations (e.g., Yue et al. 2015;Vallini et al. 2015;Olsen et al. 2017;Kohandel et al. 2019).
This whole domain opened up at high redshift (z > 4) with the ALMA interferometer becoming fully operational when pilot observations took note that detecting [CII] for normal galaxies was ubiquitous even with short on-source exposure times (Capak et al. 2015).Galaxies with star formation rates as low as a few M yr −1 have been detected in [CII] at z ∼ 5 (Riechers et al. 2014;Capak et al. 2015), and [CII] is now detected for galaxies well into the reionization epoch (e.g., Bradač et al. 2017;Smit et al. 2018;Harikane et al. 2018;Fujimoto et al. 2019;Sobral et al. 2019;Hashimoto et al. 2019).
However, existing observations of [CII] in normal galaxies at these epochs are still scarce.As strong sub-mm sources have primarily been targeted (e.g., Maiolino et al. 2009;Carilli & Walter 2013;Wagg et al. 2012;Riechers et al. 2014), they provide us with a view biased towards the intensely star-forming population with SFR > 1000 M yr −1 .Normal galaxies, that is, galaxies with SFR in the range from ∼10 up to a few hundred solar masses per year, lying on the so-called main sequence at these redshifts (e.g., Speagle et al. 2014;Tasca et al. 2015;Tomczak et al. 2016;Pearson et al. 2018;Khusanova et al. 2020), have not been observed in statistically representative numbers.The Capak et al. (2015) observations proved that this was feasible and prompted us to submit the ALPINE Large Program, which has been largely designed based on the properties of the Capak et al. (2015) sample.A key element was the availability of large samples of these normal galaxies, with accurate spectroscopic redshifts (Le Fèvre et al. 2015;Hasinger et al. 2018), to be capable of defining ALMA observations with a high success rate in detecting [CII].
This paper presents a general overview of the survey, in combination with papers presenting a detailed account of the data processing, [CII] flux and continuum measurements (Bethermin et al. 2020), and ancillary data with physical parameters computation Faisst et al. (2019).The layout of this paper is as follows.In Sect.2, we present the general ALPINE survey design, with our science goals and sample selection as in the original proposal.We give an overview of the ALMA observations, as well as a summary of the large amount of ancillary observations in Sect.3. In Sect.4, we describe the main properties of the sample, including the redshift distribution, detection rates in [CII] and continuum, and observed flux limits.Flux maps of all sample galaxies in the [CII] line are presented in Sect. 5. We use these maps, kinematic data, and all ancillary imaging data to perform an empirical visually-based morpho-kinematic classification as described in Sect.6.We summarize our findings in Sect.7. Throughout the paper, we use a ΛCDM cosmology with H 0 = 70 km s −1 Mpc −1 , Ω Λ = 0.70, Ω m = 0.30.All magnitudes are given in the AB system.

Science goals
The main goals of ALPINE at 4 < z < 6 are broadly defined as follows: Characterize the use of [CII] as a SFR indicator at these epochs.The prevalence of [CII] in high-z galaxies is a promising tool to estimate SFRs of FIR continuum-faint galaxies.While local studies find a good correlation between [CII] and SFR, this relation may change at the lower metallicities of high-z galaxies.There is also evidence that [CII] is often emitted from the diffuse CNM or HII regions in addition to PDRs (Herrera-Camus et al. 2017;Pineda et al. 2013;Vallini et al. 2015), and it also traces the difuse ionized gas; see (Pavesi et al. 2016(Pavesi et al. , 2019)).ALPINE allows calibrating this relation by comparing [CII] derived SFR to other indicators (FIR, UV, SED) over a large range of physical properties.
A comprehensive and precise (accuracy better than 20%) measurement of the SFRD at these epochs from the UV+FIR continuum and [CII] emission allows us to constrain the mechanisms which fuel the initial growth of typical galaxies in the early universe.The total SFRD at z > 4, a key epoch in galaxy assembly, is a crucial element in understanding galaxy formation.However, it remains a difficult observational measurement, as we do not yet know how much of the star formation is hidden from the wealth of existing deep UV observations.Only a survey of selected sources based on the FIR emission could solve the question but this is currently not feasible for a statistically representative sample given the small field of view of the ALMA telescope.With ALPINE, we follow a stepped approach.Starting from well-studied sources in the UV, the goal is to obtain FIR continuum and [CII] measurements to measure the fraction of their star formation that is hidden by dust.Combined with the SFR derived from the UV continuum, this would deliver the total star formation of UV-selected samples.
Estimate the remaining fraction of the star formation that may not traced by UV sign-posts.In addition to the above, we aim to use the serendipitous survey assembled by ALPINE on a total area of about 25 square arcminutes to estimate what fraction of star formation is missed in obscured sources.Together these measurements will result in a first estimate of the [CII] luminosity function and the total SFRD at 4 < z < 6 consolidated from the UV and FIR.
The first detailed characterization of the ISM properties using L FIR /L UV and [CII] /FIR diagnostics.The evolution of [CII] emission and its resolved velocity profile provide important information on the SFR and ISM properties, setting constraints on the dynamical and gas masses of galaxies.The morphology of the [CII] emission, and more generally of the gas distribution, indicates whether star formation is compact or extended, which is an important element to understand with regard to highz star formation (e.g., Wellons et al. 2016).Beyond the line flux, the FIR continuum emission constrains the total FIR luminosity and provides a good measurement of the total SFR when combined with UV continuum measurements.The continuum flux can also be combined with UV color and luminosity to construct the IRX-β diagnostic, providing insight into the spatial distribution of dust, dust grain properties, and metallicity (Reddy et al. 2012;Faisst et al. 2017).The [CII] line has also been used as a tracer of the molecular gas content (Hughes et al. 2017;Zanella et al. 2018), which is shown to be more reliable in lowmetallicity environments than, for example, CO.
Star-forming main-sequence and merger rates at z > 4. The SFRD across cosmic time as well as the shape and scatter of the star forming main sequence (MS) in the SFR-stellar mass plane provide important constraints on the starburst duty cycle and merger rate of galaxies (Rodighiero et al. 2011;Guo et al. 2013;Tomczak et al. 2016;Tacchella et al. 2015) and ultimately their mass growth (cold accretion vs. merger paradigm Dekel et al. 2009;Davé et al. 2011;Tasca et al. 2014;Faisst et al. 2017).The independent SFR measurements from FIR and [CII] in addition to the SED fitting is meant to help constrain the true scatter of the MS at z > 4 as a function of stellar mass.The comparison to Hα based SFRs on <100 Myr timescales from Spitzer colors (Faisst et al. 2016), or later from JWST spectroscopy, is aimed at allowing us to put constraints on the starburst duty cycles.Following the serendipitous [CII] detections of UV-faint galaxies in interaction with the main targets (extrapolating from Capak et al. 2015, we expected to find ∼15-30 mergers in ALPINE), which allow us to model the merger rates and thus constrain the dominant mode of mass build-up of galaxies in the early universe.
A first measurement of dynamical masses from spectrally resolved [CII], combined with stellar masses and statistical estimates of dark matter halo masses to measure dust content, gas fraction, and their evolution.The [CII] line is an excellent tracer of gas dynamics, which can be used in the same way as Hα at z < 2 (e.g., Förster Schreiber et al. 2009;Epinat et al. 2012;Molina et al. 2017).The ALPINE observations are aimed at providing a first approach at dynamical measurements at 4 < z < 6, deep into the epoch of early galaxy formation.Dynamical masses (M dyn ) can be estimated from the velocity dispersion, σ vel , derived from the [CII] line emission, while the brightest galaxies resolved in [CII] enable a direct measurement of the [CII] extent.It is expected that resolved galaxy sizes (Ribeiro et al. 2016) lead to a useful dynamical mass measurement for ∼20% of the sample.Upper limits are derived for the rest of the sample.
Gas fractions at z > 4: Comparing M dyn to reliable emissionline-corrected stellar masses from deep Spitzer imaging (Laigle et al. 2016;Faisst et al. 2019) and gas mass estimates from FIR measurements (e.g., Scoville et al. 2016) puts the first direct constraints on the gas fractions f gas at z > 4. The contribution of dark matter to the total velocity component at 1-2 R e , where R e is the effective radius of the galaxy, is expected to be low (Barnabè et al. 2012), but can be estimated by matching our sample to the output of state-of-the-art hydro-dynamical simulations such as EAGLE (McAlpine et al. 2016) FIRE (Hopkins et al. 2014), or Illustris TNG (Davé et al. 2019), or estimating average dark halo masses from HOD modeling of the correlation function (Durkalec et al. 2015).Combined with SFRs derived from UV, FIR, [CII], and Hα, this will constrain SFR efficiencies and gas depletion times, thus providing insight into the emergence and growth of massive galaxies (Tacconi et al. 2013(Tacconi et al. , 2018;;Genzel et al. 2015;Dessauges-Zavadsky et al. 2017).
The role of feedback processes in the early Universe.The prevalence of feedback processes from galactic winds produced by massive stars, supernovae and AGN are studied using kinematic diagnostics and [CII] line profiles (see from ALPINE, Ginolfi et al. 2019).

Sample selection
The sample is drawn from large spectroscopic survey samples of normal SFGs in the COSMOS (Scoville et al. 2007a) and ECDFS fields (Giacconi et al. 2002).
A key element of the selection is based on galaxies having a reliable spectroscopic redshift in 4.4 < z spec < 5.9 (< z spec >∼ 4.7), excluding 4.65 < z < 5.05 (where [CII] falls in a low transmission atmospheric window).Galaxies are UV-selected (see Le Fèvre et al. 2015, for more details) with L UV > 0.6L * to include most of the star formation traced by the UV, and excluding type I AGN identified from broad spectral lines.Accurate redshifts come from extensive spectroscopic campaigns at the VLT (VUDS, Le Fèvre et al. 2015) and Keck (DEIMOS, Hasinger et al. 2018).While the VUDS sample is unbiased against Lyman-α emitters or absorbers (Khusanova et al. 2020), Lyman-α emitters may be over-represented in the DEIMOS sample; this will need to be taken into account in subsequent analysis of the whole ALPINE sample.Galaxies in the parent sample were mostly selected based on their photometric redshifts based on SED fitting, followed by UV spectroscopy to secure the redshift (see Le Fèvre et al. 2015;Hasinger et al. 2018 for more details).The absolute UV luminosity cut (M UV < −20.2) is equivalent to SFR > 10 M yr −1 , as seen in Fig. 1.Assuming the De Looze et al. (2014) relation, this SFR limit is equivalent to L [CII] > 1.2 × 10 8 L .Some galaxies with SFR below ∼1 solar masses per year were included when made possible by the observational setup.This sample is representative of the overall SFG population, rather than ultra luminous infrared galaxies (ULIRGS), that is, mostly of galaxies positioned on or near the so-called main sequence in the SFR versus M star plane observed at these redshifts (e.g., Speagle et al. 2014;Tasca et al. 2015;Tomczak et al. 2016;Pearson et al. 2018;Khusanova et al. 2020), with M star and SFR derived from SED fitting setting the redshift to the spectroscopic redshift.We selected 118 galaxies based on these criteria.More details on the sample properties are given in Sect. 4.  This program was awarded an ALMA Large Program status under number 2017.1.00428.L for a total of 69 h, including onsource time, calibrations, and overheads.ALMA observations were carried out in Band-7 starting in May 2018 during Cycle 5 and completed in February 2019 in Cycle 6.Each target was observed for about 30 minutes and up to one hour of integration time, with the phase centers pointed to the UV rest-frame positions of the sources.The availability of spectroscopic redshifts allowed us to accurately set the main spectral window on the expected [CII] frequencies.In order to minimize overheads, targets making use of a similar setup were grouped.Two spectral windows were placed in the side band containing the [CII] lines of the two grouped sources.In addition, we placed two additional spectral windows to cover the other side band.They are used for FIR continuum measurements, an important component of this program as continuum measurements can be used as a proxy for the total FIR, therefore, for the SFR.The main calibration for the [CII]-SFR relation partly relies on those galaxies with continuum data.

Observational data
At these redshifts, the velocity width of one ALMA bandpass in band-7 is as narrow as ∼3000 km s −1 per sideband.Samples with photometric redshifts accurate to ∼0.05 × (1 + z) at these redshifts (e.g., Ilbert et al. 2013) would have added a considerable uncertainty on the detection of [CII] and the associated incompleteness, making it more hazardous to build, for example, the [CII] luminosity function.With an accuracy of a few hundred km/s even at low spectral resolution from optical (UV rest-frame) spectroscopy (Le Fèvre et al. 2015), the availability of z spec is therefore a key element of this program that would ensure a high [CII] detection rate and, for those galaxies which would be undetected, setting stringent upper limits.
We use the TDM mode of the ALMA correlator, which offers the largest bandwidth to optimize the continuum sensitivity.The resolution varies with redshift from 26 to 35 km s −1 .We assume 235 km s −1 FWHM line width (or sigma ∼100 km s −1 ), which is the average width measured in the Capak et al. (2015) sample.Emission lines were thus expected to be spectrally resolved, giving the possibility to measure the line width when the signalto-noise ratio (S/N) is sufficient.
We prioritized detection over spatial resolution, and with the typical size of the Capak et al. (2015) sources being 0.5-0.7 arcsec, we elected to use ALMA array configurations (C43-1 or C43-2), offering a beam size not smaller than 0.7 arcsec to prioritize detection over spatial resolution.The minimum achieved minor-axis size of the ALMA beam is 0.72 arcsec and the average beam size is 0 .85 × 1 .13.
We provide the ALPINE source list in Table A.1, including (RA, Dec) positions, spectroscopic redshift, [CII] S/N, when detected above 3.5σ.This is the threshold at which our simulation indicates a 95% reliability.As explained in Bethermin et al. (2020, see Sect. 6.2 for a full description of the purity estimate), where the data analysis process and the validation simulations are described, we can use this low detection threshold because we are searching for S/N peaks close both spatially and spectrally from candidates and the number of detectable objects is high.In other words, searching for a line in a reduced number of beams and channels reduces the expected number of spurious detections from noise fluctuations.

Ancillary data
The choice of the COSMOS and ECDFS fields for the selection of the sample is driven by the availability of z spec as described above and also by the large suite of multi-wavelength data available in these fields, as is extensively described in Faisst et al. (2019).These data enable us to construct the SED for each galaxy, which is essential for measuring the fundamental physical parameters needed to assess the general properties of the observed galaxy population.
All ALPINE targets in the COSMOS field (Scoville et al. 2007b) are covered by HST F814W i-band imaging (Koekemoer et al. 2007), Subaru optical imaging (Taniguchi et al. 2007), deep NIR YJHK imaging from the UltraVista Survey (McCracken et al. 2012), SPITZER 3.6 and 4.5 µm imaging (Sanders et al. 2007), with good X-ray coverage with both XMM-Newton (Hasinger et al. 2007) and Chandra (Elvis et al. 2009;Civano et al. 2016), as well as radio waves (Smolčić et al. 2017).While most of the imaging is done under point-spread functions (PSF) with similar spatial resolution as the ALMA imaging, typically FWHM ∼ 0.8 arcsec, the HST F814W i-band imaging provides a sharper look with a PSF ∼ 0.1 arcsec.A source catalog with matched photometry is available as described in Laigle et al. (2016).The data that we use in the area of the ECDFS is from the CANDELS survey (Grogin et al. 2011;Koekemoer et al. 2011).The derivation of key physical quantities from the SED fitting, including stellar mass, SFR, dust extinction, is described in detail in Faisst et al. (2019).

Redshift distribution
The redshift distribution of the ALPINE sample is presented in Fig. 2 for all galaxies as well as those with [CII] detected at more than 3.5σ above the noise.There are two separate peaks in the N(z) owing to the atmospheric visibility windows.
Comparing the redshifts obtained from the UV spectra to those obtained from [CII], which defines the systemic redshift of the gas component, allows us to probe the velocity difference between the stellar and gas components.This is explored in Faisst et al. ( 2019

Detection rate in [CII] and continuum
The S/N of the integrated [CII] line flux detections are presented in Fig. 3.The median S/N is 6.2 for the detected objects.See Bethermin et al. (2020) for more details.Taking 3.5σ as a conservative detection limit (purity > 95%) owing to the somewhat correlated noise of ALMA interferometric imaging, ALPINE detected [CII] in 75 galaxies out of 118, hence a success rate of 64% (40% for S /N > 5) as presented in Table A.1.In the continuum adjacent to [CII], 25 galaxies, or 21% (9% for S /N > 5), are detected with S /N > 3.5 (95% purity, Bethermin et al. 2020).These rates are quite impressive given the redshift of the sources and short integration times.The lower continuum detection rate is as expected from the SED models in the FIR (Béthermin et al. 2017).Stacking of the continuum data will allow to place useful constraints on the fainter emitters.The S/N of most other targets varies in the range from 0.5 to 3, providing useful upper limits.
We identified a number of line emitters detected serendipitously in the data cubes.These are to be matched with the ancillary source catalogs in these fields in the aim of identifying whether the line is [CII] at the redshift of the targeted sample, or some other line at lower redshift (Loiacono et al. 2020).

Positional offsets between [CII] and UV rest-frame
Positional offsets between the [CII] flux distribution and the UV rest-frame images have been reported for galaxies at similar redshifts as the ALPINE sample (e.g., Carniani et al. 2017).Offsets from frame centers representing the UV position are clearly evident for some objects on Figs. 4 through A.5.A preliminary analysis shows that for most (90%) of the sources the differences in RA and Dec are well represented by Gaussians centered at zero offset with a sigma of about 0 .18, which is consistent with the typical uncertainty in position of the ALMA sources, and there is a significant offset (up to a maximum of ∼1 ) present only for 10% of the sources.See Faisst et al. ( 2019) for more details.which can be a posteriori confirmed to be different from the UV (see Bethermin et al. 2020).These flux maps are compared to HST i-band F814W images (Koekemoer et al. 2007(Koekemoer et al. , 2011) ) in Figs. 5, A .2, A.4, and A.6), representing the UV rest-frame, with [CII] contours overlaid.

[CII] flux maps
These images give a first view of the shape of the [CII] emission in normal galaxies at 4 < z < 6.There are several facts worth noting.Even though the observations were carried out with a beam size providing moderate spatial resolution with FWHM ∼ 0.7 , about two thirds of the sources are resolved in [CII].This means that intrinsic (total) sizes as seen in atomic gas must be about the size of the beam, or a significant fraction thereof, hence physical source sizes reaching several kiloparsecs.By itself this fact gives an indication that physical processes at work in those galaxies are puffing up their sizes beyond being compact.Another striking evidence from these images is the large diversity of [CII] emission morphology.Some objects appear as very extended (e.g., bottom-left object COSMOS 881725 in Fig. A.3), some others with double mergerlike components (e.g., third row, second-from-left object COS-MOS 351640 in Fig. 4), while others are compact (unresolved).This diversity must also reflect a diversity in the physical processes at work.
We focus more on this in attempting a visual classification in the following section, Sect.6, and future papers will concentrate A1, page 5 of 19  on the quantitative properties of the [CII] emission in the different classes of the ALPINE sample.

Morpho-kinematic classification
As presented in Sect.5, the [CII] emission appears spatially very diverse.As we have at our disposal not only the flux maps, but also the full (α, δ, velocity) 3D data-cubes for all sources as well as all ancillary information presented in Sect.3.2, we are able to perform an empirical visual-based morpho-kinematic classification, as discussed below.
The observational data were assembled in one slide per object, including the ALMA data with flux map, the channel maps cut in 25 km s −1 velocity intervals, the velocity field map (moment 1), position-velocity (PV) diagrams projected along the major and minor axes of the velocity map, and the integrated 1D [CII] spectrum, together with multi-band optical and NIR images including HST ACS F814W i-band and/or WFC3-F160W images when available, as presented in Fig. A.7.All these data served as input to object classification described below.
Following previous work from 3D integral field spectroscopy using Hα (e.g., Förster Schreiber et al. 2009;Epinat et al. 2012), we define the following morpho-kinematic classes: -Class 1: Rotator.This class is defined for galaxies satisfying the following criteria: -Smooth transition between intensity channel maps -Obvious gradient in velocity field map ( Moment All the observational material described above was then visually inspected, independently, by eight people in the team, each assigning a class to each ALPINE object.This provided a statistical basis to estimate the mode of the classification for each object and a rough dispersion obtained as the average of the difference between the mode and each individually-measured class.To mitigate somewhat the well-known effect of a dominant class not necessarily being "the truth" (e.g., most people missing some key evidence while only a few spotted it), the mode and extremes of the classification were examined by two people who proposed to the team the class satisfying all identified evidence.One last iteration was then performed, with individual team members asked to identify the objects for which they were in most disagreement with, followed by a last round to agree on a final classification.This led to the final classification listed in Table A.1.
In order to classify these galaxies using our qualitative methods, bright line emission must be present in multiple consecutive channels.For a given S/N, a narrow line will thus be much easier to classify than a broad line, as the S/N per channel is higher.This is seen in VUDS_COSMOS_510605533 (S /N [CII] = 4.9, class 5, broad line) vs CANDELS_GOODSS_21 (S /N [CII] = 4.2, class 4, narrow line).While the S /N CII in Table A.1 is an indicator of the overall line strength, the classification depends on both the S /N [CII] and FWHM.
It is also possible that noise peaks in the data cube could be interpreted as galaxy components, earning the galaxy a merger classification (class 2).To reduce this error, we examined the integrated spectrum, moment maps, and channel maps, ignoring emission peaks that were narrow and kinematically distant from the galaxy.It should be noted that simulations of z > 4 galaxies (e.g., Pallottini et al. 2017;Kohandel et al. 2019) show many minor satellites around each galaxy on small scales, which suggests that our relatively low resolution survey may underestimate the true number of mergers.
The distribution in the different classes is presented in Fig. A.8.We find 13.3% rotating discs in class 1, 40% of galaxies in the merger class 2, 20% in the extended and dispersion dominated class 3, 10.7% compact in class 4, and the remaining 16% of the sample being too difficult to classify (class 5).If we consider only the S /N [CII] > 5 objects, the distribution is slightly different: 17% of rotating discs, 51% of mergers, 32% of extended and dispersion-dominated systems, 0% of compact systems, and 0% of objects too faint to be classified.It is not surprising that there is no object in this last category at higher S/N.No object is also found in the compact class.This may be caused by the higher S/N threshold, which could be biased against lower mass and lower SFR systems.We note that the relative contribution of the three other classes (1, 2, and 3) does not change significantly (<2σ).We find that while our sample contains both mergers and rotators, there is also a large number of dispersion-dominated sources.This diversity of kinematic classes for a sample of galaxies with similar SFRs and UV characteristics suggests that the evolutionary tracks of these galaxies in the early Universe (0.9-1.4 Gyr after the Big Bang) already had significant variations.Future systematic studies of the morpho-kinematic classification as a function of the galaxy properties will be presented in future papers (Romano et al. in prep., Jones et al. in prep.).
We note the high fraction of mergers (40%), which indicates that mass assembly through merging is frequent at these redshifts for normal main sequence SFGs.This value is significantly higher than that of Ventou et al. (2017, 8.3 +7.6  −3.7 , % at 4 < z < 6), but their objects have log(M ) < 9.5 while most of ALPINE objects are more massive.The methods for classifying mergers and the observed wavelength are also very different.A preliminary examination of spatial and velocity information indicates that most merging systems would merge within 0.5 to 1 Gyr (e.g., the triple merger system presented by Jones et al. 2019), which then means that most of these mergers would end up forming one single galaxy by z ∼ 2.5.Merging systems observed at sub-mm wavelengths have been reported previously, but for more starburst-like objects (e.g., recently, Danielson et al. 2017;Tadaki et al. 2018;Zhang et al. 2018;Díaz-Santos et al. 2018;Hodge et al. 2019).The presence of extended [CII] nebulae in class 3 is also quite striking as an indication that large extended gas reservoirs were readily available to fuel star formation right after reionization ended.
Simulations and extrapolations from local studies suggest that the merger rate was high at z > 4 (e.g., Mantha et al. 2018;Kohandel et al. 2019) and targeted [CII] observations of normal z > 4 galaxies have revealed some galaxies with clumpy morphologies (Carniani et al. 2018) which are likely caused by mergers (Calabrò et al. 2019).But several observations have revealed evidence for ordered rotation (e.g., De Breuck et al. 2014;Jones et al. 2017;Smit et al. 2018;Bakx et al. 2020;Tadaki et al. 2020), so the true distribution of kinematic states has been unknown.For the first time, we can examine the kinematics of a statistically significant number of galaxies, in order to determine this.
The properties of galaxies in these different samples will be extensively described in future papers from the ALPINE team.It may well be that some objects in class 4 would appear as rotators when observed under higher spatial resolution, as seen in near-IR integrated field observations of z ∼ 2 massive galaxies (e.g., Förster Schreiber et al. 2018).On the other hand, galaxies at z = 1−3, with stellar masses comparable to the ALPINE sample, show a larger proportion of dispersion-dominated galaxies than galaxies with higher masses.

Summary and conclusions
The ALMA-ALPINE [CII] survey (A2C2S) provides an unprecedented view of a representative sample of 118 starforming galaxies observed in their assembly right after the end of HI reionization at redshifts 4 < z < 6. Galaxies are selected on the basis of an existing reliable spectroscopic redshift and using the SED-based SFR to predict the [CII] flux using the De Looze et al. ( 2014) relation and selecting SFR such that L[CII] > 1.2 × 10 8 L .The overall detection rate is 64% for galaxies detected in [CII] 3.5σ above the noise, 21% in the continuum.We present the ALPINE survey strategy and sample properties, along with the the projected [CII] flux maps.Combining these maps with velocity channel maps, velocity field, and all available ancillary information, we have established a classification scheme.We find a surprisingly wide range of galaxy types, including 40% mergers, 20% extended and dispersiondominated, 13.3% rotating discs, and 10.7% compact, with the remaining 16% too faint to be classified.This diversity of types indicates that several physical processes are at work for the assembly of mass in these galaxies, first and foremost, for galaxy merging.While galaxy merging is commonly associated with starbursts above the main sequence, at least up to z ∼ 3, merging systems in ALPINE at z ∼ 4.7 lie mainly on the MS, and therefore, merging is also a dominant process for normal SFGs at this epoch.This will be further investigated in future papers.
The ALPINE sample offers a unique opportunity to study galaxies in the process of their assembling.This paper is the first in a series and future papers will present analyses of specific populations, as well as general statistical properties.ALPINE galaxies are ideally suited for follow-ups with such as the James Webb Space Telescope (JWST) and the Extremely Large Telescopes (ELTs).
3.1.ALMAHere we summarize the observations, pointing toBethermin et al. (2020) for an extensive description of observations and data processing, following the best practice with customized pipelines based on the Common Astronomy Software Applications (CASA) tools(McMullin et al. 2007).

Fig. 1 .
Fig. 1.Stellar mass vs. SFR distribution of ALPINE sources as selected prior to ALMA observations, colored by expected [CII] line peak flux estimated from Capak et al. (2015).M star and SFR were obtained from SED fitting of the multi-wavelength photometry available at the start of this program (see Faisst et al. 2019 for more details).The cross represents the average 1σ uncertainty.The large proposed sample aims to quantify average trends over a large range of galaxy properties.Those galaxies detected at >3σ in [CII] are identified with crosses.ALPINE provides [CII] emission line measurements at >3.5σ for 63% of these galaxies (see text).

Fig. 2 .
Fig. 2. Redshift distribution of the ALPINE sample, using the spectroscopic redshifts measured from the UV rest-frame spectra (Le Fèvre et al. 2015; Hasinger et al. 2018).The empty histogram is for all observed sources while the red shaded histogram is for those sources with [CII] measured at more than 3.5σ above the noise.

Fig. 3 .
Fig. 3. S/N of the integrated [CII] line flux for those sources with [CII] measured at more than 3.5σ above the noise.
Images in the [CII] line of sources with [CII] detected at more than 3.5σ are presented in Figs. 4, A.1, A.3, and Fig. A.5.These so-called [CII] "flux images" (flux maps hereafter) are produced using the CLEAN algorithm and the immoments routine, optimally extracted using an iterative process to determine the line profile of the source and then collapse channels [ f cen − FWHM; f cen + FWHM], where f cen is the central frequency of [CII],

Fig. 4 .
Fig. 4. Velocity-integrated [C II] flux maps obtained collapsing the cube channels containing the [C II] line (see text).Each panel is 5 × 5 or about 33 × 33 kpc at the mean redshift z = 4.7 of the survey, centered on the position of the source in the UV rest-frame based on HST-814W images.The object name and morpho-kinematic Class (see Sect. 6) are indicated on top of each sub-panel.

Fig. 5 .
Fig.5.HST F814W images(Koekemoer et al. 2007(Koekemoer et al. , 2011) ) corresponding to [CII] flux maps in Fig. 4.Each panel is 5 × 5 or about 33 × 33 kpc at the mean redshift z = 4.7 of the survey, centered on the position of the source in the UV rest-frame HST-814W images.The white contours represent the distribution of the [CII] flux from Fig. 4, and the grey-filed ellipse is the ALMA beam size.

1 )
-Tilted PV along the major axis, straight PV on the minor axis -Possible double-horned profile in spectrum -Single component in ancillary data -Class 2: Pair-Merger (major or minor), interacting system: -Complex behavior in channel maps -Separate components in flux maps or PV diagrams -Multiple components in ancillary data -Class 3: Extended Dispersion dominated -No positional shift of emission across intensity channel maps -Straight PV diagrams -Extended beyond the ALMA beam in flux maps -Class 4: Compact Dispersion dominated -No positional shift of emission across intensity channel maps -Straight PV diagrams -Unresolved in flux maps -Class 5: too weak to be classified

Fig
Fig. A.2. HST F814W images(Koekemoer et al. 2007(Koekemoer et al. , 2011) ) corresponding to [CII] flux maps in Fig. A.1.Each panel is 5 × 5 or about 33 × 33 kpc at the mean redshift z = 4.7 of the survey, centered on the position of the source in the UV rest-frame HST-814W images.The white contours represent the distribution of the [CII] flux from Fig. A.1, and the grey-filed ellipse is the ALMA beam size.

Fig. A. 3 .
Fig. A.3.Velocity-integrated [C II] flux maps obtained collapsing the cube channels containing the [C II] line (see text).Each panel is 5 × 5 or about 33 × 33 kpc at the mean redshift z = 4.7 of the survey, and centered on the position of the source in the UV rest-frame based on HST-814W images.The object name and morpho-kinematic Class (see Sect. 6) are indicated on top of each sub-panel.

Fig
Fig. A.4. HST F814W images (Koekemoer et al. 2007, 2011) corresponding to [CII] flux maps in Fig. A.3.Each panel is 5 × 5 or about 33 × 33 kpc at the mean redshift z = 4.7 of the survey, centered on the position of the source in the UV rest-frame HST-814W images.The white contours represent the distribution of the [CII] flux from Fig. A.3, and the grey-filed ellipse is the ALMA beam size.

Fig. A. 5 .
Fig. A.5. Velocity-integrated [C II] flux maps obtained collapsing the cube channels containing the [C II] line (see text).Each panel is 5 × 5 or about 33 × 33 kpc at the mean redshift z = 4.7 of the survey, and centered on the position of the source in the UV rest-frame based on HST-814W images.The object name and morpho-kinematic Class (see Sect. 6) are indicated on top of each sub-panel.

Fig. A. 6 .
Fig. A.6.HST F814W images (Koekemoer et al. 2007, 2011) corresponding to [CII] flux maps in Fig. A.5.Each panel is 5 × 5 or about 33 × 33 kpc at the mean redshift z = 4.7 of the survey, centered on the position of the source in the UV rest-frame HST-814W images.The white contours represent the distribution of the [CII] flux from Fig. A.5, and the grey-filed ellipse is the ALMA beam size.

Fig. A. 7 .
Fig. A.7. Summary panel with all information available for the classification of galaxies.Top left: velocity channel maps, in 25 km s −1 velocity intervals.Top right: optical and NIR images, with contours from the [CII] emission.Bottom, left to right: flux map in [CII] (red contours) and continuum emission (blue contours) overlaid on top of the i-band F814W HST image; velocity map with major and minor axes used to produce the PV diagram of the next two panels on the right; [CII] line emission in velocity (V = 0 being from the UV-derived spectroscopic redshift).This galaxy has been classified as a merger (Class 2).

Fig. A. 8 .
Fig. A.8. Distribution of morpho-kinematic classes in the ALPINE sample for sources with [CII] measured at more than 3.5σ above the noise.

Table A . 1 .
Properties of the ALPINE sample of galaxies with [CII] detected above 3.5σ, including position, spectroscopic redshift z spec derived from the UV spectra, S/N, as well as the object class as described in Sect.6.