GOODS-ALMA 2.0: Source catalog, number counts, and prevailing compact sizes in 1.1 mm galaxies

Submillimeter/millimeter observations of dusty star-forming galaxies with the Atacama Large Millimeter/submillimeter Array (ALMA) have shown that dust continuum emission generally occurs in compact regions smaller than the stellar distribution. However, it remains to be understood how systematic these findings are. Studies often lack homogeneity in the sample selection, target discontinuous areas with inhomogeneous sensitivities, and suffer from modest $uv$ coverage coming from single array configurations. GOODS-ALMA is a 1.1mm galaxy survey over a continuous area of 72.42arcmin$^2$ at a homogeneous sensitivity. In this version 2.0, we present a new low resolution dataset and its combination with the previous high resolution dataset from the survey, improving the $uv$ coverage and sensitivity reaching an average of $\sigma = 68.4\mu$Jy beam$^{-1}$. A total of 88 galaxies are detected in a blind search (compared to 35 in the high resolution dataset alone), 50% at $S/N_{peak} \geq 5$ and 50% at $3.5 \leq S/N_{peak} \leq 5$ aided by priors. Among them, 13 out of the 88 are optically dark or faint sources ($H$- or $K$-band dropouts). The sample dust continuum sizes at 1.1mm are generally compact, with a median effective radius of $R_{e} = 0"10 \pm 0"05$ (a physical size of $R_{e} = 0.73 \pm 0.29$kpc at the redshift of each source). Dust continuum sizes evolve with redshift and stellar mass resembling the trends of the stellar sizes measured at optical wavelengths, albeit a lower normalization compared to those of late-type galaxies. We conclude that for sources with flux densities $S_{1.1mm}>1$mJy, compact dust continuum emission at 1.1mm prevails, and sizes as extended as typical star-forming stellar disks are rare. The $S_{1.1mm}<1$mJy sources appear slightly more extended at 1.1mm, although they are still generally compact below the sizes of typical star-forming stellar disks.


Introduction
Galaxies luminous in the infrared (IR) and submillimeter/millimeter (submm/mm) wavelengths are intense starforming systems, some of which constitute the most powerful starbursts in the universe. They are the so-called dusty starforming galaxies (DSFGs; see Casey et al. 2014, for a review). Characterized by star formation rates (SFRs) of hundreds and up to thousands of solar masses per year (e.g., Magnelli et al. 2012;Swinbank et al. 2014;Simpson et al. 2015a), their high dust content absorbs the intense ultraviolet (UV) emission from the burst of star formation and radiates it at far-IR and mm wavelengths. Their redshift distribution peaks at z ∼ 2−3 (e.g., Chapman et al. 2005;Yun et al. 2012;Smolčić et al. 2012;Dudzevičiūtė et al. 2020) and they constitute a key galaxy population emitting half of the IR luminosity of the universe at z ∼ 2 (e.g., Magnelli et al. 2011, see also Pérez-González et al. 2005;Magnelli et al. 2009). Furthermore, being already massive galaxies (M * > 10 10.5 M , e.g., Wardlow et al. 2011;Hainline et al. 2011;Michałowski et al. 2012;Pannella et al. 2015) and capable of assembling large amounts of stellar mass very quickly owed to their high SFRs, and they have been proposed as progenitors of quiescent galaxies at high redshift and eventually the most massive elliptical galaxies in the local universe (e.g., Cimatti et al. 2008;Ricciardelli et al. 2010;Fu et al. 2013;Ivison et al. 2013;Toft et al. 2014;Gómez-Guijarro et al. 2018;Valentino et al. 2020).
From the first samples of mid-IR and far-IR bright galaxies uncovered by the IRAS satellite (e.g., Neugebauer et al. 1984;Rowan-Robinson et al. 1984;Elbaz et al. 1992) and the SCUBA bolometer at submm wavelengths (e.g., Smail et al. 1997;Hughes et al. 1998; Barger et al. 1998), the Atacama Large Millimeter/submillimeter array (ALMA) has recently opened a new era in the studies of DSFGs (see Hodge & da Cunha 2020, for a review). ALMA is capable of performing high-resolution and high-sensitivity observations. Its improved angular resolution enables secure galaxy identification of those otherwise affected by source confusion and blending singledish observations. Its sensitivity reaching submilliJansky (sub-mJy) levels (e.g., Carniani et al. 2015;Fujimoto et al. 2016;González-López et al. 2017) allows for one to access a less extreme DSFG population. Even more normal star-forming galaxies, such as those located within the scatter of the correlation between the SFR and the stellar mass of SFGs (the socalled main sequence (MS) of star-forming galaxies (SFGs), e.g., Noeske et al. 2007;Elbaz et al. 2007;Daddi et al. 2007), are detected at submm/mm wavelengths with ALMA (e.g., Papovich et al. 2016;Schreiber et al. 2017a;Dunlop et al. 2017). Therefore, ALMA observations are bringing together the previously existing gap between massive extreme starbursts and more normal MS-like SFGs, newly incorporated into the overall population of DSFGs as those galaxies luminous at IR and submm/mm wavelengths.
In the last years, thanks to the ALMA capabilities, a number of studies have uncovered that the submm/mm dust continuum emission occurs in compact areas smaller than the stellar sizes (e.g., Simpson et al. 2015a;Ikarashi et al. 2015;Hodge et al. 2016;Fujimoto et al. 2017;Gómez-Guijarro et al. 2018;Elbaz et al. 2018;Lang et al. 2019;Rujopakarn et al. 2019;Gullberg et al. 2019;Franco et al. 2020a). These findings are not only associated with the dust continuum, as other studies including CO lines (e.g., Puglisi et al. 2019) or radio emission (e.g., Jiménez-Andrade et al. 2019) have found more compact emission compared to the stellar sizes in these tracers as well. However, there are also examples of observations of more extended galaxy-wide dust continuum emission (e.g., Rujopakarn et al. 2016;Cheng et al. 2020;Sun et al. 2021;Cochrane et al. 2021) and simulations indicating that differential attenuation could play an important role in how observations compare the extent of the dust continuum emission to that of the stars (Popping et al. 2022). Therefore, it remains to be understood how systematic compactness is in DSFGs.
Many ALMA studies have been biased to follow-ups of galaxy samples, often targeting discontinuous areas and with inhomogeneous sensitivities. Recent ALMA blind surveys started to tackle these issues going from a deep pencil beam approach to larger areas at shallower depths (e.g., Walter et al. 2016;Dunlop et al. 2017;Franco et al. 2018;Hatsukade et al. 2018;Decarli et al. 2019;Zavala et al. 2021). However, another major roadblock in ALMA studies to date is the detection and measurement of accurate fluxes and sizes of sources spanning a wide range of intrinsic properties. While ALMA provides a tremendous advantage by resolving dust continuum sizes, single array configurations providing angular resolution sufficient to measure sizes of intrinsically compact sources could be missing more extended sources for which a coarser angular resolution would be better suited. Therefore, it has yet to be understood whether the current literature results have been affected by observational biases in order to answer to the question of how systematic compactness is in DSFGs.
In this work, we present GOODS-ALMA 2.0, an ALMA blind survey at 1.1 mm in the Great Observatories Origins Deep Survey South field (GOODS-South; Dickinson et al. 2003;Giavalisco et al. 2004). In this version 2.0, we present a new low resolution dataset and its combination with the previous high resolution dataset, GOODS-ALMA 1.0 (see Franco et al. 2018), improving the uv coverage and sensitivity. We aim at addressing the matters outlined above with the particularity of covering a large contiguous area using two array configurations at a similar and homogeneous depth over the whole field. The layout of the paper is as follows. An overview of the GOODS-ALMA survey, the observations and data processing involved in this work is in Sect. 2. In Sect. 3 we present the catalog of sources, including their flux and size measurements. Section 4 is dedicated to the study of the number counts, containing the relevant completeness and flux tests through simulations. We characterize source redshifts and stellar masses in Sect. 5, where we also report some new optically dark/faint sources. In Sect. 6 we discuss how systematic compactness is in DSFGs. We summarize the main findings and conclusions in Sect. 7.

Survey overview
GOODS-ALMA is a 1.1 mm galaxy survey carried out with ALMA Band 6 in GOODS-South. GOODS-ALMA 2.0 covers a continuous area of 72.42 arcmin 2 (primary beam response level ≥20%) centered at α = 3 h 32 m 30 s , δ = −27 • 48 00 at a homogeneous sensitivity with two different array configurations, to include both small and large spatial scales (see Fig. 1). Cycle 3 observations (program 2015.1.00543.S; PI: D. Elbaz) correspond to a more extended array configuration providing the highresolution small spatial scales (hereafter, high resolution dataset and abbreviated as high-res in figures). Cycle 5 observations (program 2017.1.00755.S; PI: D. Elbaz) concern a more compact array configuration supplying the low-resolution large spatial scales (hereafter, low resolution dataset and abbreviated as low-res in figures). The high resolution dataset was presented in Franco et al. (2018). In this 2.0 version, we present the low resolution dataset and its combination with the high resolution dataset (hereafter, combined dataset). In Table 1 we summarize the angular resolution and sensitivity of the high resolution, low resolution, and combined datasets.
The survey area corresponds to a field of ∼10 × 7 (15.1 Mpc × 10.5 Mpc comoving scale at z = 2). In order to cover this extension both the high and low resolution observations were designed as a 846-pointing mosaic separated by 0.8 times the antenna primary beam (half power beam width, HPBW ∼ 23 ), with each high and low resolution pointing centered at the exact same position. The pointings were grouped into six parallel and slightly overlapping ∼6.8 × 1.5 slice submosaics, with a position angle of 70 deg and composed of 141 pointings each.

Observations and data processing
Low resolution dataset observations were carried out between 2018 July 17 and 2019 March 22. Each slice submosaic was completed in three execution blocks (EBs), 18 EBs in total. The number of antennas ranged 42−47, in configuration C43-2. The array configuration was not the most compact of the ALMA configurations due to a scheduling problem. The shortest and longest baselines were 15.0 m and 360.5 m. The mean precipitable water vapor (PWV) ranged 1.16−2.90 mm. The resulting total on source exposure time was 14.39 h (∼2.40 h per slice submosaic). The correlator was set up in a single tuning centered at 265.0 GHz (λ = 1.13 mm) containing four spectral windows 1.875 GHz wide with 128 channels of 15.625 MHz each (17 km s −1 at 265.0 GHz) in dual polarization, centered at 256.0, 258.0, 272.0, and 274.0 GHz covering a bandwidth of 7.5 GHz in the frequency range 255.0625−274.9375 GHz. These frequencies correspond to the highest frequencies of the band 6, optimal for a continuum survey. This tuning is the same as that of the high resolution dataset observations. The radio quasar J0519−4546 was observed as a bandpass and flux calibrator in 10 EBs, J0522−3627 in three, J0423−0120 in three, and J2357−5311 in two EBs. The radio quasar J0342−3007 was observed as a phase calibrator in 10 EBs, while J0348−2749 was used in seven, and J0336−2644 in one EB. High resolution dataset observations were presented in Franco et al. (2018). The shortest and longest baselines were 16.7 m and 1808.0 m and the resulting total on source exposure time was 14.06 h (∼2.34 h per slice submosaic). For further details about the high resolution dataset observations, we refer the reader to Franco et al. (2018).
The Common Astronomy Software Applications (CASA; McMullin et al. 2007) version 5.6.1-8 was employed for data reduction using the scripts provided by the observatory. We processed both the high resolution and low resolution datasets in a similar manner to avoid introducing systematics. In the case of the high resolution dataset, it implies an independent data processing to the one already presented in Franco et al. (2018), leading to a 20% improved sensitivity when compared at the same angular resolution achieved by using the same natural weighting scheme. We inspected the visibilities and added additional flagging required besides those already included in the original scripts. We checked the flux calibration by verifying the calibrator flux density estimations. In order to reduce the computational time required for subsequent imaging we also averaged the calibrated visibilities in time over 120 s and in frequency over 8 channels.
Imaging was performed using the multifrequency synthesis algorithm collapsing all the frequency channels for continuum imaging. This is implemented within the CASA task TCLEAN, allowing for one to generate a dirty map. We decided to work with the dirty map instead of the clean map as: (1) the coverage in the uv plane is well sampled (see Fig. 2), which results in the absence of strong sidelobes; (2) the absence of very bright sources or a large dynamic range in flux densities; (3) the absence of very extended emission as the sources are marginally resolved, since the overall purpose is to measure global sizes. Imaging and deconvolution techniques suffer from specific issues related to the combination of datasets coming from multiple array configurations due the differences in the shapes of the synthesized dirty and clean beams, which requires to implement corrections (Czekala et al. 2021). Therefore, working directly on the dirty map is the best choice. This choice is also supported by the negligible difference (<1%) in the noise comparison between the dirty and clean maps. A natural weighting scheme was also chosen to get the best Notes. The columns show the slice ID, date, number of antennae, time on target, total time (time on target + calibration time), angular resolution, and sensitivity for the high resolution, low resolution, and combined datasets. point-source sensitivity, optimal for source detection. We combined each slice submosaic of the high resolution and low resolution datasets separately to produce both a high resolution and a low resolution dataset mosaic. Each of the high resolution and low resolution slice submosaics were also combined together to produce combined slice submosaics, which are also concatenated to form the combined dataset mosaic. All these combinations are always done following the original weight of each visibility.
In Table 1 we summarize the angular resolution and sensitivity of the high resolution, low resolution, and combined mosaics and slice submosaics. Since the observing conditions were slightly different during the observations of the different slices, these submosaics have small differences in angular resolution and sensitivity. The point spread function (PSF) associated to each slice submosaic are shown in Fig. 3. There are slight differences in the PSF profile, but these differences do not introduce systematics in neither the source detection nor the flux density measurements, the two pieces of the data analysis for which imaging was used and the resulting synthesized beam plays a role (see Sects. 3.1.1 and 3.2, for details). In particular, for flux density measurements, the beam used was that corresponding to the specific location of the source for which the measurement was carried out and, thus, there are no systematics associated to the location of a source in the map. Therefore, we did not apply any specific tapering to homogenize the beam, since that would be at the expense of the survey depth compromising the optimal strategy for source detection, while measurements in the untapered map do not lead to systematics. On average, the high resolution and low resolution mosaics have similar sensitivities of 89.0 and 95.2 µJy beam −1 at an angular resolution of 0. 251 × 0. 232 and 1. 33 × 0. 935, respectively (synthesized beam full width at half maximum (FWHM) along the major × minor axis). The combined mosaic reaches an average sensitivity of 68.4 µJy beam −1 at an average angular resolution of 0. 447 × 0. 418.

Noise map
In addition to the high resolution, low resolution, and combined image maps, we built a noise map for each of these datasets by using a sliding box sigma clipping methodology. Every 2 × 2 pixels in the image map, we calculated the standard deviation in a 10 × 10 box (200 × 200 pixels for a 0. 05/pix scale). This box is large enough to converge to the noise level, but small enough to reflect the local noise variations. The pixel step is below the scale where the noise varies significantly (i.e., FWHM) and saves computation time. The pixels with values above 3σ from the median value are masked (clipped). This procedure is repeated three times. After these clipping iterations the standard deviation is computed one last time and assigned as the noise level of the 2 × 2 pixels. The box slides across the mosaic until the entire map is covered. We note that these noise maps are built using the nonprimary beam corrected image maps. In Table 1 we summarize the noise levels of the high resolution, low resolution, and combined mosaics and slice submosaics.

Source detection
In order to detect the sources we employed the Python Blob Detector and Source Finder (PyBDSF 1 ), a tool designed to decompose radio interferometry images into sources. PyBDSF reads in the input image, calculates background rms and mean images, finds islands of emission, fits Gaussians to the islands, and groups the Gaussians into sources. A threshold for separating sources from noise is set, either using a constant hard threshold or a false detection rate algorithm. In the constant hard threshold method, the user defines a pixel threshold (σ p ), which corresponds to the signal-to-noise ratio (S/N) to identify an island of emission, and an island threshold (σ f ), which corresponds to the S/N that defines the boundaries of an island of emission. Islands of contiguous emission are identified by finding all the pixels greater that the pixel threshold. Starting from each of these pixels, all contiguous pixels (the surrounding eight pixels) higher than the island boundary threshold are identified as belonging to one island. Next, it fits multiple Gaussians to each island. If multiple Gaussians are fit and one of them is a bad solution then the number of Gaussians is decreased by one and the fit is done again, until all solutions in the island are good. After that, it groups nearby Gaussians within an island into sources. Once the Gaussians that belong to a source are identified, fluxes for the grouped Gaussians are summed to obtain the total flux of the source. The source position is set to be its centroid which, along with the source size, is determined from moment analysis.

Blind detection
In this work, we only used PyBDSF for source detection, but not for flux density or size measurements. We performed a blind 1 https://www.astron.nl/citt/pybdsf/ detection by running the process_image task with our own rms noise map built as explained in Sect. 2.3, overriding the internal background rms calculated by PyBDSF to have better control of the detection process. We used the hard threshold method with a grid of σ p = 3.0−6.0 and σ f = 2.0−4.0 in steps of 0.05, as opposed to the false detection rate algorithm to also have control of the false detection probability. Since the sources are only marginally extended and substructure is beyond the scope of the survey we activated the option of grouping by island (group_by_isl = TRUE), which assigns a single source per detected island. Besides, we included the sources for which the Gaussian fit for flux measurements failed (incl_empty = TRUE). These sources do not have a valid PyBDSF flux density measurement, but we do not use PyBDSF for flux density measurements, only for source detection. As shown in Sect. 2.2 the synthesized beam varies slightly over the image map. We tested whether the observed beam variation influences the detection method by running PyBDSF modifying the beam FWHM in the header of the image map within the FWHM range across the different slice submosaics (see Table 1). We found that the detection method does not depend on this beam variation. Therefore, we proceeded with a single blind detection carried out with a common average beam across the field.
In the left panel of Fig. 4 we show the pixel distribution of the combined dataset map. The right-hand side tail reflects the excess created by real sources. The sources detected in the image map are positive sources. However, the noise is Gaussian and negative detections also exist. The difference between positive and negative sources is a proxy for the number of real sources, being the purity defined as: We inverted the image map by multiplying it by −1 and repeated the detection procedure for the inverse image map. In the right panel of Fig. 4 we plot the number of positive and negative detections in the combined dataset map as a function of σ p for a fixed σ f = 2.7. Given the chosen detection options in the process_image task explained above, the number of detections does not depend on σ f . For consistency within the GOODS-ALMA survey we chose a value σ f = 2.7, equivalent to the floodclip threshold σ f = 2.7 used with the tool BLOBCAT (Hales et al. 2012) in Franco et al. (2018). A purity of p = 1 (sources detected with a purity of 100% associated to the absence of negative detections) is found for σ p ≥ 4.4 in the combined dataset map. This value is σ p ≥ 5.2 in the high resolution map and σ p ≥ 4.2 in the low resolution map. The number of sources with a purity of 100% is 44 in the combined map. Independently detected, this number is 8 in the high resolution map and 38 in the low resolution map. In Table 2 we present the main catalog of 100% pure sources detected in the combined dataset, labeling also which ones appear independently detected in the high resolution or low resolution datasets as 100% pure sources. We note that in Table 2 we include the detection S /N peak for each source, defined as the PyBDSF peak flux density divided by the average background rms noise of the island. This detection S /N peak is not strictly the same as the pixel threshold σ p . The former is usually higher by construction of the PyBDSF detection methodology. While the PyBDSF peak flux density is calculated by the code performing Gaussian fitting and then divided by the average background rms noise of the island to obtain the detection S /N peak , the pixel threshold σ p is the number of sigma above the mean for an individual pixel and, thus, it involves the read of the image and noise maps at a given pixel.

Prior-based selection
At σ p = 3.0, the blind detection results in 573 positive and 475 negative detections in the combined dataset map. A proxy for the expected number of real sources is N real = N pos − N neg = 98 ± 32 (where the uncertainty is calculated from Poisson statistics: ∆N real = (∆N pos ) 2 + (∆N neg ) 2 = ( N pos ) 2 + ( N neg ) 2 . This number is much larger than those in the 100% pure main catalog. Therefore, we created a prior-based supplementary catalog following the methodology presented in Franco et al. (2020b) using Spitzer/IRAC and the Karl G. Jansky Very Large Array (VLA) prior positions. The combined IRAC images reach a 5σ point source sensitivity of 26.7 and 26.5 AB mag at 3.6 and 4.5 µm, respectively. VLA observations in GOODS-South (PI: W. Rujopakarn) were taken at 3 GHz (10 cm), covering the entire GOODS-ALMA field reaching a rms noise of ∼2.1 µJy at an angular resolution ∼0. 3 (Rujopakarn et al., in prep.), and at 6 GHz (5 cm), around the HUDF with partial coverage of GOODS-ALMA reaching a rms noise of ∼0.32 µJy and an angular resolution of 0. 31 × 0. 61 (Rujopakarn et al. 2016).
First, the purpose is to get the distance within which a given ALMA source has a secure IRAC/VLA counterpart. We searched for IRAC/VLA counterparts in the 100% pure main catalog. In the case of IRAC we used the last publicly available catalog in GOODS-South by Ashby et al. (2015), which includes all the IRAC datasets listed above except for IGOODS, enough for the purpose of getting the distance within which a given ALMA source has a secure IRAC counterpart. For VLA, we employed the 3 GHz (10 cm) image catalog (Rujopakarn et al., in prep.). We note that in order to make the counterpart assignation we corrected the coordinates in the ancillary catalogs and images for the GOODS-South astrometry offsets reported in the literature when comparing with ALMA coordinates, except for the case of the VLA catalogs and images which do not suffer from such offsets. Franco et al. (2020b) reported a global astrometry offset of ∆RA = −96 mas, ∆Dec = 252 mas, but also a non negligible local offset caused by distortions in the original HST/ACS and WFC3 GOODS-South mosaics that can reach ∼0. 15 in the edges of the GOODS-ALMA field. We applied both in the present analysis.
Using the 100% pure main catalog we find that 44/44 have an IRAC counterpart. Among them, 35/44 have an IRAC counterpart located at ≤0. 4. However, we visually inspected the images Notes.
(1) ALMA source ID; (2) right ascension (J2000) in degrees of the ALMA source as detected by PyBDSF; (3) declination (J2000) in degrees of the ALMA source as detected by PyBDSF; (4) ID associated to the stellar counterpart in the ZFOURGE catalog (5) redshift with spectroscopic redshifts shown with three decimal digits; (6) stellar mass; (7) detection signal to noise ratio from PyBDSF, measured as the peak flux density over the local rms noise; (8) 1.1 mm flux density measured using a 1. 6 diameter aperture corrected from aperture losses and flux boosting; (9) source size given by the FWHM of a circular Gaussian model fit in the uv plane; (10) and (11) source presence in the high/low resolution datasets (Yes or No) and whether it was detected as a 100% pure source or using priors (100pur or Prior) in those datasets; (12)  . Therefore, ≤0. 4 is the robust counterpart search radius to look for further prior-based ALMA sources with either IRAC or VLA counterpart in the σ p = 3.0 blind detection. We calculated the probability of random association between a poten-tial ALMA source and an IRAC/VLA counterpart. In order to do this, we selected 100 000 random positions in the GOODS-ALMA field and checked for counterparts in the IRAC/VLA catalogs at a distance ≤0. 4. The probability of randomly finding an IRAC counterpart located at ≤0. 4 of a given ALMA source is 0.5% and 0.05% in the case of VLA. The number of negative ALMA sources with an IRAC counterpart is consistent with this estimation as there is one negative ALMA source with an IRAC counterpart (the probability calculation yields 475 · 0.5% = 2.4). The number of negative ALMA sources with a VLA counterpart is zero (the probability calculation yields 475 · 0.05% = 0.2). Second, we validated the ≤0. 4 counterpart radius by checking the stellar masses of the counterparts. We checked whether there are negative detections with a massive counterpart at ≤0. 4, providing the number of expected spurious detections. The FourStar galaxy evolution survey (ZFOURGE; PI. I. Labbé) provides the deepest K s -band image with a total 5σ sensitivity of up to 26.5 AB mag, after combining their own survey image with all the other K s -band images available in GOODS-South. We looked for counterparts in the K s -band selected ZFOURGE catalog by Straatman et al. (2016). There are no negative detections at log(M * /M ) ≥ 9.0 (see Fig. 5, panels C and D) for IRAC/VLA ≤0. 4 counterparts in the σ p = 3.0 blind detection. Therefore, we selected all the σ p = 3.0 detections with an IRAC or VLA counterpart at ≤0. 4 with stellar masses log(M * /M ) ≥ 9.0. In Table 3 we present the supplementary catalog of prior-based selected sources detected in the combined dataset, adding another 44 sources. We include the detection S /N peak for each source as defined in Sect. 3.1.1. We also label which ones appear independently in the high resolution or low resolution datasets at σ p = 3.0.
Finally, we checked the limits of the prior methodology. For IRAC, at ≤1. 2 and log(M * /M ) ≥ 10.0 there is still an excess of positive sources with a massive counterpart associated compared to negative detections (see Fig. 5, panel E), expecting around three to be spurious. For VLA, there are still no negative detections found at log(M * /M ) ≥ 9.0 for counterparts at ≤1. 0 (see Fig. 5,panel F). We report these extra 16 sources as uncertain sources (i.e., sources with either an IRAC counterpart at ≤1. 2 with stellar masses log(M * /M ) ≥ 10.0 or a VLA counterpart at ≤1. 0 with stellar masses log(M * /M ) ≥ 9.0, see Table B.1), but we did not employ them for further analysis.
As a sanity check, we visually inspected the IRAC 3.6 and 4.5 µm ultradeep images and VLA 3 GHz maps at the positions of ALMA source candidates and tagged "real" counterparts by our own personal judgement. We created an alternative priorbased catalog leading to the exact same result compared to the analysis using the catalog counterparts and the stellar mass validation criteria.
In comparison with the high resolution dataset analysis presented in Franco et al. (2018Franco et al. ( , 2020b there are three sources (namely AGS32, AGS33, and AGS39) reported in Franco et al. (2020b) using the prior methodology that do not appear in the high resolution dataset in this work, although they are found in the low resolution dataset (A2GS20, A2GS27, and A2GS39, respectively). We note that Franco et al. (2018Franco et al. ( , 2020b detection in the high resolution dataset is slightly different compared to the high resolution dataset detection here, since the former was carried out in a tapered map with a homogeneous and circular synthesized beam of 0. 6 FWHM and, besides, with a different detection tool (BLOBCAT). Data tapering is beneficial to optimize the sensitivity to sources that are larger than the angular resolution. Therefore, some differences are expected between the sources detected in the tapered high resolution dataset used in Franco et al. (2018Franco et al. ( , 2020b and the ones detected in the untapered high resolution dataset employed here. In addition, Franco et al. (2020b) reported three sources that are not part of the 100% pure main catalog or the prior-based supplementary catalog presented in this work (namely AGS34, AGS35, and AGS36), being among those with the lowest S/N (S /N = 3.72, 3.71, and 3.66) in Franco et al. (2020b). Similarly two sources appear in the high resolution dataset here, but do not in the high resolution dataset analysis in Franco et al. (2018Franco et al. ( , 2020b (namely A2GS58 and A2GS71). This points to the differences in the detection tool PyBDSF versus BLOBCAT introducing some differences in the sources detected at the lowest S/N regimes.

Flux and size measurements
GOODS-ALMA 2.0 was designed to retrieve the spatial information for both the small and large spatial scales. Compact array configurations, sensitive to large spatial scales with large maximum recoverable scales, are suitable to get total flux measurements with minimum flux losses. Extended array configurations yield information on the small spatial scales. Together, the information on multiple spatial scales makes it possible to measure a large range of intrinsic source sizes, while retrieving accurately the total fluxes.
A variety of techniques are used for flux density measurements in the literature, including aperture photometry or 2D functional fitting in the image plane, peak flux measurements in the case of unresolved sources also in the image plane, or functional fitting in the uv plane. In this work, we explored flux densities measurements using these techniques. Eventually, we report the values obtained from aperture photometry. This is the Table 3. Source catalog: Prior-based.  best choice for the dataset in this work because this technique does not assume an a priori functional form and it can be applied to both the 100% pure main catalog and the prior-based supplementary catalog, as the latter sources are not detected at a S /N peak high enough for reliable uv plane estimates. We measured total flux densities of the sources in both the 100% pure main catalog and the prior-based supplementary catalog using aperture photometry in the primary beam corrected combined dataset dirty map (e.g., Simpson et al. 2015a;Scoville et al. 2016;Liu et al. 2019). Through growth curves, we tested a range of apertures from 0.2 to 4 diameter in steps of 0. 2. We chose an aperture of 1. 6 diameter, which gave the optimal trade-off between total flux retrieval and S /N total , defined as the total flux density divided by its uncertainty. The fluxes were corrected by the appropriate aperture corrections to account for the flux losses outside the aperture. This aperture correction was calculated by dividing the flux within the aperture of 1. 6 diameter by the flux enclosed in the synthesized dirty beam within the same aperture (normalized to its maximum value). As shown in Sect. 2.2, while there are slight variations in the beam profile over the map, these differences do not introduce systematics in the flux measurements. The reason is that the beam used to perform the correction was that associated to the specific location of a given source. Besides, the nature of the aperture correction is independent of the specific shape of the synthesized dirty beam or its deviation from a Gaussian shape compared to a clean beam. This also supports the aperture technique as the best choice for the dataset in this work. In any case, we checked whether a common aperture correction significantly influences the flux density measurements. This is shown in the right panel of Fig. 3, where the aperture correction is by definition the inverse of the y-axis. The correction range is 1.38−1.55 for the aperture radius 0. 8 across the beams associated to the different slice submosaics, yielding a flux density variation of <10% if a common average beam correction across the field is used.
As a sanity check, we measured total flux densities using 2D functional fitting in the image plane via the CASA task imfit, yielding consistent results with the aperture photometry methodology with a relative difference between them given by the median (S imfit − S ap )/S ap = 0.01 ± 0.14 (where the uncertainty is the median absolute deviation). In Tables 2, 3, and B.1 we present the flux density measurements obtained from the aperture photometry methodology.
In comparison with the flux measurements in Franco et al. (2018Franco et al. ( , 2020b for the sources in common with this work, the relative difference between them given by the median (S AGS − S A2GS )/S A2GS = −0.11 ± 0.28 (where the uncertainty is the median absolute deviation). Therefore, on average the flux measurements in Franco et al. (2018Franco et al. ( , 2020b are systematically slightly lower due to limited uv coverage. Sizes can be measured in the image plane or directly in the uv plane using the information from the complex visibilities, leading to more reliable results than image plane techniques. Therefore, we measured the sizes directly in the uv plane using the combined dataset to include the information on multiple spatial scales and access the largest possible range of intrinsic source sizes with minimum biases. We employed the CASA task UVMODELFIT that allows us to fit single component models to single sources. Since the scope of this work is to get global size measurements, we fit a Gaussian model with fixed circular axis ratio. In order to isolate each source we split the combined dataset mosaic into single source measurement sets as follows: for each source pair of coordinates we searched for all the pointings that contained data on that source (each source is covered by six pointings typically). Next, each pointing was phase shifted to set the phase center at the source coordinates using the CASA task fixvis. Data and weights are modified to apply the appropriate primary beam correction that correspond to the phase shift, by using the CASA toolkit MeasurementSet module. After that, by using the CASA task fixplanets the phase center was set to the source coordinates for all the pointings that contained data on the source and the visibility weights recomputed with the task statwt. Finally, the pointings were concatenated into a single measurement set. In Table 2, we present the size measurements obtained from the uv plane fitting methodology in the combined dataset. We only report the values for the sources in the 100% pure main catalog, which have a detection S /N peak ≥ 5 (defined in Sect. 3.1.1) as the PyBDSF peak flux density divided by the average background rms noise of the island). Below this S/N, size measurements are unreliable. For this subset of sources we also determined the minimum possible size that can be reliably measured from the formula by Martí-Vidal et al. (2012): with λ c = 3.84 and β = 0.75 as in Franco et al. (2018Franco et al. ( , 2020b. The minimum size (θ min ) is given in units of the synthesized beam FWHM (θ beam ) depending on the source S/N. Values below this minimum are assumed to be unresolved and we report them as upper limits in Table 2.

Size distribution
In Fig. 6 we show the size distribution of the sources in the 100% pure main catalog (gray). Sizes are displayed as the effective (half-light) radius of the circular Gaussian model fit in the uv plane. We distinguish between sources present in the high resolution dataset (blue) and sources detected in the low resolution but . Size distribution of the 100% pure main catalog detected in the combined dataset (gray). We show histograms with Poisson error bars and probability density curves (kernel density estimates, by definition normalized to an area under the curve equal to one). Medians are displayed as a solid vertical line. The sizes were measured as the effective radius of the circular Gaussian model fit in the uv plane. Sources detected in the high resolution dataset are shown in blue, while sources also present in the low resolution dataset but not in the high resolution dataset, are shown in red. We note that the black, blue, and red histograms are overlaid, not stacked. The histogram bins are such that all the upper limits fall in the first bin to keep a correct shape. not in the high resolution dataset (red). There is only one source that appears in the 100% pure main catalog from the combined dataset but was not in the high/low resolution datasets, namely A2GS33.
First, we notice that the distribution appears skewed toward small sizes. Sources are compact with a median effective (halflight) radius of R e = 0. 10 ± 0. 05 and a median physical size of R e = 0.73 ± 0.29 kpc calculated at the redshift of each source (where the uncertainties are given by the median absolute deviation). Among them R e = 0. 09 ± 0. 03 (R e = 0.70 ± 0.23 kpc calculated at the redshift of each source) corresponds to those also present in the high resolution dataset and R e = 0. 15 ± 0. 10 (R e = 1.21 ± 0.82 kpc calculated at the redshift of each source) corresponds to those detected in the low resolution but not in the high resolution dataset (upper limits are taken at face value, with three and five sources respectively on each group). Franco et al. (2018) extraction in the high resolution dataset was carried out in a tapered map with a homogeneous and circular synthesized beam of 0. 6 FWHM under the assumption that the sources were point-like at that angular resolution. For these sources detected in the tapered map, Franco et al. (2018) measured sizes in another high resolution dataset map constructed by employing a natural weighting scheme leading to the same resolution we achieve in this work with the same dataset in our independent analysis. Franco et al. (2018) reported a median size of R e = 0. 09 (0. 18 FWHM also fit with a circular Gaussian model using UVMODELFIT), with 85% of the sources with R e < 0. 125. Therefore, Franco et al. (2018) results are perfectly consistent with our median for sources detected in the high resolution dataset, with 73% of them indeed below R e < 0. 125. In conclusion, even after including the information on multiple spatial scales making possible to access a wider range of intrinsic source sizes, the results in Franco et al. (2018) hold and were not biased to small values due to limited uv coverage.
A variety of literature studies in the ALMA era have concluded that the dust continuum emission appears located in compact regions with median sizes among the different samples within a circularized R e = 0. 10−0. 15 (e.g., Simpson et al. 2015a;Ikarashi et al. 2015;Hodge et al. 2016;Fujimoto et al. 2017;Gómez-Guijarro et al. 2018). In principle, some of the results could be biased toward small sizes if lacking uv coverage and/or sensitivity. However, Elbaz et al. (2018) also reported compact dust continuum emission (R e = 0. 10−0. 15, circularized) in a sample of DSFGs at z ∼ 2 with long integration times reaching a typical S /N ∼ 35. Our results with improved uv coverage are consistent with the conclusion that dust continuum emission occurs in compact regions. Of course, GOODS-ALMA 2.0 is a flux limited survey and we discuss this conclusion more extensively in this regard in Sect. 6.
We notice in Fig. 6 a small difference between the size distribution of sources present in the high resolution dataset compared to those that are in the low resolution but not in the high resolution dataset, with the latter skewed toward larger sizes and with a larger scatter. There is also one outlier to the smooth size distribution, namely A2GS30 with R e = 0. 86 ± 0. 16. A2GS30 is located at a distance <5 with respect to another source, namely A2GS20 with a size of R e < 0. 13. The latter could be an indication that our size measurements are biased to larger sizes in the vicinity of close neighbors. Therefore, we inspected all <5 pairs. A2GS33/37 are slightly larger than the average and have a similar z phot (see Table 2). It could be an indication that size measurements of close pairs are systematically affected or that they are physically larger due to interactions. However, there are another three <5 pairs (A2GS12/24, A2GS9/21, A2GS14/18) and none of them have anomalously large sizes even when located at similar z phot (A2GS9/21). As a sanity check, we measured sizes for these sources using the GILDAS task uvfit as an alternative uv plane fitting tool. We fit two Gaussian models with fixed circular axis ratio simultaneously, yielding consistent results. A common characteristic that the three galaxies with the larger sizes (A2GS30/33/37) share is their ID ≥ 30, pointing to the S/N as a potential reason since our IDs are ordered with decreasing detection S /N peak . Either some of the lower S /N peak sources are systematically larger due to an artificial bias in the size measurements for lower S /N peak or, as a low S /N peak is also related with a generally lower flux density, these sources are physically fainter and larger. Franco et al. (2020b) indeed argued that the priorbased methodology allowed for one to access a population of fainter and slightly larger sources. While a detail analysis about the dust continuum emission profiles is out of the scope of this paper, we discuss potential size variations due to differences in flux densities in Sect. 6.
It is important to consider that the dust continuum sizes at 1.1 mm in this work are associated to sources that span a wide redshift range (0 < z < 5; see Sect. 5). The dust continuum emission in the Rayleigh-Jeans (RJ) side (λ rf 250 µm) of the IR spectral energy distribution (SED) is more sensitive to the dust mass, while the dust continuum emission around the peak of the IR SED is sensitive to variations in the dust temperature. Most of the sources are located in the redshift range 1 < z < 4, spanning rest-frame wavelengths 0.22−0.55 µm and, therefore, the dust continuum emission traced is that of the RJ side. For sources with increasingly high redshifts, specially those at z > 4, the dust continuum emission gets closer to the peak of the IR SED. Popping et al. (2022) has recently studied the extent of the dust continuum emission for thousands of MS galaxies drawn from the TNG50 simulation Pillepich et al. 2019) between 1 < z < 5. The authors concluded that the half-light radii of galaxies at observed-frame wavelengths from 700 µm to 2 mm are similar to those at rest-frame 850 µm at the 5−10% level, for galaxies at redshifts 1 < z < 5 and stellar masses 9.0 < log(M * /M ) < 11.0. Therefore, the dust continuum sizes for the sources in this work which rest-frame dust continuum emission gets closer to the peak of the IR SED are at most 10% smaller than those associated to the RJ side of the IR SED.

Test: Flux growth curves from tapering
Although the uv coverage is sensitive to both small and large spatial scales and flux losses are expected to be negligible, we tested the flux density measurements using another methodology: a growth curve built after tapering the data (Xiao et al., in prep.). Data tapering adds an additional weight function that reduces the weights of the outer visibilities at the expense of also reducing the collecting area and, thus, the sensitivity. Nevertheless, the tapering also reduces the angular resolution, which is beneficial to optimize the sensitivity to sources that are larger than the angular resolution. We created tapered mosaics of the combined dataset, starting from the original resolution and stopping when the resulting tapered PSF R e = 1. 5, much larger than any reasonable source size. Then, for a given source, we measured the peak flux density on every tapered mosaic and built growth curves (peak flux as a function of the tapered PSF R e ). When the tapering reaches the point where the intrinsic source size is below the angular resolution, the entire flux of the source is retrieved in a single beam and the total flux can be read as the peak flux. In order to decide what tapering length is the one for which the entire source flux is measured in a single beam (the position in the x-axis at which we read the source flux in the source growth curve) we set the criteria: (1) measure always at least when the maximum S /N total is reached; (2) measure either when the first derivative of the S /N total (signal increase with respect to noise increase) is below one (more noise than signal enters the beam respect to the previous step) or the first derivative of the S /N total has a local minimum (to avoid including nearby noise peaks in the flux measurements). In the left panel of Fig. 7 we compare the flux densities obtained using the aperture and the tapering methodologies. The relative difference between them is given by the median (S ap − S tap )/S tap = −0.03 ± 0.18 (where the uncertainty is the median absolute deviation), with −0.04 ± 0.15 for the 100% pure main catalog and −0.01 ± 0.21 for the priorbased supplementary catalog. In addition, we compare the flux densities associated to the size measurements in the uv plane for the 100% pure main catalog in the right panel of Fig. 7, being the relative difference in this case 0.04 ± 0.08. Therefore, the different methodologies provide consistent flux density measurements and in particular the agreement with the fluxes obtained in the uv plane also contributes to the robustness of the size measurements.

Number counts
Calculation of the number counts requires to assess different aspects of the survey that influence the ability to retrieve sources of a given flux and size or the flux measurements themselves. For this reason, before jumping into the calculation of the number counts, we dealt with these aspects.

Completeness and boosting
The completeness is the fraction of real sources that are actually detected for a given flux and size. In order to compute the completeness, we performed simulations by injecting artificial model sources in the combined dataset map. We modeled Gaussian sources, convolved them with the combined dataset synthesized dirty beam, and injected them in the combined dataset map at random locations. In total we injected 450 sources for a given input flux and size. This number is such that, considering the number of independent beams, the probability of two sources overlapping is negligible (∼1%). Besides, the scarcity of real sources in the map allowed us to work directly with the dirty map. Even if the overlapping probability is very low, model sources could be close enough to other model or real sources and affect their flux measurements. To be sure that the latter is not the case, we eliminated any model source within 5 diameter of another model or real source. The simulations were carried out for flux densities ranging 0.1−3 mJy in steps of 0.1 mJy and sizes from pure point sources to 1 FWHM in steps of 0. 1. In total, a grid of 30 fluxes and 11 sizes composed of 450 sources each. After the injection of artificial model sources in the combined dataset map, we performed the same blind source detection procedure described in Sect. 3. In the left panel of Fig. 8 we show the completeness as a function of the input flux density (S in ) for the different simulated source sizes as detected for σ p = 3.0. The survey reaches a ∼100% completeness for all simulated sizes for flux densities S in > 1 mJy. The completeness is also lower for increasing source sizes at fixed flux densities.
For the purpose of knowing the behavior of the survey and the detection procedure in retrieving certain types of sources, the completeness analysis in S in is relevant. However, input fluxes in the real data are unknown by nature and, thus, for the practical purpose of applying a completeness correction to the number counts we need to know its behavior as a function of the flux that we are actually able to measure, the output flux density (S out ). S out measurements were performed following the same aperture photometry methodology described in Sect. 3.2. In the right panel of Fig. 8 we show the completeness as a function of S out for the different simulated source sizes as detected for σ p = 3.0. This plot provides the completeness correction to be applied to the number counts. Qualitatively, the behavior in terms of S in or S out is similar (i.e., the completeness reaches ∼100% for sources with S 1.1 mm > 1 mJy, it progressively decays below this value, and it is also lower for increasing source sizes at fixed flux densities), although quantitatively the behavior changes.
Another aspect to characterize is flux boosting, which consists in the artificial increase of S out compared to S in typically observed for sources detected at very low S/N (e.g., Murdoch et al. 1973;Hogg & Turner 1998;Coppin et al. 2005). Flux boosting is connected to the detection threshold, as it reflects the fact that detectable sources at very low S/N are those located in a noise peak and, thus, their flux measurements are systematically boosted, leading to the observed increase of S out compared to S in at low S/N. We used the same set of simulations to study the S out /S in ratio as a function of the output detection S /N peak (S /N peak out ) as shown in the left panel of Fig. 9, where we represent the range of sizes (0. 1−0. 4 FWHM), S /N peak (3.5−20), and flux densities (0.25−2.75 mJy) measured in the real sources. The S out /S in ratio is stable over the whole studied range of detection S /N peak and fluxes, as traced by the sliding median. There is evidence for a small level of flux boosting at 3.5 < S /N peak out < 5.0, reaching 4% at S /N peak out   9. Ratio of the output over input flux densities (S out /S in ) as a function of the output detection S /N peak (S /N peak out ) from PyBDSF, measured as the peak flux density over the local rms noise (left panel), and flux density accuracy ((S out − S in )/S out ) as a function of the output flux density (S out ) measured with the aperture photometry methodology (right panel) for simulated sources with sizes ranging 0. 1−0. 4 FWHM. The distribution of the whole set of simulations is shown as gray symbols with their sliding median and standard deviation in solid and dashed green, respectively. accordingly in the flux density measurements presented in Tables 2, 3, and B.1.
The set of simulations was also used to assess the accuracy of the flux density measurements and whether they are affected by systematics to be corrected. In the right panel of Fig. 9 we show the flux density measurements accuracy as given by (S out − S in )/S out as a function of S out . The sliding median of (S out − S in )/S out is ∼1 over the entire S out range studied. Therefore, we did not add any further correction to the measured flux densities based on our simulations.
We also notice the fact that the 1. 6 diameter aperture where we measured the fluxes provides on average a similar S /N total to that of the detection S /N peak . This is seen in the left panel of Fig. 9 as how the detection S /N peak out (x-axis) coincides with the S out /S in ratio (y-axis), a proxy for S /N total . For example, S /N peak out = 10 is associated with S out /S in ∼ 0.9−1.1 and, thus, a ∼10σ detection has typically a S /N total ∼ 10. The latter does not necessarily mean that a source that has a flux density 10 times over the rms noise has a flux accuracy of ∼10%, since this is only true for pure point sources. This is seen in the right panel of Fig. 9 as, for example, a S /N peak out = 10 detection, considering the average sensitivity of σ = 68.4 µJy beam −1 , has a total flux density of 0.68 µJy in the case of a pure point source associated with S out /S in ∼ 0.9−1.1. However, (S out − S in )/S out is on the level of ∼15%, which is expected since what we represent are extended sources with a range of sizes 0. 1−0. 4 FWHM and, thus, there are sources with high total flux densities but with low detection S /N peak widening the distribution of the (S out − S in )/S out ratio as a result.

Effective area
The survey sensitivity is not perfectly homogeneous and there are small differences between regions seen between the different slice submosaics (see Table 1). We calculated the effective area for a given sensitivity as shown in the curve in

Differential and cumulative number counts
The contribution of a source i to the number counts at a frequency v is: where p(S peak ν,i ) is the purity as defined in Eq. (1), A eff (S peak ν,i ) is the effective area as given by the curve in Fig. 10, which are both associated to the detection S /N peak and, thus, described in terms of the peak flux density S peak ν,i . C(S ν,i , θ) is the completeness for a pair flux density, size (S ν,i , θ), as explained in Sect. 4.1.
The differential number counts are obtained adding the contribution of sources within a flux density interval ∆S ν : Cumulative number counts are calculated summing over all the sources with a flux density higher than S ν : We calculated the contribution of each source to the number counts. In the case of the purity correction we applied p = 1 for all sources. This is the case for the 100% pure main catalog by definition. For the prior-based supplementary catalog the purity correction to be applied is, in principle, that studied in Sect. 3.1.1 as a function of the detection S /N peak . However, the purity correction applied in this way is valid only for sources from the blind detection procedure before the prior-based selection. Once the sources are validated by the priors the purity correction has to be adjusted to a smaller value, reflecting the better knowledge of the actual real sources aided by priors. In order to assess what the adjusted purity correction is, we compared the number of sources in the 100% pure plus prior-based catalogs (88 sources) with the expected number of real sources (98 ± 32, see Sect. 3.1.2). Both are consistent and, therefore, likely possible that we are capturing all the real sources down to a detection S /N peak = 3.5. Therefore, we assumed p ∼ 1 for the sources in the prior-based supplementary catalog, knowing that if a (∼10%) fraction of the sources with at S /N peak = 3.5−5.0 were missed it does not significantly affect the number counts.
For the size dependency we used the sizes from the uv plane fitting as derived in Sect. 3.2 for the 100% pure main catalog. If a source lacked of size estimation we used the median size of the 100% pure sources. In the case of the prior-based supplementary catalog sources, whose sizes are not reliable through uv plane fitting, we employed the median size of the 100% pure sources at S 1.1 mm < 1 mJy, since the prior-based sources are in this flux density regime.
We decided the optimal bin width and first bin to calculate the number counts as an optimal trade-off between resolution and sufficient number of sources per bin. The chosen bin width was ∆ log(S ν ) = 0.20. The uncertainties for each bin were calculated from 10 000 Monte-Carlo simulations varying the source fluxes randomly within their uncertainties added in quadrature to the Poisson uncertainties.
Another correction applied here was the Eddington bias, as it is sometimes called. According to this effect, because the sources with lower luminosities are more numerous than brighter sources, the noise leads to an overestimation of the number counts in the fainter flux bins. To take into account this effect we simulated a physically informed number of sources using the slope of the number counts in Franco et al. (2018) and added Gaussian noise to each simulated source. We applied a correction factor to each flux density bin as the ratio between the flux distribution before and after adding the noise.
Cosmic variance was not taken into account when calculating the uncertainties in the number counts. As also discussed in Franco et al. (2018), while cosmic variance it is expected to be significant for massive galaxies in small solid angles, the strong negative K-correction for redshifts above z > 1.8 at 1.1 mm and up to the highest redshift in our catalog (z = 4.73) counterbalances it. The comoving volume over ∆z ∼ 3 of 1400 Gpc 3 is relatively large. Using Moster et al. (2011) to estimate the effect of cosmic variance, it results in ∼15% and, thus, it does not significantly affect the number counts.
The resulting number counts derived using the whole 100% pure main plus prior-based supplementary catalogs are shown in Table 4 and in Fig. 11 (black symbols; combined). Additionally, we studied the contribution to these combined number counts from the 100% pure main catalog, the prior-based supplementary catalog, and the high and low resolution datasets independently to have an idea of which type of sources contribute to the number counts as a function of flux density. In order to do this we derived alternative versions of the number counts as if the only sources detected were those associated to the desired type of sources to be studied. The completeness correction applied was that associated to the combined number counts from the whole 100% pure plus prior-based catalogs and, thus, by definition they are insufficient to reach the combined number counts level, but in turn they reflect the contribution from that particular type of sources to the number counts. The results from the 100% pure main catalog only (gray symbols) manifest the effect of the prior methodology in the number counts when compared to the combined number counts from the 100% pure plus prior-based catalogs (black symbols). Adding, the prior-based supplementary catalog it is Notes.
(1) Central flux density in the bin; (2) differential number counts; (3) number of sources per bin in the differential number counts; (4) cumulative number counts; (5) number of sources per bin in the cumulative number counts. Bin width is ∆ log(S ν ) = 0.20. The uncertainties are calculated from Monte-Carlo simulations and Poisson added in quadrature. In parentheses, the first bin not used in our analysis.
possible to access a population of fainter 1.1 mm sources as indicated by Franco et al. (2020b). The results from the high resolution (blue symbols) and low resolution (red symbols) datasets manifest that both are able to capture the bright end of the combined number counts, but the high resolution dataset is very inefficient in retrieving the faint end as it rapidly decays at S 1.1 mm < 1 mJy. The low resolution dataset is much more complete at lower flux densities and similar to the 100% pure main catalog from the combined dataset. The difference between the high resolution and low resolution can be explicitly seen as the dashed red line that represents the sources detected in the low resolution dataset but not present in the high resolution dataset. In conclusion, the low resolution dataset is very efficient at retrieving sources for a wide range of flux densities, but the high resolution dataset is biased to the brighter ones. The combination of the high resolution and low resolution datasets (combined dataset), along with the prior methodology, allow us to be more complete at the faint end.
In comparison with the number counts analysis presented in Franco et al. (2018) from the high resolution dataset, the cumulative number counts in our analysis are slightly higher for a fixed flux density bin (see right panel of Fig. 11, dark gray circles from Franco et al. 2018 compared to blue squares from this work). We note that Franco et al. (2018) extraction in the high resolution dataset is slightly different compared to the high resolution dataset detection here, since the former was carried out in a tapered map with a homogeneous and circular synthesized beam of 0. 6 FWHM. Besides Franco et al. (2018) used a 80% pure catalog, while in this work we employed the 100% pure plus prior-based catalogs. This work is more complete and less dependent on both the purity and completeness corrections compared to Franco et al. (2018) analysis and, thus, the latter is potentially more affected by systematics that lead to slightly different number counts.
In Fig. 11 we also compare our number counts with the literature that studied number counts at similar wavelengths from both ALMA (Hatsukade et al. 2013(Hatsukade et al. , 2016(Hatsukade et al. , 2018 single-dish (e.g., Lindner et al. 2011;Scott et al. 2012;Geach et al. 2017;Simpson et al. 2019;Shim et al. 2020) facilities. When the wavelength is different than 1.1 mm we applied a correction factor to the flux densities so the number counts are comparable. These factors were calculated using a modified black body (MBB) model, assuming a dust emissivity index β = 1.5 and dust temperature T dust = 35 K. The corrections are: S 1.1 mm /S 1.3 mm = 1.79, S 1.1 mm /S 1.2 mm = 1.36, S 1.1 mm /S 870 µm = 0.44, S 1.1 mm /S 850 µm = 0.41. Our results are in agreement with the general trends of literature studies covering the flux densities around the knee of the number counts accurately with dozens of galaxies in most of the flux density bins.
A discrepancy between the ALMA results from Karim et al. (2013) and some of the single-dish studies (e.g., Lindner et al. 2011;Scott et al. 2012) exists at the bright end (e.g., Franco et al. 2018). The results presented here reduce the tension compared to the results in Franco et al. (2018), although the discrepancy still exist. The origin of the discrepancy was suggested to come from the challenging boosting and blending effects that affect the single-dish measurements leading to an overestimation of the bright end of the number counts (e.g., Karim et al. 2013;Ono et al. 2014;Simpson et al. 2019). Recent ALMA studies at the bright end (Stach et al. 2018;Simpson et al. 2020) have shown indeed agreement with the first ALMA results by Karim et al. (2013). The latter tendency has also been strengthened by more recent single-dish studies (Geach et al. 2017;Simpson et al. 2019;Shim et al. 2020).
The wealth of ALMA studies in the literature along with the results presented in this work cover a wide range of flux densities spanning from the faint to the bright end of the number counts, making it possible to accurately perform a fit. We fit a Schechter (Schechter 1976) and a double power law (DPL) function to both the differential and cumulative number counts (see Table 5): is a modified Schechter function conventionally used in similar studies at these wavelengths (e.g., Coppin et al. 2006;Knudsen et al. 2008;Austermann et al. 2010), where N 0 is the normalization, S 0 the characteristic flux density, and α is the faint-end slope.
is the DPL (e.g., Scott et al. 2002;Coppin et al. 2006;Knudsen et al. 2008). In this fit we did not consider our first flux density bin since it appears clearly incomplete. We did not take into account upper limits presented in some of the literature studies and neither did we consider the results from single-dish surveys to keep the fit as clean as possible restricted to ALMA. The Schechter fit performs better than the double power law, specially at the bright end. In Fig. 11 we also display the predictions of an updated version of the modeled Simulated Infrared Dusty Extragalactic Sky (SIDES) by Béthermin et al. (2017) at 1.1 mm for both the differential and cumulative number counts. This model shows an overall good agreement with our derived best-fit to the ALMAbased observed 1.1 mm number counts, albeit a prediction of a slightly more pronounced flattening at the faint end and a mild excess of sources at the bright end. In the right panel of Fig. 11 we also include the cumulative number counts predictions by different galaxy evolution models in the literature   Fig. 11. Differential (left panel) and cumulative (right panel) number counts from the combined dataset using the 100% pure plus prior-based catalogs (black symbols). The first bin (white symbol) is not used in our analysis. Also displayed the contribution of the 100% pure catalog only (gray symbols), the sources extracted in the high resolution (blue symbols) and low resolution (red symbols) datasets. In both panels we display literature studies of ALMA number counts at similar wavelengths of 1.3, 1.2, 1.1 mm, 870 µm, and 850 µm (filled circles) converted to 1.1 mm as explained in the main text. We also add some other single-dish literature studies (open circles). Best-fit Schechter and DPL functions using the entire set of number counts from all the ALMA studies are shown in red and black solid lines, respectively, with their uncertainties corresponding to the 16% and 84% percentiles as a shaded area. Predicted number counts from several models in the literature are shown with dotted lines. Table 5. Best-fit parameters to the differential and cumulative number counts for a Schechter and double power law function. Notes. The uncertainties correspond to the 16% and 84% percentiles. (Schreiber et al. 2017b;Popping et al. 2020;Lagos et al. 2020;Zavala et al. 2021). These models also predict a more pronounced flattening at the faint end with different degrees of normalization compared to our derived best-fit to the ALMA-based observed 1.1 mm cumulative number counts. The latter is steeper since it includes both the flatter tendency observed by Muñoz Arancibia et al. (2018) and González-López et al. (2019), and the steeper results by Fujimoto et al. (2016). Around the knee the models are overall consistent with our best-fit result, except for the model in Schreiber et al. (2017b) which exhibits a higher normalization. The model by Zavala et al. (2021) also departs from the ALMA-based observations toward higher values as it approaches the brighter flux density regimes. In the bright end the models predict an excess in the cumulative number counts not seeing so far in the observations.

Cosmic infrared background
The extragalactic background light (EBL) is the integrated intensity of all of the light emitted throughout the history of the universe across the whole of the electromagnetic spectrum. The EBL constitutes the second most energetic source of background after the cosmic microwave background (CMB). The EBL SED is composed of two main components: the cosmic optical background (COB) and the cosmic infrared background (CIB). While the COB is due to radiation from stars, the CIB comes from the absorption of UV/optical emission that is re-emitted at IR wavelengths by dust. COB and CIB have a similar contribution to the total EBL (e.g., Dole et al. 2006), which indicates that half of the stellar emission in galaxies is absorbed and re-emitted by dust. Millimeter ALMA number counts can resolve 50−100% of the CIB (e.g., Carniani et al. 2015;Fujimoto et al. 2016;González-López et al. 2020). The surface brightness of the CIB down to a given flux limit S lim ν is given by integrating the differential number counts: We calculated the amount of CIB resolved in this work by integrating the Schechter fit to the differential number counts down to the faintest flux density bin probed in the survey, which corresponds to 0.3 mJy. The result is 0.0289 +0.0011 −0.0006 MJy sr −1 (where uncertainties correspond to the 16% and 84% percentiles). The Cosmic Background Explorer (COBE) measured the absolute surface brightness of the CIB (e.g., Fixsen et al. 1998;Lagache et al. 1999;Odegard et al. 2019). Using the analytical fit from Fixsen et al. (1998), we calculated a reference absolute value of 0.076 MJy sr −1 at 1.13 mm. Therefore, the amount of CIB resolved by the survey is 37.9 +1.4 −0.8 % down to 0.3 mJy.

Source properties
The wealth of ancillary data in the GOODS-South field allows us to study some basic properties of our ALMA sources. Particularly, we are interested in characterizing the redshift and stellar mass distributions.
A43, page 16 of 29 We looked for stellar counterparts of the 100% pure main and the prior-based supplementary catalogs in the K s -band selected ZFOURGE catalog by Straatman et al. (2016), which provides photometry and other products including photometric redshift and stellar mass estimations. ZFOURGE (PI: I. Labbé) is a program carried out with the FourStar instrument (Persson et al. 2013) on the 6.5 m Magellan Baade Telescope using five near-IR medium bands (J 1 , J 2 , J 3 , H s , and H l ), covering the same range as classical J and H broadband filters, and a K s -band. It includes the CDFS field (encompasing GOODS-South), among other fields. ZFOURGE combines dedicated FourStar/K s -band observations with pre-existing K-band imaging to create superdeep detection images. In the CDFS it incorporates VLT/HAWK-I/K from HUGS (Fontana et al. 2014), VLT/ISAAC/K from GOODS, with ultra deep data in the HUDF region (Retzlaff et al. 2010), CFHST/WIRCAM/K from TENIS (Hsieh et al. 2012), and Magellan/PANIC/K in HUDF (PI: I. Labbé). In addition to the dedicated observations, the ancillary CDFS UV to near-IR filters in ZFOURGE include VLT/VIMOS/U, R-imaging (Nonino et al. 2009), HST/ACS/B, V, I, Z-imaging (Giavalisco et al. 2004;Wuyts et al. 2008), ESO/MPG/WFI/U 38 , V, R cimaging (Erben et al. 2005;Hildebrandt et al. 2006), HST/WFC3/ F098M, F105W, F125W, F140W, F160W and HST/ACSF606W, F814W-imaging (Grogin et al. 2011;Koekemoer et al. 2011;Windhorst et al. 2011;Brammer et al. 2012), and 11 Subaru/ Suprime-Cam optical medium bands (Cardamone et al. 2010). We also searched for updated spectroscopic redshifts in a recent compilation in GOODS-South (N. Hathi, priv. comm.) and additional recent surveys that have supplied new spectroscopic information in the field: VANDELS (Garilli et al. 2021), the MUSE-Wide survey (Herenz et al. 2017;Urrutia et al. 2019), and ASPECS LP González-López et al. 2019;Boogaard et al. 2019). In Tables 2 and 3 we specify the reference for each spectroscopic redshift. We note that when more than one redshift was available we chose the one with the highest reported quality. Stellar masses and photometric redshifts were taken from ZFOURGE, except when there were updated spectrocopic redshifts. In that case we calculated the stellar masses using the same methodology as ZFOURGE to keep a consistent analysis. Photometry was fit using FAST++ 2 , an updated version of the spectral energy distribution (SED) fitting code FAST (Kriek et al. 2009) employed in ZFOURGE. The stellar population models are from Bruzual & Charlot (2003, BC03), with exponentially declining star formation histories (SFHs), a Calzetti et al. (2000) dust attenuation law, and fixed solar metalicity. FAST++ input files had the same parameters and grid of models as in ZFOURGE. The stellar masses were multiplied by a factor of 1.7 to scale them from a Chabrier (Chabrier 2003) to a Salpeter (Salpeter 1955) IMF (e.g., Reddy et al. 2006;Santini et al. 2012;Elbaz et al. 2018). We note that the stellar masses are dependent on the methodology and the assumptions made in the SED fitting. In the literature some studies have tested the impact of different codes and SFHs on the stellar masses of DSFGs 2 https://github.com/cschreib/fastpp (e.g., Hainline et al. 2011;Michałowski et al. 2014;Simpson et al. 2020). Michałowski et al. (2014) reported that exponentially declining models with a code which does not assume energy balance, as employed here, were able to recover the stellar masses of their simulated submillimeter galaxies (SMGs), albeit a slight underestimation (∼0.05 dex) and significant scatter. Conversely, alternative approaches using a code assuming an energy balance between the UV emission absorbed and radiated at far-IR and mm wavelengths resulted in a systematic overestimation (∼0.1 dex) of the stellar masses of their simulated SMGs.
Additionally, we substituted the stellar masses and redshifts of the six optically dark sources studied in detail in Zhou et al. (2020), namely AGS4, AGS11, AGS15, AGS17, AGS24, and AGS25 (A2GS2, A2GS15, A2GS10, A2GS7, A2GS29, and A2GS17, respectively). Zhou et al. (2020) reported spectroscopic confirmation for AGS4 (A2GS2) and AGS17 (A2GS7) and argued for AGS11, AGS15, and AGS24 to be at the median redshift of a z ∼ 3.5 overdensity. Although independent spectroscopic confirmation is still needed for these sources, we used the assumed redshifts in Zhou et al. (2020) for them. For these sources, Zhou et al. (2020) used as well the same methodology as ZFOURGE to keep a consistent analysis (i.e., photometry was fit using FAST++, BC03 stellar population modes, exponentially declining SFHs, and Calzetti et al. 2000 dust attenuation law).
After visual inspection of all the ALMA sources in comparison with the K s -band image, there are five of them with blending issues in the ZFOURGE catalog. It leads to a smaller number of catalog entries than actual sources and, thus, the photometry is affected and requires an improved tailored analysis. One of them is AGS24 in Zhou et al. (2020, A2GS29), who already solved the blending issue and we used their stellar masses and redshifts. The other four are A2GS28, A2SGS30, A2GS33, and A2GS60. An extra source, A2GS38 does not have a counterpart in the ZFOURGE catalog and also required photometry, in this case without blending from close neighbors.
For the sources above that required an improved tailored analysis, we carried out the photometry following the methodology described in Gómez-Guijarro et al. (2018) for crowded and blended sources. Since we used ZFOURGE there are three types of datasets: HST bands that provide the best spatial resolution, ground-based bands homogenized to a common Moffat PSF profile (0. 9 FWHM; see Straatman et al. 2016), and Spitzer/IRAC bands which are the ones with a coarser PSF and more affected by blending. The Hubble Legacy Fields (HLF) v2.0 for the GOODS-South region (HLF-GOODS-S) includes all ultraviolet, optical, and IR data taken to date by HST over 14 years across the field (Illingworth et al. 2016;Whitaker et al. 2019). Therefore, instead of the ZFOURGE HST data, we used the updated HLF-GOODS-S v2.0 mosaics homogenized to the WFC3/F160W-band PSF, along with the ground-based and Spitzer/IRAC bands. Briefly, the photometry methodology is as follows: all the bands affected by blending were fit with a model using GALFIT (Peng et al. 2002), where the number of priors is set to the number of sources in the F160W-band image. While for those bands unaffected by blending, we performed aperture photometry with aperture diameters 0. 7 for HST (as in Whitaker et al. 2019) and 1. 2 for the ground-based (as in Straatman et al. 2016), along with aperture corrections derived by tracing the PSF growth curves to account for the flux losses outside the aperture, plus also PSF photometry with GALFIT for Spitzer/IRAC bands. Uncertainties were derived from empty aperture measurements.

Optically dark sources
In the last years a new population of galaxies missed in optical surveys but bright at far-IR/mm wavelengths has been discovered. This type of DSFGs differs from previously known intense starbursts (e.g., Walter et al. 2012;Riechers et al. 2013;Marrone et al. 2018) as they have lower SFRs characteristic of MS SFGs, rather than starbursts. Their space density is two orders of magnitude greater than equally massive z ∼ 3−4 SFGs (Wang et al. 2019). They are of great interest as they are seen as a key population of galaxies that dominate the contribution of massive (M * > 10 10.3 M ) galaxies to the SFR density of the universe at z > 3 (Wang et al. 2019). These optically-invisible massive galaxies (also known as HST-dark galaxies) are currently undetected or very faint in all optical and near-IR bands up to and including the H-band (H-band dropouts) in the deepest cosmological fields (H > 27 mag; 5σ point source), but bright at longer near-IR bands ([4.5] < 24 mag) (Wang et al. 2016(Wang et al. , 2019Alcalde Pampliega et al. 2019). It should be noted, as is not always the case, that the selection and characterization of this population of galaxies depends on the depth of the observations. For a fixed [4.5] magnitude, it is particularly important to know the depth at which they remain undetected or very faint in the H-band, as their z > 3 and massive nature relies in their red H−[4.5] color (e.g., Wang et al. 2019). DSFGs surveys with ALMA combined with shallower H-band observations hinted for this optically dark galaxies (e.g., Simpson et al. 2014, H > 23 mag; 3σ), although their red H−[4.5] color remained uncertain in the absence of deeper H-band observations. More examples of this galaxy population continue to be discovered in new surveys (e.g., Franco et al. 2018Franco et al. , 2020bYamaguchi et al. 2019;Williams et al. 2019;Romano et al. 2020;Toba et al. 2020;Umehata et al. 2020;Gruppioni et al. 2020;Smail et al. 2021;Fudamoto et al. 2021).
In GOODS-ALMA, Franco et al. (2018Franco et al. ( , 2020b) already reported six of these galaxies (AGS4, AGS11, AGS15, AGS17, AGS24, and AGS25, which correspond to A2GS2, A2GS15, A2GS10, A2GS7, A2GS29, and A2GS17, respectively). In this work, there exist some additional sources without or with very faint emission at bands up to and including the H-band (H-band dropouts), namely: A2GS40, A2GS47, A2GS57, A2GS82, and A2GS87 (see Fig. 12). Furthermore, A2GS33, after subtraction of the F160W-band neighbors, shows no emission in the K sband image at the position of the ALMA source. We performed aperture photometry in the residual image, confirming no significant (<3σ) emission in all bands up to and including the K s -band (K s -band dropouts). A2GS33 is also a candidate Spitzer/IRAC 3.6 µm dropout (see Fig. 12), although its emission is highly contaminated by blending with close neighbors in this band. In addition, A2GS38, which did not have a ZFOURGE counterpart at the start, is another K s -band dropout, but detected in Spitzer/IRAC 3.6 µm (see Fig. 12). A2GS33 and A2GS38 coincide respectively with ID 20 and 17 in Yamaguchi et al. (2019), who also reported them as K-band dropouts.
Therefore, the total number of optically dark/faint sources uncovered so far in GOODS-ALMA is 13 (ALMA detected  Tables 2 and 3, which represent the best redshift estimate for each source from a spectroscopic or photometric origin. Top right panel: detection fraction of sources in GOODS-ALMA 2.0 compared to all the galaxies in ZFOURGE located within the same area as a function of redshift for galaxies with stellar masses 10.0 < log(M * /M ) < 11.5. Bottom left panel: similar to the top left panel but in terms of the stellar mass. Bottom right panel: similar to the top right panel but in terms of the stellar mass for galaxies with redshift 1 < z < 4. In all panels, we represent the whole 100% pure plus prior-based catalogs from the combined dataset (black), the sources detected in the high resolution (blue) and low resolution (red) datasets, along with the sources that appear in the low resolution but do not in the high resolution dataset (dashed red). We note that for simplicity, in the case of the sources that appear in the low resolution but do not in the high resolution dataset the histograms are omitted and only the probability density curves are displayed.
H-or K-band dropouts). In particular, Zhou et al. (2020) analyzed in detail the six optically dark sources in Franco et al. (2018Franco et al. ( , 2020b and found that almost all of them (4/6) are associated to the same z ∼ 3.5 structure. In fact, along with the latter sources we detected around a dozen more ALMA sources potentially associated with the same structure (see the southwest region of the GOODS-ALMA 2.0 map in Fig. 1), with some of them located along two streams that connect at the center of the structure (pinpointed by AGS24/A2GS29). Spectroscopic confirmation is still needed to confirm this hypothesis. A detailed analysis of the optically dark/faint sources in this work and their potential link to overdense structures is beyond the scope of this paper.

Redshift and stellar mass distributions
In Fig. 13 we present the redshift distribution along with the detection fraction of sources in GOODS-ALMA 2.0 compared to all the galaxies in ZFOURGE located within the same area as a function of redshift. The redshift distribution was constructed using the values in Tables 2 and 3, which represent the best redshift estimate for each source from a spectroscopic or photometric origin. The different datasets studied in this work are represented: combined, high resolution, and low resolution datasets, and also those sources that appear in the low resolution but do not in the high resolution dataset.
First, we see that the high resolution dataset redshift distribution is skewed toward higher redshifts compared to that of the low resolution dataset, which has a similar distribution compared to the combined dataset. In fact, the dashed red line that represents the sources that appear in the low resolution but do not in the high resolution dataset reflects the difference between the high resolution and low resolution datasets clearly as it is skewed toward lower redshifts. The difference between the two datasets is also clear in the detection fraction, with the high resolution dataset being efficient in picking up sources at higher redshifts, while the low resolution dataset achieves a higher detection fraction and exhibits a similar shape as the combined dataset across redshift. The combined dataset reaches naturally a higher detection fraction than the low resolution dataset since it is a deeper map.
The median redshift of the GOODS-ALMA 2.0 survey from the combined dataset using the 100% pure plus prior-based catalogs is z med = 2.46. This value is in line with literature studies of DSFGs peaking at z = 2−3 (e.g., Chapman et al. 2005;Yun et al. 2012;Smolčić et al. 2012;Dudzevičiūtė et al. 2020). Among other literature studies at ∼1 mm in GOODS-South, Dunlop et al. (2017) reported a lower mean redshift of z = 2.15 in a ×2.0 deeper 1.3 mm survey (average sensitivity of 35 µJy beam −1 at an average angular resolution of 0. 7), while covering a ×16 smaller area (4.5 arcmin 2 ). Yamaguchi et al. (2020) reported a slightly lower median redshift of z = 2.38 in a ×1.0−2.3 deeper 1.2 mm survey (average sensitivity of 30−70 µJy beam −1 at an average angular resolution of 0. 59×0. 53), while covering a ×2.8 smaller area (26 arcmin 2 ). Aravena et al. (2020) reported a lower redshift, yielding a median value of z = 1.8 in a ×7.4 deeper 1.2 mm survey (average sensitivity of 9.3 µJy beam −1 at an average angular resolution of 1. 53 × 1. 08), while covering a ×15 smaller area of 5 arcmin 2 . These results are in line with the idea that shallower and larger-area surveys are better at detecting brighter sources at higher redshifts, while deeper and smaller-area surveys access fainter sources at lower redshifts and, therefore, the redshift distribution of DSFGs is dependent on the survey depth (e.g., Ivison et al. 2007;Béthermin et al. 2015;Aravena et al. 2020).
A second peak appears in the redshift distribution for all datasets at 3 < z < 4 due to the z ∼ 3.5 overdensity of sources reported in Zhou et al. (2020). This second peak is more prominent in the high resolution dataset where these source are detected. We note that although the histogram of the high resolution dataset shows a higher peak at the location of the overdense structure (all sources are located in a single histogram bin), the probability density curve shows that the main peak is that at z = 2−3 (sources across various histogram bins add up and account for a higher percentage of the total number of sources), consistent with the other datasets involved in this work.
Similarly, in Fig. 13 we present the stellar mass distribution along with the detection fraction for the different datasets studied. The differences here are more subtle and specially show up in the dashed red line of sources that appear in the low resolution but do not in the high resolution dataset. Sources missed by the high resolution are skewed to lower stellar masses. In terms of the detection fraction we see again that the low resolution dataset achieves a higher detection fraction than the high resolution dataset and the deeper combined dataset reaches a higher detection fraction than the low resolution dataset. In particular, the fraction of detected sources at log(M * /M ) > 11.0 in the low resolution dataset is very similar to that of the combined dataset. In other words, the newly incorporated sources in the combined dataset are more in the lower stellar mass regime.
Complementing the redshift and stellar mass characterization, Fig. 14 shows the stellar mass as a function of redshift for the sources in the 100% pure plus prior-based catalogs from the combined dataset. We distinguish between sources extracted in the high resolution dataset (median redshift and stellar masses: z = 2.7±1.1, log(M * /M ) = 10.96±0.46, where the uncertainties are given by the median absolute deviation), sources that appear in the low resolution but do not in the high resolution dataset (z = 2.29 ± 0.73, log(M * /M ) = 10.83 ± 0.43), and also sources that Fig. 14. Stellar mass versus redshift for the sources in the 100% pure plus prior-based catalogs from the combined dataset, distinguishing between sources that appear in the high resolution dataset (blue), in the low resolution but not in the high resolution dataset (red), and in the combined but not in the high resolution or low resolution datasets (yellow). Medians are shown as big squares, where the uncertainties are given by the median absolute deviation. appear in the combined but do not in the high resolution or low resolution datasets (z = 2.38±0.92, log(M * /M ) = 10.55±0.59). Therefore, we see that the sources missed by the high resolution that are in the low resolution dataset are skewed to lower redshifts and stellar masses, and the sources missed by these two datasets individually and retrieved in the combined dataset are skewed to lower stellar masses. We note that the median redshift of the high resolution dataset becomes slightly lower if the sources in the z ∼ 3.5 overdensity are not taken into account (z = 2.56 ± 0.65), but even in this case the sources missed by the high resolution that are in the low resolution dataset are still skewed to lower redshifts.

ALMA array configuration impact on source detection
ALMA has opened the possibility of resolving dust continuum emission providing size estimates of the emitting regions. However, a very important concern in studies to date is the detection and measurement of accurate fluxes and sizes of sources spanning a wide range of intrinsic properties. Single array configurations providing angular resolution sufficient to measure sizes of intrinsically compact sources could be missing more extended sources for which a coarser angular resolution would be better suited.
A given array configuration yielding an angular resolution smaller than the intrinsic source size limits the survey depth to avoid false detections resulting from the excessively large number of independent beams. This is directly related with the concept of purity as explained in Sect. 3.1.1. A survey with a large number of independent beams leads to a high detection S /N peak required for a source to be regarded as real and, thus, effectively limits the survey depth. In addition, completeness decreases as a function of the source size as described in Fig. 8 and, thus, such array configuration providing high angular resolution is worse suited for increasingly larger source sizes. Besides, this type of high resolution array configuration could not properly A43, page 20 of 29 account for the flux coming from larger spatial scales leading to potential flux losses. Tappering techniques can be used to mitigate the aforementioned purity and completeness issues, but they come at the expense of the survey depth. Combining high resolution and low resolution array configurations improves the purity by reducing the number of independent beams, which avoids compromising the survey depth to mitigate false detections and improves completeness with minimum flux biases in a wider range of intrinsic source properties.
Our analysis regarding source sizes in Sect. 3.2.1 concludes that dust continuum emission occurs in compact regions, in line with a variety of literature studies in the ALMA era (e.g., Simpson et al. 2015a;Ikarashi et al. 2015;Hodge et al. 2016;Fujimoto et al. 2017Fujimoto et al. , 2018Gómez-Guijarro et al. 2018;Elbaz et al. 2018;Franco et al. 2018;Rujopakarn et al. 2019;Gullberg et al. 2019). One remaining question to address is the reason why the sources that appear in the low resolution but do not in the high resolution, although skewed to slightly larger sizes and with larger scatter as shown in Fig. 6, did not appear originally in the high resolution dataset being overall compact sources. The most relevant property that distinguish them are the flux densities. The number counts analysis in Sect. 4.3 reflects that the sources detected in the high resolution dataset are brighter than those that appear in the low resolution but do not in the high resolution dataset. The answer to this question comes jointly from purity and completeness as outlined above. In terms of purity, a compact source that appears in the low resolution but do not in the high resolution dataset, being fainter has a lower detection S /N peak compared to another similarly compact source with a higher flux density. These sources have a higher purity in the low resolution compared to that in the high resolution dataset. This effect is seen in the resulting σ p (directly related to S /N peak ) for a purity of p = 1 found to be σ p ≥ 4.2 in the low resolution map and σ p ≥ 5.2 in the high resolution map. Besides, the Tables 2 and 3 are ordered with decreasing detection S /N peak . As we move down the table the sources are no longer detected in the high resolution dataset (see Table 2, Col. 10). In terms of completeness, a compact source that appear in the low resolution but do not in the high resolution dataset, being fainter has a lower completeness compared to another similarly compact source with a higher flux density. These sources have a higher completeness in the low resolution compared to that in the high resolution dataset. Both to mitigate purity and completeness issues, this is the reason why in Franco et al. (2018) a tapering technique was applied at the expense of lowering the survey sensitivity to an average of 182 µJy beam −1 at an angular resolution of 0. 614 × 0. 587.

Conversion of angular into physical sizes: redshift and stellar mass dependency
It is well known that the stellar sizes measured at optical wavelengths vary with redshift and stellar mass for both early and late-type galaxies (e.g., van der Wel et al. 2014). Galaxies are smaller with increasing redshift at fixed stellar mass and larger with increasing stellar mass at fixed redshift. Therefore, in order to fairly compare galaxy sizes they need to be expressed in the same terms of redshift and stellar mass. We could correct the dust continuum sizes of each source to a common redshift and stellar mass by using the R e (z, M * ) dependency of late-type galaxies of van der Wel et al. (2014). However, we first need to verify the hypothesis of whether dust continuum sizes also vary in terms of redshift and stellar mass resembling the behavior known for the stellar sizes measured at optical wavelengths.
In order to verify the aforementioned assumption, we split the source catalog in four bins according to whether the redshifts and stellar masses are above or below the median values of the whole 100% pure main plus prior-based supplementary catalogs (z med = 2.46 and log(M * med /M ) = 10.79). We stacked the sources in each bin, measuring the dust continuum size of the stack. This approach has the advantage of higher number statistics, although the ALMA centroids become less constrained for increasingly lower detection S /N peak and, thus, potentially leading to an artificial increase of the stacked size (e.g., Fujimoto et al. 2017). Alternatively, we also performed the stacks using the Spitzer/IRAC 3.6 counterparts, detected at much higher S/N and, thus, with better constrained centroids. The stacking was performed in the uv plane as follows: for each source pair of coordinates to be stacked we searched for all the pointings that contained data on that source (each source is covered by six pointings typically). Next, each pointing was phase shifted to set the phase center at the source coordinates using the CASA task fixvis. Data and weights are modified to apply the appropriate primary beam correction that correspond to the phase shift with the CASA toolkit MeasurementSet module. After that, by using the CASA task fixplanets the phase center was set to a common pair of coordinates α = 00 h 00 m 00 s; δ = 00 d 00 m 00 s (J2000) for all the pointings to be stacked and the visibility weights recomputed with the task statwt. Finally, the pointings were concatenated into a single measurement set, where the weight of each source was normalized by its flux density. We measured the sizes of the stacks as explained in Sect. 3.2 employing the CASA task UVMODELFIT fitting a Gaussian model with fixed circular axis ratio. There are no significant differences in the stacked dust continuum sizes when ALMA or Spitzer/IRAC 3.6 centroids are used. Finally, we quantified the R e (z, M * ) dependency to be compared with that for the stellar sizes measured at optical wavelengths by fitting the dust continuum sizes measured for the stacks in each bin using the expressions: where A is a normalization constant given in kpc and α expresses the R e −M * dependency, and where B is a normalization constant given in kpc and β expresses the R e −z dependency. In Table 6 we present the dust continuum sizes measured for the stacks in each bin. The results of the fits are shown in Table 7. In Fig. 15 we show the R e −M * and R e −z planes displaying the sources in the 100% pure main catalog, the dust continuum sizes measured for the stacks, and the fits. Along with these, we display the stellar size measured at optical wavelengths evolution with redshift and stellar mass for both early and late-type galaxies from van der Wel et al. (2014) at the median values of the source catalog (z med = 2.46 and log(M * med /M ) = 10.79) for comparison.
Dust continuum sizes appear to be smaller with increasing redshift at fixed stellar mass and larger with increasing stellar mass at fixed redshift. Overall, the R e −M * dependency given by α resembles that of the stellar sizes measured at optical wavelengths of late-type galaxies in (e.g., van der Wel et al. 2014), albeit a lower normalization constant. The R e −z dependency given by β resembles as well that of the stellar sizes measured at optical wavelengths of late-type galaxies in (e.g., van der Wel et al. 2014), with a lower normalization constant. Therefore, we A43, page 21 of 29 A&A 658, A43 (2022) Notes. The columns show the size measurements for the stacks of sources in four redshift and stellar mass bins according to their location relative to the median values of the source catalog (z med = 2.46 and log(M * med /M ) = 10.79). The number of sources in each bin is also indicated. The physical dust continuum sizes in kpc are calculated at the median redshift and stellar mass of each bin.
Notes. R e −M * at fixed redshift and R e −z * at fixed stellar mass fit to Eqs. (9) and (10), respectively. Results are given by fitting the sizes measured for stacks of sources in bins of redshift and stellar mass shown in Table 6.
verified the hypothesis of whether dust continuum sizes vary in terms of redshift and stellar mass resembling the behavior known for the stellar sizes measured at optical wavelengths. At the moment the relatively small number of sources in this work for this type of study does not allow us to better constrain α and β or to group the sources within narrower redshift and stellar mass bins to explore the R e (z, M * ) dependency for dust continuum size measured at 1.1 mm more accurately. Consequently, to correct the dust continuum sizes of each source to a common redshift and stellar mass, we employed the R e (z, M * ) dependency of late-type galaxies of van der Wel et al. (2014). The lower normalization values of dust continuum sizes compared to those of the stellar sizes of late-type galaxies better resemble those of early-type galaxies. This fact reinforces the idea of the stellar mass-size place locus as a proof of DSFGs being progenitors of massive elliptical galaxies (e.g., Toft et al. 2014;Barro et al. 2016;Gómez-Guijarro et al. 2018;Tadaki et al. 2020;Franco et al. 2020a) and, reflected in the compact sizes of various far-IR to radio tracers, the build-up of central stellar cores prior to the quenching of star formation (e.g., Barro et al. 2017;Jiménez-Andrade et al. 2019;Gómez-Guijarro et al. 2019;Puglisi et al. 2021;Suess et al. 2021).
In the literature Fujimoto et al. (2017) studied dust continuum sizes at 1.1 mm for a large compilation of 1627 massive SFGs observed with ALMA, finding that dust continuum sizes are more compact than those at UV/optical wavelengths, although by a factor somewhat smaller than the factor 3−4 more compact than the typical sizes of stellar disks for latetype galaxies at UV/optical wavelengths we find. Fujimoto et al. (2017) also argued that dust continuum sizes at 1.1 mm follow a similar evolutionary trend with redshift than the stellar sizes of late-type galaxies. Jiménez-Andrade et al. (2019) studied radio sizes at 3 GHz for a mass-complete sample of 3184 radio-selected SFGs and found a flatter evolutionary trend with redshift β = −0.26 ± 0.08 (0.12 ± 0.14) for galaxies on (above) the MS of SFGs, but no clear variation of radio sizes with stellar mass. Using 3 GHz and 6 GHz radio emission Jiménez-Andrade et al. (2021) found radio emission more compact by a factor 2−3 than UV/optical sizes and also no variation of radio sizes with stellar mass.

The systematicy of compactness in DSFGs
Our results conclude that dust continuum emission occurs in compact regions. However, GOODS-ALMA 2.0 is a flux limited survey and we need to understand the extend of this conclusion in terms of the flux density completeness limits. In addition, the sources that appear in the low resolution but do not in the high resolution, although compact, are skewed to slightly larger sizes and with larger scatter as shown in Fig. 6. These sources are mainly characterized by lower flux densities, but are also located at slightly lower redshifts and stellar masses.
In order to be able to fairly compare galaxy sizes across flux densities, they need to be expressed in the same terms of redshift and stellar mass as explained above, where we verified that dust continuum sizes evolve with redshift and stellar mass resembling the trends of the stellar sizes measured at optical wavelengths, albeit a lower normalization compared to those of late-type galaxies. Therefore, we corrected the dust continuum sizes of each source to a common redshift and stellar mass as given by the median values of the source catalog (z med = 2.46 and log(M * med /M ) = 10.79) by using the R e (z, M * ) dependency of late-type galaxies of van der Wel et al. (2014). In Fig. 16 we show the corrected dust continuum sizes at 1.1 mm as a function of the 1.1 mm flux densities for the sources in the 100% pure plus prior-based catalogs from the combined dataset. We also display the sources in the prior-based supplementary catalog even if their sizes are unreliable owing to their low S/N as we do know their fluxes and, thus, while their position in the y-axis is uncertain, their location in the x-axis is well constrained. We clearly see that the upper-right quadrant of the plane is empty.
In Fig. 16 we also show the typical size of the stellar distribution measured at optical wavelengths for both early and late-type galaxies from van der Wel et al. (2014) evaluated at the common redshift and stellar mass (z med = 2.46 and log(M * med /M ) = 10.79). Sources with S 1.1 mm > 1 mJy are always below the typical size of star-forming stellar disks with a median corrected size of R cor e = 0.72 ± 0.03 kpc (R e = 0. 094 ± 0. 004), meaning that there is no dust continuum emission as extended as typical star-forming stellar disks in the S 1.1 mm > 1 mJy regime. At most only one source is consistent within the scatter of the typical size of star-forming stellar disks, namely AGS17 (A2GS7). However, this galaxy exhibits signs of being a merger with a double peak dust continuum emission (see Fig. A.1). It was also reported as an extended source in the analysis of Franco et al. (2018), where the extraction was carried out in the high resolution dataset using a tapered map with a homogeneous and circular synthesized beam of 0. 6 FWHM under the assumption that the sources were point-like at that angular resolution. AGS17 was reported as one of the outliers that did not meet the pointlike criteria.
In terms of completeness, in Fig. 16 we also show the ∼100% completeness region (blank area without the line grid) drawn from the simulations for sources up to 1 FWHM presented in Sect. 4.1. This angular size corresponds to a physical size of R e = 4.05 kpc at the common redshift and stellar mass  Tables 6 and 7 are displayed. The dust continuum sizes measured for individual sources in the 100% pure main catalog are also shown distinguishing sources with a detection S /N peak > 6.5 (big black circles, representing approximately the top third of the source catalog) and sources with a detection 5 < S /N peak < 6.5 (small black circles). For comparison, the stellar size measured at optical wavelengths evolution with redshift and stellar mass for both early and late-type galaxies from van der Wel et al. (2014)  (z med = 2.46 and log(M * med /M ) = 10.79). We are ∼100% complete for sources with S 1.1 mm > 1 mJy covering the scatter of the typical size of star-forming stellar disks. Therefore, we conclude that compact dust continuum emission at S 1.1 mm > 1 mJy prevails and that dust continuum emission as extended as typical star-forming stellar disks at these flux densities is rare (at most we only detected one source in the entire 72.42 arcmin 2 area of the survey, if we consider AGS17 as such type of source).
At lower flux densities in the S 1.1 mm < 1 mJy regime the picture is less clear. In this flux regime we have a mix of sources in the 100% pure main catalog with reliable size measurements, plus all sources in the prior-based supplementary catalog with unreliable size measurements. The median corrected size of the S 1.1 mm < 1 mJy sources in the 100% pure main catalog is R cor e = 0.92±0.06 kpc (R e = 0. 138±0. 009). This value appears slightly more extended compared to that of the S 1.1 mm > 1 mJy sources. In order to confirm the slight size difference between both flux density regimes we stacked the sources at S 1.1 mm > 1 mJy and at S 1.1 mm < 1 mJy and measured the dust continuum size of the stacks as explain in Sect. 6.2. S 1.1 mm > 1 mJy sources result in R cor e = 0.80 ± 0.05 kpc and S 1.1 mm < 1 mJy sources result in R cor e = 0.97 ± 0.05 kpc, confirming the slight size difference in these flux density regimes.
It is clear is that, beyond any slight difference between flux density regimes, even in the S 1.1 mm < 1 mJy regime most of the sources appear compact below the sizes of typical star-forming stellar disks as in the S 1.1 mm > 1 mJy regime. The most distinct characteristic of the S 1.1 mm < 1 mJy regime is the larger scatter with some of the sources possibly entering the scatter of the typical size of star-forming stellar disks. However, the lower S/N could lead to systematically larger sizes due to an artificial bias at this S/N and larger samples with higher S/N would be required for confirmation. We note also that in the regime of S 1.1 mm < 1 mJy, and specially for larger sources, the completeness drops (see Fig. 8) and further data would be also required to know the abundance of these sources. Naturally, there should be a regime where the millimeter observations get deep enough to start detecting secular star formation in regular star forming disks, with their associated extended dust continuum disks. In the literature some studies have suggested that fainter millimeter sources could be physically larger. Franco et al. (2020b) argued that the prior-based methodology allowed for one to access a population of fainter and slightly larger sources. The ALMA Frontier Fields work by González-López et al. (2017) at 1.1 mm probed the sub-mJy regime and found similar results to our work. The lensing-corrected sizes were in the same range as those measured in brighter samples, with possibly larger dispersion. 2/3 of the sources had sizes a factor ×1.6 larger that the brighter sources and suggested that a substantial portion of the sub-mJy sources may be mildly more extended than brighter ones. Rujopakarn et al. (2016) reported more extended galaxywide dust continuum emission at 1.1 mm in a sample of 11 normal MS SFGs, Cheng et al. (2020) also found extended dust continuum emission at 870 µm in four DSFGs, mostly with mild IR luminosities log(L IR /L ) < 12.0, and Sun et al. (2021) also found extended dust continuum emission at 1.3 µm in two DSFGs with mild IR surface brightness associated to less vigorous star formation.

Summary and conclusions
The GOODS-ALMA survey is a 1.1 mm galaxy survey carried out with ALMA in the GOODS-South field. GOODS-ALMA 2.0 covers a continuous area of 72.42 arcmin 2 at a homogeneous sensitivity with two different array configurations: a more extended configuration providing the high-resolution small spatial scales (high resolution dataset) and a more compact configuration supplying the low resolution large spatial scales (low resolution dataset). The results based on the first high resolution dataset alone were already published in Franco et al. (2018Franco et al. ( , 2020a and Zhou et al. (2020). In this 2.0 version we present the low resolution dataset and its combination with the high resolution dataset (combined dataset), reaching an average sensitivity of σ = 68.4 µJy beam −1 at an average angular resolution  Fig. 16. Physical dust continuum size at 1.1 mm corrected to a common redshift and stellar mass (z med = 2.46 and log(M * med /M ) = 10.79) versus 1.1 mm flux density for the sources in the 100% pure plus priorbased catalogs from the combined dataset. For sources in the 100% pure main catalog we distinguish detection S /N peak > 6.5 (big black circles, representing approximately the top third of the source catalog) and detection 5 < S /N peak < 6.5 (small black circles). The sizes of sources in the prior-based supplementary catalog (which have a detection S /N peak < 5) are unreliable (shown as crosses), but their 1.1 mm flux densities are well constrained and, thus, we know their position in the x-axis. Median corrected dust continuum sizes for S 1.1 mm > 1 mJy sources (orange circle) and S 1.1 mm < 1 mJy sources (brown circle) are also displayed. The typical size of the stellar distribution measured at optical wavelengths for both early and late-type galaxies from van der Wel et al. (2014) at z med and M * med are also shown with their scatter as a shaded areas. The grid of purple lines shows the region where we are no longer ∼100% complete. Compact dust continuum emission at S 1.1 mm > 1 mJy prevails and sizes as extended as typical star-forming stellar disks in this flux density regime are rare. of 0. 447 × 0. 418. In particular we construct a source catalog, deriving number counts, and dust continuum sizes at 1.1 mm. In summary we find: -A total of 88 galaxies are detected in a blind search, compared to 35 in the high resolution dataset alone. We find 44 sources with a detection S /N peak ≥ 5 associated to a purity p = 1 (100% pure main catalog). Using a prior-based methodology we find another 44 sources with a detection 3.5 ≤ S /N peak ≤ 5 (prior-based supplementary catalog). -We find a total of 13 optically dark/faint sources (ALMA detected H-or K-band dropouts). This adds seven sources to those already reported in the GOODS-ALMA survey. -Number counts are fully consistent with the wealth of literature studies. We derived best-fit parameters to the differential and cumulative number counts from a compilation of ALMA-based studies at 850 µm-1.3 mm (see Table 5).
In addition, we dissected the contribution to the number counts from our different datasets: while the high resolution dataset is efficient at picking up the bright end of the number counts at S 1.1 mm > 1 mJy, it missed sources at S 1.1 mm < 1 mJy that appear in the low resolution dataset, which is efficient at retrieving sources for a wide range of flux densities. The combination of the high resolution and low resolution datasets (combined dataset), along with the prior methodology, allow us to be more complete at the faint end S 1.1 mm < 1 mJy regime.
-Dust continuum sizes at 1.1 mm are generally compact with a median effective radius of R e = 0. 10±0. 05, corresponding to a physical size of R e = 0.73 ± 0.29 kpc (at the redshift of each source). This result takes advantage of the improved uv coverage and sensitivity in a wider range of spatial scales given the combination of high resolution and low resolution datasets from two array configurations. -Dust continuum sizes at 1.1 mm evolve with redshift and stellar mass resembling the trends of the stellar sizes measured at optical wavelengths, albeit a lower normalization compared to those of late-type galaxies. -Compact dust continuum emission at 1.1 mm prevails for sources with flux densities S 1.1 mm > 1 mJy. Sizes as extended as typical star-forming stellar disks are rare. -At S 1.1 mm < 1 mJy, dust continuum emission at 1.1 mm appears slightly more extended, although they are still generally compact below the sizes of typical star-forming stellar disks. A larger scatter in the sizes in this flux regime is also seen, with some of the sources possibly entering the regime of the typical size of star-forming stellar disk, but the lower S/N and completeness associated to this flux regime would require further data to confirm and evaluate the abundance of such sources. After covering a large contiguous area using two array configurations at a similar and homogeneous depth providing both small and large spatial scales, our findings indicate that dust continuum emission occurring in compact regions smaller than the stellar distribution appears to be a norm in DSFGs. Table B.1 contains the extra 16 sources that we labeled as uncertain drawn from the analysis or the prior methodology and stellar mass verification pushed to the limit as described in Sect. 3.1.2. For IRAC priors, at ≤ 1. 2 and log(M * /M ) ≥ 10.0 there exists an excess of positive sources with a massive counterpart associated compared to negative detections, expecting around three to be spurious. For VLA, there are no negative detections found at log(M * /M ) ≥ 9.0 for counterparts at ≤ 1. 0.