Diagnosing deceivingly cold dusty galaxies at 3.5<z<6: a substantial population of compact starbursts with high infrared optical depths

Using NOEMA and ALMA 3mm line scans, we measure spectroscopic redshifts of six new dusty galaxies at 3.5<z<4.2 by solidly detecting [CI](1-0) and CO transitions. The sample was selected from the COSMOS and GOODS-North super-deblended catalogs with FIR photometric redshifts z>6, based on template IR spectrum energy distribution (SED) from known submillimeter galaxies at z=4--6. Dust SED analyses explain the photo-z overestimate from seemingly cold dust temperatures (Td) and steep Rayleigh-Jeans (RJ) slopes, providing additional examples of cold dusty galaxies impacted by the Cosmic Microwave Background (CMB). We therefore study the general properties of the enlarged sample of 10 ``cold"dusty galaxies over 3.5<z<6. We conclude that these galaxies are deceivingly cold at the surface but actually warm in their starbursting cores. Several lines of evidence support this scenario: (1) The high infrared surface density and cold Td from optically thin models appear to violate the Stefan-Boltzmann law; (2) the gas masses derived from optically thin dust masses are inconsistent with estimates from dynamics and CI luminosities; (3) the implied high star formation efficiencies would conflict with cold Td; (4) high FIR optical depth is implied even using the lower, optically-thick dust masses. This work confirms the existence of a substantial population of deceivingly cold, compact dusty starburst galaxies at z>~4, together with the severe impact of the CMB on their RJ observables, paving the way for the diagnostics of optically thick dust in the early universe. Conventional gas mass estimates based on RJ dust continuum luminosities implicitly assume an optically thin case, overestimating gas masses by a factor of 2--3 on average in compact dusty star-forming galaxies.


Introduction
The interstellar medium (ISM) is key to diagnosing the evolutionary state of galaxies in the early Universe. In particular, the available reservoir of molecular hydrogen (M H 2 ) dictates the ongoing star-and subsequent dust formation. However, measuring the molecular gas mass M gas of galaxies, both in the local and the high−z Universe is not a trivial task, as H 2 lacks dipole moment and thus is not detectable under the most typical, most representative ISM conditions. As gas and dust are well mixed (e.g., Bohlin et al. 1978;Boulanger et al. 1996), an efficient method to infer M gas indirectly is to use far-infrared(FIR)/submillimeter(submm) dust continuum observations to derive the (observationally cheaper) dust mass (M dust ) Marie Curie Fellow of a galaxy that is subsequently converted to M gas through the metallicity-dependent dust-to-gas-mass ratio technique (e.g., Leroy et al. 2011;Eales et al. 2012;Magdis et al. 2012Magdis et al. , 2017Santini et al. 2014;Genzel et al. 2015;Berta et al. 2016;Schinnerer et al. 2016;Tacconi et al. 2018Tacconi et al. , 2020Liu et al. 2019b;Kokorev et al. 2021). Similarly, single band measurements of the dust emission flux on the Rayleigh-Jeans side of the SED have also been calibrated against CO observations and under a series of assumptions (e.g., dust temperature, α CO conversion factor) can be used to infer gas mass estimates (e.g., Groves et al. 2015;Scoville et al. 2016Scoville et al. , 2017aSchinnerer et al. 2016;Liu et al. 2019b).
However, the derivation of M gas from M dust or from a single band Rayleigh-Jeans (RJ) dust continuum flux density are usually based on the assumption that the observed FIR/submm Article number, page 1 of 14 arXiv:2206.10401v2 [astro-ph.GA] 23 Mar 2023 A&A proofs: manuscript no. aanda emission is optically thin. If instead the dust opacity of a galaxy extends to λ > 100µm, its M dust based on optically thin models can be severely overestimated (a factor of 2-20 e.g., Galliano et al. 2011;Jin et al. 2019;Cortzen et al. 2020;Fanciullo et al. 2020). Subsequently, the inferred M gas would also be problematic. While there is mounting evidence for optically thick dust emission in both local and high-z dusty star-forming galaxies (e.g., Blain et al. 2003;Draine & Li 2007;Conley et al. 2011;Casey 2012;Spilker et al. 2016;Hodge et al. 2016;Scoville et al. 2017b;Simpson et al. 2017;Riechers et al. 2014Riechers et al. , 2017Riechers et al. , 2020Cortzen et al. 2020), the large degeneracy between the solutions of the thick and thin dust model SEDs prevents a definitive conclusion.
Besides the dust SEDs, submm lines provide more powerful diagnostics for ISM properties, revealing the presence of optically thick dust emission at FIR wavelengths in the starbursting cores of submillimeter galaxies (SMGs) in the local Universe. For example, Papadopoulos et al. (2010) developed a method to identify the optically thick dust using ratios between optically thick and optically thin lines in local galaxies (e.g., HCN(1-0)/CO(1-0) and CO(6-5)/CO(3-2)) and proposed that dust optical depths can significantly suppress high-J CO and C + emission. Based on James Clerk Maxwell Telescope (JCMT) observations, they found that the very faint CO(6-5) lines in Arp220 and other ultraluminous infrared galaxies (ULIRGs) can be attributed to high dust optical depths at FIR wavelengths immersing those lines in a strong dust continuum. Furthermore, using high-resolution ALMA CO(1-0) mapping, Scoville et al. (2017b) confirmed that the dust in the two nuclei of Arp 220 is optically thick well into the FIR, and in one of them even at 2.6 mm with a high dust brightness temperature of 147 K.
Such high optical depths are also expected in the ISM of high-z star-forming galaxies that were more compact and rich in gas and dust. For example, Jin et al. (2019) discovered a sample of "apparently" extremely cold dusty galaxies at z = 4-6 using ALMA line scans. Their dust spectral energy distributions (SEDs) peak at longer rest-frame wavelengths than most literature sources at z > 4, and their dust temperatures (24-42 K) as inferred with optically thin models are between 1.5 and 2.0 times colder than that of main sequence (MS) galaxies at these redshifts. Given the massive dust content and compact morphology of these galaxies, the "apparent" cold dust temperatures are best understood as evidence for large optical depths that effectively attenuate the dust continuum emission in the FIR and shift the peak of the observed SED to longer wavelengths. In the same direction, Cortzen et al. (2020) used the two neutral carbon transitions in GN20, a star-bursting galaxy at z = 4.05, and found that the excitation temperature as inferred by the [CI](2-1)/(1-0) ratio is significantly higher than the abnormally cold dust temperature derived from SEDs under the assumption of an optically thin FIR dust emission. On the other hand, the excitation temperature is fully consistent with the dust temperature of a general opacity model with λ eff ≈ 170µm. These findings suggest that high FIR optical depths could be a widespread feature of starbursts across comic time. Follow-up observations of seemingly "cold", high-z star-bursting galaxies with compact-dust-emitting sizes are necessary to push this investigation forward.
In this paper, we present an in-depth investigation of the ISM properties of six such galaxies based on new NOEMA and ALMA observations. These six galaxies are used to extend the original sample of Jin et al. (2019) to a total of ten, which is used in the remainder of this paper. In Section 2 we describe the sample selection and data processing. In Section 3 we discuss redshift confirmation, SED modeling, and size measurements.
We present evidence of optically thick dust in Section 4. We discuss the implication of these results in Section 5 and provide our summary and conclusions in Section 6. We adopt cosmology H 0 = 73, Ω M = 0.27, Λ 0 = 0.73; and a Chabrier initial mass function (IMF) (Chabrier 2003).

Sample, observations, and data reduction
Six new dusty star-forming galaxies were selected for follow-up observations in this study, after being culled for potentially lying at very high redshifts (z > 6) based on their SEDs. We show their multi-wavelength images in Fig. 1, including deep ALMA and NOEMA continuum data. The NOEMA and ALMA observations are summarized in Table 1. Five are in the COSMOS field and one is in the GOODS-North field.
Among the five COMSOS sources, ID3117, 5340, 6309, and 7549 were selected from the COSMOS super-deblended catalog (Jin et al. 2018). They were originally detected in the VLA 3GHz catalog (Smolčić et al. 2017) and are optically dark, that is, with no detection in HST I or H band, and are therefore not included in the COSMOS2015 catalog (Laigle et al. 2016). In the COS-MOS super-deblended catalog (Jin et al. 2018), these four galaxies are detected with significant FIR emission (S/N FIR+mm > 5) in Herschel, SCUBA2, AzTEC, and/or MAMBO (Geach et al. 2017;Cowie et al. 2017;Aretxaga et al. 2011;Bertoldi et al. 2007). Fitting of their dust SEDs returns very high photometric redshifts of z phot,FIR > 6 using the z = 6.3 HFLS3 template (Riechers et al. 2013). We carried out follow-up observations of the four galaxies with NOEMA 3mm line scans (ID: S18DI, PI: S. Jin) in the winter 2018 semester.
ID12646 was selected from the FIR/(sub)mm superdeblended catalog in the GOODS-North field  as the source with the highest photometric redshift z phot,FIR ∼ 6. During 2017 to 2019, Liu et al. carried out follow-up analyses of this source with NOEMA 3mm (IDs: W17EK, W19DQ) and 1mm observations (ID: W18EY).

NOEMA
The four galaxies, ID3117, 5340, 6309, and 7549, were observed with NOEMA 3mm line scans in the winter of 2018 (ID: S18DI, PI: S. Jin), continuously covering the frequency range 74.7-106 GHz. The observations were conducted in track-sharing mode with array configurations C and D. Each source was observed with two to three tracks in each tuning, reaching rms sensitivities of 0.13-0.16 mJy per 500 km/s with synthesized beams as listed in Table 1. We re-observed the two sources ID5340 and 6309 with one NOEMA tuning setup in January and February 2021 (ID: W20DM, PIs: E. Daddi & S. Jin) in order to obtain solid [CI](1-0) detections. ID7549 is also well-detected at 2mm continuum with NOEMA (ID: W20DM, PI: S. Jin).
ID12646 was firstly followed up with NOEMA 3mm line scans in the winter of 2017 (IDs: W17EK, PI: D. Liu); however, only one line was detected at 89.7 GHz. As the line might have been CO(6-5) at z = 6.7, a NOEMA 1mm observation (ID: W18EY, PI: D. Liu) was carried out to search for the [CII]158um  Liu et al. (2018) catalog, while the remaining IDs are originally from the radio catalog of Smolčić et al. (2017) and are also used in the COSMOS super-deblended catalog (ID Jin =ID Smolcic +2E7).
S. Jin, et al.: Diagnosing Optically Thick Dust in Deceivingly Cold Galaxies at z ∼ 4  Liu et al. (2018) catalog, while the remaining IDs are from the radio catalog of Smolčić et al. (2017) and correspond to ID+2E7 in the deblended FIR/(sub)mm catalog (Jin et al. 2018). Article number, page 3 of 14 A&A proofs: manuscript no. aanda line, but no line was detected. In the winter of 2019, this galaxy was observed with a NOEMA 3mm setup (ID: W19DQ, PI: D. Liu).
The NOEMA data were reduced and calibrated using IRAM GILDAS software packages. We then produced uv tables to perform further analysis with IRAM GILDAS working on the uv space (visibility) data. The resulting synthesized beam at 3mm is in most cases rather elongated (Table 1) and no source is expected to be resolved given the resolution from the compact NOEMA configurations used. We therefore extracted the spectra by fitting a point source model in the uv space at the known spatial positions from the high-resolution ALMA imaging. The NOEMA spectra are shown in Fig. 2.

ALMA
The above-mentioned four NOEMA sources in COSMOS have public imaging data in the ALMA archive at 345 GHz (ID: 2016.1.00463.S, PI: Y. Matsuda), in which two of them are also observed at 210 GHz (ID: 2016.1.00279.S; PI: I. Oteo). The four galaxies are solidly detected in the dust continuum and their fluxes were accurately measured. No serendipitous line was found in the ALMA data cubes.
ID9316 was observed with five ALMA setups at 3mm, continuously covering frequencies from 89.5 to 103.5 GHz. We processed each setup by reproducing the observatory calibration with their custom-made script based on the Common Astronomy Software Application package (CASA; McMullin et al. 2007). We then exported the data into uvfits format to generate uv tables to perform further analysis with the IRAM GILDAS working on the uv space (visibility). The spectrum of each setup is extracted in the uv space on the position of the continuum peak using a Gaussian kernel determined from the continuum and kept fixed in size and position in each spectral channel. The final spectra, as shown in Fig. 2, are produced by stacking the spectra of the five setups. We measure a redshift of z = 4.072 via detecting multiple lines, including CO(4-3), [CI](1-0), and CO(5-4). The 1mm and 870µm ALMA data of ID9316 were well processed as part of the A 3 COSMOS catalog (Liu et al. 2019a). We therefore directly adopt their measurements.
As a sanity check, we compared the ALMA 870µm photometry for all COSMOS sources with the SCUBA2 850µm photometry from the super-deblended catalog, finding that they are in excellent agreement (see a similar comparison, with consistent results, in Jin et al. 2018 Fig. 18 and 19). The source ID3117 has ACA (Atacama Compact Array) 630µm observations (2019.1.01832.S, PI: J. Zavala), and its 630µm flux from our SED fitting also agrees well with the ACA photometry. Together, these checks confirm the expected high fidelity of the flux measurements from the super-deblended technique.
We emphasize that we analyzed both NOEMA and ALMA data in uv space using the same algorithms that were applied in Jin et al. (2019), Puglisi et al. (2019), and Valentino et al. (2018Valentino et al. ( , 2020a. FIR/(sub)mm photometry is measured using the same Super-deblending technique (Jin et al. 2018;Liu et al. 2018). The consistencies therefore allow us to directly compare results in this work to these previous studies and to merge the samples.

Line detection and redshift identification
Adopting the line-search algorithm from Jin et al. (2019), we blindly searched for lines by sweeping through the full 3 mm continuum-subtracted spectrum. We present their NOEMA and ALMA 3mm spectra in Fig. 2 with detected transitions highlighted in green, and the line measurements are listed in Table 2. Every galaxy is detected in at least two transitions, where the brightest lines are detected with 6-14σ and secondarily bright ones are detected at 4.5-6.0σ. We thus confirm redshifts of all sources to be at z =3.545-4.147. ID3117 has the lowest redshift in this sample, of namely z = 3.545, which is confirmed by CO(3-2) and CO(4-3) emission lines (S/N=9, 13). ID5340, 6309, and 7549 are confirmed at z = 3.815, 4.032, and 4.095 with CO(4-3) and [CI](1-0) emission lines, respectively. ID12646 is detected with three lines: CO(5-4), CO(4-3), and [CI](1-0) at z = 4.147. ID9316 is confirmed at z = 4.074 with ALMA, detecting CO(4-3), CO(5-4), and [CI](1-0) emission with S/N=30, 23, and 19, respectively. An absorption of H 2 O(1 10 -1 01 ) is also detected with S/N=6.2, which appears two times stronger than the continuum at the same frequency, similar to the strong H 2 O absorption found in HFLS3 by Riechers et al. (2022). Notably, five of the six sources are detected with [CI](1-0) transition at 4.5-19σ. Given that most CI detections are in the local Universe (e.g., Jiao et al. 2017Jiao et al. , 2019Jiao et al. , 2021, the largest high-z [CI] sample is at z ∼ 1.2 from (Valentino et al. 2018(Valentino et al. , 2020b, and several [CI](2-1) detections at z ∼ 3 were presented by Yang et al. (2017). This [CI] sample is comparable in size to the SPT ones (Vieira et al. 2013;Bothwell et al. 2017), thus constituting one of the first sizable [CI] samples at z 4.
Remarkably, the redshifts of the five NOEMA sources are all lower than the original photometric redshifts. In the same way as the four galaxies found by Jin et al. (2019), this sample was selected as photometric z > 6 candidates but eventually confirmed at z ∼ 4. We note that the redshift of ID9316 is consistent with our own z phot,IR = 3.9 ± 0.8 from the super-deblended catalog (Jin et al. 2018), but we keep this galaxy in the sample for completeness, distinguishing it from the remaining galaxies where appropriate. The full sample of ten galaxies is studied here as an enlarged version of the sample studied by Jin et al. (2019).

Cold dust SEDs and CMB impact
As mentioned above, the sources are confirmed at lower redshifts than the early expectations. To estimate the characteristic dust temperatures, we performed a detailed SED analysis by fitting their deblended FIR photometry and ALMA/NOEMA continuum fluxes. Following the identical SED recipes adopted in Jin et al. (2019), we fitted the FIR/(sub)mm photometry of this sample in three ways, using: (1) optically thin MBB (modified black body) models from Magdis et al. (2012); (2) optically thin MBB models from Magdis et al. (2012) Table 4 and optically thick SEDs are presented in Fig 3. The radio data points are not directly used for the fittings, but we extrapolated the expected radio emission based on the measured IR luminosities using the mass-stratified IR-radio correlation from Delvecchio et al. (2021). The predicted radio fluxes are in good agreement with the observations for most cases in this sample, showing little evidence for radio AGN emission.
The five NOEMA sources show clearly steep Rayleigh-Jeans slopes β > 2.5 − 2.7 for optically thin MBB models; see the β noCMB thin listed in Table 4. These Rayleigh-Jeans slopes are similar and even steeper than the findings in Jin et al. (2019). Benefiting from high signal-to-noise ratios (S/Ns) from the ALMA and NOEMA photometry, the constraints on the Rayleigh-Jeans  -0.78±0.11 -700 Notes: a The CO(5-4) line is on edge of the spectra, we estimated the line flux assuming an identical line profile to CO(4-3). b The H 2 O(1 1,0 − 1 0,1 ) line is in absorption.
slopes are more robust than those in Jin et al. (2019). However, no indisputable evidence has yet been produced in the literature for galaxies with steep Rayleigh-Jeans slope β > 2. A recent ALMA large survey revealed that the ALESS SMG sample can be characterised by β 1.9 ± 0.4 (da Cunha et al. 2021), which is consistent with the emissivity index of the dust in the Milky Way and other local and high-z galaxies, indicating little evolution in dust grain properties between high-z SMGs and local dusty galaxies. Therefore, the steep Rayleigh-Jeans slopes β > 2 appear not to be genuine in these galaxies, similarly to what is discussed in Jin et al. (2019). On the other hand, the CMB temperature is warmer at higher redshifts, reaching 14 K at z = 4. Article number, page 5 of 14  The black (red) curve shows the best fit to the observed SED accounting (not accounting) for CMB. The blue curve marks the stellar component. Note that the radio data are not included in the fitting, we extrapolated a radio component to the dust SEDs using the infrared-radio correlation from Delvecchio et al. (2021). 3.5 × 10 −3 ) between this sample (including Jin et al. 2019) and Simpson et al. (2017), indicating that these galaxies belong to a new population and are significantly different from general high-z submm galaxies in the literature. Formally, these measurements are inconsistent with the Stefan-Boltzmann law even when accounting for their uncertainties and error bars. The Stefan-Boltzmann (S-B) law applies to blackbody emission, and acts as an upper envelope to the measurable SFR density in real galaxies. It has been shown empirically to confine the range of observations of previously known samples of distant dusty galaxies, e.g., Simpson et al. 2017. However, the Stefan-Boltzmann law is only valid for blackbody emission that is optically thick at all wavelengths, it is unclear if greybodies like galaxies could break the Stefan-Boltzmann envelope.
Here we demonstrated that the Stefan-Boltzmann law is a strict upper limit that should not be violated in any case. The Stefan-Boltzmann law states that the emission of a blackbody depends only on the its size and its temperature T . The emission is optically thick at all wavelength, and the emitted energy is E = AσT 4 . For the case of a grey body, it has E = A σT 4 where is the emissivity ( = 1 − e −τ , τ = (ν/ν 0 ) β ) , and E < E since < 1. We integrated the formula of grey bodies with different effective wavelengths λ eff from 1µm up to 1E7µm, i.e., from optically thin to blackbody. The resulted Stefan-Boltzmann law for greybody can be described as logL IR = a+b×logT d in cgs unit, the coefficients are listed in Table 7 for different λ eff . When a galaxy is thick at all wavelengths, it emits like a blackbody (L IR ∝ T 4 ) rather than a greybody (L IR ∝ T 4+β in the thin limit).
Article number, page 7 of 14 Fig. 3. SEDs of galaxies in this study. Photometry is from the super-deblended catalogs (Jin et al. 2018;Liu et al. 2018) and ALMA and NOEMA measurements. The dust SEDs are fitted with optically thick MBB models accounting for CMB impact on the continuum (da Cunha et al. 2013). The black (red) curve shows the best fit to the observed SED accounting (not accounting) for CMB. The blue curve marks the stellar component. We note that the radio data are not included in the fitting; we extrapolated a radio component to the dust SEDs using the IR-radio correlation from Delvecchio et al. (2021). This reduces the observable dust continuum of cold dusty galaxies notably at the longest wavelengths (da Cunha et al. 2013;Zhang et al. 2016); e.g., 3mm. This suggests that the CMB has an impact on the Rayleigh-Jeans slopes and accounting for it can explain their abnormally steep observed values (Jin et al. 2019).
After accounting for the CMB effect on the continuum using da Cunha et al. (2013) models, the Rayleigh-Jeans slopes reduce to lower best-fitting values β =2.0-2.5. We note that the bestfitting Rayleigh-Jeans slopes of ID3117, 5340, and 6309 remain at β > 2 in optically thin models even though the CMB effect has been accounted for. Their slopes can be further reduced to β ∼ 2 if we fit their SEDs using an optically thick dust template (plus the CMB effect).
In Fig. 4, we compare the dust temperatures from optically thin models to literature data. In the five NOEMA sources, their dust temperatures from optically thin models are 24-26 K, which is two times colder than the average T d of main-sequence (MS) galaxies at the same redshift (Schreiber et al. 2018), and similar to the extreme cold sample in Jin et al. (2019). This work therefore confirms the existence of a new population of apparently Article number, page 6 of 14 Jin et al.: Optically thick dust in galaxies at 3.5 < z < 6   Fig. 4. Dust temperature T d versus redshift for galaxies in the literature and this work. Left panel shows dust temperatures in optically thin models. Right panel shows dust temperatures in optically thick models for galaxies in Jin et al. (2019) and this work, while in optically thin T d for remaining data points. This figure shows a population of cold dusty galaxies at z =4-6 in optically thin models, but they might be comparably warm like main-sequence galaxies if they have optically thick dust. In Fig. 6, we show the Σ IR − T d for different galaxy samples and the normalized greybody models with different λ eff (dashed curves). The thicker dust models tend to approach higher IR surface density with colder T d , but they are limited at the Stefan-Boltzmann law (Σ IR = 5.5 × 10 5 T 4 d,thin ) where λ eff is infinite. The SFR densities derived from global size measurements are lower limits to the actual Σ IR . Hence it is noticeable that most of the galaxies in our new sample violate explicitly the Stefan-Boltzmann envelop. In order to quantify the bias introduced by the mis-use of thin models, we fit thick dust templates using thin models and compare the intrinsic temperatures to the measured ones T d,thin . In Fig. 6-left, we show the 10% systematic bias of T d,thin as a dot-dashed curve, for any data points consistent with or above the curve their T d will be underestimated by at least 10% by thin models. The expression of the dot-dash curve is Σ IR = 1.5 × 10 5 T 4.21 d,thin , providing a useful limit for thin models, above this curve the systematic bias of T d,thin becomes significant and optically thick dust is needed. On the other hand, using T d in optically thick models ( Fig. 6-right) resolves the tension, as the galaxies show a distribution that appears comparable with what was seen for other regular dusty galaxy samples, e.g., the SCUBA2 sources (Simpson et al. 2017) and the ALESS sample (Hodge et al. 2019). Therefore, the Stefan-Boltzmann law disproves the extremely cold T d from thin model and indicates optically thick dust.
In Fig. 6-right, we also compared to the AS2UDS sample (Stach et al. 2019) with quantities calculated by Gullberg et al. (2019) and Dudzevičiūtė et al. (2020), finding that the AS2UDS Article number, page 8 of 14 Right panel shows dust temperatures in optically thick models for galaxies in Jin et al. (2019) and this work, and in optically thin T d for remaining data points. This figure shows a population of cold dusty galaxies at z =4-6 in optically thin models, but they might be comparably warm, as in main sequence galaxies, if they have optically thick dust. cold dusty galaxies at z ≈ 4, further strengthening the conclusion drawn by Jin et al. (2019). However, their dust temperatures would be two times warmer if the dust is optically thick in FIR (Table 4), thus becoming comparably warm as typical dusty starforming galaxies at high-z (Fig. 4-right). We present a diagnosis of the dust opacity of this sample in Sect. 4.

Size measurements
The ancillary 870µm and 1mm data from ALMA provide highresolution imaging for our sample. We measure the sizes of the dust continuum in the uv space using 870µm and/or 1mm observations, following the same method applied in Puglisi et al. (2019) and Valentino et al. (2020b). The size measurements are listed in Table 6. The six sources are well resolved with FWHM sizes in the range of 1.7-3.4 kpc. In Fig. 5, we compare sizes against a control sample of MS galaxies at z ∼ 1.2 (Valentino et al. 2020b;Puglisi et al. 2019) and from the GOODS-ALMA sample (Franco et al. 2020;Gómez-Guijarro et al. 2022a,b). Our sample has a dust continuum FWHM which is 2.4 ± 1.3 times smaller than that of comparable FIR luminous galaxies at z ∼1-2, while it has comparable FWHM sizes with the GOODS-ALMA sample that was selected from high-resolution ALMA 1mm images (resolution= 0.6 , Franco et al. 2020). Interestingly, an anti-correlation is seen in the figure, that is, log(FWHM size/kpc) = −0.38 × log(L IR /10 10 L ) + 1.42, (1) with 1σ scatter of 0.24 dex, indicating galaxies with higher IR luminosity tend to have a more compact dust morphology. This correlation appears to be intrinsic rather than due to selection effects: as the local spirals, LIRGs, and ULIRGs (green crosses in Fig. 5) follow a similar anti-correlation, where high-z dusty galaxies have either larger sizes than the local ones, higher luminosities, or a combination there of. This is consistent with the findings of Tacconi et al. (2006), who showed that SMGs at z =2-3 have larger sizes than local ULIRGs, suggesting that high-z SMGs are scaled-up versions of local ULIRGs at the maximum starburst limit.
This correlation is in opposition with the positive correlation reported in Fujimoto et al. (2017). We note that the sample of Fujimoto et al. (2017)  The solid line shows the best fit to the high-z data points, and the dashed lines mark the 1σ scatter. Both local and high-z galaxies show an anticorrelation between dust continuum size and infrared luminosity. mixes all types of galaxies observed with variable configurations and sensitivities.

Near-infrared and IRAC photometry and stellar mass
We cross-matched this sample to the deepest wide-field nearinfrared(NIR)/IRAC catalogs from COSMOS2020 (Weaver et al. 2022). Two sources, ID6309 and 7549, are found in both The Farmer and Classic versions. Because of the typically greater precision of profile-fitting, we chose to adopt photometry from The Farmer for these two sources. ID5340 is not included in the catalog derived using The Farmer because it is in a masked region where photometry was not performed. However, it is found in the Classic catalog but the IRAC channel 2, 3, and 4 measurements are missing, and so for this source we use Classic photometry (VISTA and IRAC channel 1), and IRAC channel 2 to 4 photometry from the S-COSMOS catalog (Sanders et al. 2007). For the remaining sources in COSMOS (including the four in Jin et al. 2019), their UltraVISTA and IRAC photometry are measured by performing new measurements with the The Farmer (Weaver et al. 2022), adopting point-like models centered on their ALMA positions. Models of known sources neighboring these positions from COSMOS2020 were re-fitted to improve deblending, which is consistent with the methods of the original COS-MOS2020 catalog from The Farmer. The IRAC photometry of the GOODS-N source ID12646 is from the GOODS-Spitzer project (PI: M. Dickinson). Stellar masses, attenuation, and luminosity-weighted ages are summarized in Table 5, which are estimated using the MICHI2 code (Liu et al. 2021) by fitting the NIR+IRAC photometry with star-formation templates from BC03 (Bruzual & Charlot 2003), allowing for dust attenuation (Calzetti et al. 2000) and multiple ages. The E(B − V) values indicate high attenuation, that is, 2 < A V < 3, if adopting the Milky Way reddening parameter R V = 3.1 and 3 < A V < 4.6 if adopting R V = 5 for dense dust, which is in line with the optically thick dust suggested by FIR data (Section 4). The age is not reliably constrained because of the high attenuation. Note that we are fitting stellar and dust SEDs separately rather than using an energy balance between optical and FIR. This is because dust and stellar components are often found to be spatially unassociated in high-z dusty galaxies, e.g., Franco et al. (2018); Hodge et al. (2019). Literature work has shown that in this type of galaxy the star formation from optical emission is just a tiny fraction of the total SFR (Puglisi et al. 2017;Calabrò et al. 2018). Therefore, there is no reason to assume that the dust and stellar are well mixed.

Cold dust temperature against high IR surface
brightness T d -Σ IR Figure 6 shows that the new sample has cold dust temperatures and high IR surface densities that are more extreme than those of the other dusty galaxy samples. By performing a bootstrapped Kolmogorov-Smirnov test in T d -Σ IR space, we found a low probability of similarity (median= 2.4 × 10 −3 , semi-interquartile= 3.5 × 10 −3 ) between this sample (including Jin et al. 2019) and that of Simpson et al. (2017), indicating that these galaxies belong to a new population and are significantly different from general high-z submm galaxies in the literature. Formally, these measurements are inconsistent with the Stefan-Boltzmann law even when accounting for their uncertainties and error bars. The Stefan-Boltzmann (S-B) law applies to black body emission, and acts as an upper envelope to the measurable SFR density in real galaxies. It has been shown empirically to confine the range of observations of previously known samples of distant dusty galaxies; see for example Simpson et al. 2017. However, the Stefan-Boltzmann law is only valid for black body emission that is optically thick at all wavelengths, and it is unclear whether or not grey bodies like galaxies could break the Stefan-Boltzmann envelope.
Here we demonstrate that the Stefan-Boltzmann law is a strict upper limit that should not be violated in any case. The Stefan-Boltzmann law states that the emission of a black body depends only on its size and temperature T . The emission is optically thick at all wavelengths, and the emitted energy is E = AσT 4 . For the case of a grey body, it has E = A σT 4 where is the emissivity ( = 1−e −τ , τ = (ν/ν 0 ) β ) , and E < E because < 1. We integrated the formula of grey bodies with different effective wavelengths λ eff from 1µm up to 1E7µm, that is, from optically thin to black body. The resulting Stefan-Boltzmann law for grey body can be described as logL IR = a + b × logT d in cgs Article number, page 8 of 14 Jin et al.: Optically thick dust in galaxies at 3.5 < z < 6  fect the [CI](1-0) luminosities to gas mass conversion factor, e.g., CI/H 2 abundance scales linearly with metallicity Z (Heintz & Watson 2020). This would reduce the requirement for excess metallicity over MS galaxies to factors of 2 − 3×, but it would still be problematic for these gas-rich galaxies. We note though that dynamical constraints discussed in the next section rule out an anomalous [CI](1-0) luminosities to gas mass conversion factor. Therefore, the low SFE/high Z explanation for cold dust is strongly disfavored, and the optically thick dust becomes the only rational explanation left.
We note that recent studies have questioned the molecular gas mass estimates from CI. For example, Izumi et al. (2020) found that [CI](1-0) luminosity is enhanced by a factor of 10 by type 1 AGN in a local galaxy. Similar cases are also found in other local galaxies (Liu et al. 2022). Such extreme cases occur only in the very central region (∼300 pc) around luminous AGN where a non-LTE condition exists. However, such bias would be diluted in unresolved galaxies or starbursting cores, significantly reducing the impact on global molecular gas estimates. This is also supported by CI+CO study in a local starburst galaxy from Salak et al. (2019), they found [CI](1-0)/CO(21) is uniform within statistical errors in the central 1 kpc, and suggested that [CI](1-0) luminosity can be used as a CO-equivalent tracer of molecular gas mass. More recently, Papadopoulos et al. (2022) studied line ratios of the two CI transitions of 106 galaxies and found their average excitation conditions to be strongly subthermal. This non-LTE excitation of the CI lines can add uncertainty 4.3. Gas masses from optically thin dust models are overestimated In Jin et al. (2019), we argued that the dust mass of the cold galaxy ID85001674 is overestimated when derived using optically thin dust models, as it leads to an abnormally high gas mass that cannot be reconciled with the gas mass derived from [CI](1-0) emission and from dynamics. On the other hand, the dust mass would be 3× smaller if the dust is optically thick in the FIR and the appropriate thick models are used, bringing the gas mass into agreement with the gas mass derived from [CI](1-0) and from dynamics. Here the same test can be performed in a larger sample, thanks to the larger numbers of [CI](1-0) detections and resolved submm sizes (e.g., Jiao et al. 2017Jiao et al. , 2019Jiao et al. , 2021. We adopt the same approach for this sample, and estimate gas mass in three ways: (1) The gas mass M gas,dust derived from dust mass assuming a gas-to-dust-ratio G/D depending on metallicity. The dust masses are derived from SED fitting (Table 4) 6. IR luminosity surface density vs dust temperature for dusty galaxies in the literature and this work. Left and right panels show dust temperatures from optically thin and thick models, respectively. Blue squares mark data points from Simpson et al. (2017); red diamonds show the ALESS sample in Hodge et al. (2019). The AS2UDS sources (Stach et al. 2019) are shown in open circles, which are calculated using results from (Gullberg et al. 2019) and Dudzevičiūtė et al. (2020). The Stefan-Boltzmann (S-B) law is shown as a solid curve, representing the expected limit for a black body. The dot-dashed curve marks the 10% systematic bias of T d,thin , where above the curve T d would be underestimated more than 10% by thin models. Dashed curves show the expected relations for general models with different λ eff as labeled. This figure provides a diagnostic method identifying optically thick dust. units, and the coefficients are listed in Table 7 for different λ eff . When a galaxy is thick at all wavelengths, it emits like a black body (L IR ∝ T 4 ) rather than a grey body (L IR ∝ T 4+β in the thin limit). In Fig. 6, we show the Σ IR − T d for different galaxy samples and the normalized grey-body models with different λ eff (dashed curves). The thicker dust models tend to approach higher IR surface density with colder T d , but they are limited at the Stefan-Boltzmann law (Σ IR = 5.5 × 10 5 T 4 d,thin ) where λ eff is infinite. The SFR densities derived from global size measurements are lower limits to the actual Σ IR . It is therefore noticeable that most of the galaxies in our new sample explicitly violate the Stefan-Boltzmann envelop. In order to quantify the bias introduced by the misuse of thin models, we fit thick dust templates using thin models and compare the intrinsic temperatures to the measured ones T d,thin . The left panel of Fig. 6 shows the 10% systematic bias of T d,thin as a dot-dashed curve; for any data points consistent with or above the curve, their T d will be underestimated by at least 10% by thin models. The expression of the dot-dash curve is Σ IR = 1.5 × 10 5 T 4.21 d,thin , providing a useful limit for thin models; above this curve the systematic bias of T d,thin becomes significant and optically thick dust is needed. On the other hand, using T d in optically thick models (Fig. 6-right) resolves the tension, as the galaxies show a distribution that appears comparable with what was seen for other regular dusty galaxy samples; for example, the SCUBA2 sources (Simpson et al. 2017) and the ALESS sample (Hodge et al. 2019). Therefore, the Stefan-Boltzmann law dis-Article number, page 9 of 14 A&A proofs: manuscript no. aanda proves the extremely cold T d from the thin model and indicates optically thick dust.
In the right panel of Fig. 6 we also compare with the AS2UDS sample (Stach et al. 2019) with quantities calculated by Gullberg et al. (2019) and Dudzevičiūtė et al. (2020), finding that the AS2UDS sample has more scatter violating the Stefan-Boltzmann limit. We note that the SFR and T d of the AS2UDS are derived using a different method from this work, namely MAGPHYS optical+FIR+radio SED fitting with optical-FIR energy balance; this could lead to systematic errors.

High star formation efficiency from [CI](1-0)
Dust temperature can be expressed in terms of the average intensity of the radiation field U , according to the relation: U = (T d /18.9) 6.04 (Magdis et al. 2012). For the parameter U , the following relation applies U ∝ SFE/Z (i.e., star formation efficiency SFE over metallicity Z, Magdis et al. 2012). The U measurements for our cold sample are approximately ten times lower than those of MS galaxies at z = 4 from Magdis et al. (2017), or about 3.3 times lower than those of MS galaxies at z ∼ 2 (see Fig. 4-right in Jin et al. 2019). Hence, in Jin et al. (2019) we discussed several interpretations for the cold dust temperatures and low U values: (1) the galaxies have optically thick dust in FIR, which are hot in the cores but appear cold for observers; (2) the galaxies have very low SFEs (i.e., long depletion timescales) and/or (3) the ISM is cooling efficiently due to very high metallicities.
Given that five of the six sources in this work have been detected with [CI](1-0) and all FIR SEDs are well constrained, we can directly measure their SFEs to disentangle the above SFE−Z degeneracy. We estimate their molecular gas masses by adopting the empirical calibration from Valentino et al. (2018), assuming a CI excitation temperature T exc,CI = T d,thick . SFRs are converted from IR luminosity from the FIR SED fitting, that is, SFR = L IR × 10 −10 . In Table 6, we list gas depletion times τ dep = M mol,CI /SFR, the inverse of SFEs. We note that the IR luminosity is invariant upon optical depth, and the [CI](1-0) emission is almost always optically thin (see specific discussion in Jin et al. 2019 for these cold galaxies), and therefore the measurements of gas depletion times and SFEs are robust regardless of dust optical depth. We show molecular gas masses and SFRs in the top panel of Fig. 7, and compare them to scaling relations for typically star-forming and starbursting galaxies at z ∼ 2 (Sargent et al. 2014). The two sources with the highest [CI](1-0) luminosities, ID 6309 and 7549, are in line with the MS scaling. The remaining galaxies fall between the MS and starburst scaling relations, showing SFEs higher than MS galaxies by a factor of 1.8-2.5. Even if adopting the canonical CI-to-gas-mass conversion from Weiß et al. (2005), which scales up the M gas,CI by a factor of two, the SFEs would still be in line with the MS scaling. Hence, there is no evidence for anomalously low SFE.
Metallicity might be responsible for the anomalously cold dust. To explain these U measurements together with the observed SFEs, the metallicity Z would need to be between six to eight times higher than that seen in MS galaxies at z ∼ 2. This appears unrealistic for galaxies at z ∼ 4. However, in principle, a much higher metallicity could also affect the [CI](1-0) luminosity-to-gas-mass conversion factor, for example, CI/H 2 abundance scales linearly with metallicity Z (Heintz & Watson 2020). This would reduce the requirement for excess metallicity over MS galaxies to factors of two to three, but it would still be problematic for these gas-rich galaxies. Nevertheless, we note that dynamical constraints discussed in the following section rule out an anomalous [CI](1-0) luminosity-to-gas-mass conversion factor. Therefore, the low SFE/high-Z explanation for cold dust is strongly disfavored, and the optically thick dust becomes the only rational explanation left.
We note that recent studies questioned the molecular gas mass estimates from CI. For example, Izumi et al. (2020) found that [CI](1-0) luminosity is enhanced by a factor of ten by type 1 AGN in a local galaxy. Similar cases are also found in other local galaxies (Liu et al. 2022). Such extreme cases occur only in the very central region (∼300 pc) around luminous AGN where there is non-local thermodynamic equilibrium (non-LTE). However, such bias would be diluted in unresolved galaxies or starbursting cores, significantly reducing the impact on global molecular gas estimates. This is also supported by a study of CI+CO in a local starburst galaxy by Salak et al. (2019), who found [CI](1-0)/CO(2-1) is uniform within statistical errors in the central 1 kpc, and suggested that [CI](1-0) luminosity can be used as a CO-equivalent tracer of molecular gas mass. More recently, Papadopoulos et al. (2022) studied line ratios of the two CI transitions of 106 galaxies and found their average excitation conditions to be strongly subthermal. This non-LTE excitation of the CI lines can add uncertainty in CI-based molecular gas mass estimates, but is only effective when the [CI](2-1) line is used, while the [CI](1-0)-based gas mass is substantially less affected (Papadopoulos et al. 2022). Therefore, [CI](1-0) is still a good gas tracer for galaxies in this work, as supported by the dynamical estimates in the following sections.
In the bottom panel of Fig. 7, we compare this sample to the MS at z = 4 (Schreiber et al. 2015). Intriguingly, all these cold galaxies have high sSFRs (2-30× MS), and therefore clearly belong predominantly to the starburst population.

Gas masses from optically thin dust models are overestimated
Jin et al. (2019) argued that the dust mass of the cold galaxy ID85001674 is overestimated when derived using optically thin dust models, as it leads to an abnormally high gas mass that cannot be reconciled with the gas mass derived from [CI](1-0) emission and from dynamics. On the other hand, the dust mass would be three times smaller if the dust were optically thick in the FIR and the appropriate thick models were used, bringing the gas mass into agreement with the gas mass derived from [CI](1-0) and from dynamics. Here the same test can be performed for a larger sample thanks to the larger numbers of [CI](1-0) detections and resolved submm sizes (e.g., Jiao et al. 2017Jiao et al. , 2019Jiao et al. , 2021. We adopt the same approach for this sample, and estimate gas mass in three ways: (1) The gas mass M gas,dust is derived from dust mass assuming a gas-to-dust ratio, G/D, depending on metallicity. The dust masses are derived from SED fitting (Table 4), for both the thick and thin dust cases. For G/D, the metallicities are derived using the FMR relation Magdis et al. (2012). We added a 20% uncertainty to the derived gas mass estimates to account for variations in the assumed metallicity (see Jin et al. 2019 for examples of the effect).
(2) The gas mass M gas,dyn is derived from the dynamical masses of the galaxies, which are based on CO line widths and submm ALMA sizes, following Daddi et al. (2010) and Coogan et al. (2018); see Table 6. The dynamical masses are converted to total gas mass using the relation: M gas,dyn = M dyn,2Re × 0.8 − M * . The factor 0.8 is used to account for dark matter (Daddi et al. 2010). The inclinations are measured from the sizes of major and minor axes, which are obtained by fitting elliptical models Article number, page 10 of 14 emission of the galaxies 1 . The statistical dispersion in the average is propagated to the dynamical mass error.
(3) Gas mass M gas,CI derived from [CI](1-0) emission. To account for the small dependence on CI excitation temperature we assume this equal to the dust temperature estimated from thick models (Cortzen et al. 2020). The molecular gas mass is derived using Eq. 2 in Valentino et al. (2018), as listed in Table 6. Note that high-J CO transitions are impractical to constrain molecular gas mass due to uncertain excitation Daddi et al. 2015) and can be significantly suppressed by thick dust (Papadopoulos et al. 2010). The [CI](1-0) transition is optically thin in all reasonable conditions, and is thus an optimal tracer of molecular gas. 1 Note that ID3117 and 6309 have relatively narrow linewidths, their dynamical masses appears lower than their stellar masses. This is because ID3117 has low and uncertain inclination angle of i = 16 ± 40 • , making it impossible to obtain a reasonable constraint on its M gas,dyn . ID6309 has a reliable inclination measurement of i = 60 ± 6 • but shows excessive stellar mass with respect to the dynamical mass M * > M dyn,2Re . Therefore we do not constrain their M gas,dyn in this work.
In Fig. 8, we compare the three gas mass derivations. The gas mass from optically thin dust, M gas,dust , are explicitly (3 − 5×) higher than the ones derived from dynamical mass M gas,dyn for 3 galaxies, and 1.4-4× higher than gas mass from [CI](1-0) M gas,CI for 5 galaxies. In Fig. 8-left, only the less extreme galaxy ID9316 falls well on the identity line of M gas,dust − M gas,dyn , while the other source ID12646 appears on the identity line but with large uncertainty on M gas,dyn . In terms of M gas,dust −M gas,CI comparison, ID12646 is the only with M gas,dust agreeing well with M gas,CI . Both comparisons indicate that the dust mass from optically thin dust models are overestimated on average by factors of 2-3. On the other hand, optically thick models predict dust masses that are indeed lower than the thin ones by just about the same factors of 2-3 ( Fig. 8-right) and produce estimates fully consistent on average with both CI and dynamical derivations.
We note that the dynamical masses are calculated using high-J CO(4-3) velocity widths, while sizes are from 1mm dust continuum, i.e., we are assuming that high-J CO gas and dust emission are equivalent. Although in some bright SMGs the low-J CO(1-0) sizes can be much larger than the high-J ones (e.g., Ivison et al. 2011;Riechers et al. 2011), in typical distant starforming galaxies dust continuum sizes are found to be consistent with the high-J CO sizes (Puglisi et al. 2019), and the high-J CO reflecting accurately the star forming gas (Daddi et al. 2015;Liu et al. 2015) that emits from the same region as the dust. We could check this equivalence only for the ALMA 3mmdetected ID9316 in our sample, and found that its CO(4-3) size (0.39 ± 0.07 ) is in agreement with the dust continuum size (Table 6).

High dust opacity at FIR
In Fig. 9, we show direct estimates for the dust opacity at 100µm rest-frame, i.e., τ 100µm = κρR e , where for κ we are adopting the value from Jones et al. (2013) and ρ is the dust volume density assuming spherical geometry. Similarly to what was discussed in terms of the Stefan-Boltzmann law, because the galaxy might actually be composed of clouds, the opacity derived in this way is an upper limit. For our sample we show τ 100µm derived for both optically thin and thick cases. We directly adopted the dust masses (optically thin) and sizes from Valentino et al. (2020a), as a comparison sample. They were measured in a consistent way as in this work. As shown in in Fig. 9, a correlation is seen between τ 100µm and Σ SFR following τ 100µm = 3 × 10 −2 Σ SFR . This correlation is in fact probably driven by the correlation between dust mass and IR luminosity.
Remarkably, the τ 100µm,thin of our galaxies are all greater than 1, even for the optically thick case, which already suggests that the FIR emission at 100µm is generally optically thick, sometimes very much so. In the optically thick case, as shown by the blue squares, the opacity is systematically lower simply because the dust masses are lower, but it is still high (0.5 < τ 100µm < 3.0 with a median of 1.1). This further demonstrates the need for optically thick dust in these galaxies.

Caution for gas mass estimates based on dust continuum
Nowadays, as an alternative approach to CO observations, the Rayleigh-Jeans dust continuum emission is widely used to estimate dust and molecular ISM masses at high redshift (Eales et al. 2012;Magdis et al. 2012;Scoville et al. 2016Scoville et al. , 2017a. However, Article number, page 11 of 14 to the ALMA emission of the galaxies 1 . The statistical dispersion in the average is propagated to the dynamical mass error. (3) The gas mass M gas,CI is derived from [CI](1-0) emission. To account for the small dependence on CI excitation temperature, we assume that CI excitation temperature equals to the dust temperature estimated from thick models (Cortzen et al. 2020). The molecular gas mass is derived using Eq. 2 in Valentino et al. (2018), as listed in Table 6. We note that high-J CO transitions are impractical for constraining molecular gas mass due to uncertain excitation Daddi et al. 2015) and can be significantly suppressed by thick dust (Papadopoulos et al. 2010). The [CI](1-0) transition is optically thin in all reasonable conditions, and is therefore an optimal tracer of molecular gas. 1 We note that ID3117 and 6309 have relatively narrow line widths, their dynamical masses appear lower than their stellar masses. This is because ID3117 has a low and uncertain inclination angle of i = 16 ± 40 • , making it impossible to obtain a reasonable constraint on its M gas,dyn . ID6309 has a reliable inclination measurement of i = 60 ± 6 • but shows excessive stellar mass with respect to the dynamical mass M * > M dyn,2Re . Therefore we do not constrain their M gas,dyn .
In Fig. 8, we compare the three gas mass derivations. The gas masses from optically thin dust, M gas,dust , are explicitly (3 to 5 times) higher than those derived from dynamical mass M gas,dyn for three galaxies, and 1.4-4 times higher than gas mass from [CI](1-0) M gas,CI for five galaxies. In the left panel of Fig. 8, only the less extreme galaxy ID9316 falls well on the identity line of M gas,dust − M gas,dyn , while the other source ID12646 appears on the identity line but with large uncertainty on M gas,dyn . In terms of M gas,dust − M gas,CI comparison, ID12646 is the only with M gas,dust agreeing well with M gas,CI . Both comparisons indicate that the dust masses from optically thin dust models are overestimated by factors of 2-3 on average. On the other hand, optically thick models predict dust masses that are indeed lower than the thin ones by just about the same factors of 2-3 ( Fig. 8right) and produce estimates that are fully consistent on average with both CI and dynamical derivations.
We note that the dynamical masses are calculated using high-J CO(4-3) velocity widths, while sizes are from 1mm dust continuum; that is, we are assuming that high-J CO gas and dust emission are equivalent. Although in some bright SMGs the low-J CO(1-0) sizes can be much larger than the high-J ones (e.g., Ivison et al. 2011;Riechers et al. 2011), the dust continuum sizes in typical distant star-forming galaxies are found to be consistent with the high-J CO sizes (Puglisi et al. 2019), and the high-J CO accurately reflects the star forming gas (Daddi et al. 2015;Liu et al. 2015) that emits from the same region as the dust. We were only able to verify this equivalence for the ALMA 3mmdetected ID9316 in our sample, and find that its CO(4-3) size (0.39 ± 0.07 ) is in agreement with the dust continuum size (Table 6). Figure 9 shows direct estimates for the dust opacity at 100µm rest-frame,that is, τ 100µm = κρR e , where for κ we are adopting the value from Jones et al. (2013) and ρ is the dust volume density assuming spherical geometry. Similarly to what was discussed in terms of the Stefan-Boltzmann law, because the galaxy might actually be composed of clouds, the opacity derived in this way is an upper limit. For our sample, we show τ 100µm derived for both optically thin and thick cases. We directly adopted the dust masses (optically thin) and sizes from Valentino et al. (2020a), as a comparison sample. These were measured in a way that is consistent with the method used in this work. As shown in Fig. 9, a correlation is seen between τ 100µm and Σ SFR following τ 100µm = 3 × 10 −2 Σ SFR . This correlation is in fact probably driven by the correlation between dust mass and IR luminosity.

High dust opacity at FIR
Remarkably, the τ 100µm,thin of our galaxies are all greater than 1, even for the optically thick case, which already suggests that the FIR emission at 100µm is generally optically thick, and sometimes very much so. In the optically thick case, as shown by the blue squares, the opacity is systematically lower simply because the dust masses are lower, but it is still high (0.5 < τ 100µm < 3.0 with a median of 1.1). This further demonstrates the need for optically thick dust in these galaxies. dust emission is presumed to be optically thin in literature studies, and calibrated relations are based on this hypothesis. This assumption will work for typical galaxy populations, but would become problematic for dusty galaxies with compact optically thick dust similar to our sample. As already discussed in Sect. 4, their M dust , M gas and SFEs can be misjudged by the simple Rayleigh-Jeans flux to gas conversion if using optically thin fits. The dust masses in this work are overestimated by 2.2 ± 0.5 using optically thin fits, and simulations indicate that such overestimation can be even up to a factor of 20 when accounting for variable emissivity (Fanciullo et al. 2020). Therefore, conventional gas conversion based on dust continuum should be treated with caution particularly for the most luminous systems.

Discussion
Moreover, the Rayleigh-Jeans slopes of cold dusty galaxies are more substantially impacted by the CMB at high-z, which further complicates the dust and gas mass estimates. According to the SED results with and without CMB (the red and black curves in Fig. 3), the 3mm continua can be lower than the intrinsic fluxes by a factor of 1.3-1.6. This indicates that models that do not account for the impact of CMB would bias the dust and gas masses in apparently cold galaxies, this time towards underestimates. Several literature studies are using 870µm and 1mm continuum observations to constrain gas masses of galaxies at intermediate redshifts for convenience. It is true that the CMB impact is much smaller at these wavelengths, however, the dust mass measurements are less accurate closer to the peak of the SED. Going to longer wavelengths like 2mm or 3mm can provide potentially more reliable derivations, but this only works when properly accounting for the CMB.

A procedure to identify optically thick sources
Based on our results, we suggest a procedure to follow to check whether optically thick models should be considered for a given source/sample. 1) Calculate the dust opacity (upper limit) τ 100µm and SFR surface density Σ SFR (Fig. 9). The threshold above which one should seriously consider optically thick dust is τ 100µm 1 and/or Σ SFR 20 M yr −1 kpc −2 .
2) If a linewidth measurement is available (e.g., from CO), check dynamical estimate of gas masses and compare with gas masses derived from dust using optically thin models. If the latter have substantially larger estimates than the former, this points towards optically thick IR dust. If CI is available, do the same with gas estimates from CI. Note that CI gas estimation also depends on CI/H2 abundance.
3) If a dust temperature T d can be measured from the SED, compute location in Σ IR −T d diagram (Fig. 6) and compare to the Stefan-Boltzmann limit. Optically thick dust is needed if Σ IR 1.5 × 10 5 T 4.21 d,thin (10% bias limit) and is indispensable for galaxies violating the Stefan-Boltzmann law. Also one can compare to the T d vs redshift trend for normal galaxies. 4) if a [CI] excitation temperature is available, a significantly higher value than T d,thin from optically thin models will also suggest optically thick dust (e.g., Cortzen et al. 2020).
Article number, page 12 of 14 Fig. 9. Dust opacity τ 100µm vs SFR surface density Σ SFR . The dashed line shows a fit to the data points under the optically thin assumption, i.e., τ 100µm = 3.1 × 10 −2 Σ SFR . In both the optically thin and thick models, our sample shows high dust opacity at FIR wavelengths. 2012; Magdis et al. 2012;Scoville et al. 2016Scoville et al. , 2017a. However, dust emission is presumed to be optically thin in literature studies, and calibrated relations are based on this hypothesis.
This assumption will work for typical galaxy populations, but would become problematic for dusty galaxies with compact optically thick dust similar to those of our sample. As already discussed in Sect. 4, their M dust , M gas , and SFEs can be misjudged by the simple Rayleigh-Jeans flux-to-gas conversion if using optically thin fits. The dust masses in this work are overestimated by 2.2 ± 0.5 using optically thin fits, and simulations indicate that such overestimation can even be up to a factor of 20 when accounting for variable emissivity (Fanciullo et al. 2020). Therefore, conventional gas conversion based on dust continuum should be treated with caution particularly for the most luminous systems.
Moreover, the Rayleigh-Jeans slopes of cold dusty galaxies are more substantially impacted by the CMB at high-z, which further complicates the dust and gas mass estimates. According to the SED results with and without CMB (the red and black curves in Fig. 3), the 3mm continua can be lower than the intrinsic fluxes by a factor of 1.3-1.6. This indicates that models that do not account for the impact of CMB would bias the dust and gas masses in apparently cold galaxies, this time towards underestimates. Several literature studies are using 870µm and 1mm continuum observations to constrain gas masses of galaxies at intermediate redshifts for convenience. It is true that the CMB impact is much smaller at these wavelengths, but the dust mass measurements are less accurate closer to the peak of the SED. Going to longer wavelengths like 2mm or 3mm can provide potentially more reliable derivations, but this only works when properly accounting for the CMB.

A procedure to identify optically thick sources
Based on our results, we suggest using the following procedure in order to check whether optically thick models should be considered for a given source or sample.
2) If a line width measurement is available (e.g., from CO), check the dynamical estimate of gas masses and compare them with gas masses derived from dust using optically thin models. If the latter have substantially larger estimates than the former, this points towards optically thick IR dust. If CI is available, do the same with gas estimates from CI. We note that CI gas estimation also depends on CI/H2 abundance.
3) If a dust temperature T d can be measured from the SED, compute its location in a Σ IR − T d diagram (Fig. 6) and compare this to the Stefan-Boltzmann limit. Optically thick dust is needed if Σ IR 1.5 × 10 5 T 4.21 d,thin (10% bias limit) and is indispensable for galaxies violating the Stefan-Boltzmann law. Also, one can compare to the T d versus redshift trend for normal galaxies. 4) If a [CI] excitation temperature is available, a significantly higher value than T d,thin from optically thin models will also suggest optically thick dust (e.g., Cortzen et al. 2020).