Massive galaxies on the road to quenching: ALMA observations of powerful high redshift radio galaxies

We present 0.3"(band 6) and 1.5"(band 3) ALMA observations of the (sub)millimeter dust continuum emission for 25 radio galaxies at 1<z<5.2. Our survey reaches a rms flux density of ~50$\mu$Jy in band 6 and ~20$\mu$Jy in band 3. This is an order of magnitude deeper than single-dish 850 $\mu$m observations, and reaches fluxes where synchrotron and thermal dust emission are expected to be of the same order of magnitude. Combining our sensitive ALMA observations with radio data from ATCA, VLA, and IR photometry from Herschel and Spitzer, we have disentangled the synchrotron and thermal dust emission. We determine the star-formation rates (SFR) and AGN IR luminosities using our newly developed spectral energy distribution fitting code MrMoose. We find that synchrotron emission contributes substantially at ~1 mm. Through our sensitive flux limits and accounting for a contribution from synchrotron emission in the mm, we revise downward the median SFR by a factor of 7 compared to previous estimates based solely on Herschel and Spitzer data. The hosts of these radio-loud AGN appear predominantly below the main sequence of star-forming galaxies, indicating that the star formation in many of the host galaxies has been quenched. Future growth of the host galaxies without substantial black hole mass growth will be needed to bring these objects on the local relation between the supermassive black holes and their host galaxies. Given the mismatch in the timescales of any star formation that took place in the host galaxies and lifetime of the AGN, we hypothesize that a key role is played by star formation in depleting the gas before the action of the powerful radio jets quickly drives out the remaining gas. This positive feedback loop of efficient star formation rapidly consuming the gas coupled to the action of the radio jets in removing the residual gas is how massive galaxies are rapidly quenched.


Introduction
The connection between active galactic nuclei (AGN), their host galaxies and environments has been one of the central questions in extra-galactic astrophysics for over 30 years (Balick & Heckman 1982). This contemporary debate centers around two predominant issues concerning the influence of AGN on their environment: Is their influence "positive" or "negative", meaning do they either increase or decrease the star-formation efficiency of their hosts? How do super-massive black holes (SMBHs) regulate their own growth? These two questions are intertwined. When the SMBH is active, it may well regulate its own growth while also enhancing or inhibiting the stellar or baryonic mass growth of its host. The empirical, approximately linear relationship between the mass of SMHB and both galaxy bulge mass and the velocity dispersion (e.g. Magorrian et al. 1998;Geb-hardt et al. 2000;Ferrarese & Merritt 2000;Häring & Rix 2004), suggests that the growth of these two components is concomitant along the observed relationship. However, it is not clear if this relation is causal or if it simply reveals a connection between galaxy-galaxy mergers and that the growth of galaxy components are limited asymptotically (central limit theorem, e.g., Peng 2007;Jahnke & Macciò 2011).
SMBHs and host galaxies share several properties. Both SMBHs and galaxies have exponential cut-offs at the high mass end of their co-moving space densities (e.g., Shankar et al. 2009;Ilbert et al. 2013;Kelly & Shen 2013). The population of both SMBHs and galaxies also exhibit mass downsizing whereby the oldest, in the case of galaxies, and the most massive of SMBHs grew early and rapidly (e.g., Thomas et al. 2005Thomas et al. , 2010Merloni & Heinz 2008). However, there is a mismatch in both the shape and co-moving number density between galaxies and dark matter Article number, page 1 of 47 arXiv:1809.09427v2 [astro-ph.GA] 26 Sep 2018 A&A proofs: manuscript no. SED_ALMA_paper halos, especially at the low and high mass ends of these functions (Benson et al. 2003). Because powerful AGN can have a mechanical and radiative energy output similar to or exceeding that of the binding energy of a massive galaxy and dark matter halo, AGN are thought to play a key role in regulating the baryonic growth of galaxies. Both observations and simulations have suggested that there may be a positive trend between the mean black hole accretion rate and star-formation rate (SFR; e.g., Delvecchio et al. 2015;McAlpine et al. 2017), while the mean SFR as a function of black hole accretion rate shows no correlation for low luminosity sources (e.g., Stanley et al. 2015;McAlpine et al. 2017). One should be cautious when interpreting both theoretical and observational results in the definition of what exactly AGN feedback is and how AGN affect their host galaxies to explain the properties of an ensemble of galaxies (Scholtz et al. 2018). The strength and nature of AGN feedback -the cycle whereby the SMBH regulates both its own growth and that of its host -depends on galaxy mass and morphology. For example, the most massive elliptical galaxies are generally metal-rich and old, while less massive lenticular galaxies, which make up the bulk of the early-type galaxy population, have star formation histories that lasts significantly longer (Thomas et al. 2005(Thomas et al. , 2010Emsellem et al. 2011;Krajnović et al. 2011). Clearly, if AGN feedback plays a crucial role in shaping the ensemble of galaxies, its impact on massive dispersion dominated galaxies must result in somewhat different characteristics in these galaxies compared to rotationally-dominated and predominately less massive lenticular galaxies.
To gain a deeper understanding about how the growth of host galaxy and the SMBH are intertwined, it is important to study the characteristics of the star formation occurring in the host galaxies of actively fueled black holes. Within this context, powerful radio galaxies generally, and high-redshift radio galaxies (HzRGs) in particular, are important test beds of our ideas on the physics underlying AGN feedback. At almost all redshifts, powerful radio-loud AGN are hosted by galaxies that are among the most massive (Bithell & Rees 1990;Lehnert et al. 1992;van Breugel et al. 1998;Rocca-Volmerange et al. 2004;Best et al. 2005). Since we know that at low redshift the star formation history of many of these galaxies was brief, but intense (e.g., Thomas et al. 2005;Tadhunter et al. 2011), and they lie at the exponential high mass tail of the stellar mass distribution (e.g., Seymour et al. 2007;Ilbert et al. 2013), if AGN play an important role in shaping massive galaxies, it is in these galaxies that this must be most evident. Most importantly, HzRGs are luminous sources not only in the radio, but also throughout the mid-infrared (MIR) and sub-mm continuum. This generally implies that they have high rates of star-formation (Archibald et al. 2001;Reuland et al. 2003), and a rapid accretion onto the SMBH. Due to the fact that the broad line region and restframe ultraviolet-optical continuum emission from their accretion disks is obscured, we can observe their stellar emission De Breuck et al. 2010). All of these arguments make HzRGs important targets for understanding the complex relationship between AGN and massive galaxies.
The substantial mechanical and radiative AGN luminosity and apparently significant star-formation rates of HzRGs leads to a quandary. If AGN feedback effectively suppresses black hole and galaxy growth, why do the host galaxies of AGN grow so rapidly (mass doubling times of ∼0.1-1 Gyr, Drouart et al. 2014)? This is the "coordination problem", the apparent contradiction which is a paradoxical situation where the strongest phase of energy and momentum injection into the interstellar medium of the host galaxy by the AGN is not co-concomitant with strong suppression of star formation (see e.g., Drouart et al. 2014Drouart et al. , 2016. Models predict an offset between the fueling of the AGN and the suppression of star formation because the timescales for fueling the AGN is substantially shorter than the star-formation timescale within the host. Could the impact of AGN feedback be to increase the star formation efficiency in galaxies (i.e., "positive feedback"; Silk 2013; Kalfountzou et al. 2017)?
The bulk of the bolometric output of HzRGs is emitted in the infrared (IR; e.g., Miley & De Breuck 2008). Both AGN and star-formation (SF, dust heated by stars) contribute energy to the dust continuum spectral energy distribution; the AGN heats the dust to warm temperatures (T 60 K) emitting in the MIR, while the SF generally heats the dust to lower temperatures (T 60 K) and emits mainly in the far-infrared (FIR). The Herschel space telescope provided an opportunity to cover both sides of the peak of thermal emission allowing us to disentangle the dust components heated by AGN and SF. In our Herschel radio galaxy evolution (HeRGÉ) project, we constrained spectral energy distributions (SEDs) for a sample of 70 radio galaxies at 1< z <5.2 (Drouart et al. 2014). While this led to substantially improved estimates of the SFR, one limitation of Herschel data are their low spatial resolution (e.g., 36 at 500 µm). As shown by several arcsecond resolution follow-up observations, the dust continuum emission often splits into several components, which are not necessarily coincident with the AGN host galaxy (De Breuck et al. 2005;Ivison et al. 2008Ivison et al. , 2012Nesvadba et al. 2009;Emonts et al. 2014;Gullberg et al. 2016). Sub-arcsecond resolution imaging is therefore essential to separate the star formation occurring in the AGN host galaxies from that occurring in the nearby companion galaxies. We have therefore started a large systematic follow-up program of our HeRGÉ sample with the Atacama Large Millimeter Array (ALMA).
As HzRGs are, by selection, the brightest radio sources at each redshift, synchrotron radiation may make a substantial contribution at (sub)mm wavelengths. This was already discussed by Archibald et al. (2001), who concluded the 850 µm fluxes of three sources in their sample of 47 may be dominated by synchrotron emission. However, to make such an assessment, one has to assume that the sub-mm fluxes are a straight power-law extrapolation of the radio SED. While one may expect the spectra to steepen at high frequencies due to aging of high energy electrons, there are also suggestions that the SEDs in at least some HzRGs remain a power law with a constant exponent even at the highest observed radio and/or mm frequencies (Klamer et al. 2006).
The high sensitivity of ALMA also allows us to reach flux density levels more than an order-of-magnitude fainter than previous single-dish observations with LABOCA or SCUBA. Reaching such depths implies that we may reach flux density levels of the extrapolated synchrotron emission in most of the sources in our sample. It is therefore essential to disentangle the thermal dust and synchrotron components. To achieve this, we adopt two strategies: (1) multi-frequency photometry covering the range 10< ν obs <200 GHz, and (2) spatially resolving the radio core and lobes. For this first strategy, we combine our ALMA data with 7 mm and 3 mm observations from the Australia Telescope Compact Array (ATCA; Emonts et al. 2014). For the second strategy, we use the available radio maps from the Very Large Array (VLA; Carilli et al. 1997;Pentericci et al. 2000;De Breuck et al. 2010), which have similar spatial resolution as our ALMA data. A more detailed study of the physics of the high-frequency synchrotron emission is deferred to a forth-coming paper. For now, we simply consider the possibility that synchrotron component will impact our (sub)mm observations and our estimates of the SFRs of the AGN host galaxy.
The three main SED components, synchrotron emission, AGN and SF heated thermal dust emission constitute the SEDs of HzRGs. Because each component potentially makes a significant contribution to the over all SED, it requires an analysis of photometry covering an order-of-magnitude range in spatial scales -sub arcsecond to 10s of arc seconds. To this end, we developed the Multi-resolution and multi-object/origin spectral energy distribution fitting procedure Mr-Moose (Drouart & Falkendal 2018). This versatile code allows us to isolate the SF heated dust emission L IR SF from the two spectrally adjacent components -the AGN heated dust component at higher frequencies and the synchrotron emission at lower frequencies. We revise the L IR SF downwards by a factor of many compared to previous Herschel determinations.
This paper is structured as follows: after introducing the sample and observations in §2, we briefly describe our fitting code in §3. The overall results are described in §4, with detailed descriptions of each individual source given in the appendix. We discuss the implications of our results in §5, and summarize our conclusions in §6.

Sample
Our sample consists of 25 HzRGs over the redshift range 1<z<5.2. This is a subsample of the parent HeRGÉ sample of 70 HzRG a project dedicated to observed HzRGs with Herschel (described in detail in Seymour et al. 2007;De Breuck et al. 2010). To summarize, the parent samples sources were selected to have luminosities at rest-frame 3 GHz greater than 10 26 W Hz −1 and have ultra-steep radio spectra (α=-1.0; S ν ∝ν α , at ν obs ∼1.4 GHz). The parent sample has complete 12-band 3.6 to 850 µm photometry from Spitzer, Herschel SCUBA/LABOCA (De Breuck et al. 2010;Drouart et al. 2014). The subsample of 25 sources observed with ALMA were chosen to be easily observable by ALMA, i.e., the southern part of the parent sample and were grouped in such a way that multiple sources share a phase calibrator to minimize the overheads to the extent possible.

ALMA Observations
ALMA Cycle 2 band 6 (and band 4 for source TN J2007-1316) observations were carried out from June 2014 to September 2015 (Table 1). We used four 1.875 GHz spectral windows, tuned to cover molecular lines at the specific redshift of each source (Mc-Mullin et al. 2007). The data was calibrated in CASA (Common Astronomy Software Application) with the supplied calibration script (with exception of MRC 2224-273, for which the provided script was changed to correctly compensate for different averaging factor in one of the spectral windows). Since a significant fraction of our sources have a low SNR, we decided to optimize the sensitivity by using natural weighting to construct images. For all sources, except TN J2007-1316, atmospheric absorption noise was present in observations, therefore we excluded the affected channels from the final images. The settings used in our data reduction were: cell size of 0.06 arcsec (roughly five times smaller than the beam size), barycentric reference frame (BARY), and the mode "mfs" (multi-frequency synthesis emulation). For the brightest source MRC 0114-211, a phase self-calibration was done which decreased the RMS noise in the final image from 87 µJy to 59 µJy.
The ALMA Cycle 3 band 3 (and band 4 for source MRC 0943-242) observations were conducted from March 2016 to September 2016 (Table 1). We used four 1.875 GHz spectral windows tuned to include the [CI] 3 P 1 − 3 P 0 line in one of the side bands. Just as for the Cycle 2 observations the data were calibrated in CASA with the calibration scripts provided by the observatory and the continuum maps were produced in the same way, i.e., natural weighting, cell size of 0.28 arcsec, BARY and mode mfs, to be consistent over the whole sample.
For the source, 4C 23.56, we use the ALMA band 3 and 6 continuum data presented in Lee et al. (2017). Please see Lee et al. (2017) for details of the observations and data reduction.

ATCA 7 mm and 3 mm data
The ATCA observations of 7 mm continuum in some of the sources were conducted over 2009 − 2013. These continuum data were part of a survey to search for cold molecular CO(1-0) gas in high redshift radio galaxies (Emonts et al. 2014). The data were obtained using the compact hybrid H75, H168 and H214 array configurations, with maximum baselines of 89, 192 and 247 m, respectively. This resulted in synthesized beams ranging from roughly 6−13 arcsec. We used the Compact Array Broadband Backend (CABB; Wilson et al. 2011) with an effective 2 GHz bandpass and 1 MHz channels, centered on the redshifted CO(1-0) line (32−48 GHz; Emonts et al. 2014). The data were calibrated using the software package, MIRIAD (Sault et al. 1995). The observing and data reduction strategy, as well as the basic data products, were previously described in Emonts et al. (2011b) and Emonts et al. (2014). For MRC 0943-242, we also used the ATCA 3 mm ATCA/CABB system on March 21 2012 in the H168 array configurations and on September 30 and Oct 1 2012 in the H214 array configuration to obtain an upper limit of the radio continuum at 88.2 GHz. This frequency corresponds to the redshifted CO(3-2) line, which was not detected in our observations. We used PKS 1253-055 (March) and PKS 0537-441 (Sept/Oct) for bandpass calibration, PKS 0919-260 for frequency gain calibration, and Mars (March) and Uranus (Sept/Oct) to set the absolute flux levels. The total on-source integration time was 5.1 hrs. We used a standard observing and data reduction strategy to calibrate these 3 mm data, matching the strategy of the 7 mm data. The resulting synthesized beam of these 3 mm data is 2.4 × 1.9 (PA 82 • ) after robust +1 weighting (Briggs 1995). The rms noise level of these 3 mm data are 0.3 mJy beam −1 .

Mr-Moose
Mr-Moose is a new SED fitting code, developed with the goal of being able to handle multi-resolution photometric data that have multiple spatially-resolved detections in the same photometric band. The specific motivation for developing this new code is to be able to make full use of the information contained in deep and high resolution ALMA data. When combining ALMA and radio interferometric (such as JVLA and ATCA) data with previous low resolution data (such as Herschel or Spitzer) where the beam is too large to resolve individual components, one needs a SED fitting tool able to handle multiple components in order to make full use of the interferometric data which often contain multiple resolved components.  Table 1: Details about the ALMA observations. Several sources were observed more than once because the first observation did not meet the requested sensitivity and/or resolution. The additional data were included in the final reduction when they improved the signal-to-noise of the resulting image. 1 The ALMA are from Lee et al. (2017), see that study for details concerning the reduction and observations. and the current version operates in MIR to radio wavelengths (Drouart & Falkendal 2018). The code relies on simple analytic models to describe the underlying physical processes. It is up to the user to define which data point should be fitted to which analytic models. Each photometric data point can be associated to any number of models and combinations of different models. It therefore only requires the user to make educated guesses about the underlying physical processes responsible for the observed flux and does not require the user to select only one possible choice. The code fits simultaneously all pre-selected analytic models and uses Bayesian statistics to find the most likely solution given the set of observed fluxes. The Bayesian, Monte Carlo Markov Chain (MCMC) approach provides marginalized posterior probability density functions (PDF) for each of the free parameter.
In this paper, we model the MIR through radio spectral energy distributions with three components: a component representing dust heated by an AGN which we model as a power law with a slope, γ, and an exponential cut-off at ν cut (1); a component representing dust heated by the young stellar population of the host galaxy or companion which we model as a modified blackbody (2); and a component representing synchrotron emission which we modeled as a simple power law with constant slope, α, with no cut-off at high frequencies (3). Table 2 summarizes the allowed range and the number of free parameters in the three different models.  Table 2: List of the free parameters in our models and their allowed ranges. N AGN is the normalization of the AGN power law with slope, γ, and rest-frame cut-off wavelength, λ cut . The λ cut is fixed for all galaxies except 4C 23.56. We choose this particular galaxy to fit this parameter because it is AGN-dominated (see Fig. 1). N BB is the normalization of the modified blackbody of temperature, T. N sync is the normalization of the synchrotron power-law with slope α.

Analytic models
In this paper, simple analytic models are used to fit each component of the SED, instead of, using a (perhaps more realistic) physical models because the limited number of data points available. As is often the case for high-redshift studies, we are limited to broad band photometric data. So even though this sample of galaxies has been observed with Herschel, Spitzer, LABOCA, and now ALMA, we are still limited to ∼10 data points in the infrared. To properly constrain more complex models, more data points are needed. We therefore rely on less complicated, but empirically justifiable models, to fit the data in a statistically robust way and to prevent the temptation to over-interpret the physical processes underpinnings of our results (e.g., the characteristics of the AGN torus which may be responsible for reprocessing the emission from the accretion disk).

AGN model
The IR-luminosity of the source is modeled following the study of Casey (2012). The mid-and far-infrared SEDs are fitted with a combination of a simple power law with a low frequency exponential cut-off and a single temperature modified blackbody. These two simple models represents the dust heated by both the AGN and star-formation in the galaxy respectively. However, in this paper, the two models to describe the AGN and SF component are de-coupled and normalized individually to fit the photometry. The functional form of the dust heated by the AGN is, where ν cut is the rest-frame frequency of the exponential cut-off of the power law. The shape at longer wavelengths of the heated dust by AGN is not well constrained because cold dust emission often completely dominates at longer wavelengths making it very difficult to determine the actual shape of the AGN emission. Therefore, we use a simple exponential with a fixed rest-frame cut-off wavelength/frequency at λ cut =33 µm or ν cut =9.085 THz. This value was determined by letting λ cut be a free parameter for one galaxy and finding the best fit value. The mid-infrared emission of 4C 23.56 appears to predominately due to warm dust emission from its AGN and the far-infrared emission is very faint suggesting that it has a very low star formation rate. Within our sample, this makes 4C 23.56 the most obviously suitable choice for determining λ cut for this sample of HzRGs. Fig. 1 shows the best fit with λ cut as a free parameter. The chosen model for the warm dust used in this paper has also been adopted in other studies. For example, Younger et al. (2009) found good agreement when fitting the IR emission of luminous high-redshift galaxies with a modified blackbody paired with a power law component at short wavelengths. .56 when assuming that the AGN is solely responsible for the mid-infrared emission and allowing λ cut to be an additional free parameter. The black solid line shows the best fit with λ cut = 33 +3 −3 µm. The dashed line is the best fit AGN model used in Drouart et al. (2014) which itself is the average AGN used in the DecompIR SED fitting code (Mullaney et al. 2011). See Table A.23 for details about the photometric data.

Star-formation model
The far-infrared emission from dust heated by star-formation is fitted by a simple single temperature modified blackbody (i.e., a "graybody"), where B ν (ν, T) is a blackbody (BB) distribution, ν 0 is the critical frequency where the source becomes optically thin (assumed to be fixed at ν 0 = 1.5 THz; Conley et al. 2011) and β is the emissivity, which we fixed at β = 2.5. The value of the emissivity depends on the characteristics of the dust grain size, composition, distribution and how efficiently the grains re-emit the absorbed energy. It is common to assume a β in the range 1-2 (Hildebrand 1983) but values of up to 2.5 at sub-mm wavelengths have been found in the integrated SEDs of galaxies (Galametz et al. 2012;Cortese et al. 2012). A value of β = 2.5 has been adopted in this paper. This is justified because sources with both LABOCA and ALMA detections in the submm have SEDs that are better fit with β=2.5 compared to models with β ∼ 1.5 − 2. Whether or not this is physical is a difficult question to answer since we do not know the precise composition of the dust grains in these sources. Also, the emission from dust in galaxies is most likely a combination of regions with different temperatures and mixtures of grain size distributions and compositions. This means that when a single modified blackbody is fitted to the total dust emission from a galaxy the estimated value of beta is affected by these different effects which may result in an increase in β.

Synchrotron model
The synchrotron emission is fitted with a single power law, with a constant slope α. Such a simple representation is perhaps not a physical model in that it does not include the possibility of a steepening or cut-off due to the aging of the electron populations. At which frequency this happens is not constrained with the data we have available for most of the sources. Therefore, it is important to realize that with this simple power law, we are fitting the maximum possible contribution from synchrotron to the radio and mm frequencies. For the sources with a good photometric data coverage in the high radio frequencies (i.e., >10 GHz), there is no evidence for a high frequency cut-off or steeping, in agreement with previous work (Klamer et al. 2006;Emonts et al. 2011b).

Fitting procedure
Mr-Moose fits a pre-selected number of models to the rest-frame photometry of a galaxy. To do this, it uses Bayesian parameter estimation to find posterior distributions of the free parameters which are determined based on the prior distribution (uniform) and the likelihood function. Mr-Moose uses the Monte Carlo Markov Chain (MCMC) core provided in the Python package, Emcee (Foreman-Mackey et al. 2013). The best fit model is determined by minimizing the likelihood (χ 2 ); a parameter of the goodness of the fit, calculated by comparing the observed data with the model values for the combination of all models at the same time for each photometric band. The parameter space is explored by "walkers" taking random steps. Each walker makes a chain of random steps with each new step being only dependent on the previous step in the sequence. The new value of a parameter after each step is accepted if the χ 2 is lower than the likelihood of the previous step in the chain, or rejected if it is higher, in which case, the previous value is retained. The process continues resulting in a random walk. The parameter space is thus explored during these "walks" and from the combination of the chain of each individual walker, posterior probability density functions (PDF) are estimated. These PDFs are used to find the best fit values and the uncertainties of each parameter. The likelihood function, χ 2 , is calculated as described in Sawicki (2012). One particular aspect of calculating the likelihood function in this way, is to emphasize that it treatments upper limits in a continuous way. There is no sharp upper cut to the allowed value of the modeled data can be for data points reported as upper limits. The upper limits are included in a continuous way, and the modeled value can also go above the 3σ upper limit of the observed data, but in that case the χ 2 increase rapidly when the model starts to over predict the flux of the upper limit. In the case where all of the observed photometric data are detections, the likelihood function reduces to the classical expression of χ 2 . We refer the reader to the Appendix: The maximum likelihood formalism for SED fitting with upper limits in (Sawicki 2012) for details about the derivation of the maximum-likelihood. We also refer the reader to the paper (Drouart & Falkendal 2018) for a more detailed description of Mr-Moose.

Setting up the SED for fitting
For each source, input files need to be specified individually. This is necessary because the code can fit a large number of possible models to each data set. Depending on the complexity of the source, if there are several individual resolved components, the user can launch the code with any number of models adapted to the specific nature of each source. If all the photometric data is unresolved then the multi-component part of the code is not applicable and only a single combination of synchrotron emission, modified blackbody and power-law component is needed.
It is up to the user to assign which data point belongs to which model or set of models, and thus, requires some knowledge of the source properties and what underling processes contribute to the observed flux. The code is not made to be applied blindly to a large sample of galaxies. Even though the process of assigning analytic models to each photometric data point may sound subjective, we actually let the code decide between multiple models as illustrated in Table 3. In case of doubt, we provide the code with many flexible options. For the sources with spatially-separated detections in the same band, it is easy to connect them to corresponding detections at other frequencies. To show how the set up is done, we take MRC 0114-211 as an example. This source has unresolved Spitzer IRS through to LABOCA data, resolved ALMA band 6 data with two spatially resolved continuum components, unresolved ALMA band 3, ATCA and VLA L data and resolved VLA C-and X-band data with two resolved components. The two radio components coincide with the two detections in ALMA. The FIR data are unresolved and it is unclear if this is the combined flux from the two individual sources detected in ALMA or just the flux from the host galaxy with no contribution from the companion. For the two detections in ALMA it is unclear if these are two thermal dust emitting objects or the high frequency end of the synchrotron emission. The unresolved radio data is the total flux from both synchrotron components. In this situation, since we cannot decide what is the nature of the two ALMA components and therefore allow contributions to their flux from both synchrotron emission and the modified blackbody model. We refer the reader to the full list of components and how they are assigned to various models which is shown in Table 3. The code then determines what contributes to each component, not the astrophysicist.
The best fit is determined by minimizing χ 2 , which is calculated by fitting models to each spatial and photometric data point. The MCMC attempts to find the most likely solution considering all the data at the same time. For example, as we already outlined, in the case of MRC 0114-211, two black bodies and two synchrotron models were assigned to the spatially resolved ALMA band 6 data points and what came out of the fitting procedure is that the eastern component is consistent with being dominated by synchrotron emission and the western is dominated by thermal dust emission from the host galaxy (Sect. A.2 and Table 3).

Results of the SED fitting with Mr-Moose
Combining Spitzer, Herschel, SCUBA/LABOCA, ALMA, ATCA and VLA data, we fit the FIR-radio SED with Mr-Moose Article number, page 6 of 47  to derive the IR luminosities of both the SF and AGN components in our sources. Importantly, we are able to disentangle the contribution of synchrotron at ∼1 mm, which can otherwise masquerade as thermal dust emission. The contribution of synchrotron to the long wavelength thermal dust emission, if not well-constrained, can lead to a general over-estimate of L IR SF and thus the SFR (see also Archibald et al. 2001). The focus of this paper is to disentangle the emission of cold (assumed to be heated by young stars) and warm (assumed to be heated by the AGN) dust emission from individual components and to separate the emission from nearby objects and radio hot spots/lobes by identifying independent emission components. We did not include the Spitzer IRAC bands since these can be dominated by stellar photospheric emission and emission from PAH bands. No models for photospheric emission from stars are included in the version of the SED fitting code, Mr-Moose, we used in this paper.

SF and AGN IR luminosities
From our well-sampled SEDs we estimate the total IR luminosity of the AGN and SF component (L IR AGN and L IR SF ). We estimate the total IR luminosity as the integrated flux density over rest-frame 8-1000 µm continuum emission. The flux densities are derived from the best fit of the analytic models, (1) and (2) for the AGN and SF component respectively. To determine the total integrated luminosity, the analytic models need to be well constrained. For several sources, this is not the case for the SF component. Either because there are only upper limits in the FIR and ALMA bands (e.g., Fig. A.21), the measured ALMA flux is not consistent with originating from pure dust emission (e.g., Fig. A.9) or there is only one detection in the FIR (e.g., Fig.  A.27) which is not enough to constrain a model with two free parameters. In these cases, only an upper limit of L IR SF can be estimated. This is done by scaling a modified BB (with fixed β = 2.5 and T=50 K) to the ALMA data point.
The estimated upper limit of the L IR SF is dependent on the exact values of β and T that are assumed. A flatter slope, β = 1.5−2, will lower the inferred IR luminosity. Assuming a higher (or lower) temperature then 50 K for a fixed β will increase (or decrease) the integrated IR luminosity. We illustrate these dependencies in Fig. 4. In our sample, eight sources are not detected in our ALMA data: TN J0205+2242, MRC 0324-228, MRC 0350-279, TN J2007-1316, MRC 2025-218, MRC 2048-272, MRC 2104. Four sources have ALMA fluxes which are consistent with being dominated by synchrotron emission: MRC 0037-258, MRC 0152-209, MRC 0406-244 and MRC 1017-220, The SFR for these sources are determined by scaling a modified BB to the ALMA detection and are thus very conservative upper limits (i.e., they could be much lower).
In the case of only upper-limits in the MIR, it is not possible to constrain the AGN contribution to the SED. This is the case for four sources: TN J0121+1320, TN J0205+2242, TN J0924-2201 and MRC 2048-272. For these four galaxies, L IR AGN is given as a upper-limit and have been estimated by scaling the analytic AGN model, (1), to the Spitzer IRS 16 µm upper-limit with a fixed slope of γ=2. The best fit values and upper-limits of L IR AGN and L IR SF are given in Table 5.

Calculating uncertainties of the integrated IR luminosity
The estimated total integrated luminosity results from our fitting the SEDs. Each combination of the parameters affects the total infrared luminosity in a unique way. There are degeneracies between parameters, meaning that several combinations of parameters can give the same integrated luminosity. Therefore the standard uncertainty estimates, such as quadratically summing the uncertainties for the luminosities of each individual component, is not an accurate reflection of the true uncertainty. In order to estimate accurately the total luminosity and its associated uncertainty, we performed an after-the-fit post-processing calculation. Because each Monte Carlo chain contains all the required information about each fitted parameter, we build the marginalized distribution for the integrated luminosity for each step and each walker after convergence by integrating the model in the defined wavelength limits (8-1000 µm in the rest-frame in our case). From this distribution we are therefore able to derive the percentile values that are listed in Table 5.

High frequency synchrotron
The synchrotron emission is modeled by assuming a power-law with a constant slope without any steepening or cut-off at high frequencies. This means that we are estimating the maximal possible contribution from individual synchrotron components out to frequencies where the low frequency tail of the cold dust emission and synchrotron possibly overlap. Through the use of already published VLA L, C and X-band data (Carilli et al. 1997;Kapahi et al. 1998;Condon et al. 1998;Pentericci et al. 2000;De Breuck et al. 2010;Broderick et al. 2007) as well as ATCA 7 mm and ALMA band 3 for a subset of our sample, we were able to determine whether or not the detections in ALMA are likely to be the high frequency extrapolation of the synchrotron emission or low frequency thermal emission from dust. There are in total 13 individually resolved detections in ten ALMA maps which have been found though the SED fitting procedure to most likely be dominated by synchrotron emission. For four sources, MRC 0037-258, MRC 0156-252 (both components), MRC 0406-244, and MRC 1017-220, the total ALMA flux is consistent with being dominated by synchrotron emission. For these sources their L IR SF and SFR are only given as Integrated fluxes are determined over regions where the signal has a significance >1.5σ. Upper limits are 3σ above the noise at the IRAC position of each source over an area of one ALMA beam. The noise for each source is estimated by measuring the root-mean-square of the pixels in the non-primary beam corrected images. The peak flux is the deconvolved value from a single Gaussian fit. The signal-to-noise estimates are from the image and do not include the uncertainties in the flux calibration. The characteristics of various components of MRC 0943-242 are indicated as "Y" for Yggdrasil, "T" for Thor, "O" for Odin, and "F" for Freja (see Gullberg et al. 2016, for details). Other notations are "N" for the northern component, "H" indicating the host galaxy, "S" for the southern component, "E" for the eastern component, and "W" for the western component. * Sources with multiple components for which the individual components are not robustly separated in the ALMA continuum images. The flux of these sources is deblended by fitting two Gaussian profiles using CASA 4.5.0. The flux is determined by integrating the fit, peak values are deconvolved with the beam, the position is the flux center of the fit, and noise level is estimated from the RMS of the uncorrected primary beam image. For MRC 0152-209, the flux in the ALMA continuum data is divided between a northern and southern component (see Emonts et al. 2015b, for details  0.55 L IR AGN is the integrated AGN luminosity. L IR SF is the integrated SF luminosity over the wavelengths 8-1000 µm in the rest-frame assuming β = 2.5 ( § 4.1). Upper-limits in L IR SF where estimated assuming β = 2.5 and T=50 K (Sec. 4.1). The star-formation rates, SFR, are calculated from the L IR SF via the conversion given in Kennicutt (1998) but we scaled these values from the original Salpeter IMF assumed in Kennicutt (1998) to a Kroupa IMF by dividing by a factor of 1.5. LAS is the largest angular size of the radio source as given in the references indicated by the superscript: a Kapahi et al. The difference between the total luminosity of heated dust by star-formation, L IR SF (this paper) and the total luminosity of dust heated by starbursts L IR SB (Drouart et al. 2014). Two of sources contain additional sources in the ALMA images and for these two sources, we indicate the host galaxy and companions as "H" and "C", respectively.

IR luminosity model comparison
The IR luminosities were calculated by integrating the flux density of the two analytic models used in this analysis (see Eqs. (1) and (2)). This procedure differs from the previous SED fitting work on the parent sample where starburst templates and an average AGN model were used to fit the SED (Drouart et al. 2014). To investigate how these different approaches may influence our results, we now make a direct comparison of our results with those of Drouart et al..
The study of Drouart et al. used the AGN model implemented in DecompIR (see Fig. 1; Mullaney et al. 2011) The simple empirically motivated model implemented in this paper does not deviate much form the average AGN model of Mullaney et al. (2011). A direct comparison of the total integrated AGN luminosities (Fig. 2), L IR AGN suggests a modest, ∼20%, offset in the median of L IR AGN in between the two approaches.
We compare the star-forming models in this paper and those from Drouart et al. The differences are predominately due to having data in the rest-frame submm which is more sensitive, has higher spatial resolution, and covers a wider wavelength range. As can be seen from the SED shapes of the starburst/star-forming components, they differ both in the presence of PAH features in the Drouart et al. models, and sometimes in the overall shape depending on which starburst template is used (Fig. 3). The PAH features do not contribute significantly to the total integrated L IR SF . However, in general when it comes to SED fitting, the choice of template can change the integrated L IR by up to a factor of four, as mentioned in Sect 4.4 of Drouart et al. (2014). The factor of four is estimated in the case one cannot discriminate between the most extreme starburst templates. Even in the less extreme case, there is still a factor of ∼2 between the sets of templates that are consistent with the same data points. The SED parameters that influence this difference include the assumed dust tempera- We identify four general categories where the SED fits lead to significant differences in L IR SF between the two studies: (1) only upper limits in the FIR; (2) one detected ALMA component and upper limits in the rest of the FIR bands; (3) two spatially resolved detections in the ALMA bands; and (4) when the FIR consists mainly of detections. It is clear that there are cases where the differences are not significant (e.g., MRC 0156-252). However, in some cases, isolating a sub-component in high resolution, 0 . 3, sensitive ALMA imaging leads to a significantly lower L IR SF which simply is not possible using only the lowresolution Herschel data (e.g., MRC 0251-273). There are also  Table 6: The average ratio of IR luminosities of this paper (denoted as L IR SF ) and those from (Drouart et al. 2014, denoted as L IR SB ). The ratio is given for three cases: the L IR estimated for sources with detections in both ALMA and Herschel and thus the infrared luminosities are constrained in both studies (Constrained L IR ); when the infrared luminosity is given as an upper limit in this paper but is given as a detection in Drouart et  cases where significantly deeper ALMA data ( > ∼ 10× deeper than any previous submm/mm observations) still does not detect any emission (e.g., MRC 0350-279). Accordingly, our limits on L IR SF are also much more stringent, but formally still consistent with the shallower upper limits of Drouart et al. Furthermore, we also include a more robust extrapolation of the synchrotron component due to now including the ALMA and ATCA ∼90 GHz data. As mentioned above, it is important to note that choice of template can change the L IR SF by ∼2 even with good photometric coverage of the peak of the thermal dust emission. In our sample, six sources have ≥3 FIR detections from Herschel and LABOCA. Out of these six, three have good agreement in the infrared luminosity, having ratios of 0.72-0.89, with Drouart et al.. One is an ALMA source with multiple components, MRC 0943-242, which explains the large difference in L IR SF . Only two sources, MRC 0211-256 and 4C 03.24, have poor agreement due to differences between a modified blackbody and the starburst templates used in Drouart et al. (2014). Considering that Drouart et al. found a potential difference of ∼2 within the templates used in their study, it is to be expected that we are finding a factor of two to three difference compared to their results for a few of our sources, especially when we also include an additional ALMA data point that constrains the Rayleigh-Jeans side of the emission peak.
Quantitatively, we find that our estimated far-infrared luminosities of the component due to star formation are only a fraction of those found by Drouart et al. (Fig 2 and Tab. 6). If we only include detections, we find that our estimates of L IR SF are only ∼50% of those estimated in Drouart et al. If we also include detections and upper limits of L IR SF in either of the two papers, then our estimates are only ∼40% of those in Drouart et al., (Table 3). We discuss the implications of these significantly lower L IR SF in Sects. 5 and 6. If we compare all the sources together and estimate the median IR luminosity including upper limits of the overlapping 25 sources in both studies, we find our IR luminosities are a factor ∼7 lower. 2

Notes on the Stellar Masses
Given the importance of stellar masses in our analysis, we briefly discuss the nature of the mass estimates we are utilizing. All stellar masses used in this paper are based on those estimated in 10 10 10 11 10 12 10 13 10 14 Observed frequency [Hz] 10 -1  Our stellar masses are based on 6-band Spitzer photometry covering 3.6-24 µm, augmented with near-IR imaging. The AGN in our sample may contribute flux to the optical to mid-IR photometry used to determine the stellar masses. AGN emission contributes from both direct and scattered continuum (dominating at λ rest <1 µm), and dust emission from the torus (dominating at λ rest >5 µm). Our sample is composed of Type-2 AGN where the direct AGN contribution is obscured by the dusty torus. One exception, MRC 2025-218, has a SED which is consistent with AGN-dominated continuum emission and thus, although it is detected in the photometry used to estimate masses, we assume its stellar mass is an upper limit. Spitzer photometry used in Seymour et al. (2007)  While such estimates are reasonable, they may slightly overestimate the masses (for a detailed discussion, see Seymour et al. 2007). The remedy this, Drouart et al. (2016) combined the Spitzer data with existing optical and near-IR photometry on a sub-sample to perform a multi-component SED fitting through population synthesis. In cases of overlap, we use the stellar masses derived by Drouart et al. (2016). The paper from which each mass estimate is taken is indicated in Table 5.

Relationship between radio galaxies and their supermassive black holes
We can now compare the relative growth rates of both galaxies and their central supermassive black holes, using estimates of the star-formation rates and the mass accretion rates, respectively. Such an investigation addresses the question of how galaxies and black holes evolved to the black hole mass-bulge mass relation we observe locally (Magorrian et al. 1998;Gebhardt et al. 2000;Ferrarese & Merritt 2000;Häring & Rix 2004). We have already discussed this issue for radio galaxies in Drouart et al. (2014), so we only briefly highlight how our results re-enforce the conclusions of that paper. The differences between our approach and the one of Drouart et al. (2014) are detailed in Section 4.4. Other than these differences, we follow the analysis of Drouart et al. quite closely, that is, we use the same conversion factors between IR luminosities and SFR and AGN accretion rates, and a very similar SED to determine the AGN luminosities (see Fig. 1 and 2 for a direct comparison). . Filled black circles are galaxies detected with ALMA and with constrained SF and AGN luminosity estimates. Purple arrows are sources with upper limit in the ALMA band. The upper limits of L IR SF are approximated by scaling a modified blackbody to the 3-σ upper limit estimated assuming β=2.5 and T=50 K. We assumed β=2.5 for all of our fits and 50 K is approximately the medium temperature of our best fits (Table 5). Black arrows indicate galaxies which are detected in ALMA but where the observed ALMA flux(es) are consistent with an extrapolation of the lower frequency synchrotron emission implying little or no contribution from thermal dust emission. We indicate with bars in the lower right corner of the plot, how the upper limits of the L IR SF would shift if one of the fixed parameters, β or T, are changed with respect to our assumed values of β = 2.5 and T=50 K.

Determining growth rates of star-formation and SMBH
To estimate the star formation rates of galaxies, we use the conversion factor from Kennicutt (1998): where the efficiency factor = 0.1 and the bolometric correction factor κ Bol AGN = 6. We refer to Drouart et al. (2014) for a more detailed discussion. A large fraction of the sample lies above the one-to-one line between the L IR SF and L IR AGN , showing that the FIR emission from the SF component is generally much weaker than the luminosity of the AGN. This is consistent with other samples of powerful AGN (e.g., Netzer et al. 2014Netzer et al. , 2016Stanley et al. 2015) 5.2. Relative growth rates of galaxies and SMBHs What are the relative growth rates of the stellar and black hole mass? To make this comparison, we simply scale the IR luminosities of each component as just described. If the galaxies evolve along the local relation, we would expect the accretion rate,Ṁ acc BH , to be about ∼0.2% of the SFR (we chose 0.2% to be consistent with Drouart et al. and is within the uncertainty of estimates in the literature at the time; see Kormendy & Ho 2013). Of course, there are many assumptions that must be made in order to use these relations and one should be aware that the empirical relation is really between integrated IR luminosities, L IR SF and L IR AGN , with scaling factors. Nevertheless, we find that our sample lies more than an order of magnitude above the local parallel growth relation of 0.2% (Fig. 4). This suggests that the black holes can become overly massive relative to their host galaxies if the accretion time spans the same time scale as the star formation. In fact, given that the host galaxies are already massive, it is likely that this implies that the SMBHs are overly massive at the epoch they are observed.
In Drouart et al. (2014), we argued that for the growth of the host galaxy and SMBH to ultimately be consistent with the local relation, the on-going star formation would have to last about a factor of eight longer than the observed level AGN activity. Shifting L IR SF downwards by about a factor of seven, now implies that the star formation must last over a factor of 50 longer. If the lifetime of the radio loud phase is ∼25 Myrs (Martini & Weinberg 2001;Schmidt et al. 2017), this would suggest that the star formation has to last more than a Gyr. Since we predominately have upper limits for the star formation rates of the majority of the galaxies, this appears unlikely. There is evidence at high redshift that perhaps SMBHs are already overly massive compared to their host galaxies, where overly massive means that they do not have the local value of the black hole mass to spheroidal mass ratio (e.g., Nesvadba et al. 2011;Wang et al. 2013;Willott et al. 2015;Trakhtenbrot et al. 2015;Shao et al. 2017;Vayner et al. 2017, but see Willott et al. 2017). Thus, the time required for the stellar mass to "catch up" to the mass of the SMBH is actually much longer than we have estimated here. Our new results therefore exacerbate the problem already discussed in Drouart et al. that it appears difficult for the mass ratio of the SMBH and the spheroidal component of the radio galaxies to fall on the local relation through star formation. We caution however that the average black hole accretion rates over longer time scales of star formation are not well constrained by the relatively instantaneous estimates provided here and in Drouart et al. (Hickox et al. 2014;Stanley et al. 2015;Volonteri et al. 2015b).

Keeping up with rapid SMBH growth
The host galaxies of HzRGs need to catch up with the growth of the SMBH, as they appear to be already overly massive. In order to end up on the local mass relationship, the stellar component needs to grow through a mechanism that does not fuel substantially the supermassive black hole. In the following sections, we discuss the possibility of growth by mergers as a way to explain how the sample of high-z galaxies in our study can evolve on to the local relationship.

Growth through major mergers
High redshift powerful radio galaxies like the ones studied here are found in environments which are over-dense (e.g., Wylezalek et al. 2013;Hatch et al. 2014;Dannerbauer et al. 2014;Cooke et al. 2015Cooke et al. , 2016Noirot et al. 2016Noirot et al. , 2018. In such environments, mergers are likely an important mode of galaxy growth. However, a few caveats must be kept in mind when considering galaxy mergers as the mechanism allowing galaxies and SMBHs of powerful radio galaxies to evolve onto the local mass relation. The first requirement is that mergers do not bring substantial amounts of gas to grow the SMBH significantly compared to the mass of the accreted stars. Major mergers, which may increase the stellar mass considerably, would have to be gas poor galaxies as major mergers can carry gas efficiently to small scales (kpcscales) through dissipation which may lead to significant black hole growth. Generally, massive galaxies at high redshift, those that would constitute major mergers for radio galaxies, are gasrich (e.g., Bolatto et al. 2015;Noble et al. 2017;Emonts et al. 2018). So unless massive galaxies within the over-dense environments of radio galaxies are particularly gas poor (see e.g., Emonts et al. 2014;Lee et al. 2017;Dannerbauer et al. 2017;Emonts et al. 2018) then major mergers do not appear to be particularly favored for growing the stellar content of radio galaxies. Having said that, the quenching time of moderately massive galaxies in clusters is likely a small fraction of the Hubble time (e.g., Muzzin et al. 2012;Foltz et al. 2018) but with reduced efficiency with increasing redshift (Nantais et al. 2016(Nantais et al. , 2017. The second significant problem with major mergers as the driver of the stellar growth is that the merging galaxy likely also contains a supermassive black hole. In the early universe, the merger partner may have a black hole that is massive relative to the mass of its host (e.g., Willott et al. 2015). After the merger has advanced to the coalescence stage of the merger, which occurs in a few dynamical times of the most massive galaxy, the black holes will merge in less than a Hubble time ( < ∼ 1 Gyr for M ∼10 11 M , which is approximately the stellar masses of our galaxies; Berczik et al. 2006;Merritt et al. 2007) A final, but perhaps less important limitation in such a picture is that the relative velocities of the merging galaxies should be relatively low, of-order the internal dynamical velocity of the stars in the most massive galaxy. Thus, relative low speed encounters are favored for efficient merging. In the over-densities surrounding the high redshift radio galaxies, the dispersion of the most massive galaxies in the potential appears to be high (Kuiper et al. 2011;Noirot et al. 2018).

Growth through minor mergers
Minor mergers may be an effective way to allow the mass of old stellar populations to grow in massive galaxies without fueling significant SMBH growth. There are several pieces of evidence that suggest hypothesizing that minor mergers contributed significantly to the stellar mass growth of massive galaxies. High resolution imaging suggests that there may be low mass galaxies in the surroundings of some radio galaxies (Miley et al. 2006;Seymour et al. 2012). So the potential merging sources are close at hand. Beyond just the necessary association of low mass galaxies, there are a number of lines of evidence that support the notion that massive early-type galaxies grew substantially through minor mergers. Some of these are: (1) the size evolution of massive galaxies in the early Universe to the present may be driven principally through minor mergers (e.g., Daddi et al. 2005;van Dokkum et al. 2008;Delaye et al. 2014;Vulcani et al. 2016;Hill et al. 2017); (2) the change in the mass and luminosity function of galaxies with redshift and as a function of environment (e.g., Ilbert et al. 2013;Sarron et al. 2017); (3) the elemental abundance ratios, abundance gradients, and age gradients in the outer regions of local massive spheroids are consistent with accreting galaxies with a range of masses, perhaps predominately low mass, which had their star formation truncated early in their growth (Huang et al. 2013;Greene et al. 2013;Barbosa et al. 2016); and (4) the mass of massive early-type galaxies grew by about a factor of four over the last ∼10 Gyr (e.g., van Dokkum et al. 2010;Ilbert et al. 2013).
Interestingly, Hill et al. (2017) identified the epoch at which the stellar growth of very massive galaxies, M > ∼ 10 11.5 M , transition from growing substantially through star formation to one where mergers dominate the stellar mass growth. This is at the low redshift end of the objects in our study but is overall consistent with the quenching we observe. Moreover, at the average redshift we are observing our sample, they also find a factor of three increase in the stellar mass, which again is similar to what is needed to close the gap between the mass of the supermassive black holes and the host galaxies (see also van Dokkum et al. 2010;Vulcani et al. 2016). Both the fossil record in nearby massive galaxies and their in situ evolution suggest that minor mergers played a role in their stellar mass growth and physical properties (e.g., Greene et al. 2013;Hilz et al. 2013;Laporte et al. 2013;Hirschmann et al. 2015).

A comparison with BCGs and X-ray-selected AGNs
Radio galaxies lie in over-densities and have been suggested to be progenitors of brightest cluster galaxies (BCGs; Hatch et al. 2014), a more relevant comparison is not with the general properties of massive galaxies but the stellar mass growth of BCGs. Statistical samples of BCGs are limited to redshifts lower than about 1 which is lower than the median redshift of our sample, z∼2.4. Results from these studies suggest that BCG typically grew by about a factor of two over the last 8-10 Gyrs (Aragon-Salamanca et al. 1998;Bellstedt et al. 2016). Overall, these growth rates are consistent with semi-analytic models which indicate that, since about z∼1-1.5, BCGs grew by about a factor of 2-4 (De Lucia & Blaizot 2007; Tonini et al. 2012). However, any theoretical result explaining the growth of BCGs is sensitive to the treatment of dynamical friction and tidal stripping through galaxy-galaxy interactions as galaxies move through the cluster potential (e.g., Shankar et al. 2015). We conclude that if some of the galaxies in our sample are destined to become BCGs, then our overall current understanding of the stellar growth of these massive clusters galaxies is consistent with closing the difference in relative masses of the supermassive blackholes and their host galaxies.
Other samples, such as X-ray selected AGN, show a range of relative growth rates of SMBH and host galaxies. Some studies, like those that select star forming galaxies and then investigate their black hole accretion rates (using amount of X-ray emission observed above that expected that due to galaxy stellar populations) and SFR, find that black holes and galaxies are growing in lock-step (e.g, Delvecchio et al. 2015). Similarly, some studies of X-ray selected AGN with a wide range of AGN bolometric luminosities that black holes and galaxies grow in lock-step (z∼2, e.g., Mullaney et al. 2012). But such results are not found universally. Netzer et al. (2016), again for an X-ray selected sample of AGN, but now at somewhat higher redshifts than previous studies, find that SMBHs are growing more rapidly on average than their hosts. Cisternas et al. (2011) find that there is no evolution in the black hole-to-galaxy mass ratio out to z∼1, except perhaps for high mass black holes where black holes are overly massive relative to their host galaxies. Could the variety of results be simply due to the mass of both the galaxy and the SMBH (Cisternas et al. 2011)?

SFR and Main Sequence Comparison: On the road to quenching
The main sequence of star-forming galaxies -the empirical relation between the star-formation rate and stellar mass with a slope of approximately 1 and a scatter of about a factor of twohas been studied extensively both theoretically and observationally (the MS, e.g., Brinchmann et al. 2004;Noeske et al. 2007;Elbaz et al. 2007;Daddi et al. 2007;Santini et al. 2009Santini et al. , 2017Peng et al. 2010;Whitaker et al. 2012;Stark et al. 2013;Speagle et al. 2014;Lehnert et al. 2015;Davé et al. 2016;Lee et al. 2018;López Fernández et al. 2018;Davidzon et al. 2018). The relation is observed over a wide redshift range, z∼0-7, and its normalization increases with increasing redshift. The MS has often been parameterized as a simple power-law but there has also been evidence that the MS flattens at higher stellar masses (e.g Whitaker et al. 2014;Lee et al. 2015;Tasca et al. 2015;Schreiber et al. 2015;Tomczak et al. 2016;Lee et al. 2018). The slope of the MS and the turnover mass in the case of a flattening MS depend on the sample selection, redshift range and the technique used to determine the stellar mass and SFR. The estimated turnover stellar mass is ∼10 10 -10 10.5 M (e.g., Schreiber et al. 2015). The MS can be used to study and classify galaxies according to their relative SFR, where large deviation from the MS suggests that galaxies are "starbursts" if they lie above the MS, or quenched, if they fall below the MS. The definition of whether a galaxy is a starburst or is quenched varies in literature, but is often taken as either offset by more than three times the scatter or a factor of ten above or below the mean relation (e.g., Rodighiero et al. 2011).
We now compare our sample of HzRGwith the MS and discuss what the implications are now that we include spatially-resolved FIR ALMA data and account for possible contamination from synchrotron emission.

Comparison with the MS and impact of submm spatial resolution
Our sample of radio galaxies has a wide range of relative SFR compared to galaxies which lie along the MS ( Fig. 5; Schreiber et al. 2015;Santini et al. 2017). Unfortunately, the number of possible MS parameters such as slope, zero-point, and whether or not the MS was fitted with a turnover at high stellar masses makes a direct comparison with our results challenging. To make matters worse, there are additional potential differences in the methods and wavelength range used to estimate stellar masses and SFR, the range of stellar masses studied, and, for consistency with the broad redshift range of our sample, the redshift range that any individual study covered. For example, Santini et al. (2017) redshift span of 1.3< z <6 and stellar mass range,∼10 7.5 -10 11 M . Our masses are generally higher than their upper mass limit. Schreiber et al. (2015) span stellar masses from ∼10 9 to 10 11.3 M , which makes their mass range comparable to the radio galaxies in our sample. Unfortunately, their redshift range is rather limited, 0.5≤ z ≤2.5, for making a robust comparison with our sample difficult. Of course, these limitations are purely observational depending on the depth and area covered by the surveys from which these results are derived. Our HzRG sample  2015), with a turnover at higher stellar masses. Our stellar masses have been scaled from a Kroupa to a Salpeter IMF to be consistent with the IMF used to make the MS in this comparison. In the highest redshift bin, 3< z <4, two sources, TN J1338+1942 and TNJ0924-2201, have been added to the right most panel despite having redshifts outside of the range used to construct the MS (z=4.110 and 5.195 respectively). Given the redshift dependence on the normalization of the MS, these galaxies may lie relatively lower in comparison with the mean relation of a MS derived using galaxies over a more appropriate, higher redshift range (see Fig. 6).
consists of the rarest, most massive galaxies selected from all sky radio surveys. HzRGs thus allows us to extend MS studies to the most massive end of the galaxy mass distribution, and as now discussed, our comparison suggests that the HzRGs in our sample fall either on or below the star-forming MS (Fig. 5).
We first compare our results with the MS from Schreiber et al. (2015) (gray regions in Fig. 5), covering 0.5< z <4 and for M * =10 11.33 M , close to the mean stellar mass of 10 11.35 M of our sample. In comparison to Schreiber et al. (2015), who fits a turnover in the MS at high stellar masses, we find two sources lie above, 14 sources lie within the 0.3 dex scatter of the MS, and nine lie below the MS. We note however, that three of the sources that lie along the MS have only upper limits in their estimated SFR. In addition, two sources have uncertainties in their SFR estimate, which give them a significant probability of lying below the MS. All of these sources could well lie below the MS. Thus, in comparison with the results of Schreiber et al. (2015), we find that the galaxies in our sample generally fall below the main sequence.
Since Schreiber et al. fit their data in the SFR-stellar mass plane with a function that has a turnover, comparing our results with a study that does not allow for such a turnover, may result in a change in how we characterize our results. We therefore also compare our results with the MS from Santini et al. (2017), which does not include such a turnover (colored regions in Fig. 5). We again find that our HzRG sample significantly lies below the MS. None of our sources lie above the MS of Santini et al. (2017), six sources lie along the MS, five sources lie at at the lower ±0.3 dex boundary, and six sources lie below. The remaining seven sources have upper limits in SFR, and may well lie below the MS of Santini et al. (2017). At any rate, the comparison with both Schreiber et al. (2015) and Santini et al. (2017) suggest that many of our sample galaxies lie below the MS and are consistent with being quenched.
Three out of the four galaxies with the highest SFR in our sample and which lie on or above the MS, TN J0121+1320, MRC 0152-209, PKS 0529-549, and TN J1338-1942, have mor-phological features in the ALMA continuum suggesting that perhaps they might be mergers (we indicate these four sources by name in Fig. 5 to highlight the fact that they are high in the SFR-stellar mass plane compared to the other galaxies in our sample). The Dragonfly galaxy, MRC 0152-209, has two clearly interacting mm components. Indeed, for this source there is strong evidence that the dynamics of this system are consistent with a strong interaction of three components with significant mass transfer between them (Emonts et al. 2015b,a). Both TN J0121+1320 and PKS 0529-549 have elongated shapes which are different from the synthesized beam, which in analogy with the Dragonfly galaxy, is consistent with them being mergers. We would need (sub-)mm line kinematics and continuum morphology at higher resolution to know definitively whether or not they are mergers. TN J1338-1942 is unresolved in our data and, within this criterion, is not consistent with being a merger.
We can also compare the sSFR our sample with that of the ensemble of star-forming galaxies as a function of redshift (Fig. 6). Just as with our comparison with main sequences at various redshifts, we find that a large fraction of our sample falls below the main sequence and its evolution. To be conservative, we made this comparison with the MS evolution estimated in Schreiber et al. (2015), which includes a turnover in their fitting of the MS. Even though the galaxies are massive and lie above the mass at which the turnover starts and the MS flattens, our galaxies generally lie below. Of course, if we used main sequences that do not allow for a turnover in the stellar mass-SFR relation, then the difference between our sources and the evolution of the MS at constant mass would be even more dramatic. We note that since the slope of the MS is about one and if there is no flattening in the MS at high stellar masses, then the exact mass used for comparison with the radio galaxies is not particularly important.
Our results are different from previous studies. In previous studies massive high redshift radio galaxies have often been associated with high BH accretion rate combined with high SFR because they are extremely luminous in both the mid-IR (Ogle  (Archibald et al. 2001;Reuland et al. 2003;Stevens et al. 2003). Not surprisingly given the sensitive upper limits or finding that synchrotron emission may dominate the emission at mm wavelengths, our star formation rate estimates are lower than those found by other studies. On the other hand, our results on the AGN-heated dust luminosity are consistent with previous studies, which is not surprising given that we are using the same or similar data. Our finding that HzRGs fall mainly along or below the MS is therefore new. Most of the sources lying below the MS relation are non-detections in the ALMA band, and often not detected in the SPIRE bands either. These galaxies have very low SFR and are an order-of-magnitude weaker then what was previously estimated. These galaxies are not star forming, they are on their way to being quenched.

Comparison with X-ray selected high redshift AGNs
Naturally, given the importance of understanding the relation between the growth of the stellar population of galaxies and the influence of AGN, it is useful to compare our results for the star-formation rates of powerful high redshift radio galaxies with those of other types AGN. Within the perspective of the overall population of AGN, it is not clear exactly what the role of the AGN is in quenching the star formation. We find that many of our sources do not currently have significant star formation. Our upper limits, constraining the median SFR to be ∼100 M yr −1 , are also consistent with little or no star formation, especially for those sources with upper limits in the Herschel bands and (likely) only synchrotron emission in the mm. As we already discussed, the typical lifetime of AGN is of-order a few 10 Myrs. The IR and sub-mm thermal dust emission probes star formation, past and present, over timescales of 100 Myrs or more (i.e., comparable to a few internal dynamical times of the galaxies; Lehnert & Heckman 1996;Boquien et al. 2014Boquien et al. , 2016. These differing timescales implies that unless the AGN are significantly longer lived than we currently understand them to be, the luminous or mechanical output from AGN is unlikely to be the sole mechanism for shutting down star formation in these galaxies. We can perhaps understand this by comparing the results of studies of the AGN influence on the star formation within their hosts. In Fig. 7, we show such a comparison focusing on X-ray selected AGN from various studies (based on a similar figure in Netzer et al. 2016, but see also Rosario et al. 2012;Stanley et al. 2015Stanley et al. , 2017 for earlier or different renditions of the same plot). Our galaxies lie at the upper end of the distribution of AGN luminosities and their total infrared luminosities are dominated by the emission from the AGN they host. Generically, our host galaxies are forming stars at a much lower rate on average than the X-ray selected AGN, at least for those that have well determined star formation rates. Netzer et al. (2016) found that if they stacked the X-ray selected AGN without detections in the infrared results in a much lower mean IR luminosity due to young stars. This stacked luminosity and the implied SFR is comparable to our sample. These final results may suggest that the key parameter in the difference in differential growth rates of actively fueled SMBHs is in fact is the luminosity of the AGN. All of our sources and those of Netzer et al. are among the most luminous at their respective redshifts.
Such an hypothesis would explain a wide range of results. If we focus on studies that sample a wide range of AGN bolometric or IR luminosities and estimated SFRs, the evidence points to both the AGN and galaxies growing in lock step Rovilos et al. 2012;Delvecchio et al. 2015) and there is generally no widespread evidence for quenching (e.g., Harrison et al. 2012;Rosario et al. 2013;Stanley et al. 2017). However, one has to be cautious in these simple relations between galaxies being on the main sequence and the impact of AGN on their star formation rates. The effect of AGN on star formation rates may be subtle and may also influence the control sample of galaxies without active black holes. Scholtz et al. (2018) make the interesting point that the star-formation rate distribution at constant mass of galaxies with or without AGN is similar, and this result actually agrees with simulations. They suggest that the agreement between AGN and non-AGN in lying on the same relation in the SFR-stellar mass plane is because the impact of AGN is evident in both samples, and it is this effect that is driving the slope of decreasing sSFR with increasing mass (see also Mainieri et al. 2011). So the effect of AGN feedback is subtle, broadening distributions and not necessarily correlating with AGN power/luminosity as might be naively expected. In a study similar to ours, Mullaney et al. (2015) used ALMA observations of X-ray selected AGN, finding that the estimated SFRs were lower relative to previous findings. This reduction lead to AGN hosts having a different distribution than the star forming galaxy population, meaning the AGN population of star forming galaxies had a broader, and perhaps even offset distribution of star formation rates (Rovilos et al. 2012;Rosario et al. 2015).

A significant synchrotron contribution in submm
Our high resolution ALMA data allowed us to do component separation such that the contribution of synchrotron to the mm emission is now clear, originating from radio lobes and the nuclei. For nine out of 25 sources, the best-fit SEDs of specific spatial components are consistent with a power-law extrapola-   Rosario et al. (2012) to the mean relation for AGNs over the redshift range 1.5< z <2.5 scaled up by a factor of two (see Netzer et al. 2016, for details). We indicate when the luminosity due to star formation equals that of the AGN with a black solid line. tion from radio frequencies. These spatial components are likely dominated by synchrotron emission. The total flux of five galaxies, not just the lobes, are associated with pure synchrotron emission and are consistent with no contribution from thermal dust emission. For two host galaxies, which have two spatiallyresolved emission components in the ALMA band, the majority of the mm-flux is due synchrotron with only a minor contribution from thermal dust emission.
It is surprising to find our high frequency data is consistent with an extrapolation extrapolation of the synchrotron intensities in the radio (Jaffe & Perola 1973;Carilli et al. 1991;Blundell et al. 2006). The energy loss though aging of the electrons and inverse Compton scattering should steepen the observed spectrum at higher frequencies. But what we see in a few sources is that the synchrotron spectrum continues straight out to high frequencies (e.g., Gopal-Krishna et al. 2001). One possible explanation is that the electrons are continuously accelerated within the lobes, perhaps through strong oblique shocks (e.g., Summerlin & Baring 2012). The details and analdysis of the radio spectra including ALMA synchrotron detections will be presented in a subsequent study.

Star formation and radio source sizes
The spatial separation of the emission from the radio lobes -the projected distance between the two lobes or between the core and lobes -can provide useful constraints on the nature of the AGN. The separation, if the sources expand at constant velocity, can be used as a proxy for the age of the radio source (e.g., Carilli et al. 1991). Larger sources correspond to older sources. A possible test of a scenario where the AGN is the agent driving the quenching of star formation is that galaxies with the lowest star formation rates have the largest radio sources. In fact, we do see a possible trend between the largest angular size of a radio source and the predominance of upper limits to the star formation rates (Fig. 8). This trend is robust in the sense that high significance estimates of the SFR all appear below largest angular sizes of ∼3 arcsec. At the typical redshifts of our sources, this corresponds to approximately 10-15 kpc in radius, or about the expected extent of the interstellar medium of a massive galaxy. We do not have a sufficient number of sources or sensitive enough upper-limits to say that there is a correlation. Such a correlation would be interesting, as it would support a simple model of propagating jets that may enhance star formation when they are still confined within the host galaxy, but then quickly preventing further star formation after they break out.
Timescale arguments may also complicate the comparison between radio size and SFR. The duration of AGN activity is thought be around 10 7 to 10 8 yrs (e.g., Martini & Weinberg 2001;Hopkins et al. 2005;Schmidt et al. 2017). The star formation likely lasts much longer. Other than observing a proclivity for the upper-limits in the SFR to be associated with radio sources larger than the host galaxy proper, it may not be that surprising that we do not see a very clear correlation. There are several factors that may influence the rate at which radio jets expand into the surrounding media. Perhaps the most important is the environment into which the jets are expanding. For jets which propagate outwards into denser environments, the radio source may be confined and can either simply expand more slowly, or can decelerate (Shabala et al. 2017). There may not be a simple linear relationship for individual sources and they may not all take the same amount of time to reach the same size. There are also of course projection and beaming effects to consider. Since we observe a wide range of SFRs, radio morphologies, and lobe asymmetries in this sample (De Breuck et al. 2010), it is likely that the characteristics of the surrounding environments of the individual radio sources play a significant role in determining the scatter in this relation. Moreover, given that the star formation is expected to occur on time scales longer than the duration of the AGN activity perhaps not seeing a clear cut correlation is not unexpected, regardless of the processes that can influence radio source sizes and morphologies. Other effects may play a role since the galaxies in our sample with high SFRs, have morphologies consistent with being mergers (Sect. 6).

Possible interpretation of our results
Our interpretation of our results is that: (i) the quenching we observe for the majority of the sample is naturally explained by our galaxies predominately being at the high mass end of the galaxy population (Ilbert et al. 2013) and having very massive SMBHs (Nesvadba et al. 2011). We know that such galaxies in the local universe must be quenched rapidly and soon after their most substantial period of growth (e.g., Thomas et al. 2005Thomas et al. , 2010. They also likely grew subsequently, after they are quenched, through the accretion of relatively low mass, likely gas-poor galaxies (e.g., Greene et al. 2013). Thus, we can say that their quenching is mainly a result of the fact that they are already massive only a few Gyrs after the Big Bang; and, (ii) given the fact that some of galaxies in the sample of high redshift radio galaxies have in- tense star formation and young stars (Dey et al. 1997, Man et al. 2018, it is likely that star formation played a key role in quenching the host galaxies of distant radio galaxies. This is a natural conclusion, given that the AGN are likely only luminous for 10s of Myrs (Schmidt et al. 2017), which is generally shorter than we would expect bursts of star formation to last. Other samples of AGN with lower host masses have substantial outflows (e.g., Harrison et al. 2016), and are not generally quenched. It is difficult to isolate the impact of mass and environment on the rate and timing of quenching. Mass quenching is more important at earlier times in the evolution of galaxies and may be more important in denser regions (e.g., Peng et al. 2010;Muzzin et al. 2012;Lee et al. 2015;Darvish et al. 2016;Kawinwanichakij et al. 2017;Darvish et al. 2018). And in the case of powerful radio galaxies, which lie in over-dense environments, both gas-rich and gas-poor mergers likely play an important role in both the growth of the stellar mass and the black holes. Volonteri et al. (2015a) suggest that in the merger phase, the AGN dominates the bolometric luminosity but the accretion can be very stochastic (see also Gabor & Bournaud 2013;DeGraf et al. 2017). It appears that the galaxies in our sample with the highest star-formation rates all host very powerful AGN, and are potentially all advanced mergers, consistent with this picture. In fact, PKS 0529-549, which has one of the highest SFRs of all the galaxies in our sample, has a modest gas fraction of about 15%, a high star-formation efficiency (SFR/molecular gas mass), and has been transforming its gas into stars rapidly (Man et al., in prep.). The star formation efficiencies in the other radio galaxies with high SFRs also appear extreme (10-100 Gyr −1 ; Man et al.). But of course, that does not explain our results in themselves. Dubois et al. (2015), in a study using numerical simulations of the relative growth of SMBHs and their host galaxies, found that star formation may regulate the black hole accretion rate. During the most rapid, gas-rich phase of the growth of massive galaxies, it may be that a larger fraction of the gas in the ISM is not available to fuel the SMBHs, but is consumed via star formation (see DeGraf et al. 2017). As the gas fractions decline, the relative power of the AGN compared to that of the star formation in-creases, resulting in an increased star formation efficiency. Concomitantly, the increased star formation rate can then disperse the dense gas making it easier for the jets to drive vigorous and efficient outflows (Nesvadba et al. 2006(Nesvadba et al. , 2017. So we suggest that mass is the primary difference in the characteristics of our sample of galaxies compared to other samples of AGN (see also Mainieri et al. 2011;Stanley et al. 2017). This difference makes it then much easier for the AGN to dominate the bolometric output as the SMBHs in massive galaxies may be "overly" massive for their host masses. Even relatively low accretion rates would lead to powerful AGN emission. The mechanical and radiative output from the young massive stars, heats the interstellar medium, changes its phase distribution, and puffs it up. In such a state, the coupling between the mechanical energy of the jet and heated and expanded interstellar medium of the galaxy, would be high (Biernacki & Teyssier 2018). This leads to a positive feedback loop at the end of the most rapid growth phase of massive galaxies where the residual gas of star formation is rapidly removed through the action of the AGN, leading to the rapid session of star-formation. The galaxy is on the road to being quenched.

Conclusions
With 0 . 3 resolution mm ALMA data, and utilizing the new SED fitting tool, Mr-Moose, we have estimated L IR SF and L IR AGN . We have disentangled the IR luminosity into components heated by the stellar populations and that heated by the luminous AGN that reside in these host galaxies. With our deep ALMA observations, we reach depths at which we expect the thermal dust and synchrotron contributions to the mm emission to be of the same order-of-magnitude. This high sensitivity is the reason why it is essential to disentangle both spectrally and spatially the contribution from the high frequency synchrotron emission at (sub-)mm wavelengths. From a study of 25 powerful high redshift radio galaxies, our main conclusions are: • We find that the SFRs are lower than previously estimated, having a median which is ∼7 times lower (cf. Drouart et al. 2014). This is a result of having deep ALMA data enabling us to estimate robust upper limits, to disentangle several emission components, and determine if the mm emission represents the high frequency end of the power-law extrapolation of the radio emission. We interpret any flux density estimate that is consistent with the power-law extension from the radio to the (sub-)mm as synchrotron emission, and not as dust heated by young stars. • Unlike many studies of high redshift AGN, we find that a large fraction of our sources do not lie within the scatter of the relation between the stellar mass and the star-formation rate of star forming galaxies (the "main sequence"). Since the deviation of the radio galaxies with upper limits does not meet the general criterion for quenched galaxies, we interpret such galaxies as being "on their way to being quenched". • Finding lower star formation rates generally exacerbates the problem of differential growth between the black hole and the host galaxy discussed by Drouart et al. (2014). This favors a scenario where the host galaxies ultimately grow by dissipationless merging which adds additional stellar mass without increasing significantly the mass of the black hole. We find that at the typical specific star-formation rates we estimate for the host galaxies, the host galaxy needs ∼50 times longer than the time over which the SMBH is active to "catch up" to having the local ratio of black hole mass to stellar mass.
• There is no clear relation between the radio size and SFR, although we note that the upper limits on the star formation rate tend to be in sources where the LAS is larger than ∼3 arc seconds. However, given that the star formation is expected to occur on time scales of the same order or longer than the duration of the AGN activity (10 7 to 10 8 yrs) perhaps not finding a clear cut correlation is not completely surprising even if the radio jets are driving the quenching we observe.
From these results, we suggest that mass is the primary difference in the characteristics of our sample of galaxies compared to other samples of AGN. The host galaxies of our sample of radio galaxies are generally more massive and likely harbor high mass black holes than AGN selected using other methods. Hosting more massive black holes means that it is also easier for the AGN to dominate the bolometric output of the galaxy, even if it forming stars vigorously. In order to comprehend how the host galaxies are quenching so rapidly, we hypothesize a positive feedback loop between the AGN and star formation. The star formation leads to rapid gas depletion and its intense energy input heats the remaining gas allowing the interaction between the gas and jets to be efficient. The AGN then blows away the residual gas at the end of the star formation episode. It is in this way perhaps that the host galaxies of powerful high redshift radio galaxies are "on the road to being quenched". The colored data points indicate the data which have sub-arcsec resolution and black points indicate data of low spatial resolution. Green pentagons represent the fluxes from the radio core and the blue diamond indicates ALMA band 6 detection. Filled black circles indicate detections (>3σ) and downward pointing triangles indicate the 3σ upper limits (Table A.1).

Appendix A: SED fits
Appendix A.1: MRC 0037-258 MRC 0037-258 is detected with a single continuum component which coincides with the position of the radio core (Fig. A.2). SED fitting with Mr-Moose is done with three components, a synchrotron power law for the radio core (the lobes are excluded), a modified BB and an AGN component. The VLA data are assigned to only the synchrotron component while the ALMA flux density is assigned to both the synchrotron and modified BB. The PACS, SPIRE, MIPS and IRS data are fitted to the modified BB and AGN components. In the best fit model, the ALMA flux is dominated by synchrotron emission (Fig. A.1). This is because the 3σ upper limits in the FIR are not given a high weight and there is a solution where the synchrotron power law can account for all of the observed flux. The plot of the marginalized distribution of each free parameter of the fit have been omitted from the paper, but are available online 4 for each 25 sources.  MRC 0114-211 has two detected continuum components in ALMA band 6. Both detections coincide with the two lowerfrequency radio components (Fig. A.4). The SED fitting is done with five components, two synchrotron components (eastern and western radio components), two modified BB (one for each ALMA detection) and one AGN component. The western radio component (detected in VLA bands C and X) is assigned to the western ALMA detection and the same setup is also applied to the eastern radio and ALMA components. The VLA band X, ATCA 7 mm, and ALMA band 3 detections do not resolve the individual components and are only considered for fitting the total radio flux (the combination of the western and eastern synchrotron power-law components). The two ALMA band 6 points are both fitted with a combination of a synchrotron power-law and a modified BB. The PACS. SPIRE, MIPS and IRS data points are fitted to the combination of the two modified black bodies and a AGN component. The best fit model is where the brighter western component in ALMA band 6 is pure synchrotron and the eastern component is dominated by thermal emission (Fig. A.3).  2.21 ± 0.31 this paper ALMA 6 w 9.82 ± 0.9 this paper ALMA 3 32.6 ± 3.2 this paper ATCA (7mm) 82.5 ± 24.7 this paper VLA X w 599.9 ± 59 a A VLA X e 9.99 ± 0.99 a A VLA C w 1084.5 ± 108 a A VLA C e 63.7 ± 6.37 a A VLA L 4091.6 ± 409 C Notes ( (Table A.3).

Appendix A.3: TN J0121+1320
TN J0121+1320 has one continuum detection in the ALMA band 3 observations which coincides marginally with the compact radio component (Fig. A.6). SED fitting is done with three components, a single synchrotron power law, a modified BB, and an AGN component. The single component in radio VLA C and X bands is assigned to a synchrotron power-law, the ALMA detection is assigned to both a synchrotron and a modified BB model. The PACS, SPIRE, MIPS and IRS data points are fitted with a combination of the modified BB component and an AGN component. The best fit solution implies that the ALMA emission is due to thermal dust emssion but the AGN component is unconstrained as there are only upper limits in the mid-infrared.  MRC 0152-209 has two detected continuum components in the ALMA observations and both of which coincide with lower frequency radio detections (Fig. A.8). SED fitting is done with four components, one total synchrotron, two modified BB and one AGN component. The VLA data is fitted to one synchrotron power-law since it is not possible to resolve individual components in any of the observed bands. The two ALMA band 6 detections are fitted to the same total synchrotron power-law and two individual modified BB for each ALMA detection. The LABOCA, PACS, SPIRE, MIPS and IRS data points are all fitted with a combination of the two individual modified BB and an AGN component. The best fit model is one where both of the ALMA detections are dominated by dust emission. The radio slope is too steep to be able to account for the observed ALMA continuum. It is unclear wherever the northern or southern dust component corresponds to the host galaxy, but since the southern component (indicated by the blue BB line in Fig. A.7) is brighter, this component is more likely to be the host galaxy and the northern emission, is an interacting companion. Though this is not enough to determine which of the component host the radio-loud AGN. In Emonts et al. (2015b) they argue that the NW component is the host, which might be more correct. If this is the case, then the SFR would be one order of magnitude lower and the galaxy would be on the MS, instead of lying above the MS relation. Though the morphology of the source strongly suggest it is a merger, a SFR∼1000 M yr −1 is reasonable for a merging system.    (Table A.5).

Appendix A.5: MRC 0156-252
MRC 0156-252 has two detected continuum detections, where one detection coincides with the radio core and the other with the northern radio lobe (Fig. A.9). SED fitting is done with five components, two synchrotron power-laws (one for the core, one for the northern lobe, the southern lobe is excluded from the fit), two modified BB for the two ALMA detections, and one AGN component. The VLA L and ATCA 7 mm bands are fitted to the total synchrotron emission since these data do not resolve the individual radio components, while the VLA C and X band fluxes are fitted with two individual synchrotron power-laws. The two ALMA band 6 flux points are fitted to a combination of the individual synchrotron power-law and modified black bodies. The LABOCA, PACS, SPIRE, MIPS and IRS data points are fitted by a combination of a modified BB and an AGN component. The best fit model is one where both the ALMA detections are dominated by synchrotron emission. This is because there are only upper limits in the FIR and they do not require the modified BB to be very bright (Fig. A.9). The blue and pink markers show the two ALMA detections and correspond to the same markers used for the SED in Fig A.9. The core and northern radio lobes are color coded in the same colors as the flux markers in Fig. A.7, the flux of the southern lobe (red radio contours) was not used in the SED fit. Panel B: PACS 70 µm continuum map with radio contours overlaid.   (Table A.6).

Appendix A.6: TN J0205+2242
TN J0205+2242 has no continuum detection (Fig. A.12). SED fitting with Mr-Moose is done with five models, two synchrotron power-laws (northern and southern radio component), two modified BB and one AGN component. The northern radio component (detected in VLA bands C and X) is assigned to the northern ALMA upper limit and the same setup is also applied to the southern radio and ALMA upper limit. The VLA band X detection does not resolve the individual components and are only considered for fitting the total radio flux (the combination of the northern and southern synchrotron power-law components). The two ALMA band 3 upper limits are both fitted with a combination of a synchrotron power-law and a modified BB. The SCUBA, PACS, SPIRE and IRS data points are fitted to the combination of the two modified black bodies and a AGN component. The best fit only constrain the two synchrotron models, both the SF and AGN dust components are unconstrained, due to the fact that all FIR and MIR data points are upper limits (Fig. A.11).    (Table A.7).
Appendix A.7: MRC 0211-256 MRC 0211-256 has one single continuum detection, which does not coincide with the radio emission (Fig. A.14). SED fitting with MrMosse is done with three components, one synchrotron power-law, one modified BB and one AGN component. VLA L, X, and C bands are only considered for fitting the total radio flux since only the total integrated flux for band L and X, C are reported in Condon et al. (1998) and Kapahi et al. (1998). The ALMA detection is assigned to both the synchrotron power-law and a modified BB component. The LABOCA, SPIRE, PACS, MIPS and IRS data points are fitted to the combination of the modified BB and a AGN component. The best fit model gives a solution where the ALMA band 6 flux is dominated by thermal dust emission (Fig. A.13).    (Table A.8). The open diamond shows available ATCA data but only plotted as a reference and was not used in the SED fit.
Appendix A.8: TXS 0211-122 TXS 0211-122 have one single continuum detection, which coincide with the radio core (Fig. A.16). SED fitting with Mr-Moose is done with three components, one synchrotron power-law (for the radio core, the two lobes are excluded in the fit), one modified BB and one AGN component. The VLA data is fitted to the synchrotron component, the ALMA data point is assigned to both the synchrotron power-law and a modified BB. The LABOCA, PACS, SPIRE, MIPS and IRS are fitted to the combination of the modified BB and a AGN component. The best fit model gives a solution where the ALMA band 6 flux is dominated by emission from heated dust (Fig. A.15). The reason why the PACS 160 µm data point is not well fitted is due to the fact that the AGN powerlaw have a fixed exponential cut off at 33 µm rest frame and the ALMA point puts hard constraints on the normalization parameter of the modified BB.  Fig. A.2, σ = 53 µJy). The blue diamond indicates the ALMA detection and is the same marker used in Fig. A.15. The green contours show the portion of the VLA data that is used in the SED fit, the red contours are excluded in the fit. Panel B: MIPS 24 µm continuum map.  MRC 0251-273 has one continuum detection which coincides with one of the two radio components (Fig. A.18). SED fitting with Mr-Moose is done with four components, two synchrotron (northern and southern radio component), one modified BB and one AGN component. The northern radio component (detected in VLA bands C and X) is assigned to the ALMA detection and fitted with with an individual power-law. The southern radio component is only assigned to the VLA bands C and X. The ALMA detection is associated to the northern synchrotron power-law and a modified BB. The SCUBA, SPIRE, PACS, MIPS and IRS data are fitted to the combination of the modified BB and a AGN component. The best fit model gives that the ALMA detection is dominated by dust emission (Fig. A.17). The slope of the northern synchrotron component is very steep and this can be due to not properly extracted photometry because of blending between the two components.   The northern radio component (detected in VLA bands C and X) is assigned to an individual synchrotron power-law and the same setup is also applied for the southern radio component. The VLA band L and ATCA 7 mm data do not resolve the individual components and are only considered for fitting the total radio flux (the combination of the northern and southern synchrotron power-law components). The ALMA band 6 detection is also assigned to the total radio flux and to a modified BB. The LABOCA, SIPRE, PACS, MIPS and IRS data are fitted to the combination of the modified BB and a AGN component. The best fit model does not constrain the modified BB due to only upper limits in the FIR. Without an intermediate data point between ALMA and ATCA 7 mm it is not possible to determine where the steepening occurs, but the upper limit in ALMA is consistent with a continued synchrotron slope.   A.22). SED fitting with Mr-Moose is done with four components two synchrotron power-law (northern and southern radio component), one modified BB and one AGN component. The northern radio component (detected in VLA bands C and X) is assigned to an individual synchrotron power-law and the same setup is also applied for the southern radio component. The VLA band L and ATCA 7 mm data do not resolve the individual components and are only considered for fitting the total radio flux (the combination of the northern and southern synchrotron power-law components) The ALMA band 6 upper limit is assigned to the total radio flux and a modified BB. The LABOCA, SIPRE, PACS, MIPS and IRS data are fitted to the combination of the modified BB and a AGN component. The best fit model gives a solution where the ALMA detection is dominated by synchrotron, but the combined models is slightly over predict the 3σ upper limit at 235 GHz (Fig. A.21).  MRC 0406-244 has one single continuum detection which coincides with the radio core (Fig. A.24). SED fitting with Mr-Moose is done with three components, one synchrotron (of the radio core, the two lobes are excluded in the fit), one modified BB and one AGN component. The VLA data is fitted to the synchrotron power-law, the ALMA detection is assigned to both the synchrotron component and a modified BB. The LABOCA, SPIRE, PACS, MIPS and IRS is fitted to combination of the modified BB and a AGN component. The best SED fits favors a solution where the ALMA detection is dominated by synchrotron emission, due to the many upper limits in FIR and that there is no intermediate data point which could conclude whether the synchrotron is steepening at higher frequencies. The modified BB is completely unconstrained (Fig. A.23).  PKS 0529-549 has two continuum detections, both coinciding with the two radio components (Fig. A.26). The SED fit is done with five components, two synchrotron power-laws (eastern and western radio components), two modified BB and one AGN component. The western radio component (detected in ATCA 8640-18496 MHz) is assigned to the western ALMA detection and fitted with an individual synchrotron power-law and the same setup is also applied to the eastern radio and ALMA components. The ATCA 7 mm data do not resolve the individual components and are only considered for fitting the total radio flux (the combination of the western and eastern synchrotron power-law components). The eastern ALMA detection is assigned to the first synchrotron component and to a modified BB component, while the western ALMA detection is similarly assigned to the second synchrotron component and another modified BB component. The SPIRE, PACS, MIPS and IRS data are fitted to the combination of the two modified black bodies and an AGN component. The best solution finds that the western ALMA detection is dominated by thermal dust emission and the eastern component is pure synchrotron emission (Fig. A.25).  Appendix A.14: TN J0924-2201 TN J0924-2201 has one single continuum detection which coincides with the eastern radio component (Fig. A.28). SED fitting with Mr-Moose is done with four components, two synchrotron power laws (eastern and western radio component), one modified BB and one AGN component. The western radio component (detected in VLA bands U and K) is assigned to the ALMA detection and fitted with an individual synchrotron power-law. The eastern radio component (detected in VLA bands U and K) is only fitted with a synchrotron power-law Sync, 2). The VLA bands C and L do not resolve the individual components and are only considered for fitting the total radio flux (the combination of the western and eastern synchrotron power-law components). The western ALMA detection is assigned to the western synchrotron component (Sync, 1), and to a modified BB component. The SCUBA, SPIRE, PACS, MIPS and IRS data is fitted to the combination of the modified BB and a AGN component. The SED fit only constrain the synchrotron models and the modified BB and AGN are unconstrained. This is due to the fact that there is only one detection in the FIR and it is not possible to constrain the modified BB with two free parameters to only one data point. The solution for the two radio lobes shows that the ALMA detection is likely completely dominated by thermal dust emission. The plotted modified BB in Fig.  A.27 is scaled to the ALMA detection with slope and temperature fixed to β = 2.5 and T= 50, like in the case where the upper-limit of the L IR SF is calculated for the sources with ALMA non-detections.    MRC 0943-242 has several continuum detections with ALMA band 6, one at the host galaxy and three resolved companions (Fig. A.30). With ALMA band 4 the three companions are also detected, as well as two additional continuum components which coincide with the two radio components seen with VLA. SED fitting with Mr-Moose is done with five models, two synchrotron power-laws (north and south component), two modified BB (one for the host and one for the three companions grouped together) and one AGN component. The VLA X, C and ALMA band 4 resolve the synchrotron components and the two radio lobes are fitted with two individual power-laws. The VLA L, ATCA 3 mm and 7 mm do not resolve any individual components and are assigned to the combination of both synchrotron models. The LABOCA, SPIRE, PACS, MIPS and IRS data are assigned to both the modified BB of the host and companions as well as the AGN component. The best fit model, shows that ALMA detection at the host galaxy is dominated by thermal dust emission, and the three companions are also consistent with emission from heated dust. The two synchrotron components are consistent with a power-law from radio to ALMA band 4 ( Figure A.29). One reason why the PACS 100 µm is not well fitted is because either the temperature of the modified BB must be higher which is outside the parameter space or because the fixed ν cut does not allow the AGN component to account fully for the observed flux.   Carilli et al. (1997) using the listed I 4.7 and α 8. MRC 1017-220 has one single continuum detection with ALMA, which coincides with the unresolved synchrotron component (Fig. A.32). SED fitting with Mr-Moose is done with three components, one synchrotron, one modified BB and one AGN component. The VLA L, C, X and ATCA 7 mm bands are assigned to the synchrotron component and the ALMA detection is assigned to both the synchrotron power-law and a modified BB. The LABOCA SPIRE, PACS, MIPS and IRS points are assigned to the combination of the modified BB and an AGN component. The best fit solution gives a dominating synchrotron contribution to the ALMA detection (Fig. A.31), due to the many upper limit in the FIR. But studying the SED it seems likely that the synchrotron is turning over at high frequencies but without a data point at an intermediate frequency between the radio and ALMA data, it is not possible to make a final conclusion whether the ALMA detection is synchrotron dominated or not.  4C 03.24 two continuum detections in ALMA band 3, one that coincides with the north synchrotron lobe and one that covers both the radio core and hot spot of the south lobe (Fig. A.34). The SED fitting with Mr-Moose is done with six components, three synchrotron (core, northern and southern radio component), two modified BB (northern and southern) and one AGN component. The VLA L detection does not resolve the individual components and are only considered for fitting the total radio flux (the combination of the western, eastern and core synchrotron power-law components). The C and X bands resolves the three components are there assigned to individual synchrotron models. The northern ALMA detection is assigned to the northern synchrotron power-law and one modified BB and the southern ALMA detection is assigned to a combination of the southern and core synchrotron components, as well as a modified BB. The SCUBA, SPIRE, PACS, MIPS and IRS are fitted to the combination of the two modified BB and a AGN component. The best fit model finds that the north ALMA detection is dominated by thermal dust emission and the southern ALMA detection is dominated by synchrotron emission from the south radio lobe.  TN J1338-1942 has one continuum detection with ALMA, which coincides with the northern radio component (Fig. A.36). SED fitting with Mr-Moose is done with four components, two synchrotron (northern and southern radio component), one modified BB and one AGN component. The northern radio component (detected in VLA bands C and X) is assigned to the northern ALMA detection and fitted with an individual synchrotron power-law. The southern radio component (detected in VLA bands C and X) is also fitted with an individual power-law and assigned to the southern ALMA upper limit. The VLA L data do not resolve the individual components and are only considered for fitting the total radio flux (the combination of the northern and southern synchrotron power-law components). The northern ALMA detection is assigned to the norther synchrotron component and to a modified BB component. The MAMBO SCUBA, SPIRE, PACS, MIPS and IRS are fitted to the combination of the modified BB and a AGN component. The best fit is where the ALMA flux is a combination of the northern synchrotron lobe and dust emission.   (Fig. A.38). SED fitting with Mr-Moose is done with four components, two synchrotron (northern and southern radio component), one modified BB and one AGN component. The northern and southern radio components (detected in VLA bands C and X) are each assigned to an individual synchrotron power-law. The VLA L band do not resolve individual components and is only considered for fitting the total radio flux (the combination of the north and south synchrotron power-law). The ALMA upper limit at the AGN host location is assigned to both synchrotron components and the modified BB. The SPIRE, PACS, MIPS and IRS data a fitted to the combination of the modified BB and a AGN component. The best fit model only constrains the synchrotron and AGN components, while the modified BB remains unconstrained because there are only upper limits in the FIR (Fig. A.37).     MRC 2048-272 has no continuum detection with ALMA ( Fig. A.42). SED fitting with Mr-Moose is done with four components, two synchrotron (north and south radio component), one modified BB and one AGN component. The northern and southern radio component (detected in VLA bands C and X) are each assigned to an individual synchrotron power-law. The VLA L and ATCA 7 mm bands do not resolve any individual components and are only considered for fitting the total radio flux (the combination of the northern and southern synchrotron power-law components). The ALMA upper limit at the AGN host location is associated to both synchrotron components and the modified BB. The LABOCA, SPIRE, PACS, MIPS and IRS data are fitted to the combination of the modified BB and AGN component. The best fit model only constrains the synchrotron, and both the AGN the modified BB models are unconstrained because the FIR through mid-infrared data are only upper limits.     A.44). SED fitting with Mr-Moose is done with three components, one synchrotron (only the core, the northern and southern radio components are excluded in the fit), one modified BB and one AGN component. The VLA C and X band data are assigned to the sychrotron power law of the radio core. The ALMA upper limit at the AGN host location is associated to both synchrotron components and the modified BB. The SPIRE, PACS, MIPS and IRS data are fitted to the combination of the modified BB and a AGN component. The best fit model only constrains the synchrotron and AGN component, the modified BB models are unconstrained due to the FIR data consisting only of upper limits (Fig. A.43).  Notes (c) Radio core, (a) changed to upper limit because of contaminating forground object (b) flux estimated using AIPS from original radio map, convolved to the resolution of the VLA C band (*) data not used in fitting.  4C 23.56 has no continuum detection in ALMA band 6, but is detected in band 3. The detection coincides with the radio core ( Fig. A.46). SED fitting with Mr-Moose is done with three components, one synchrotron power-law (of the core, the north and south lobe are excluded for the fit), one modified BB and one AGN component. The VLA C and X band fluxes are assigned to the synchrotron power-law. The ALMA band 3 detection is assigned to both the synchrotron model and modified BB and the same set up is used for the ALMA band 6 upper limit. The LABOCA, SPIRE, PACS, MIPS and IRS are assigned to the combination of the modified BB and a AGN component. The best fit model constrains the synchrotron and gives that the ALMA band 3 detection is dominated by synchrotron emission. The modified BB is unconstrained (Fig. A.45) due to the fact that all the FIR data points are upper limits. This source has a low SFR and the IR emission is dominated by the AGN. 4C 23.56 was use to fix the ν cut =33 µm for the model describing the AGN heated dust, because the SF contribution in the FIR is very weak.  <12.9 C ALMA 6 <0.4 a D ALMA 3 0.23 ± 0.05 a this paper VLA X c 5.11 ± 0.51 b E VLA C c 8.77 ± 0.87 b E Notes (c) Radio core, (a) Flux estimated using AIPS for primary beam corrected images. (b) flux estimated using AIPS from original radio map, convolved to the resolution of the VLA C band.  4C 19.71 has three detected continuum components in ALMA band 3, one coincides with the AGN host galaxy and two coincide with the northern and southern synchrotron lobes (Fig.  A.48). SED fitting with Mr-Moose is done with four components, two synchrotron (northern and southern radio components), one modified BB and one AGN component. The northern radio component (detected in VLA bands C and X) is assigned to the northern ALMA detection and fitted to a individual synchrtoron component, the same setup is also applied to the southern radio and ALMA components. The VLA L detection does not resolve any individual components and are only considered for fitting the the total radio flux (the combination of the northern and southern synchrtorn power-laws). The ALMA detection at the host location is assigned to only the modified BB and the SCUBA, SPIRE, PACS, MIPS and IRS are fitted to the combination of the modified BB and a AGN component. The best fit model constrains the two synchrotron components and they are fitted out to ALMA band 3. The ALMA detection at the host is dominated by thermal dust emission.    MRC 2224-273 has a single continuum detection which coincides with the compact synchrotron emission (Fig. A.50). SED fitting with Mr-Moose is done with three components, one synchrotron power-law, one modified BB and one AGN component.
The VLA X, C and L data are assigned to the synchrotron powerlaw, the ALMA detection is assigned to both synchrotron component and a modified BB. The LABOCA, SPIRE, PACS, MIPS and IRS are fitted to the combination of the modified BB and a AGN component. The best fit model gives that the ALMA detection is dominated by thermal dust emission. Without a flux at intermediate radio frequencies between the VLA and ALMA data, it is not possible to constrain whether or not the synchrotron is steeping. In any case, the synchroton emission does not significantly contribute to the emission detected by ALMA.