Towards a census of high-redshift dusty galaxies with $\mathit{Herschel}$: A selection of"500 $\mu$m-risers"

$\mathit{Herschel}$ extragalactic surveys offer a unique opportunity to efficiently select a significant number of rare and massive dusty objects, and thus gain insight into the prodigious star-forming activity that takes place in the very distant Universe. To search for $z\geq4$ dusty star-forming galaxies, in this work we consider red SPIRE objects with fluxes rising from 250 $\mu$m to $500\:\mu$m (so-called"500 $\mu$m-risers"). We aim to implement a novel method to obtain a statistical sample of"500 $\mu$m-risers"and fully evaluate our selection inspecting different models of galaxy evolution. We consider one of the largest and deepest ${\it Herschel}$ surveys, the Herschel Virgo Cluster Survey. We develop a novel selection algorithm which links the source extraction and spectral energy distribution fitting. We select 133"500 $\mu$m-risers"over 55 deg$^{2}$, imposing the criteria: $S_{500}>S_{350}>S_{250}$, $S_{250}>13.2$ mJy and $S_{500}>$30 mJy. Differential number counts are in a fairly good agreement with models, displaying better match than other existing samples. In order to interpret the statistical properties of selected sources, which has been proven as a very challenging task due the complexity of observed artefacts, we make end-to-end simulations including physical clustering and lensing. The estimated fraction of strongly lensed sources is $24^{+6}_{-5}\%$ based on models. We present the faintest known statistical sample of"500 $\mu$m-risers"and show that noise and strong lensing have crucial impact on measured counts and redshift distribution of selected sources. We estimate the flux-corrected star formation rate density at $44$ sources.


Introduction
The abundance of dusty galaxies at high-redshifts (z > 4) constrains our theories about early galaxy formation, since it is generally stated they are the progenitors of massive ellipticals seen in overdense regions of the local Universe (e.g. Eales et al. 2017, Toft et al. 2014, Casey et al. 2014. The widely accepted picture is that dusty star-forming galaxies (DSFGs) occupy most massive halos in early epochs and lie on the most extreme tail of the galaxy stellar mass function (e.g. Ikarashi et al. 2017, Fudamoto et al. 2017, Oteo et al. 2016, Mancuso et al. 2016, Michałowski et al. 2014). These DSFGs are usually selected in the far-infrared (FIR) regime where the star formation rates can be directly measured.
Large FIR surveys, such as those conducted with the Herschel Space Observatory (Pilbratt et al. 2010), provide an op-portunity to build a thorough census of prodigious starbursts over cosmological redshifts using wide and blind concept of searches. The Herschel SPIRE photometer (Griffin et al. 2010) was often used for mapping large areas at wavelengths of 250 µm, 350 µm and 500 µm. The redshift peak of most Herschel detected sources matches with the redshift where galaxies have formed most of their stars (z ∼ 2, Pearson et al. 2013, Lapi et al. 2011, Amblard et al. 2010. Considering that rest-frame dust Spectral Energy Distribution (SED) of a galaxy typically peaks between 70-100 µm, colours of sources in Herschel SPIRE bands were used to select candidate high-redshift dusty objects. To search for z 4 candidates, there is a particular interest to exploit the sources having red SPIRE colours, with rising flux densites from 250 to 500 microns (so-called "500 µm-risers"). Such galaxies should lie at z 4 unless they have dust temperatures that are Article number, page 1 of 28 arXiv:1709.00942v2 [astro-ph.GA] 1 Feb 2018 A&A proofs: manuscript no. ddrisers notably lower than is seen in local FIR-bright equivalents (Asboth et al. 2016, Yuan et al. 2015, Dowell et al. 2014, Roseboom et al. 2012. If the selection of "500 µm-risers" is free of contaminants such are blended systems and powerful non-thermal sources (e.g. quasars), it is expected to offer us insight into very distant and dusty, star-forming galactic events. Recently, a rapidly flourishing literature on dusty high-z galaxies has grown, including handful of serendipitously discovered "500 µm-risers" (e.g. Daddi et al. 2009, Cox et al. 2011, Capak et al. 2011, Combes et al. 2012, Vieira et al. 2013, Miettinen et al. 2015, Negrello et al. 2017). However, these findings had serious shortcomingsthey were limited to few objects with red SPIRE colours. 1 Being primarily focused on individual (usually strong lensing) candidates or millimeter selected samples regardless of the galaxy colour, they poorly constrained the statistics of "500 µmrisers". To derive a larger number of potentially unlensed "500 µm-risers" and analyse them in a more standardized manner, several works used map-search technique (Asboth et al. 2016, Dowell et al. 2014. They have led to the discovery of most distant dusty starburst galaxies known to date: SPT0311-58 at z = 6.902 (Strandet et al. 2017), HFLS3 at z = 6.34 (Riechers et al. 2013) and G09-83808 at z = 6.02 (Zavala et al. 2017, Fudamoto et al. 2017. Furthermore, studies that used lowest-resolution Herschel SPIRE-maps to select "500 µm-risers" (Dowell et al. 2014 andAsboth et al. 2016) claimed significant overprediction of observed number of "500 µm-risers" with those predicted by existing models (e.g. Hayward 2013, Béthermin et al. 2012).
Even if Herschel offers a direct insight to the IR emission of high-z objects, there are critical limitations like sensitivity of detectors and low spatial resolution. These are responsible for biases such as source confusion. The sensitivity of SPIRE instrument allows to directly detect only the brightest, thus rarest objects at the tip of luminosity function (Cowley et al. 2015, Karim et al. 2013) and we therefore need large surveys to increase the statistics. Study of Dowell et al. (2014) analyzed maps of three different extragalactic fields observed as part of the HerMES (Herschel Multi-tiered Extragalactic Survey, Oliver et al. 2010) program, while Ivison et al. (2016) and Asboth et al. (2016) probed much wider but shallower area of H-ATLAS and HeLMS (HerMES Large Mode Survey) field respectively (see Section 2.1 for details about field's properties).
In this work we aim to introduce a slightly different approach to build a statistically significant sample of red, potentially z > 4 sources. The new selection scheme we propose is motivated by the size and the depth of the field we chose to investigate. Herschel Virgo Cluster Survey (HeViCS, Davies et al. 2010) is deeper than the one used in the analysis of Asboth et al. (2016) and Ivison et al. (2016), and larger than the area analysed by Dowell et al. (2014).
The paper is organized as follows: in Section 2 we describe the methods we used for the data analysis and our new selection criteria for "500 µm-risers". In Section 3 and Section 4 we present expected redshift/luminosity trend of selected galaxies and differential number counts. In Section 5 we compare our results to different models. We perform simulations to review all selection biases and highlight the necessity of a further refinement of selection criteria in searching for z 4 sources. The 1 Throughout the text we adopt following terminology regarding colours of IR-detected sources: red colours and red sources refer to "500 µm-risers", while "350 µm peakers" refer to galaxies having SED peak at 350 µm.
nature of "500 µm-risers" and main conclusions are outlined in Section 6 and Section 7 respectively. We assume a Planck Collaboration et al. (2016a) cosmology.

HeViCS field
HeViCS is a fully-sampled survey that covered a region centered at the Virgo cluster (Davies et al. 2010, Davies et al. 2012. It is one of the largest uniform Herschel surveys, and its main advantage is the sensitivity and the uniformity of data. In this survey, Herschel observed four overlapping fields (fields V1-V4) in fast parallel-mode. The total entire survey region is 84 square degrees, where 55 square degrees are observed at unvarying depth with eight orthogonal cross scans (see Auld et al. 2013 andPappalardo et al. 2015 for more details). HeViCS observations reached a depth close to the confusion limit at the shortest SPIRE wavelength (250 µm). Because of the number of repeated scans, instrumental noise is significantly reduced in HeViCS maps, giving the 1σ levels of 4.9, 4.9 and 5.5 mJy at 250 µm, 350 µm and 500 µm respectively (Auld et al. 2013). In the overlapped, deeper regions recorded by 16 scans these values are even smaller, namely 3.5, 3.3 and 4.0 mJy. Due to the presence of bright sources (see Section 2.3), global noise estimation is not a straightforward task. We excluded bright sources from the map by masking them, and after their removal the global noise is derived from the variance of the map. It reaches the 1σ values of 6.58, 7.07 and 7.68 mJy (250 µm, 350 µm and 500 µm respectively) for a major area covered by 8 cross-scans. The extensive contributor to the overall noise measured in HeViCS maps is confusion noise, usually defined as the the variance in the sky map due to the fluctuations of unresolved sources inside the SPIRE beam. We calculate confusion using the relation σ conf = σ 2 tot − σ 2 inst , where σ tot is the total noise measured in the map, and σ inst is the instrumental noise. Values determined for the confusion noise are 4.4, 5.2 and 5.5 mJy at 250 µm, 350 µm and 500 µm band respectively, and almost identical to the ones presented in Auld et al. (2013). These values are also close to the confusion noise measured in HerMES maps, within twice the uncertainty of 3σ-clipping estimates from Nguyen et al. (2010) (3.8 mJy, 4.7 mJy and 5.2 mJy at 250 µm, 350 µm and 500 µm band respectively). In this work we use only SPIRE data. In parallel, each HeViCS tile has been observed by the Herschel PACS instrument, but the depth of PACS data (5σ tot =70 mJy at 100 µm, Pappalardo et al. 2016) is not sufficient to directly detect our "FIR-rising" sources. PACS data at 100 µm and 160 µm will be added together with a deep optical-NIR maps from the Next Generation Virgo Cluster Survey (NGVS, Ferrarese et al. 2012) in a following paper analysing ancillary data.

An overview of other Herschel fields
There are several large Herschel fields (e.g. with an observed area θ > 10 deg 2 ) used for detection of "500 µm-risers". Summary of their properties is shown in Table 1. All the values are taken from the literature (Oliver et al. 2012, Wang et al. 2014. The H-ATLAS survey (Eales et al. 2010) is used to select red candidates by Ivison et al. (2016). H-ATLAS is designed to uniformly cover 600 deg 2 of sky, but with its two scans survey did not reach the level of confusion noise at 250 µm. Studies of Asboth et al. (2016) and Dowell et al. (2014) acquired the data D. Donevski et al.: Towards a census of high-redshift dusty galaxies with Herschel The later six fields are areas with different design levels nested as a part of HerMES: Total area covered in HerMES is 380 deg 2 . Shallow HeLMS field covers the area of 274 deg 2 , and deeper Level 1-Level 6 fields cover the total area of about 80 deg 2 . FLS and Lockman-SWIRE have been used to probe "500 µm-risers" selection by Dowell et al. (2014), while Asboth et al. (2016) applied the same selection method in the HeLMS field. from different fields which are part of HerMES survey (Oliver et al. 2012). The HerMES survey observed 380 deg 2 of the sky. The survey has a hierarchical structure containing 7 levels, ranging from very deep observations of clusters to wider fields with varying size and depth. The largest (and the shallowest) observed area is HeLMS, with its 274 square degrees. HeViCS maps consist of two or four time more scans comparing to other fields listed in Table 1. It leads to a reduction of instrumental noise by a factor of √ 2. The global 250 µm noise in HeViCS is smaller than in H-ATLAS and HeLMS. However, it is still higher than in other HerMES fields, showing that confusion is a very compelling supplier to the noise budget for point sources in the HeViCS field. We clarify that statement repeating our analysis on regions overlapped between the tiles, which have greater coverage. We found no significant noise reduction, implying that the maps are dominated by confusion noise.

Map filtering
Prior to performing source extraction on SPIRE maps, we reduce the background contamination. 250 µm map of V2 field in HeViCS is strongly affected by galactic cirrus emission. This contamination peaks at around 150-200 µm (Bracco et al. 2011, Valiante et al. 2016, implying it is the brightest in the shortest SPIRE bands. The main effect of cirrus emission is to increase the confusion in the maps, but the small-scale structure within it can also lead to spurious detections in the catalogues. We follow the method applied by Pappalardo et al. (2015). Using the SEXtractor (Bertin & Arnouts 1996) we re-grid the 250 µm map into meshes larger than the pixel size. The SEXtractor makes an initial pass through the pixel data, and compute an estimator for the local background in each mesh of a grid that covers the whole adopted frame. We apply repetitive iterations to estimate the mean and standard deviation of the pixel value distribution in boxes, removing outlying pixels at each iteration. Important step in this procedure is the choice of the box size, since we do not want the background estimation be affected by the presence of objects and random noise. The box size should generally be larger than the typical size of sources in the image, but small enough to encapsulate any background variations. We therefore fix the mesh size to 8 pixels, adopting the result from Pappalardo et al. (2015). The local background is clipped iteratively to reach ±3σ convergence around its median. After the background subtraction, the number of sources appears uniform for regions with different cirrus emission. We use the background subtracted map as an input for source extraction process described in next subsection.

Extraction of sources
Blind SPIRE source catalogues have been produced for HeViCS (Pappalardo et al. 2015). Nonetheless, when density of sources is very high, which is the case in highly crowded HeViCS field, blind source extraction cannot separate blended point sources in an efficient way. Additionally, it remains difficult to properly cross-match sources at different wavelengths, since central positions from blind catalogues are not well constrained. To deal with source multiplicity we choose to perform extraction of 350 µm and 500 µm fluxes at exact a − priori position of 250 µm band detections, allowing much precise identification work. The potential limitation of such a method is that we might be eventually missing some "500 µm-risers" that are not included in the prior list after the first iteration of source extraction. We thus run our method iteratively and add new sources that may appear in the residuals at each iteration.
We use source finding algorithm optimized for isolated point-sources, SUSSEXtractor (Savage & Oliver 2007), to create the catalogue of galaxies detected in the 250 µm map as a prior to extract the flux densities at longer wavelengths. We then implement our novel technique (see Section 2.6) where source deblending and single temperature modified blackbody (MBB) fitting are combined in the same procedure. In following we explain our source extraction pipeline (see Fig.1 for graphical description): 1. We run SUSSEXtractor with use of fully-overlapped HeViCS 250-micron map. The SUSSEXtractor works on the flux-calibrated, Level 2 Herschel SPIRE maps. We create a point response function (PRF) filtered image, smoothed with the PRF. In SUSSEXtractor PRF is assumed to be Gaussian by default, with full-width-half-maximum (FWHM) provided by the FWHM parameter. We apply values of 17.6", 23.9" and 35.2" at 250, 350 and 500 µm respectively. Subsequently, pixel sizes at these bands are 6", 10" and 14". The algorithm searches for a local maximum which is the highest pixel value within a distance defined by pixel region. The position assigned to the possible source is then refined by fitting a quadratic function to certain pixels in the PSF-filtered image. 2. To search for even fainter sources that are closer to the confusion limit and usually masked/hidden in highly confused fields like HeViCS, we perform additional step looking for detections in our residual maps. Applying such additional step, we increase total number of sources by around 4%. Errors in the position estimated to be a source are determined as 0.6×(FWHM/signal-to-noise), up to a maximum of 1.0 pixels, as suggested in the literature (Ivison et al. 2007). 3. We build an initial list of 250 µm sources selecting all point sources above the threshold=3 (Bayesian criteria in SUSSEXtractor). We also impose the flux density cut choosing the values equal or higher than 13.2 mJy, which corresponds to S 250 3σ conf . As a result of our source extraction pipeline at 250-micron maps, we listed 64309 sources, similarly distributed among the four (V1-V4) fields. 4. List of 250-detections is then divided in "no-neighbour" list of sources (250 µm sources without another detection inside the 500 µm beam) and "close-neighbour" list where we add all sources having another 250 µm detection within the 500 µm beam. Prior to assign initial 250 µm list as an input to our modified blackbody fitting procedure, we clean the catalogue from potentially extended sources.

Extended sources
The Virgo cluster is one of the richest local clusters and we expect to have a significant number of extended sources. Objects that are extended on the SPIRE beam scale (see Wang et al. 2014 or Rigby et al. 2011) are not expected to be accurately identified with point-source extracting codes, and large galaxies may be misidentified as multiple point sources. To prevail the problem, we implement the same method used in Pappalardo et al. (2015). They define a mask using the recipe of SEXtractor, which detects a source when a fixed number of contiguous pixels is above a σ-threshold estimated from the background map. We keep the same value which has been tested in Pappalardo et al. (2015) -70 contiguous pixels above the 1.2 σ. In this case, most of the sources larger than 0.7 arcmin 2 are rejected from the sample. Additionally, we cross-match all remaining HeViCS 250 µm detections with their nearest (within 36") counterpart in the 2MASS Extended Source Catalog (Jarrett et al. 2000). We remove any detection supposed to be a counterpart with a Kron  Fig. 1: Schematic representation of selection of "500 µm-risers" in the HeViCS field. Coloured in orange are segments of source extraction prior to use MBB-fitter. They are (from upper left, following the arrows): 1.) Subtraction of strong cirrus emission 2.) SUSSEXtractor list of initial (threshold=3) 250 µm detections; 3.) Assumed a − priori list cleaned from extended sources. Enveloped by smaller black square are parameters considered for the fitting procedure with MBB-fitter and it corresponds to all pixels in the map where we have 250 µm detections; Coloured in green are segments of our selection after performing the photometry, subsequently: 1.) MBB photometry (S 250−500 ) at SPIRE wavelengths using 250 µm priors ; 2.) Parent list of "500 µm-risers" (140 sources in total), not cleaned from strong synchrotron contaminants. 3.) Final list (133 in total) of "500 µm-risers" after excluding local radio-sources. We apply following selection criteria for the final sample: S 500 > S 350 > S 250 , S 250 > 13.2 mJy and S 500 >30 mJy. elliptical aperture semi-major larger than 9". In total we suppressed 812 sources from the analysis, thus decrease the number of 250-detections in our parent list from 64309 to 63497.
List of sources detected at 250 µm down to 13.2 mJy, cleaned from galactic cirrus contaminants and extended sources, is further used as a − priori list for our simultaneous modified blackbody fitting.

Modified blackbody fitter
As described in Section 2.1, our maps are limited by confusion noise caused by the high density of sources relative to the resolution of the Herschel instrument. In cases where SPIRE images have multiple sources per beam, measured fluxes might be biased if we treat several blended sources as one. There are different approaches for source "de-blending" which has been invoked in the literature. We can separate them in three groups: Firstly, they are "de-blending" methods that use only positional priors as an input information (e.g. Elbaz et al. 2010, Béthermin et al. 2010, Roseboom et al. 2010. The second group of de-blenders consists of methods which combine positional information with statistical techniques (Hurley et al. 2017, Safarzadeh et al. 2015. For example, Safarzadeh et al. (2015) and Hurley et al. (2017) developed the codes based on the Monte Carlo Markov Chain (MCMC) sampling for prior source detec- Columns from left to right: 250 µm, 350 µm an 500 µm maps of sources (first column), subtracted models (second column) and residuals (third column). Note that in this example two central sources are considered for the fitting.
tions. While Safarzadeh et al. (2015) used Hubble Space Telescope (HST) H-band sources as a positional argument, Hurley et al. (2017) developed a similar MCMC-based prior-extraction for Spitzer 24 µm detections. Such an algorithm is efficient in obtaining Bayesian PDFs for all priors. However, they might still have some limitations, and the most important one is potential inability of unveiling the true high-z sources in the case they are not included in the initial list. Finally, third group of de-blenders consists from those using SED modelling as an addition to positional arguments (MacKenzie et al. 2014, Shu et al. 2016, Merlin et al. 2016, Liu et al. 2017).
Due to the lack of Spitzer 24 µm data for the HeViCS field, and unsufficiently deep existing optical images, we have no opportunity to use positional priors from shorter wavelength surveys, and we based our analysis on SPIRE data instead. We fur-ther use models to test the whole procedure of selecting "500µmrisers" (see Section 5).
MBB-fitter is a code developed to extract sources from multiwavelength bolometric observations (Boone et al, in prep.). It combines positional priors and spectral information of sources, such that SEDs of fitted galaxies should follow modified blackbody (MBB) shape as defined by Blain et al. (2003). Our MBB deblending approach is thus very similar to the method applied by MacKenzie et al. (2014). Although such a model can encapsulate just a segment of the general complexity of astrophysical processes in a galaxy, it can account for the SED data for a wide variety of dusty galaxies. The MBB-fitter method is described in detail and is tested on simulations in Boone et al. (in prep.). We summarize here the main points. In MBB-fitter the map (M) of a sky region is described as a regular grid corresponding to the sky flux density distribution convolved by the point spread function (PSF) of an instrument and sampled in pixels. Thus M i = M(α i , δ i ) refers to the value of the map at the i th pixel with coordinates α i , δ i . We further assume that the sources have a morphology independent of the frequency. The flux density distributions in the 3D space s k (α, δ, ν), where ν is frequency, can therefore be decomposed into a spatial distribution term (morphology) and a SED term: where PS F j is the PSF of the map at given frequency (ν j ), N sources is a number of sources considered for the fitting, with its reference coordinates (α κ , δ κ ). Here we assume that thermal continuum SED of galaxies in FIR domain can be modelled as a modified blackbody of the form (Blain et al. 2003): where ν w is the lower boundary of Wien's regime, h is the Planck constant and k is the Stefan-Boltzmann constant. Thus, Eq.1 describes a general model for a set of maps at different wavelengths, and each source SED is defined by a set of parameters arranged into a vector of following form P κ = [L IR κ , T d κ , z κ , β κ , γ κ ], where z κ is the redshift of given source, β κ its emissivity index, γ κ is the index of the power law to substitute the Wien's regime and L IR is the IR luminosity of a source. This model is parametrized by the source SED parameters, P κ , and coordinates α κ , δ κ . The total number of parameters is therefore N par =N s × (N SED par + 2), where N SED par is the number of SED parameters and N s is the number of sources. Introduced set of parameters reflects global properties of the source. There are potential degeneracies, meaning that different values of parameters may give very similar SEDs. This is the most prolific challenge when using the FIR peak as a redshift indicator -dust temperature and the redshift are completely degenerate (see e.g. Pope & Chary 2010).
Defined model can be compared to a set of observed maps and its fidelity can be assessed with a figure of merit such as the chi-square exemplified as the sum of the pixel deviations: whereM i j is the multiwavelength set of maps, σ i j is the Gaussian noise level of the i-th pixel of the map at the j-th frequency, N pix is the number of accounted pixels, while N freq represents the number of maps, which is equal to the number of different frequencies used. We use the Levenberg-Marquardt algorithm to find the minimum of the chi-square function in the space of parameters. The algorithm also returns an estimation of the errors on the parameters based on the covariance matrix. We note that all the pixels at maps need not to be included in the chi-square computation, it can be restricted to a subset of pixels (contiguous or not) and sky regions without prior sources can be excluded.
We impose our list of 250-detected sources as a prior list, and use MBB-fitter to perform two-step photometry: (1) simultaneously fitting the sources affected by surrounding source-blend (12135 sources in total), and (2) performing the photometry of sources that are more isolated without a surrounding companion inside the beam (51362 sources in total). We set emission spectral slope and dust temperature fixed at β = 1.8, and T d = 38 ± 7 K. These values are chosen to provide a very good description of the data and they are based on our current knowledge of dust temperatures in SPIRE detected sources (e.g. Yuan et al. 2015, Symeonidis et al. 2013, Lapi et al. 2011, see also Schreiber et al. 2017b). An example of simultaneous MBB "deblending" is illustrated in Fig.2. 2.7. Final data sample of "500 µm-risers" We select the final list of 133 "500 µm-risers" over the area of 55 deg 2 . Selected sources fulfil criteria accepted for the final cut: S 500 > S 350 > S 250 , S 250 > 13.2 mJy and S 500 > 30 mJy. The full catalogue is presented in Table 8. We performed several tests and chose to set flux density cut in 500 µm band at S 500 30 mJy, which is related to 4σ above total noise measured in HeViCS maps. Using this value, we reach the completeness level of ∼ 80% at 500 µm (see Fig.5) avoding larger uncertainties in colours due to lower signal-tonoise ratios (see Table 5 and Section 5.2). A 250 µm flux cut (S 250 > 13.2 mJy) corresponds to S 250 > 3σ conf after applying an iterative 3σ clipping to remove bright sources (see Section 2.1). Amongst the final "500 µm-risers" (133 in total) we have 11 red candidates with one or two additional 250 µm detections inside the 500 µm beam. Example 2D cutouts of several "500 µm-risers" from our final sample are presented in Fig.3 (see also Appendix C). Our final list of "500 µm-risers" in the HeViCS field is cleaned from strong radio sources that have flat spectra and very prominent FIR emission (7 objects in total). They may have colours similar to those of dusty, star-forming systems we are interesting to select. Contamination due to radiobright galaxies is eliminated by cross-matching existing radio catalogues: HMQ (Flesch 2015), NVSS (Condon et al. 1998), FIRST (Helfand et al. 2015) and ALFA ALFA (Giovanelli et al. 2007). Identified radio-bright sources are classified as quasars. In Fig.4 we show actual flux distribution of "500 µm-risers" selected in this work, along with red sources from other studies. We see that our catalogue contains in average more fainter objects than other existing samples. The median 500 µm flux of our sample is 38 ± 4 mJy, while median fluxes found in Dowell et al. (2014), Asboth et al. (2016) and Ivison et al. (2016) are 45 mJy, 65 mJy and 47 mJy respectively. Therefore, our selection is arguably the faintest sample of "500 µm-risers" available. We note that underlying flux distribution of selected galaxies might be still affected by strong gravitational lensing, which we discuss in detail in Section 6.2. Along with galaxy-galaxy lensing, cluster-lens amplification might affect measured fluxes. The effect is the strongest when the deflector is located half-way between the observer and the lensed object (e.g. Kneib & Natarajan 2011). Strong lensing events are thus more numerous for clusters at 0.1 < z < 0.5 (Johnson et al. 2014). Virgo cluster is very close along the line of sight (z = 0.003) and for this reason we expect negligible impact on SPIRE fluxes of "500 µm-risers". Colours of sources selected in this work, together with colours of "500 µm-risers" with spectroscopic redshifts from other studies are plotted in Fig.6. Median observed colour of our sample is S 500 /S 350 = 1.11 ± 0.10, whereas median colours from Dowell et al. (2014), Asboth et al. (2016) and Ivison et al. (2016) are S 500 /S 350 = 1.08, S 500 /S 350 = 1.12 and S 500 /S 350 = 1.23 respectively. In Section 5.1.5 and Section 5.2 we illustrate and discuss how important is the impact of the noise on observed counts and colours.

Completeness and flux accuracy
We check the quality of our data analysis performing the tests of completeness and flux accuracy. To determine these values, we use real "in-out" simulations. We calculate completeness by considering the number of injected sources with certain flux density S which are recovered in the simulated maps as real detections. Since our selection is based on a − priori 250-micron positions, in this test we mostly inspect that band.
We use Monte-Carlo simulation, artificially adding Gaussian sources and projecting them at randomized positions to our real 250 µm data, then convolving with the beam. Simulated sources in 350 µm and 500 µm maps are then placed at same positions. FWHM values we used to convolve are 17.6", 23.9" and 35.2", values we consider for our SUSSEXtractor detections. In each HeViCS field we add sources spanning the wide range of flux densities separated in different flux bins (first bin starts from 1-5 mJy, then 5-10 mJy, 10-20 mJy, 20-30 mJy and so, up to the 90-100 mJy). We perform the source extraction pipeline extracting the list of recovered point sources using SUSSEXtractor and searching for matched detections. The HeViCS field is significantly crowded and we can perform missmatch with an unrelated source close to the input (x, y) coordinate. We match input to output catalogue consider-  ing source's distances smaller than half of the 250 µm beam size (9"), measured from the centre of 250 µm detection. Quality assessment results are summarized in Table 2 and Fig.5. The plotted completeness curves are the best-fitting logistic functions, describing the completeness as a function of input flux. We can see that positions of recovered sources do not show any significant offset to assigned inputs. More than 70 % of sources with S 250 > 13.2 mJy have offset lower than 6", which is the value smaller than a 250 µm pixel size. Completeness and flux accuracy are consistent over all four HeViCS fields. The top panel in Fig.5 shows our completeness result compared to one obtained by Pappalardo et al. (2015) confirming that our results do not change notably due to different methods of flux estimation. With measured numbers in hand, we can adjust the detection cut to see variations in completeness, which is connected to function of input flux density. Our completeness at 250-micron maps shows the trend of fast decrease below the ∼ 25 mJy, and reaches ∼ 50% at ∼ 13.2 mJy, the value we choose as the lowest cut for our prior S 250 list. The fitting logistic function reaches the saturation at the level of 250 µm fluxes larger than 40 mJy. For the 500 µm band, our completeness is above 80% at S 500 > 30 mJy. This 500 µm flux we imply as the threshold for the final selection of "500 µm-risers". The bottom panel in Fig.5 shows that fluxes of sources below S 250 = 7 − 10 mJy are systematically overestimated because of "flux-boosting".
Reliability test is performed to quantify eventual spurious detections. This is important value for measuring the total number counts rather than for selection of "500 µm-risers", since we expect that eventually spurious source would be present in one waveband but not in all three SPIRE bands. However, since our prior list is based on as much as close to the confusion limit 250 µm detections, we make an analysis of possible false detections injecting fake sources into noise maps. We also quantify the range of statistical outliers present in our measurements. Outlier contamination is measured as a function of output flux density. Sources are considered as contaminants if their output fluxes are more than 3σ above the value inferred for injected sources. Considering the lowest detection threshold in 250 µm band (13.2 mJy) we found the low sample contamination of 2.7%.

Expected L IR and z distribution of "500 µm-risers"
Without known interferometric positions and confirmed spectroscopic redshifts of our sources, we are limited to the properties of FIR SEDs of selected "500 µm-risers". Nevertheless, our data may be very useful in providing approximate redshift/luminosity distributions of large samples and candidates for follow-ups. To fit an MBB model to our photometric data, we should consider a degeneracy between β − T d , and T d − z (Blain et al. 2003, Pope & Chary 2010. The peak of the SED is determined by the ν/T d term (see Eq.2), giving that a measurement of the colours alone constrains only (1 + z)/T d ratio. However, assuming reasonable priors (in our case β and T d ) it is possible to estimate a qualitative redshift distribution of "500µm-risers" (see T d −z degeneracy illustrated in Fig.B.1).
We fit a single temperature MBB model to the data. We fix the power-law slope β = 1.8 and dust temperature T d = 38 K, and we determine the median redshift of 4.22 ± 0.49. The choice of β is arbitrary, likely 1.5 < β < 2.0 (Casey et al. 2014, MacKenzie et al. 2014, Roseboom et al. 2012, and here we chose the value which is in the middle of that range. Because of aforementioned degeneracy in (1 + z)/T d space, decreasing  the dust temperature, e.g. from 38 K to 30 K for the fixed β, peak of the redshift distribution decreases to z = 3.57 (with 33% of sources at z > 4). Modelled redshift tracks for different MBB parameters are provided in Appendix B (see Fig.B.1).
To further investigate possible redshift/luminosity range we consider collection of SEDs of extensively studied IR-bright galaxies both from the local and higher-z Universe, namely: the compact local starburst M82, typical interacting, local dusty system Arp220, and the "Cosmic Eyelash", strongly cluster-lensed dusty galaxy at z = 2.3 (Swinbank et al. 2010).
Sample of dusty galaxies at z > 4 are expected to be a combination of intrinsically luminous starbursts and strongly lensed systems , Casey et al. 2014. It has been shown that lack of observational constraints caused some SEDs to express too low fluxes on the long λ-side of the FIR peak (Lutz 2014, Elbaz et al. 2010. Another important point considered here is that Ultra-luminous Infrared Galaxies (ULIRGs) at higher redshifts express a wider variance in dust temperatures comparing to their local analogues (e.g. Smith et al. 2014, Symeonidis et al. 2013. To overcome these differences we use empirical template representative for H-ATLAS galaxies (Pearson et al. 2013). The template is built for subset of 40 H-ATLAS survey sources with accurately measured redshifts covering the range from 0.5 to 4.5. It adopts two dust components with different temperatures and it is already used in the literature as a statistical framework for characterizing redshifts of sources selected from SPIRE observations (see Ivison et al. 2016). With different templates in hand we can better characterize the systematics and uncertainties of measured redshifts. Shifting these templates to fit with our FIR fluxes, we forecast redshift ranges of "500 µm-risers" in the HeViCS field. Probability distribution function (PDF) is built based on fitted χ 2 values and it allows us to derive an estimate of the median redshift. Median redshift of "500 µm-risers" is 3.87 if we apply "Cosmic Eyelash" template, 4.14 for Arp220 and 4.68 for M82. Next, we apply H-ATLAS empirical template accepting the following best-fit parameters: T cold = 23.9 K, T hot = 46.9 K, and the ratio between the masses of cold and warm dust of 30.1, as same as in Pearson et al. (2013). The result is shown in Fig.7. We determine a median redshiftẑ = 4.28, and an interquartile range of 4.08-4.75. This estimate matches 1-sigma uncertainty range of photometric redshifts calculated by Dowell et al. (2014) and Asboth et al. (2016). To determine an average photometric redshift for red sources selected from several HerMES fields, Dowell et al. (2014) used the affine invariant method (Foreman-Mackey et al. 2013) to fit the MBB model with fixed emissivity and rest-frame wavelength peak for each source. They found z = 4.7±0.9. The median photometric redshift estimated in Ivison et al. (2016) is somewhat lower (ẑ = 3.7), but note that not all of sources in their catalogue of high-z candidate DSFGs are "500 µm-risers". In Section 5.2 we analyse how the noise and resolution effects might impact the observed colours of DSFGs and thus the redshift distribution of selected "500 µm-risers".
To compute the IR luminosity, which is defined as the integral over the rest frame spectrum between 8 µm to 1000 µm, we rely on the same approach. Our results favour the presence of very luminous systems with no regard of chosen templatein 90% of cases selected "500 µm-risers" have L IR ≥ 10 13 L , with minimum and maximum values L IR = 8.1 × 10 12 L and L IR = 5.1×10 13 L respectively. Concerning the MBB model, we obtainL IR = 2.14 × 10 13 L , but it is worth to note that the model accounts for mid-IR excess using a very simplified method (see A&A proofs: manuscript no. ddrisers Eq.2). Furthermore, we apply the H-ATLAS representative template to determine corresponding median rest-frame IR luminos-ityL IR = 1.94 × 10 13 L . The result is again in consistency with Dowell et al. (2014) and Ivison et al. (2016). These studies performed SED modeling of the sources with known photometric or spectroscopic redshifts, referring that "500 µm-risers" have in average L IR ≥ 10 13 L . For example, Ivison et al. (2016) report the median value L IR = 1.3 +0.7 −0.5 × 10 13 L . The corresponding luminosities in their sample range from L IR = 6.0 × 10 12 L to L IR = 5.8 × 10 13 L .  : Infrared luminosity of our "500 µm-risers" (y-axis) as a function of redshift. Orange crosses are values estimated applying the empirical template made from H-ATLAS galaxies covering the redshift range from z = 0.5 to z = 4.5 (Pearson et al. 2013). We imposed the same fitting parameters as used for H-ATLAS galaxies: T cold = 23.9 K, T hot = 46.9 K, and the cold-to-hot dust mass ratio equal to 30.1.
In Fig.7 we illustrate the L IR -redshift trend of our "500 µmrisers". We see that expected redshift distribution is not fully uniform, expressing the heavy-tail towards the higher redshifts. It indicates that very red submm galaxies are present at redshfits above the interquartile range, but it could also be due to strong gravitational lensing or unresolved blending. We should account with the possibility to have a much wider range of L IR − z values under larger uncertainties in dust temperatures. The IR luminosity emitted by a MBB source at a given T d depends on the emissivity and size of radiation area. Therefore, eventual gravitationally lensed candidates would lead to an apparent boost in dust luminosity at a given T d due to a larger magnification . We consider and quantify effects of strong lensing in our selection (see Section 5.1.5 and Section 6.2). We should emphasize here that the proper "training" of selected templates requires addition of higher-resolution data, and just in that case we can estimate systematic uncertainties and reject unreliable templates.
3.1. Comparison to red sources from the literature In Fig.6 we show how the sources from this work relate to some of "500 µm-risers" with known redshifts selected from the literature. Redshift tracks of Arp 220 and Cosmic Eyelash are added for comparison. We see that colours of our sources agree very well with a large number of red DSFGs at z > 4 found in different studies. There are some exceptions, and one of them is the most distant "500 µm-riser", SPT0311-58 at z = 6.9 (Strandet et al. 2017). The possible reason for this might be additional contribution from AGN, since it has been shown that SPT0311-58 experiences additional mechanical energy input, probably via AGN driven outflows. There are also examples of sources with redshifts somewhat lower than what would be expected from their very red colours. These sources usually have much lower T d compared to median dust temperature of "500 µm-risers" at z > 4 . Such an example is "500 µm-riser" at z = 4.2 (Weiß et al. 2013). It is strongly lensed red source with colours very similar to those of known "500 µm-risers" at z > 6. However it has estimated T d = 30 K. For comparison, all known "500 µm-risers" at z > 6 have dust temperatures higher than 50 K (Zavala et al. 2017, Strandet et al. 2017, Riechers et al. 2013. We draw attention to some interesting "500 µm-risers" selected in this work. According to MBB model introduced in Section 2.6 , it is plausible to expect a source at z ∼ 6 if it has T d ≥ 45 K and satisfies colour criteria S 500 /S 350 > 1.3 and S 350 /S 250 > 2.4 (see Fig.B.1 in Appendix B). In the literature there are 2 out of 3 "500 µm-risers" at z > 6 that fulfil these, so-called "ultra-red" requirements (Zavala et al. 2017, Riechers et al. 2013. We have 7 such sources in our sample, and we highlight them as suitable very high-redshift candidates. They are catalogued as HVS 5, HVS 6, HVS 21, HVS 35, HVS 47, HVS 75, HVS 85 and HVS 94. We also stress 5 of our "500 µm-risers" (HVS27, HVS60, HVS 64, HVS115 and HVS 130) that reside in potential overdense regions unveiled by Planck (Planck Collaboration et al. 2016b). If these overdensities are not just part of random fluctuations in the galaxy number density within the Planck beam, it makes them candidate protoclusters of high-z DSFGs (Greenslade et al. 2018, Smolčić et al. 2017, Oteo et al. 2017a, Clements et al. 2016).

Differential number counts
We measure the raw 500 µm differential number counts and test our selection in respect to previous studies (this section), but also models of galaxy evolution (Section 5). We determine our counts placing 133 "500 µm-risers" in seven logarithmic flux bins between 30 mJy and 100 mJy. The resulting plot is shown in Fig.8. The total raw number density of "500 µm-risers" in 55 deg 2 of HeViCS is N = 2.41 sources per deg 2 , with the corresponding 1σ uncertainty of 0.34. Note that this value is uncorrected for non-red contaminants and completeness. In Section 5.2 we fully inspect and quantify biases that affect observed distribution, correcting for these effects. Furthermore, the presented differential number counts ignore eventual source multiplicity. Without interferometric data at longer wavelengths our counts should be interpreted as the density of sources in respect to the beam resolution. Potential impact of SPIRE source multiplicity on our "FIR-riser" selection is discussed in Section 6.
In Table 3 we presented raw differential and integral number counts. In Fig.8  to select red sources. The method homogenizes beams to the size of 500 µm, and creates the DMAP where SPIRE images are all smoothed to an identical angular resolution. Prior to source identification, weighted combination of the SPIRE maps is formed applying the relation D = 0.920 × M 500 − 0.392 × M 250 , where M 500 and M 250 are the smoothing values after matching SPIRE maps to the 500 µm resolution. To select "500 µm-risers", they performed red colour map-based search, starting from lowest resolution 500 µm maps. From Fig.8 it can be seen that dif- (1) In order to calculate the cumulative (integral) number counts we have sum over all the "500 µm-risers" above the specified flux density (S min ). Here we presented raw differential and integrated number counts.
ferential number counts notably differ, especially in the lowest two flux bins where our measurements are in consistency with Dowell et al. (2014). The slope obtained by Dowell et al. (2014) in the first two flux bins is almost flat, which is different from HeViCS curve, where a steeper decrease is present. While our number counts show fairly good agreement with those measured by Dowell et al. (2014) especially in lower flux bins, they are well below the values measured by Asboth et al. (2016). Depending on a chosen flux bin discrepancy factor ranges from 3 to 10. We note that discrepancy between our number counts and those from Asboth et al. (2016) are due to several effects. Asboth et al. (2016) included possible blends in their final catalogue under assumption that at least one component of a blend is a red DSFG. This introduces a non-negligible number of contaminants, which is the effect we analyse in Section 5.2. The effect of blending on our number counts is reduced since we apply a new technique of source extraction based on positional and SED priors. Additionally, Bethermin et al. (2017) simulated the same process of source extraction and selection as in Asboth et al. (2016) and found number of observed "500 µm-risers" higher by factor of 8 compared to the number of genuine (modelled) "500 µm-risers". They concluded that combination of noise, resolution effects, and the steepness of flux density distributions produce numerous red artefacts that match "500 µm-risers" criteria. In Section 5.1.5 and Section 5.2 we analyse in details the effect of noise and explain how it affects our selection.
Apart from samples mentioned above, it is possible to find other "500 µm-risers" in the literature, but they are not selected in uniform manner (e.g. Casey et al. 2012, Miettinen et al. 2015, Negrello et al. 2017. "500 µm-risers" are serendipitously detected in relatively wide and shallow surveys that use wave-lengths longer than 500 µm for initial detection (e.g. South Pole Telescope, see Vieira et al. 2013). These samples contain some of brightest "500 µm-risers", all of them significantly amplified by gravitational lensing (S 500 > 100 mJy).

Comparison to models of galaxy evolution
In this section we compare our results with expectations from the most recent models of galaxy evolution. To evaluate our selection method, we perform simulations by generating realistic SPIRE maps from catalogues of simulated galaxies.

Models
In the literature there is a number of methods aiming to predict the evolution of number counts at different IR wavelengths. In this work we consider galaxy evolution models based on multiband surveys. In general such models rely on the combination of observed SED templates of galaxies and luminosity functions. It has been shown (Dowell et al. 2014) that some galaxy evolution models, mostly from "pre-Herschel era", underpredict either the total number of high-z, red sources (Le Borgne et al. (2009), or the number of red sources which lie at z > 4 (Béthermin et al. 2011). There are some exceptions (e.g. Franceschini et al. 2010), that anticipate larger number of red, high-z galaxies, but predicts at the same time large amount of very low-z (z < 2) red objects which seems unphysical. Additionally, we expect that "almost red" sources (described as sources having a peak in their FIR SEDs at wavelengths between 350 and 500 µm) contaminate the sample of "500 µm-risers", making observational artefacts caused by the noise. It is then very important to have models that predict significant number of such sources, to check the systematics in the detection rate of these contaminants. In this work we consider phenomenological models of Béthermin et al. (2012), Schreiber et al. (2016) and Bethermin et al. (2017), hereafter denoted as B12, S16 and B17 respectively. These models were built on Herschel data and accurately match the total IR number counts. Summary of models and their main ingredients is given in Table 3. Common for all models is their usage of stellar mass function (SMF) as a starting point from which properties of galaxies are generated. They share the same general description of star-forming galaxies, with star formation rate (SFR) assigned based on the dichotomy model of Sargent et al. (2012). It decomposes bolometric FIR-luminosity function with main-sequence (MS) and starburst (SB) galaxies. The MS galaxies are described as secularly evolving galaxies tightly relating stellar mass and SFR, while SB galaxies show an offset from the main sequence, expressing very high specific star formation rates (sSFR = SFR/M ). Shape of the SEDs is controlled by the galaxy type (MS or SB) and mean intensity of the radiation field U , which couples with the temperature of dust. Differences between the models are described briefly in following subsections -the most important are scatter from the main sequence, parameters chosen to fit stellar masses, and redshift evolution of a dust temperature. Models presented here have also slightly different description of mergers.

Béthermin et al. (2012) model
The B12 model describes the stellar mass function (SMF) with a Schechter function with redshift invariant characteristic mass parameter (see eq. 4 in Béthermin et al. 2012, but also Peng et al. 2010). IR SED template are based on Draine & Li (2007) models, with the mean radiation field U evolving with redshift up to z = 2.0 . The dispersion of the MS log-normal distribution is σ MS =0.15 dex, following Sargent et al. (2012) and Salmi et al. (2012). The model takes into account strong lensing effects reckoning the magnification rate PDF from Hezaveh & Holder (2011). These lensed sources contribute ∼ 20% to the bright-end submm counts (PLW 100 mJy). The AGN contribution is statistically associated based on results from Aird et al. (2012) and Mullaney et al. (2011).

Bethermin et al. (2017) model
The B17 model is based on similar prescriptions as B12, but implies several improvements in order to match recent interferometric results that disclosed notable overestimation of submm number counts due to the source blending (Karim et al. 2013, Simpson et al. 2015. The model accounts for detailed consideration of clustering and resolution effects. To describe the physical clustering an abundance matching procedure is used to populate the dark-matter halos of a light cone constructed from the Bolshoi-Planck simulation (Rodríguez-Puebla et al. 2016).The MS-scatter is updated (σ MS =0.3 dex) in order to match the measurements of Ilbert et al. (2015) and Schreiber et al. (2015). The model uses a new parametric form to fit the redshift evolution of the radiation field U . Evolution of dust temperature of a main sequence SF galaxies stops at z = 4 (instead at z = 2.0 as in B12) and remains constant at higher values. Contribution of AGNs and strong lensing effects are modelled as in B12. The weak lensing regime is modelled with magnifications that are randomly drawn from a Gaussian distribution. Their width and mean values are derived based on a cosmological simulation of Hilbert et al. (2007).

Schreiber et al. (2016) model
Similarly to B12 and B17, the S16 model is based on the MS-SB dichotomy. MS galaxies are modelled based on a stellar mass and redshift from Schreiber et al. (2015) and Pannella et al. (2015). The scatter of the main sequence σ MS =0.3 dex. Randomly-selected galaxies (5%) are placed in the SB mode by enhancing their SFR by a factor of ∼ 5. The distribution of stellar masses is described by double power-law Schechter fit, with parameters evolving with redshift. These parameters are chosen accordingly to observational data from Schreiber et al. (2015) (see also Grazian et al. 2015). A new set of template SEDs is used to model the dust emission of star-forming galaxies. These SEDs are based on the physically-motivated dust model of Galliano et al. (2011). Redshift evolution of a dust temperature is modelled as The "starburstiness" term R SB is used to quantify the SFR offset between MS and SB galaxies (R SB = SFR/SFR MS ). Dependence of sSFR of the stellar mass shows that sSFR is constant at lower masses, and drops at the highest masses, M > 10 10.5 M * (e.g. Schreiber et al. 2015, Whitaker et al. 2015. This trend is similar to the one used in B17, but different from the fit used in B12. To summarize, in S16 both sSFR and T d evolve continuously with redshift, which is not the case for other two models (see Table 3). Effects of strong lensing and AGNs are not included in the model.

Mock catalogues
We set mock catalogues based on models described in previous section. Our goal is to compare the number counts of observed "500 µm-risers" to the ones predicted by models. 3 The catalogues based on B12, B17 and S16 cover sufficiently large areas to offer accurate inspection of our observational criteria in the HeViCS field. Namely, for the B12 model we used the catalogue which is a result of 500 deg 2 simulations . We also used the catalogue created by the B17 model covering the area of 274 deg 2 (Simulated Infrared Dusty Extragalactic Sky, SIDES, Bethermin et al. 2017). The size of simulated area is equal to the size of HeLMS field, thus perfectly suited for a comparison of our selection technique to one performed by Asboth et al. (2016). With S16 model, we generated a mock catalogue covering the size of the HeViCS field (55 deg 2 ).
We further apply our "500 µm-risers" selection criteria (S 500 > S 350 > S 250 , S 250 > 13.2 mJy and S 500 >30 mJy) on modelled sources. Left panel in Fig.9 shows observed counts of HeViCS-selected "500 µm-risers" plotted against the model predictions. We see that observed counts are in a fairly good agreement with models, while corresponding power-law slope show no significant difference from the slopes anticipated by models. Observed values are steep at the fainter-end and flatter at the bright-end (S 500 > 80 − 90 mJy) which is due to flux magnification by gravitational lensing. Our HeViCS differential number counts have the perfect agreement with B12, but we have to note there are recent evidences (see Bethermin et al. 2017 for more details) that the simulated catalogue based on the B12 model overpredicts the number counts at 500 µm fluxes below 50 mJy by a factor of 2-3. The number of sources satisfying our selection in B17 and S16 are 0.45 deg −2 and 0.40 deg −2 respectively. Predictions of B17 and S16 are consistent with the 1σ error bars of observed counts in the higher flux bins (60 mJy< S 500 < 100 mJy). In lower flux bins (30 mJy< S 500 < 60 mJy) discrepancy factors are between 1 and 6 depending on a chosen bin. In Section 5.1.5 we discuss in more details potential reasons of a present difference. However, it is clear that we have a much better agreement with empirical models than previous studies (Asboth et al. 2016) who claim observational discrepancy of an order of magnitude.

Effects of noise on measured counts
The results described in Section 5.1.4 neglect the effect of noise on the number counts of "500µm-risers". Therefore, we simulate the effect of both confusion and instrumental noise by adding a random Gaussian noise drawn from the values measured in the HeViCS field (Section 2.1). The comparison between observations and models with simulated Gaussian noise are illustrated in the right panel in Fig.9. The significant increase of counts in the lowest two flux bins is clearly seen, while their slopes do not express significant change compared to intrinsic ones (left panel in Fig.9).
Considering the addition of noise, the number of simulated sources that appear as "500 µm-risers" in B17 jumps to 473, resulting to an increase of number counts from 0.45 per deg 2 to 1.73 per deg 2 . We obtain a very similar result with S16 -a num-Article number, page 13 of 28 This work B12+noise S16+noise B17+noise Fig. 9: Observed differential number counts in the HeViCS field (black line) confronted to models. Expected values from models are overplotted as coloured lines. Le f t: Comparison between observed and modelled counts. Models are represented by full, coloured lines: cyan for B12 , red for B17 (Bethermin et al. 2017 and green for S16 (Schreiber et al. 2016). Effect of simulated noise is ignored. Right: Comparison between observed and modelled counts if the effect of noise is simulated. The effect of confusion and instrumental noise is simulated by adding a random Gaussian noise to the modelled fluxes. Differential counts are then represented by dashed, coloured lines. ber density increases from 0.4 per deg 2 to 1.54 per deg 2 . As a consequence, observed HeViCS values match predictions of the B17 and S16 model even in the faintest 500 µm flux regime. 4 We find that noise often tends to increase the number of gen- (1) Modelled density of 500 µm-risers before and after the addition of random Gaussian noise. Criteria used to select "500 µm-risers" are: S 500 > S 350 > S 250 , S 250 > 13.2 mJy and S 500 > 30 mJy. Numbers reported in brackets refer to the percentage of strongly lensed galaxies; (2) Contribution of intrinsically red but faint sources (S 500 < 30mJy) to the population of modelled sources that match our "500 µm-risers" selection criteria after the addition of modelled Gaussian noise.
uinely red sources that are slightly below our 500 µm flux cut. 4 In B17 star formation rate (SFR) is limited to a maximum 1000 M yr −1 .
These galaxies have modelled flux densities S 500 < 30 mJy, but they can pass our final "500 µm-risers" selection after the addition of a Gaussian noise. As shown in Table 5 contribution of these sources to the modelled "500 µm-risers" vary between 52-67%. In Section 4, we mentioned that for the wide and shallow HeLMS field, Bethermin et al. (2017) found an increase of observed number counts by factor of 8 compared to their modelled values. This strong increase is caused by the combination of noise and clustering, and it is two times higher than noise-driven increase of density of detected "500 µm-risers" in the HeViCS field (see Table 5). We thus conclude that the effect of noise is more prominent for shallower fields, while for deeper fields (e.g. HeViCS and deep HerMES fields analysed in Dowell et al. 2014) the noise has just a mild impact on measured number counts. The effect of noise also changes the relative contribution of modelled lensed and unlensed "500 µm-risers". The percentage of weakly lensed and non-lensed "risers" in B17 increases from 49% (modelled values) to 76% (modelled values+Gaussian noise). A closer inspection suggests that weak lensing (1 < µ < 2) is a contributor to the flux budget for a non-negligible number of sources (166 out of 473, or 35%). Fainter, weakly lensed red sources can pass our final "500 µm-risers" criteria more often if they are on a positive fluctuations of the magnification. This has an important role in producing observed "500 µm-risers" without a support of strong lensing.

Simulated maps
Mock mapping simulations are needed to explore the exact nature of selected sources and possible biases of our Herschel SPIRE selection. These are mostly induced by the limited angular resolution combined with the noise effects. Having used two different models, we can uncover systematic uncertainty about the purity of detected sources. This is a crucial step if we want to investigate our photometric pipeline of "500 µm-risers", since it S500=S350 > 0:9 S500=S350 > 0:97 S500=S350 > 1 S16 B17 Fig. 10: Left and right panel attribute to S16 and B17 respectively. Upper panels: Galaxies detected in mock maps. "FIR-riser" criteria are imposed as for real HeViCS maps (S 500 > S 350 > S 250 , S 250 > 13.2 mJy, S 500 > 30 mJy). Intersected area coloured in dark orange depicts recovered red sources. Light orange area represents detected contaminants. Violet area represents missed sources. Those genuinely red galaxies are present in our catalogue, but not as "500 µm-risers". Lower panels: Redshift distribution of modelled galaxies. Different S 500 /S 350 colour cuts are imposed. Those cuts are related to statistical properties of the confusion noise, which are the most responsible for colour uncertainties. (see also Table 6).
is necessary to count with the numbers of sources we potentially miss and to determine fraction of contaminants. We generate a set of simulated SPIRE maps with use of Empirical Galaxy Generator Code (EGG, Schreiber et al. 2017a). We filled our mock maps with the sources that has been drawn from S16 and B17 mock catalogues. We convert mock catalogues into maps, assigning theoretical Herschel PSF and the same noise properties as measured in our real HeViCS images. We chose to simulate clustering. The mock galaxies from the catalogues are then placed on the sky at coordinates with a fixed angular two-point correlation function to implement this effect. Positions of galaxies in S16 are clustered by default with use of Soneira & Peebles (1978) algorithm, which produces a twopoint correlation function with a power-law shape. To assign clustered coordinates for mock maps created from B17, we follow prescriptions from simulation described in Bethermin et al. (2017). It adopts sophisticated clustering model with the subhalo abundance matching procedure used to populate the darkmatter halos of a light cone constructed from the Bolshoi-Planck simulation (Rodríguez-Puebla et al. 2016). With this we ensure that simulated maps contain a realistic background of clustered sources that can contribute to the confusion noise.
Due to computational dependencies, we generated maps covering the area of one of the four HeViCS fields (16 deg 2 each). As shown in Section 2.4, our source extraction lead to an homogeneous distribution of sources over all HeViCS fields, so we can easily extrapolate our results for the whole observed area of 55 deg 2 . We use the methods described above and perform the same source detection pipeline as we did in our raw HeViCS maps. We also check positional and flux accuracies in the same way we did for real HeViCS images.
Results from our simulations are summarized and presented in Fig.10. We detected all modelled "500 µm-risers" via our detection procedure, but not all of them are characterised with their intrinsic colours.We thus disclose three groups of sources described below.
As expected from the mock catalogue analysis (Section 5.1.4), population of recovered risers is a mix of both lensed and unlensed galaxies. We confirm that our criteria to select "500 µm-risers" unveil significant group of dusty and potentially very high-z sources that are not biased toward higher magnifications.
As introduced in Section 5.1.5 noise effects might produce colours somewhat redder than original ones. We quantify this as a "reddening", namely: where σ PLWm and σ PMW are 1-sigma total noise ratios measured for appropriate bands in the HeViCS field (see Section 2.1). The differences between intrinsic and observed colours of recovered 500 µm-risers are presented in Table  6. We see that the largest difference is for the lowest flux bin, where we also expect largest flux uncertainties due to the lower S/N ratio. Redshift distribution of recovered "500 µm-risers" in Fig.10 reflects differences between models. Following S16, all red galaxies are at z > 3, and most of them (70%) at z > 4 . The mean redshift value is z = 4.56 ± 0.94. The B17 model has a comparable high redshift tail, but this model peaks at lower redshift, causing the mean value ( z = 3.89 ± 0.9) to be lower than in S16.

Contaminants
Light-orange region in Fig.10 represents a fraction of contaminants. Sources are detected as "500 µm-risers", but having assigned their modelled colours and fluxes, they should not pass our selection criteria. The total percentage of contaminants is 39% for B17 and 43% for S16. They consist of two different populations of sources: the first one are genuinely red, but fainter galaxies, with modelled 500 µm fluxes not bright enough for the final catalogue inclusion (S 500 < 30 mJy). Nonetheless, their observed fluxes and colours are sufficiently high to pass "500 µm-risers" criteria. The relative contribution of these contaminants is 60% for B17 and 72% for S16. The second population of contaminants are "non-red" sources with narrowly ranged intrinsic colours between 0.95<S 500 /S 350 <1.0. They are weakly lensed or unlensed "almost red" sources. We also find that non-red contaminants are accompanied with sources clustered at 1 < z < 2. The largest 500 µm excess is caused if clustered neighbours are at z = 1.3 ± 0.1. These sources are massive galaxies with M * > 10 9.8 M . It confirms the conclusion from Section 5.1.4 that combined with weak lensing, confusion which arises from clustering and noise has a tendency to increase the number of observed red sources.
Having exploited both models, we see that all detected contaminants are high-z dusty galaxies, with just one source at z < 3. Redshift distribution of contaminants peaks at 3.48 for B17 (z min = 2.2 and z max = 5.11), and 4.17 for S16 (z min = 3.28 and z max = 5.86). Additionally, according to both models, contaminants are galaxies with warm dust temperatures (40 K < T d < 50 K).
3. Missed "500 µm-risers" With our extraction procedure we detected all modelled "500 µm-risers" from simulated maps. However there is a fraction of sources for which we failed to properly characterize their colours with MBB-fitter. Violet region in Fig.10 represents sources having S 500 /S 350 < 1 or S 500 < 30 mJy. In this way we missed to select around 20% of genuine "500 µm-risers". According to models, missed "500 µmrisers" have colours and fluxes very close to the lowest selection threshold. In our simulated maps, they often immerse in complex blends, having a nearby, unresolved blue source at 0.5 < z < 1.0 inside the 250 µm beam. Therefore, missed sources passes both 250 µm and 500 µm flux selection cuts but their observed colours are non-red. We quantify that the difference between observed and modelled colours of missed "risers" is shifted bluewards by a factor of ∼ 0.03, thus giving negative ∆. Interestingly, not just observed, but the modelled ∆ of missed sources exhibits negligible change due to noise effects. It tends to make their colours marginally redder or even bluer compared to the average value we measure for "500 µm-risers" ( ∆ = 0.01 for the whole population, while for missed sources ∆ < 0.003 having consulted both models). The final colour-criterion is thus highly sensitive to multiple effects including the strong clustering of blue sources. The redshift distribution of missed "500 µm-risers" shows that 50% of them are weakly lensed galaxies at z > 4. It becomes vital to account for the missed population of red sources while searching for z > 4 dusty galaxies (see Section 5.2.1). (1) Catalogued colours from B17. Noise is not added to initial flux values; (2) Measured colours in simulated maps.; (3) Colour difference between observed and simulated flux values. It is quantified as "reddening" (∆ = S 500 + σ PLW S 350 + σ PMW − S 500 S 350 ). Values in brackets show the difference between catalogued fluxes before and after the additon of modelled Gaussian noise. 5.2.1. Modifying criteria to select galaxies at z > 4 ?
Percentage of simulated red sources recovered by our detection pipeline is not as high as the values presented in Asboth et al. (2016) and Dowell et al. (2014). These studies reported purity of almost 90 per cent. Yet, there is an important and significant difference between our simulations. While we created mock images assuming the colours of sources that have been drawn directly from the simulated catalogues, Asboth et al. (2016) and Dowell et al. (2014) omitted modelled red sources from their maps, injecting "risers" with SPIRE colours fixed to the median value measured in their raw maps. Such an approach may lead to a bias, since we found (see Section 5.1) that most of unlensed "500 µm-risers" have S 500 /S 350 very close to one. Additionally, S16 and B17 predict large number of "almost red" FIR sources (e.g. 0.9 < S 500 /S 350 < 1.0 and S 250 > 13.2 mJy). The modelled number density of such defined sources is between 8-10 per deg 2 . They are included in our initial "prior" catalogue, and due to effects of noise, clustering and/or lensing, some of them might be observed "redwards", thus being responsible for lower purity index.
Considering the colour differences (parameter ∆ in Table  6), it appears clear that distinguishing between the population with S 500 /S 350 = 0.97 and S 500 /S 350 = 1 is very troublesome in crowded environments, even for fields with significantly reduced instrumental noise. As expected, colour uncertainties increase towards the lower fluxes. We conclude that some flexibility on a colour threshold is needed to account for noise and environmental effects. Here we test several colour cuts. The goal Table 7: Relation of different colour cuts to the number of observed/missed red sources and their redshift distributions. Sources observed in simulated SPIRE maps are divided in three columns according to different colour limits (S 500/350 > 0.9, S 500/350 > 0.97, S 500/350 > 1.0). In each of three cells left column (m) substitute to percentage of missed, genuine "500 µm-risers", while the right column represents the percentage of z > 4 sources contained in selected populations.

Models
Colour-cuts S 500/350 > 0.9 S 500/350 > 0.97 S 500/350 > 1.0 of such a test is twofold: to find the colour threshold that increases the number of true (modelled) "500 µm-risers", but introducing in the same time low contamination of 2 < z < 4 sources. We quantify what is the contribution of these lower-z objects for each colour limit. The result of the analysis is summarized in Table 7 and histograms in the lower panel of Fig.10.
Having imposed S 500 /S 350 > 0.9 we anticipate that almost all "500 µm-risers" will be found independently of chosen models. Nevertheless, contribution of 2 < z < 4 sources is significant, and vary from 57% (S16) to 77% (B17). Redshift distribution peaks at z = 3.25 for B17 and z = 3.92 for S16. If we increase the colour cut requirement to S 500 /S 350 > 0.97, percentage of detected simulated intrinsic "500 µm-risers" ranges between the 82% and 85%. Contamination is heavily reduced, and just a few galaxies are at z < 3 (e.g. in S16 all the galaxies that satisfy this colour cut are at z > 3). Furthermore, redshift peak shifts to higher values, z = 3.68 for B17 and z = 4.29 for S16. Following these results, we suggest that a cut at S 500 /S 350 > 0.97 might be imposed to search for z 4 galaxies in the future. It is a good compromise that accounts for several effects which decrease the percentage of recovered red galaxies. In the same time, it works well against the lower-z contaminants, especially those at z < 3, limiting their contribution to a maximum 15%. Recent follow-up observations of red sources selected in H-ATLAS field ) also support our quest for modifying "500 µm-risers" colour criteria. They report just 30% of sources selected as "risers" are lying at z > 4, claiming that such a low fraction is perhaps due to significant number of spurious-red contaminants in their catalogue. To account for a high level of noise in the H-ATLAS field, authors agreed that some refined selection technique is needed. In this paper we keep our final selection based on traditional threshold S 500 /S 350 > 1 due to easier comparison to existing studies. 6. What are "500 µm-risers"?

Problem of multiplicity
The potential problem in performing refined selection criteria for "500 µm-risers" is our limited knowledge of source multiplicity. Different rates of multiplicities are claimed by studies found in the literature (e.g. Cowley et al. 2015, Simpson et al. 2015. Their conclusions are tightly related to selection criteria and instrument used to measure the flux (single dish or interferometer). Some observations suggest that multiplicity fraction may be dependent on the observed FIR flux, where is more liable for brighter sources to have multiple components (Karim et al. 2013, Bussmann et al. 2015. This assumption includes strongly lensed sources as well, since they are known to be very bright in SPIRE bands. More recent SPIRE multiplicity study is done by Scudder et al. (2016). They used improved version of XID+ code (Hurley et al. 2017) to search for many potential contributing sources per 250 µm detection. They have considered dusty galaxies covering a wide range of photometric redshifts and 250 µm flux densities. For 250 µm fluxes between 30 mJy and 45 mJy they found that combination of the two brightest components accounts for approximately 90% of the flux in the 250 µm source. Additionally, the brightest component contributes with more than 60% to the total flux.
Since we work with SPIRE data only, our multiplicity analysis is based on simulations. We perform photometry on simulated SPIRE maps and compare the flux emitted from the brightest galaxy to the total single-dish flux ratio. For each source detected as "500 µm-riser" in our mock maps, we assign the brightest component in the radius no larger than the half of 250 µm beam (9"). We then measure the ratio between the flux of this brightest galaxy from mock catalogues and the measured singledish flux in our simulated maps. We place computed values in several flux bins. Our results are presented in Fig.11. For galaxies detected as "500 µm-risers", we found that 72% to 85% of observed 250 µm flux is emitted by the brightest galaxy (78% in average). The brightest source contribution S16 B17 S16_missed B17_missed Scudder+ '16 Overall, we see almost insignificant change for S16 and B17, with the trend expressing slight increase towards higher 250 µm fluxes. Repeating the same analysis for 500 µm beam we find the similar trend with somewhat lower brightest galaxy fraction (64 % in average) due to stronger resolution effects. Even from such a simple multiplicity analysis, it seems plausible to expect that selection of red sources from prior 250-micron detections does not experience strong resolution effects. Situation is somewhat different for missed red sources -the contribution of brightest components is lower (47-65%) with the tendency of decreasing towards larger 250 µm fluxes. This is due to a presence of "blue" blends close to position of "500 µm-risers" (r < 9 ).
Fraction of the brightest 250 µm component to the total flux measured in simulated maps is higher than one found in Scudder et al. (2016). They use deep multiwavelength data and apply different definition of the flux density fraction. Additionally, their analysis consider sources regardless of SPIRE colours making it difficult for a direct comparison here. Our multiplicity predictions appear to be in a good agreement with the measurements of Greenslade et al. (in prep.). They performed Submillimeter Array (SMA) follow up program for 36 of "500 µm-risers" selected from various fields and found the multiplicity rate of 33% with brightest components contributing 50 − 75% to SMA flux. To shed a new light on this issue, the contamination of sources at intermediate redshifts (1 < z < 2) seen in deep NGVS and PACS images will be discussing in our following paper.

Lensing and clustering
Considering the increase with redshift of the optical depth to lensing and the magnification bias, we account for the possibility that some of our "500 µm-risers" are strongly lensed (Bethermin et al. 2017, Negrello et al. 2017, Cai et al. 2013. Without interferometric data which would allow us to identify the true lensing fraction of our "500 µm-risers", we cannot precisely correct for the lensed population in our final sample. We thus investigate their relative contribution making the comparison with recent ALMA/NOEMA follow-up results of "500 µm-risers" selected in other fields (H-ATLAS, HerMES and HeLMS, Oteo et al. 2017b, see also Fudamoto et al. 2017), and using predictions from B17. Our findings are summarized in Fig.12 and Table 8.
Sample used for high-resolution ALMA follow-up in Oteo et al. (2017b) contain 44 red DSFGs. They were chosen from complete samples of red DSFGs presented in Ivison et al. (2016), Asboth et al. (2016) and Dowell et al. (2014). The sources span the wide range of 500 µm fluxes, from 30 mJy to 162 mJy. We note here that the sample is highly incomplete, accounting just for the reddest sources whose colours are consistent with high photometric redshifts, estimated to be z ∼ 4 − 6. Observed total fraction of strongly lensed galaxies in Oteo et al. (2017b) is 40 % (18 out of 44). However, most of the lensed sources are in the bright flux regime of red DSFGs (S 500 > 52 mJy) 5 To make the sample suitable for comparison with our sources and model predictions, we split it into three subsamples according to methods used for their selections: 21 sources from Ivison et al. (2016), 13 sources from Asboth et al. (2016) and 10 sources from Dowell et al. (2014). In Fig.12 we show 500 µm flux distribution of our sample alongside the sub-samples drawn from Oteo et al. 5 Here we distinguish between fainter (S 500 < 52 mJy), and brighter "500 µm-risers" (S 500 > 52 mJy). The chosen separation is based on model predictions (B17), where highly-magnified sources are contributing more than 50% to the population of red, DSFGs at fluxes higher than S 500 > 52 mJy.
(2017b). We see that "500 µm-risers" selected in this work have average fluxes fainter than other presented galaxies. We show observed number of strongly lensed objects in H-ATLAS, Her-MES and HeLMS in the second column of Table 8. The subsample of "500 µm-risers" in HeLMS contains the largest fraction of lensed sources (75%), while the lowest observed fraction is for H-ATLAS (23%).
We further use B17 and consider the same selection criteria from initial studies , Asboth et al. 2016, Dowell et al. 2014. For each sub-sample we examine the range of fluxes identical to what is presented in Oteo et al. (2017b) and simulate the effect noise as explained in Section 5.1.5. From Table 8 we see there is a very good match between model predictions and observations. We estimate the predicted contribution of strongly lensed sources to our sample of "500 µm-risers". Applying our selection criteria we find that B17 predicts 24 +6 −5 % of strongly lensed "500 µm-risers" in our sample. More precisely, 17% of strongly magnified galaxies are expected to lie at fainter flux regime (S 500 < 52 mJy) where most of our sources reside (119 out of 133, or 89%). The median redshift of modelled, strongly lensed sources isẑ = 4.2 ± 0.4, whilst median magnification differs fromμ = 5.3 ± 3 andμ = 8.1 ± 5 for the fainter and brighter flux regime respectively. It implies that strong lensing can affect the observed luminosity function of our "500 µm-risers" and lowers the fraction of intrinsically bright sources. After correcting for lensing, we expect to have 24% of galaxies with luminosities falling in the range 1.3 × 10 12 L < L IR < 10 13 L (classified as ULIRGs), and 76% of sources with L IR > 10 13 L , classified as hyperluminous IR galaxies (HyLIRGs).
Apart from strong lensing, clustering might be responsible in producing observed excess in number counts of "500 µmrisers". Several studies of clustering of submm-bright galaxies at 1 < z < 3 (e.g. M * > 10 11 M , and L IR > 10 12 L , see Farrah et al. 2006, Wilkinson et al. 2017 have been shown they are hosted in moderately strongly clustered massive halos. Herschel detected star-forming galaxies are in average more strongly clustered at higher-redshifts (z ∼ 2) than nearby objects (Béthermin et al. 2014, Ono et al. 2014. Since the beam size in Herschel is a direct function of the wavelength, we expect largest SPIRE beam (500 µm) to be more affected than other two beams. Bethermin et al. (2017) have studied in details influence of clustering on FIR continuum observations. They found that the confusion which arises from clustering increases the number of observed "500 µm-risers". The number of such contaminants causes the peak of intrinsic redshift distribution of "500 µm-risers" is somewhat lower than what would be expected from observed SPIRE colours. 6 Therefore, future spectroscopic confirmations of full samples of selected "500 µm-risers" are needed to quantify and compare exact fraction of sources at z > 4.
Simulations presented in Section 5.2 have some limitations meaning that they are affected by cosmic variance beyond simple Poissonian, thus containing under-or overdensities at some redshifts. On top of that, pure probabilistic treatment is used to model the lensing, lacking the physical connection to the overdensity of lower-z systems. To improve our understanding of how lensing and clustering might shape "500 µm-risers" selection, different solutions can be considered. Along with deeper, interferometric observations, more complex cosmological simulations are needed. They should include larger halo volumes,  Fig. 12: Flux distribution of "500 µm-risers" from this work (coloured blue area), compared to "500 µm-risers" considered for a submm, interferometric follow-up in different studies. Orange, black and green stepfilled lines represent sub-samples selected from Ivison et al. (2016), Asboth et al. (2016) and Dowell et al. (2014) respectively. Flux distribution of red sources from the B17, that are modelled applying the same selection criteria presented in this work, is represented by dashed, red line. (1) Flux distribution of "500 µm-risers" from this work compared to 44 red DSFGs presented in Oteo et al. (2017b). These are H-ATLAS, HerMES and HeLMS sub-samples observed with ALMA and NOEMA at 870 µm, 1 mm and 3 mm (Oteo et al. 2017b, Fudamoto et al. 2017). The second column shows median 500 µm fluxes along with the median 500 µm flux from this work. Relative contribution of observed, strongly lensed sources in various selections is shown in the third column. The fourth column shows relative contribution of modelled, strongly lensed sources from B17. A relative contribution of strongly lensed sources is modelled applying the same selection cuts of "500 µm-risers" for tabulated extragalactic fields.
high-resolution to unveil fainter galaxies responsible for confusion, environmental effects and refined treatment of a lens modelling.

Star-formation rate density
The bright individually-detected "500 µm-risers" play an important role in understanding the evolution of massive systems. Here we test if such a selection is suitable to measure the total star formation rate density (SFRD). We make a rough estimate of the SFRD assuming the statistical properties of selected "500 µmrisers". We determine SFRD at 4 < z < 5 applying the equation (Vincoletto et al. 2012, Hogg 1999: where ρ * (z) is SFRD, defined as a total sum of SFRs per comoving volume, while SFR IR is determined from IR luminosity. Bottom number (denominator) in Eq.6 is the co-moving volume contained within 4 < z < 5, and IR luminosity is converted to SFR applying the standard conversion formula from Kennicutt (1998): SFR IR (M yr −1 ) = 1.71 × 10 10 L IR (L ) We apply our selection criteria to models (B17). Since B17 provides magnification factor for each source, we further use the modelled, lensing-corrected luminosities and redshift distribution of "500 µm-risers". We then correct for the number of missed and contaminant sources applying the result from our simulations (Fig.10, Section 5.2). We place IR luminosities in a wide bin by the co-moving volume per deg 2 . Scaling with the observed area and applying cosmological parameters Ω M = 0.307, Ω Λ = 0.693 and H 0 = 67.8 km/s/Mpc (Planck Collaboration et al. 2016a), we find that SFRD = 1.99 × 10 −4 M yr −1 Mpc −3 , for "500 µm-risers" at 4 < z < 5. This result is shown with a filled, red star in Fig.13. Large uncertainties (±1.4 × 10 −4 ) determined by Monte Carlo bootstrapping are due to the Poisson noise and the fact that we make constrains with SPIRE data only. We additionally cross-check this result and make the same estimation of SFRD but adopting median IR luminosity (1.94 × 10 13 L ) from observed L IR − z distribution (see Section 3.). Based on expected fraction of strongly amplified objects found in Section 6.2 (24%), we randomly chose sources from our catalogue to correct for lensing, namely 17% of sources at S 500 < 52 mJy, and 83% at S 500 > 52 mJy. These percentages correspond to predictions from B17. We then adopt a median magnifications from B17 that correspond to chosen fluxregime (µ = 5.3 for fainter, and µ = 8.1 for brighter "500 µmrisers"), and produce the observed, lensing-corrected luminosity distribution. We find SFRD = 2.87 × 10 −4 M yr −1 Mpc −3 . Our model-based SFRD estimation is consistent with recent observational findings from Duivenvoorden et al. (2018). Note also that offset between the study of Dowell et al. (2014) and our own may be explained by the fact that estimation by Dowell et al. (2014) assumed substantially higher IR luminosities and purity ratios. Furthermore, we can use the shape of luminosity function to correct for the fainter red sources. We extrapolate the contribution of "500 µm-risers" to SFRD -firstly we remove 500 µm flux limit (thus accounting for a red sources fainter than S 500 < 30 mJy), and then we remove lower 250 µm flux cut, thus accounting for all "500 µm-risers" below the Herschel sensitivity limit. Extrapolated contributions are depicted by empty red triangles in Fig.13 Here we assume the luminosity distribution from B17, but we should mention that very similar result (higher by a factor of 1.2) we obtain applying the luminosity function from S16. As expected, the largest portion of sources responsible for a stel-lar budget at z > 4 are missed due to sensitivity limitations of Herschel instruments and not because of colour selection.
The contribution of the emission of dusty star-forming galaxies (DSFGs) to the total infrared (IR) luminosity functions up to z = 5 is partially investigated (e.g. Koprowski et al. 2017, Rowan-Robinson et al. 2016, Madau & Dickinson 2014, Burgarella et al. 2013. However, there is no consensus about whether or not ultraviolet (UV) estimates have notably underestimated the contribution of dustenshrouded star formation in DSFGs because our knowledge at z > 3 is severely incomplete, mostly due to optical obscuration. Attempting to extend the cosmic SFRD up to z = 6 Rowan-Robinson et al. (2016) exploited existing 500 µm Herschel selections of "500µm-risers". They report no decline in dust-obscured SFRD between z = 3 − 6, and very high SFRD value at 4 < z < 5 order of magnitude higher than our estimation. However, our result is in a much better agreement with studies of Bourne et al. (2016) (marked with diamonds in Fig.13), who used very deep millimetre imaging to reduce the effects of confusion on cosmic SFRD measurements.

Conclusions
We have performed a systematic search of dusty, background galaxies onto 55 deg 2 area. We introduce an innovative method to select "500 µm-risers" (S 500 > S 350 > S 250 ). In order to estimate the 250-500 µm flux of blended sources, we use the multiwavelength MBB-fitter code. We assign positions of galaxies detected in the 250 µm map as a prior to extract the flux densities at longer wavelengths, iteratively fitting their SEDs as modified blackbodies.
-We select 133 "500 µm-risers" which fulfil following criteria: S 500 > S 350 > S 250 , S 250 > 13.2 mJy and S 500 >30 mJy. We reject all strong synchrotron sources by cross-matching available radio catalogues. -We use statistical properties of selected galaxies and estimate a median redshift valueẑ = 4.28 and the corresponding median rest-frame IR luminosityL IR = 1.94 × 10 13 L . -The total raw number density of selected sources is 2.41 ± 0.34 deg −2 . To evaluate our results, we retrieve mock catalogues based on models of Béthermin et al. (2012), Bethermin et al. (2017) and Schreiber et al. (2016). Differential number counts are fairly consistent with model predictions. Namely, towards the brighter end (S 500 > 70 mJy), HeViCS counts have both slope and values identical to ones anticipated by models. At the lowest flux end, discrepancy is the effect of noise and lensing. We find that noise tends to increase number counts of "500 µm-risers". It may be high enough to fully match with observations within 1-sigma uncertainty. We confirm that selecting "500 µm-risers" applying our selection criteria is a direct way to detect significant population of unlensed, dusty, high-z galaxies. -We build simulated maps. Clustering and lensing are modelled in order to fully resolve effects that produce colour uncertainties. We propose an modified colour criterion (S 500 /S 350 > 0.97) that should be probed in the future selections of potentially z > 4 dusty sources. Motivation to apply this criterion is twofold: (1) it would account for colour uncertainties that arise from effects of noise, clustering and lensing; (2) it increases the sample of candidate z > 4 galaxies. Anticipated contribution of 2 < z < 3 contaminants satisfying this colour cut is small, and reach the maximum of 13%.
-We inspect the effect of multiplicity working with simulated SPIRE data. The brightest galaxy inside the beam contributes in average 75% and 64% to the total single-dish flux measured at 250 µm and 500 µm respectively. -We use the model of Bethermin et al. (2017) and find that 24 +6 −5 % of "500 µm-risers" selected with our criteria would be strongly lensed. After correcting measured luminosities for lensing, we expect 24% of sources with luminosities 10 12 L < L IR < 10 13 L (classified as ULIRGs) and 76% of sources with L IR > 10 13 L (classified as HyLIRGs).
-We use statistical properties of our galaxies to determine the role of "500 µm-risers" to the SFRD for 4 < z < 5. We show that after correcting for fainter sources, projected contribution of "500 µm-risers" is factor of 2-3 below the total, dust-corrected SFRD.
D. Donevski et al.: Towards a census of high-redshift dusty galaxies with Herschel   Fig. B.1: Criteria for selecting "500 µm-risers". Plotted is SPIRE colour diagram related to the modified blackbody (MBB) model. The plot is a result of a Monte Carlo simulation in which we start with a single-temperature modified blackbody with the flux S 500 =30 mJy. The mock SEDs are generated to account different fixed T d and β illustrating temperature-redshift degeneracy. Plotted against all possible colours are simulated sources with T d =45 K and β = 1.8 (upper panel) and T d =38 K and β = 1.8 (lower panel). Left : Colour-redshfit plot. Red lines represent redshift tracks rising from right to left. The coloured background indicates the average redshift in these colour-colour spaces. Blue lines define the 95% confidence limit region. Correction factor due to modelled effect of CMB heating on colours is applied. However, our simulations shown that we do not expect major contribution from that effect at sources lying below z ∼ 7; Right : Different chi-square fitting values reflected on SPIRE colour-colour ratios. For our final "500 µm-risers" selection we kept just those sources whose chi-square reside inside 95% confidence limit region (χ 2 < 3.84).
Article number, page 27 of 28   .