Free Access
Issue
A&A
Volume 660, April 2022
Article Number A9
Number of page(s) 16
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/202142224
Published online 30 March 2022

© ESO 2022

1. Introduction

The next generation of photometric surveys will produce unprecedented galaxy statistics that will fuel large-scale structure studies (LSST Science Collaboration 2009; Laureijs et al. 2011; Benitez et al. 2014; Dark Energy Survey Collaboration 2016). Compared with their spectroscopic counterparts (Le Fèvre et al. 2005; Driver et al. 2011; Guzzo et al. 2014; DESI Collaboration 2016), photometric surveys go deeper and faster; however, the surveying efficiency comes at the cost of spectral resolution. Imaging surveys are limited to photometric measurements such as broad-band colours to infer the redshifts of galaxies (Connolly et al. 1995; Bolzonella et al. 2000; Benítez 2000). The minimum error in a photometric redshift estimate with optical and near-infrared broad-band photometry is σz/(1 + z)∼0.02 due to fundamental degeneracies in the colour-redshift parameter space (Salvato et al. 2019). Nevertheless, with the promise of large sample sizes, this precision is often acceptable for large-scale structure studies based on galaxy clustering and weak lensing analyses. The redshift of individual galaxies is not required for these analyses, but instead precise knowledge of the redshift distribution of the sample is needed to properly interpret the statistics. A systematic error in the redshift distribution estimate propagates directly to biases in the results (Newman et al. 2015).

The ensemble redshift distribution of a photometrically selected galaxy sample can be measured directly by targeting a representative sub-sample with spectroscopy. Currently the complete calibration of the colour-redshift relation (C3R2) campaign is underway using 8m class telescopes to construct a calibration dataset for Euclid1 (Masters et al. 2019; Euclid Collaboration 2020; Stanford et al. 2021). It is challenging to build fully representative spectroscopic samples particularly at the faint end at both low and high redshift. For past surveys it was necessary to include corrections for incompleteness in the spectroscopic measurements (Lima et al. 2008; Hartley et al. 2020) and these corrections depend on a complex set of parameters related to observing conditions and intrinsic galaxy properties (Scodeggio et al. 2018). An alternative solution, the clustering redshift estimator, uses the signal encoded in the spatial correlation between a photometric sample and reference spectroscopic samples to infer the redshift distribution of the photometric sample (Schneider et al. 2006; Newman 2008; Schmidt et al. 2013; Scottez et al. 2016). This approach is expected to be successful when applied to the Euclid data set. Each method to calibrate photometric redshift distributions comes with its own assumptions and sources of systematic errors; therefore, it is worthwhile to develop complementary methods that can provide robust cross-checks. We focus here on an approach that will be enabled by the rich data set provided by Euclid’s slitless spectroscopy.

Slitless spectroscopy provides a unique tool since a measurement of spectral flux can be extracted for every source detected in imaging (e.g., Zwicky 1941; Momcheva et al. 2016). The ESA Euclid mission will be the first modern all-sky survey programme to employ a slitless spectrograph (Laureijs et al. 2011; SPHEREx, Doré et al. 2018, and the NASA Nancy Grace Roman, Akeson et al. 2019, missions will follow). By design the Euclid spectroscopy will detect and measure the redshifts for the brightest emission line galaxies primarily using the Hα line in the redshift range 0.9 < z < 1.8. The majority of photometrically detected sources will be fainter and give only a very low signal-to-noise ratio spectrum precluding a direct redshift measurement. However, by stacking the spectra we can extract physical information from the ensemble and augment photometric studies.

The ensemble photometric redshift method proposed by Padmanabhan et al. (2019) aims to constrain the redshift distribution of a photometrically selected galaxy sample by using the stacked spectrum built from the average of many low signal-to-noise ratio spectra. Since broad-band photometric measurements have coarse wavelength resolution, galaxies with different spectral types at different redshifts can have degenerate colours. These degeneracies lead to catastrophic photometric redshift errors which are characterised by multiple peaks and long tails in the redshift distribution. Adding information from stacked spectroscopy can break these degeneracies since spectral features leave their signature in the stack. The spectra cannot be used to infer the redshift of individual sources due to the low signal-to-noise ratio of the measurements, but the ensemble can be used to infer the redshift distribution. The stacked spectrum will be a mixture of galaxy spectral types at different redshifts and with template fitting a unique decomposition may be found to recover the redshift distribution.

In this work we implement and test the approach on a mock galaxy survey considering ground based photometry from the Vera C. Rubin Observatory and near-infrared photometry and slitless spectroscopy from Euclid. We select galaxy samples based on the photometry and infer the redshift distributions using the combination of stacked spectroscopy and stacked photometry. The Euclid near-infrared spectrograph (NISP) has a wavelength range 1.25–1.85 μm; therefore, it will measure the rest-frame spectral energy distribution (SED) at λ < 9000 Å for galaxies at z > 1. Thus the spectroscopy can augment the near-infrared photometry by adding continuum shape information. The 4000 Å break which is a key feature for redshift estimation will be accessible for galaxies at z > 2. The redshift distributions inferred from broad-band photometry alone are generally not accurate because of the dependence on the template priors (Benítez 2000). However, the joint fit of spectroscopy and photometry together proves to be a powerful tool for extracting redshift distributions: broad-band photometry provides a broad wavelength coverage and spectroscopy gives higher spectral resolution that can break colour degeneracies. We demonstrate this in the case of Euclid in Appendix A. We quantify the accuracy of the constraints by considering the inference of the baryon acoustic oscillation (BAO) scale from angular two-point correlation function measurements. The BAO scale is not the only feature that will be measured in photometric galaxy clustering analyses; the full shape of the galaxy power spectrum encodes relevant information for cosmological studies. However, we can consider that the uncertainty on the BAO scale provides a lower limit on the information contained in the power spectrum and thus is a useful metric for quantifying systematic errors. This metric is also applicable to weak lensing studies that require determinations of the mean redshift of tomographic bins.

This paper is organised as follows. In Sect. 2, we present the ensemble photometric redshift method and describe how we reduce the formal problem of finding the redshift distribution of a group of galaxies with similar colours to a linear problem. Then, in Sect. 3, we describe the construction of the mock catalogues used to test the method (Sect. 3.1). In this section we also discuss the SED templates used to fit the redshift distributions, how the galaxies are partitioned into colour groups and the quantitative benchmark for the redshift distribution estimates based on the measurement of the BAO scale. Finally, in Sect. 4, we present the results of the analyses of both ideal noiseless spectroscopy data and realistic cases with noise. In Sect. 5 we summarise our results and discuss the applications and possible improvements that may be made.

2. The ensemble photometric redshift method

2.1. Method

The distribution of galaxies in colour-redshift space can be constrained by adding information from stacked spectroscopy. This is illustrated in Fig. 1. The left panel shows a three-fold degeneracy in colour at Y − J = 0.8 for starburst and elliptical galaxy spectral types. This colour can correspond to a population of elliptical galaxies at z ∼ 1.7, starburst galaxies either at z ∼ 1.9 or z ∼ 2.5, or to different mixes of these populations. In the right panel we show that these galaxy populations have unique spectral shapes, and so, the stacked spectrum encodes information about the distribution of redshifts and the mixture of galaxy types. We now consider the general case with many photometric colour measurements and describe how the information in the stacked spectroscopy can be extracted.

thumbnail Fig. 1.

Example of how stacked spectroscopy breaks the colour-redshift degeneracy. Left: to illustrate the method we show a colour degeneracy in Y − J as a function of redshift for starburst (SB) and elliptical (Ell) galaxy templates from Ilbert et al. (2009). The dashed horizontal line indicates Y − J = 0.8 and shows three redshift solutions consistent with the colour: elliptical galaxies at z ∼ 1.7, and starburst galaxies at z ∼ 1.9 and z ∼ 2.5. Right: we see that each solution gives a unique spectral shape in the near infrared range probed by the Euclid NISP instrument (red shaded area). The red line shows the elliptical template at z ∼ 1.7 and the blue and green lines show the starburst template at z ∼ 1.9 and 2.5. The spectra are normalised at the effective wavelength of the Y NISP filter. The Euclid NISP Y and J filter transmission are overplotted. The spectral resolution of the plotted templates is lower than the Euclid NISP spectrograph one. The stacked spectroscopy at fixed colour is built from the linear combination of these templates and encodes enough information to recover the relative contributions of spectral type at each redshift.

The normalised stacked spectrum (hereafter stacked spectrum) of a sample of galaxies with similar colours (hereafter a colour group) is defined as the weighted mean of the individual spectral flux measurements,

(1)

where i indexes the galaxies, f(λ) is the measured galaxy flux as a function of wavelength and is the integrated flux for normalisation, therefore the normalised stacked spectrum is expressed in units Hz−1. The calculation of the integrated flux will be discussed in Sect. 2.3. In the analysis we generalise f(λ) to also include broad-band photometric measurements (see Sect. 2.4). The observed flux spectrum can be written in terms of the rest-frame SED of the galaxy,

(2)

thus, as the product of a flux normalisation, a, and a rest-frame template transformed to redshift z, T(λ | z).

Galaxy SEDs can be modelled by a finite set of parameters (e.g., Marchetti et al. 2013). Therefore, the expression for the stacked spectrum can be rewritten as a sum over discrete SEDs indexed by α and weighted by their frequency as a function of redshift, pα(z),

(3)

The template normalisation required to equate will be discussed in Sect. 2.3.

In order to carry out the numerical analysis we discretise the expression over a regular grid of redshift,

(4)

where NSED is the number of templates, and zmin and zmax are the minimum and maximum redshift over which to evaluate the redshift distribution. The overall redshift distribution is therefore given by the summation over templates

(5)

In principle the redshift distribution can be found by substituting the observed flux for in Eq. (4) and solving for the coefficients pα, z.

2.2. Machine implementation

Equation (4) describes a linear set of equations that can be written in matrix notation as

(6)

where f is the spectral flux data vector with Nλ elements. The matrix T is constructed from NSED SED templates each shifted to Nz redshifts; therefore T has dimension (NSEDNzNλ. The redshift distribution of each template is encoded in the vector p which has length NSEDNz. Since the product of templates and redshifts NSEDNz is much greater than the number of spectral elements Nλ, the system is under-constrained and does not have a unique solution.

We can make progress in solving Eq. (6) by using a linear regression algorithm that employs regularisation terms to identify the most interesting solutions. We add two pieces of information: first, we are not interested in unphysical solutions with negative pα, z, and second, galaxy spectra are well fitted by a small number of SED templates. These two considerations led us to impose a non-negativity constraint and to use a shrinkage estimator2 to find the minimum set of templates that can fit the stacked spectra.

We tested three linear regression methods with non-negativity constraints: first, the non-negative least squares method (NNLS, Lawson & Hanson 1987), second, the least absolute shrinkage and selection operator (LASSO, Tibshirani 1996), and last, the elastic net regularisation (ElasticNet, Zou & Hastie 2005).

All three methods minimise a cost function of the form

(7)

but adopt different penalty functions Q. The penalty function acts to reward solutions that use fewer templates. LASSO adds the l1 penalty of the form Q(p, α) = α|p| and ElasticNet uses Q(p, α, β) = α[β|p|+0.5(1−β)p2]. The variables α and β are free parameters that must be chosen in the analysis (see Sect. 3.5). NNLS is a variant of the standard least squares solver and does not introduce a penalty term. In this work we used the implementation of the NNLS algorithm in the Python SciPy library optimize.nnls (Virtanen et al. 2020). For LASSO and ElasticNet we used the implementations found in the Scikit-learn library (Pedregosa et al. 2011).

The attractiveness of the ensemble photometric redshifts method as it has been presented here comes from its ability to infer the underlying distribution using only a chosen template set and no additional information. However, adding physical priors, for example the galaxy luminosity function or galaxy type-redshift distributions, may improve the method performance. Considering how the problem was reduced to a set of linear equations (Eq. (6)), to take into account physical priors is not a trivial task. Possibly, the most straightforward way to do so is to rewrite the problem in terms of likelihood maximisation in a Bayesian framework. The likelihood could be sampled in the parameter space via a Markov chain Monte Carlo (MCMC) algorithm. This approach could have a high computational cost since the parameter space is very large.

2.3. Normalisation

The normalisation of the spectra is important in the stacking process (Eq. (1)) to standardise the contribution from the faintest and brightest sources. We chose to normalise the galaxy spectroscopy by the integrated flux. However, since the measured spectra are very noisy, the integration cannot be carried out on the spectra themselves. Instead, we used the broad-band photometry to set the normalisation. The photometry is typically deeper than the spectroscopy and so gives a robust normalisation.

In this analysis we used the near-infrared photometry in the Y, J and H bandpasses that will be measured by the Euclid NISP instrument. That is, the integrated flux used to weight the measured spectra in Eq. (1) is given by

(8)

where fY, fJ and fH represent the measured photometric flux in the Y, J and H bandpasses.

The SED templates, Tα, z(λ), were normalised in the same way by computing the integrated flux in the three NISP bandpasses and summing them. The flux integrated over a bandpass response function R(λ) is

(9)

where fλ(λ) is the spectral flux in units erg cm−2 s−1 Ås−1 and hc is the product of Planck’s constant and the speed of light.

2.4. Combining photometry and spectroscopy

The extension of the observed stacked spectrum f with photometry is straightforward. We generalised the spectroscopic wavelength λ in Eq. (1) so that it also referred to photometric bands. The first part of the data vector f contains the actual stacked spectrum, while its last Nb elements, where Nb is the number of photometric filters, is the observed stacked photometry in each filter. The photometric data were stacked following Eq. (1) in the same way as the spectra were and have the same normalisation as well (Eq. (8)). Therefore, the extended stacked spectrum f is a vector with Nλ + Nb components.

In order to extend the template matrix with photometry we computed the photometric fluxes in the bandpasses of interest with Eq. (9) for each one of the NSEDNz templates in the matrix. These fluxes were normalised in the same way as the SED templates. The dimension of the template matrix T becomes (NSEDNz)×(Nλ + Nb) and its columns have the same order as the elements of the extended stacked spectrum.

Lastly, in this work we do not weight the data with their observational errors. This choice was dictated by the great number of galaxies in the colour groups. There are so many galaxies in a colour group that the noise in the spectra becomes negligible. This is seen in Fig. 2 right panel which illustrates the spectral stack with 2 × 106 galaxies. However, an inverse-error weighting may be applied to improve the performance in the analysis of less populous colour groups. The weights may be defined using the variance of the stacked spectrum,

(10)

thumbnail Fig. 2.

Stacked spectro-photometry for two groups of galaxies that have been photometrically selected (see Sect. 3.3). The photometric data includes EuclidYJH and the Vera C. Rubin Observatory ugrizy bands and the spectrocscopic data is from the Euclid NISP instrument with simulated noise. The photometric bandpasses used in the analysis are over-plotted. In both plots the photometric uncertainty bars are smaller than the markers. Left: an example stack for a group of galaxies with mean redshift z = 1.10. The stack includes 2 × 103 galaxies. Right: stacked flux for a group at mean redshift z = 1.44 with 2 × 106 galaxies.

where σ(λ) is the observed galaxy flux error as a function of wavelength. The stacked standard deviation is the square root of the stacked variance and its inverse can be used to weight the stacked spectrum and the columns of the template matrix. If the data are weighted following this recipe, the computation of p remains a linear problem with the form of Eq. (6) with the only difference being that we substitute the stacked spectrum and the template matrix with their weighted counterparts.

3. Application to mock Euclid data

3.1. Survey simulation

We synthesised mock spectroscopic and photometric observations representative of the Euclid survey to validate the ensemble photometric redshifts method. We based the simulations on the Euclid Flagship mock galaxy catalogue v1.8.43. The Euclid Flagship simulation is a dark matter N-body simulation with a box size of 3780 h−1 Mpc and particle mass of 2.4 × 109M (Potter et al. 2017).

The cosmic web of dark matter halos in the Flagship simulation was populated with galaxies using an extended halo occupation distribution model (Carretero et al. 2015; Crocce et al. 2015) by the SciPIC collaboration (Carretero et al. 2017; Tallada et al. 2020) and a full-sky light cone was produced spanning the redshift range from 0 to 2.3. Galaxy properties, including the SEDs and broad-band magnitudes were assigned to match the luminosity function and galaxy clustering measurements at z = 0.1 and extrapolated to higher redshift.

In this work we used the Flagship catalogue v1.8.4 covering one octant of the sky (5157 deg2) in the redshift range 0 < z < 2.3. We used the SED of each galaxy to simulate the Euclid grism spectroscopy in the near infrared as well as the Euclid broad-band photometry Y, J and H bands and the six bands from the Vera C. Rubin Observatory: u, g, r, i, z and y4. The flux from spectral lines was not simulated in the SEDs or broad-band photometry. This choice simplified the SED fitting procedure but is an idealisation that should be addressed in a future analysis. However, the addition of emission line flux on the photometry is minor compared with the effect of internal attenuation which we describe next.

The attenuated mock SEDs are based on the COSMOS template set (Ilbert et al. 2009) with a variation in the internal galaxy attenuation curves with the addition of the 2175 Å bump (Prevot et al. 1984; Calzetti et al. 2000). Each mock SED has an associated attenuation model that comes from matching against the COSMOS catalogue. We used three mock catalogue versions in the analysis with different attenuation models:

  1. ‘non-attenuated’ – galactic attenuation was not applied to the SEDs;

  2. ‘fixed’ – a fixed attenuation model was applied to all galaxy SEDs (see below);

  3. ‘real’ – the value of E(B − V) for each galaxy in the Flagship catalogue was used to apply attenuation to the SED.

In each case the broad-band photometry was computed in a consistent way from the SED with Eq. (9).

We applied attenuation to the SED in the following way that is consistent with the construction of the Flagship mock galaxy catalogue. The attenuated SED was computed as the product of the non-attenuated SED and an attenuation factor,

(11)

where fatt(λ) is one out of the four attenuation curves from Prevot et al. (1984) and Calzetti et al. (2000), f0(λ) is a constant function per unit frequency and E(B − V) is the colour excess. We note that the parametrisation in Eq. (11) is peculiar to the construction of the Flagship catalogue. To build the fixed attenuation catalogue all of the galaxy SEDs were multiplied by the same attenuation factor computed with the attenuation curve from Prevot et al. (1984) and E(B − V) = 0.2. In the case of the real attenuation catalogue, the attenuation curve and the value of E(B − V) specified for each galaxy in the Flagship catalogue were used.

We simulated the measurement uncertainty of the mock spectroscopy and photometry using a simple photometric model. The signal-to-noise ratio (S/N) was defined as

(12)

where f is the band flux and σ its measurement uncertainty. Then the variance of a given flux can be computed as

(13)

where flim is the flux corresponding to the instrumental depth in the chosen band, S/Nlim is the signal-to-noise ratio at which the depth is expressed and f is the true galaxy flux. We adopted the 10 σ depth values (J. C. Cuillandre, priv. comm.) listed in AB magnitudes in Table 1. We generated the observed photometric flux by drawing a value from a Gaussian distribution centred on the real photometric flux and with standard deviation σf. Moreover, in order to simulate the measured galaxy sample we applied an H-band magnitude selection H < 24 and a signal-to-noise ratio threshold of S/NH > 5 for the redshift distribution analysis. The total number of galaxies in each catalogue was about 109 after the signal-to-noise ratio selection.

Table 1.

10 σ depths.

The Euclid NISP spectrograph is sensitive over the wavelength range 1.25 < λ < 1.85 μm. The pixel dispersion is Δλ = 13.4 Å pixel−1 such that the spectral data vector has Nλ = 488 elements. We modelled the measurement uncertainty with instrumental and astrophysical noise sources. The variance on the detector in electron count units per pixel is

(14)

where Nexp is the number of exposures, texp is the exposure time. The detector noise has contributions from the dark current ndark and the read noise . The astrophysical background nsky includes contributions from zodiacal emission and scattered light. The noise per pixel was propagated to the flux-calibrated one-dimensional spectrum σλ(λ) in units erg cm−2 s−1 Ås−1 with

(15)

Here, A is the collecting area of the telescope, Δλ is the spectral dispersion in Å pixel−1, w is the extraction window in pixels, qe is the detector quantum efficiency and T is the transmission function. The measurement uncertainty was assigned to the model flux spectra by computing the spectral variance . The noisy realisations of the spectra were generated by drawing values from a Gaussian distribution with the given variance and adding them to the model flux spectra.

In Fig. 2 we show an example stacked spectrum traced by ugrizyYJH photometry and NISP spectroscopic measurements. The left and right panels show stacks built from 2 × 103 and 2 × 106 sources, respectively. The measurement uncertainty on the photometric points is not visible in both cases while the uncertainty on the spectroscopy is evident. The spectroscopic noise on the other hand becomes negligible with > 106 sources. Finally, the two uncertainty models for photometry and spectroscopy we described above simulate only observational uncertainties. A discussion of systematic uncertainties is left to the final conclusions (Sect. 5).

3.2. Description of templates

For the ensemble photometric redshifts method to be successful, the analysis template set used to build the matrix T should be representative of the observed galaxy SEDs and also span the range of attenuation values. In this study we used the same template set that was used to generate the mock galaxy SEDs. Clearly this is an idealised situation that can lead to over-optimistic results. A discussion of this issue will be left to the final conclusions (Sect. 5). The COSMOS template set includes a mix of elliptical, spiral, lenticular and starburst galaxy types making a total of 31 templates (indexed from 0 to 30; Ilbert et al. 2009). The templates are plotted in Fig. 3.

thumbnail Fig. 3.

Arbitrary scaled COSMOS templates used for the Euclid Flagship mock galaxy catalogue. In red are shown the elliptical templates, in blue the lenticular and spiral and in green the starburst ones.

The three versions of the mock galaxy catalogue described in Sect. 3.1 used different assumptions on the attenuation and therefore required different template sets for analysis. In the first case, for the analysis of the non-attenuated catalogue we used templates without applying attenuation. This provided an idealised case study that we used to assess the impact that attenuation has on the result.

In the second case with fixed attenuation the attenuation model is assumed to be known a priori. We applied the fixed attenuation model, which we described in Sect. 3.1, to all galaxy SEDs and also to the template set.

Finally, in the third catalogue that is the most realistic case, realistic attenuation is applied to the mock galaxies. The attenuation curve and the E(B − V) value are assigned to each galaxy from the Flagship catalogue. In this case we used a combination of attenuated and non-attenuated templates in the analysis. The attenuated templates were constructed by applying the attenuation factor (Eq. (11)) with the Prevot et al. (1984) and Calzetti et al. (2000) attenuation curves and E(B − V) fixed to the median value from the Flagship catalogue, which cover a colour excess range from 0 to 0.5. In Table 2 we show the correspondence between the COSMOS templates and the attenuation curves. This procedure followed the recipe used for the Flagship mock galaxy catalogue (Carretero et al., private communication). In this way we had a very general set of 47 templates that also contains the attenuation model. A more representative set of templates could be built using different values of E(B − V); however, a greater number of templates would also increment the number of parameters that need to be fitted in order to compute the redshift distributions (see Sect. 5).

Table 2.

Attenuation curves and COSMOS templates.

In order to build the matrix, the templates need to be shifted to Nz redshifts (see Sect. 2.2). The redshift distributions will be measured on the grid of these Nz redshifts. We used the same redshift grid for the analysis of all the three catalogues, it ranged from redshift 0 to 2.30 with a step of 0.01 for a total of 231 redshifts. In the analysis of the real attenuation catalogue, the dimension of the template matrix was (NSEDNz)×(Nλ + Nb) = 10 857 × 497, while for the non-attenuated and fixed attenuation catalogue analyses it was 7161 × 497.

3.3. Colour selection

We used a self-organising map (SOM) for the colour group division. The SOM (Kohonen 1982, 1990) is an unsupervised machine learning algorithm that projects high-dimensional data on a lower-dimensional grid, usually a two-dimensional map. Its main characteristic is that the lower-dimension representation preserves the characteristic of the high-dimensional data. In the last few years SOMs have grown in popularity as a data driven method to estimate photometric redshifts (e.g., Masters et al. 2015; Wilson et al. 2020). However, in this work we simply exploited the efficiency with which SOMs are able to cluster and group data with similar features. Other approaches could be used to select the samples such as the K-means clustering algorithm, or photometric redshift bins.

For each galaxy we had nine photometric fluxes (see Sect. 3.1) from which we computed eight colours used to build the SOM. Using SOMPY, a SOM library for Python by Moosavi et al. (2014), we built a 20 × 20 rectangular cell SOM using the principal component analysis (PCA) as the initialisation method. The SOMs we built have much smaller dimensions than the SOMs used for photometric redshift measurements or to estimate physical properties (Masters et al. 2015; Davidzon et al. 2019). We had two goals when we chose the SOM size: firstly, we wanted galaxy samples that spanned relatively narrow redshift ranges which are appropriate for measuring clustering statistics. Secondly, we needed these samples to be highly populated in order to be able to average out the spectroscopic noise during the stacking process. However, there is an intrinsic tension between these two goals due to the nature of the SOM algorithm. In this work we deemed more important the second goal and opted for a coarse SOM (see Appendix B) to have highly populated cells (Ngal > 106) without the need to group cells together. Comparing Fig. 2 left and right panels it can be seen how the increase in the number of galaxies in the cells reduces the noise in the stacked spectroscopy making it negligible; the issues related to the analysis of less populated groups (see Fig. 2 left panel) will be discussed in Sect. 5. Nevertheless, the choice of the SOM size will depend on the application, such as galaxy clustering studies or weak lensing, and should be optimised for the analysed survey.

We used the cells defined by the SOM grid to define the colour groups for analysis. In Fig. 4 we illustrate how the cells of the SOM grid map to redshift. In the analysis we focused on colour groups with compact redshift distributions and, therefore, we selected groups with standard deviation in redshift below 0.20 which are marked with red spots in in the left panel of Fig. 4. The mean number of galaxies in a group is 2.5 × 106.

thumbnail Fig. 4.

Photometric redshift division with SOM. Left: the two-dimensional projection of the galaxy colour groups constructed with the SOM. The colour scale indicates the mean redshift of the SOM cells, which we define as colour groups. The red spots identify the cells with σz < 0.20. Right: the colour groups sorted by their mean redshift. The error bar represents the standard deviation of redshift, σz, in each group. The groups with red error bars are the one with σz < 0.20.

3.4. Quantifying the method performance

In order to quantify the error in the redshift distributions measured using the ensemble photometric redshift method and to understand if they can be useful in cosmological studies we compared the angular position of the BAO peak computed with the real redshift distribution and the one measured with the ensemble photometric redshifts method.

The position of the BAO peak in the angular correlation function is determined by the projection of the sound horizon rs at the comoving distance r(z) to the galaxy sample. In the case of a thin redshift shell at redshift z, the angular scale is ϑBAO = rs/r(z) (Sánchez et al. 2011). A systematic shift in the redshift distribution Δz propagates to an error in the angular position to first order as

(16)

where is the mean redshift of the shell.

However, the full shape of the angular correlation function also depends on the evolution of the correlation function integrated over the redshift distribution. This can have a substantial impact on the measurement of the BAO scale particularly when the redshift distribution has extended tails. We therefore used a full model of the angular correlation function to propagate the error. We wrote the angular correlation function in terms of the three-dimensional galaxy power spectrum Pg(k, z) and normalised redshift distribution p(z),

(17)

where r is the radial comoving distance, J0 the Bessel function of order zero and the redshift z = z(r) is a function of the radial comoving distance (Elvin-Poole et al. 2018). This expression was derived using the Limber and flat-sky approximations which are valid at the redshifts we probe. We modelled the relation between the galaxy power spectrum, Pg(k, z), and the matter power spectrum, Pm(k, z), with a linear bias Pg(k, z) = b2Pm(k, z).

We computed the matter power spectrum using the CLASS code (Blas et al. 2011). We adopted a flat ΛCDM cosmological model with h = 0.7, Ωb = 0.05, ΩCDM = 0.25 and ΩΛ = 0.7. We used only the position of the BAO peak in the analysis so the details of the galaxy bias model, overall power spectrum amplitude and broad-band shape of the power spectrum do not significantly influence the results. We integrated Eq. (17) numerically over the redshift range 0 < z < 2.3 (the limit of the mock catalogue). To achieve convergence the integration range was set with kmax = 10 which corresponds to max = 53 000.

To locate the BAO peak, we fitted the angular correlation function computed with Eq. (17) with a template that consists of a power law, which describes the decreasing part of the correlation function, added to a Gaussian component that represents the peak,

(18)

There are six parameters in this model: ynorm, an integration constant, c1 and c2, two coefficients and the three parameters characteristic of the correlation function, γ, which determines the slope, σ, which is the BAO peak width, and finally ϑBAO, the angular position of the BAO. We used the optimize.curve_fit algorithm from SciPy (Virtanen et al. 2020) to fit the correlation function. We computed the BAO angular position using both the measured redshift distribution and the real one, known from the mock catalogue data (see Sect. 3.1), for every colour group. The error in the measured redshift distribution was quantified by the relative error of the BAO angular position (hereafter BAO relative error):

(19)

We used the error on the BAO scale as the performance metric; however, the ensemble photometric redshift method may also be applied in tomographic weak lensing analyses. In the context of weak lensing, estimates of the mean redshifts of the tomographic bins are needed. Equation (16) shows that in the limit of narrow redshift distributions, the error on the BAO scale can be equated with the error on the mean redshift. Therefore our results can also be interpreted as systematic errors for weak lensing analyses.

3.5. Optimising regression parameters

The regression methods LASSO and ElasticNet described in Sect. 2.2 have free parameters that must be chosen. We analysed the fixed attenuation catalogue over a range in the parameter space to test the quality of the regression result and tune the parameters. The LASSO method has the single parameter αLASSO, while ElasticNet has two parameters αElasticNet, βElasticNet. We quantified the goodness of fit with the BAO relative error, Eq. (19).

We found that the regression parameters are weakly dependent on the mean redshift, zmean, of the analysed colour group. Therefore we decided to limit the redshift range of our analyses by selecting only colour groups with zmean > 1. Due to this selection we could neglect the regression parameters dependence on the redshift and use the same set of parameters for all the analysed groups. As discussed in the introduction, we are most interested in the ability of the method to fit redshift distributions at high redshifts, z > 1, rather than low redshift ones. At z > 1 the Euclid near IR spectroscopy will measure the rest-frame SED at λ < 9000 Å which carries more information in the continuum shape to constrain photometric redshifts.

In Fig. 5 we show the means over the colour groups of the absolute relative error on the BAO scale (hereafter mean BAO error) obtained with different parameters sets. For LASSO we observed a clear minimum in the goodness of fit that identifies α = 5 × 10−5 as the best fit parameter both for the analyses with and without spectroscopic noise. We used this value as αLASSO for all the analyses we present in the following section. On the other hand, for the ElasticNet method we found a broad minimum in the parameter space. The analyses without and with noise respectively have best fit parameters αno noise = 5 × 10−5 and βno noise = 0.6, and αnoise = 5 × 10−5 and βnoise = 0.7. These two sets of fit parameters were used as αElasticNet and βElasticNet in the following section; we used the parameters with subscript ‘no noise’ to compute the results presented in Sect. 4.1, and the ones with subscript ‘noise’ for the analyses in Sect. 4.2.

thumbnail Fig. 5.

Mean BAO error for different sets of LASSO and ElasticNet fit parameters. The colour groups for the analysis were selected to have zmean > 1 and standard deviation in redshift lower than 0.15.

We found that the parameter choice also affects the smoothness of the redshift distribution estimates seen by eye. However, it was not possible to achieve the minimum error and smoothness simultaneously. We therefore optimised only for the error.

4. Results

Having explained how the method is implemented, the data prepared and the redshift distribution computed we now discuss the results we obtained in the analyses of the three catalogues. We present the joint analysis of stacked photometry and spectroscopy. The analysis with broad-band photometry only or spectroscopy only proves to be significantly less accurate. We discuss these cases in Appendix A.

In Sect. 3.1 we discussed how uncertainties are added to the photometry and the spectroscopy. The errors in the spectral fluxes are greater than those in photometric data (see Fig. 2) and we expect them to be the main sources of uncertainty in the redshift distribution fits; thus, we firstly analysed the catalogues adding only the photometric error and considered the spectroscopic noise only in later analyses. The analyses without spectroscopic noise can be considered as the limit in which there are enough galaxies in a colour group such that the noise in the stacked spectrum is negligible.

4.1. Analyses without noise

In the left panels of Fig. 6 we present an example of a fitted redshift distribution for each one of the analysed catalogues. With these plots we highlight not only the general features related to the analysis without spectroscopic noise, but also the characteristics of the different catalogue analyses.

thumbnail Fig. 6.

Results on ideal spectroscopy without measurement noise. Top row: results of the non-attenuated catalogue analyses. Middle row: results of the fixed attenuation catalogue analyses. Bottom row: results of the real attenuation catalogue analyses. NNLS, LASSO and ElasticNet results are respectively plotted in orange, green and red. The left column shows examples of redshift distribution fits with the real distribution plotted in blue. Shown on the right is the BAO relative error of the analysed colour groups ordered by redshift.

Firstly, in the non-attenuated and fixed attenuation analyses the method is able to recover and fit the position, the width and the height of the redshift distributions; moreover, the detailed shapes of the distributions are well fitted. The three algorithms, NNLS, LASSO and ElasticNet give comparable results. However, in some cases spurious secondary peaks are evident. The occurrence of spurious peaks is reduced in the LASSO and ElasticNet results due to the shrinkage and selection processes in the algorithms which suppress secondary solutions in favour of the principal ones.

Secondly, we find a loss of precision in the real attenuation case. All three estimators fail to reproduce the shape of the redshift distributions accurately. As seen in the example shown in Fig. 6, the principal peak in the estimated redshift distribution is narrower and many significant spurious peaks are seen.

The right-hand panels of Fig. 6 show the relative error in the recovered BAO position, Eq. (19), for each colour group. The error is weakly dependent on the mean redshift of the group. The three estimators tend to show more stable performance and give lower error at higher redshift z > 1.4. The NNLS estimator gives the largest error while the LASSO and ElasticNet estimators perform similarly.

The best performance is found in the case of the non-attenuated catalogue which has a mean error of approximately 0.5–0.7% in the BAO position from the three estimators (see Table 3). The fixed attenuation analysis shows similar error of 0.4–0.9%. However, when allowing the attenuation to be free in the real attenuation case, the error grows to 2%. Attenuation introduces a degeneracy in colour-redshift space that leads to spurious peaks in the redshift distributions which biases the BAO angular position. Underestimation of the BAO position signifies that the redshift was biased high, as seen for the NNLS estimator at z > 1.8. Overall the three methods have a similar error around 2% in the presence of attenuation with NNLS and ElasticNet giving the best fits. LASSO on average performed less well due to a small number of groups that were poorly fitted. At z > 1.5 the LASSO and ElasticNet algorithms perform better in the presence of attenuation, which may be attributed to the training process. Indeed, the performance will depend on the internal galaxy attenuation models used in the template set and the training sample.

Table 3.

Results.

4.2. Analyses with noise

The results of the analyses with noisy spectroscopy are shown in Fig. 7, on the left side panels we show an example redshift distribution for each one of the analysed catalogues. The non-attenuated and fixed attenuation catalogue analyses with noise show similar results to the analyses without noise. At lower redshift the method is able to recover the redshift distributions with great detail with all three the regression methods, however that is not the case at higher redshifts, z > 1.5. At high redshift the computed distribution tend to be very noisy and less smooth (see Fig. 7 middle left panel). The degree of this behaviour depends on the regression method used in the fit with NNLS showing more spurious peaks. Even though the smoothness of the redshift distributions is lost at high redshifts, the position and the width of the distributions are still recovered with enough precision in order to have small relative errors in the BAO angular position measurements. As for the analyses without spectroscopic noise we observe more stable performance and lower error at higher redshifts.

thumbnail Fig. 7.

Results with noisy spectroscopy. The panels show the same as in Fig. 6, the example redshift distribution fits plotted here are from different colour groups than the ones shown in Fig. 6. The redshift distributions estimated for colour groups at higher redshift tend to be less smooth with a greater frequency of spurious peaks. The LASSO and ElasticNet regression algorithms have free parameters that can be adjusted to give smoother distributions but at the cost of lower accuracy.

In the real attenuation catalogue analysis we observe more spurious peaks separated from the principal peak than the ones in the other two catalogue analyses. However these spurious structures usually are more peaked and frequent with respect to the much wider ones we have in the noiseless analysis of this same catalogue. Moreover, the fitted redshift distributions tend to be very noisy and lose smoothness even at lower redshifts (see Fig. 7 bottom left panel).

The measured error on the BAO position are plotted in the right-hand side panels of Fig. 7. The non-attenuated and fixed attenuation cases show similar trends to the analyses without noise. In both cases we are able to recover the BAO position well, however, at z > 1.5 the NNLS algorithm shows a trend of underestimating the BAO scale.

In the case of the real attenuation analysis, colour groups at z ∼ 1.05 tend to have the BAO position overestimated. This trend was not evident in the analysis without noise and indicates an added degeneracy related to the attenuation curves and how attenuation is modelled in the presence of spectroscopic noise. At z > 1.5 similar behaviour is found with and without spectroscopic noise.

The mean BAO errors are reported in Table 3. The trends are consistent with the analysis without spectroscopic noise but we find that the error degrades by approximately 10%. This indicates that the error introduced by the internal galaxy attenuation and its imperfect modelling is the most important factor that limits the fit performance.

5. Conclusions

In this pilot study we have tested the use of stacked spectra from Euclid near infra-red grism spectroscopy to reconstruct the ensemble redshift distribution of photometrically selected galaxy samples. The general approach in the context of slitless spectroscopic surveys was proposed by Padmanabhan et al. (2019). Here we considered the combination of broad-band photometry including the ugrizy bands from the Vera C. Rubin Observatory and Euclid NISP YJH augmented with stacked NISP grism spectroscopy using the Euclid Flagship mock galaxy catalogue. Since the optimisation of the photometric galaxy selection in Euclid is ongoing (Pocino 2021), we selected mock galaxy samples in colour space using the SOM algorithm. These galaxy samples have compact distributions in both colour and redshift. The redshift distributions inferred from broad-band photometry alone prove to be unreliable as shown in Appendix A. This is not unexpected since the constraints from SED fitting depend on the template priors which we do not consider (Benítez 2000). However, we find that the full application of the joint analysis of photometry and spectroscopy on mock survey data is promising and very informative of both the method’s limits and its potential applications.

To assess the quality of the redshift distribution estimation we focused on the cosmological application of inferring the BAO scale with photometric galaxy clustering measurements. Currently the best constraints of the BAO scale with photometric measurements is ∼4% (Seo et al. 2012; Abbott et al. 2019). This error depends on the survey area, the redshift of the sample as well as the width of the redshift distribution. We can expect that Euclid will make measurements of the BAO scale with percent-level statistical precision in multiple redshift bins from 0 < z < 3. Thus, it will be necessary to reduce the systematic error propagated from uncertainty in the redshift distributions to the sub-percent level.

We tested the quality of the redshift distribution estimates in progressively more realistic cases on mock galaxy catalogues considering grism spectroscopy with and without measurement noise. In the most idealised configuration without internal galactic attenuation the redshift distributions were reconstructed with excellent accuracy on the BAO scale of about 0.5%. The presence of spectroscopic noise degraded this error to about 0.6%. We compared three regression algorithms, NNLS, LASSO and ElasticNet. All three performed well but ElasticNet which has two free parameters gave the best results.

Our main conclusion is that the accuracy of the redshift distribution estimation is limited primarily by internal galaxy attenuation and its modelling. Compared with the non-attenuated and fixed attenuation cases, we found a significant loss of precision in the real attenuation analysis where the attenuation curve varies for each galaxy. This was the case in both analyses we carried out considering spectra with and without measurement noise (Sects. 4.1 and 4.2). Nevertheless, despite the degeneracies introduced by attenuation we found that the BAO scale could be recovered with a precision better than 2%. However, this behaviour reveals the importance of the template set and attenuation model that must be representative of the galaxy sample.

On this matter, we made a preliminary investigation to understand if increasing the number of templates affects the method performance. We expanded the template set with three values of E(B − V) to give three times the number of templates. Using this template set, we analysed a random sub-sample of the real attenuation catalogue without spectroscopic noise. The resulting error on the BAO scale remained stable and did not decrease by the addition of more templates to the fit. So we expect that choosing a larger, but more representative set of templates for the fit will improve the method performance.

In this work we used the same template set to build the galaxy spectra and the template matrix (see Sect. 3.2); this is an ideal situation that is not possible when analysing real observations. We expect a further loss in precision in a realistic case when the template set is not fully representative of the galaxy sample. However, optimisations may be made in the template set with the addition of priors that may improve the fitting performance. Spectroscopic campaigns such as the ongoing C3R2 will build representative redshift catalogs that can provide invaluable information to improve the templates, constrain the attenuation models, and set priors.

Another idealisation made while building the template matrix that needs to be highlighted is the range of the redshift grid. The redshift grid we used for our analyses covers only the redshift range that is simulated in the Euclid Flagship catalogue. Real catalogues will contain higher redshift galaxies, hence a wider range should be spanned by the redshift grid. Nevertheless, we still expect that extending the redshift range will not produce a significant loss in precision although we may find spurious peaks at high redshift if they are degenerate with the adopted attenuation model.

Moreover, it may be possible to improve the method fitting performance by introducing inverse-error weights in the stacked spectrum and template matrix as we suggested in Sect. 2.4. These weights will be useful in the analyses with noisy spectra to balance the relative importance of the photometry and spectroscopy in the fit and produce smoother redshift distributions. In addition it could make feasible the analysis of less populous colour groups, in which there are not enough galaxies to average the noise out of the stacked spectrum.

In the case of real observations we should also account for contamination from stars and quasars for which the template fits may be unreliable. We will also face additional sources of systematic error that we have not addressed here. Grism spectroscopy suffers from contamination due to overlapping spectra (Kümmel et al. 2009). This contamination can particularly spoil the measurement of the galaxy continuum. However, we expect that the spurious signals will be uncorrelated between spectra and average out in the stack. The spectrophotometric calibration error on the other hand can systematically alter the shape of all spectra in the stack and bias the fit. The importance of these sources of error will be investigated in a later work. Future work must also investigate the effect of emission lines in the galaxy SEDs on the stacked spectrum. We expect emission lines to appear in the stacked spectrum as bumps, the width of which will depend on the photometric redshift bin width. In order to take the emission lines into account in the analysis, they need to be modelled and added to the template matrix SEDs. Potentially the emission lines signal would help constrain the template fitting and improve the results, but they could also make the analysis more sensitive to the choice of the template set.

Our analyses confirm that in the case of Euclid, stacked spectroscopy adds information that can help to break degeneracies in colour space that affect statistical studies based on photometric redshifts. The approach provides an internal method for calibrating the redshift distributions without relying on representative spectroscopic samples. This is particularly important at the high redshifts and faint galaxy luminosities probed by Euclid where statistically complete samples of spectroscopic galaxy redshifts are lacking for calibration.


2

In statistics, shrinkage is a process that aims to reduce overfitting and the effect of sampling variation. It can be implemented with the addition of penalties to the cost function of interest.

4

The Euclid and Vera C. Rubin Observatory filter transmission functions were obtained from the Euclid data model version 1.8.

Acknowledgments

The Euclid Consortium acknowledges the European Space Agency and a number of agencies and institutes that have supported the development of Euclid, in particular the Academy of Finland, the Agenzia Spaziale Italiana, the Belgian Science Policy, the Canadian Euclid Consortium, the Centre National d’Etudes Spatiales, the Deutsches Zentrum für Luft-und Raumfahrt, the Danish Space Research Institute, the Fundação para a Ciência e a Tecnologia, the Ministerio de Economia y Competitividad, the National Aeronautics and Space Administration, the National Astronomical Observatory of Japan, the Netherlandse Onderzoekschool Voor Astronomie, the Norwegian Space Agency, the Romanian Space Agency, the State Secretariat for Education, Research and Innovation (SERI) at the Swiss Space Office (SSO), and the United Kingdom Space Agency. A complete and detailed list is available on the Euclid web site (http://www.euclid-ec.org). This work has made use of CosmoHub. CosmoHub has been developed by the Port d’Informació Científica (PIC), maintained through a collaboration of the Institut de Física d’Altes Energies (IFAE) and the Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT) and the Institute of Space Sciences (CSIC & IEEC), and was partially funded by the “Plan Estatal de Investigación Científica y Técnica y de Innovación” programme of the Spanish government.

References

  1. Abbott, T. M. C., Abdalla, F. B., Alarcon, A., et al. 2019, MNRAS, 483, 4866 [NASA ADS] [CrossRef] [Google Scholar]
  2. Akeson, R., Armus, L., Bachelet, E., et al. 2019, ArXiv e-prints [arXiv:1902.05569] [Google Scholar]
  3. Benítez, N. 2000, ApJ, 536, 571 [Google Scholar]
  4. Benitez, N., Dupke, R., Moles, M., et al. 2014, ArXiv e-prints [arXiv:1403.5237] [Google Scholar]
  5. Blas, D., Lesgourgues, J., & Tram, T. 2011, JCAP, 2011, 034 [Google Scholar]
  6. Bolzonella, M., Miralles, J. M., & Pelló, R. 2000, A&A, 363, 476 [NASA ADS] [Google Scholar]
  7. Calzetti, D., Armus, L., Bohlin, R. C., et al. 2000, ApJ, 533, 682 [NASA ADS] [CrossRef] [Google Scholar]
  8. Carretero, J., Castander, F. J., Gaztañaga, E., Crocce, M., & Fosalba, P. 2015, MNRAS, 447, 646 [NASA ADS] [CrossRef] [Google Scholar]
  9. Carretero, J., Tallada, P., Casals, J., et al. 2017, Proceedings of The European Physical Society Conference on High Energy Physics - PoS(EPS-HEP2017), 488 [CrossRef] [Google Scholar]
  10. Connolly, A. J., Csabai, I., Szalay, A. S., et al. 1995, AJ, 110, 2655 [NASA ADS] [CrossRef] [Google Scholar]
  11. Crocce, M., Castander, F. J., Gaztañaga, E., Fosalba, P., & Carretero, J. 2015, MNRAS, 453, 1513 [NASA ADS] [CrossRef] [Google Scholar]
  12. Dark Energy Survey Collaboration (Abbott, T., et al.) 2016, MNRAS, 460, 1270 [Google Scholar]
  13. Davidzon, I., Laigle, C., Capak, P. L., et al. 2019, MNRAS, 489, 4817 [Google Scholar]
  14. DESI Collaboration (Aghamousa, A., et al.) 2016, ArXiv e-prints [arXiv:1611.00036] [Google Scholar]
  15. Doré, O., Werner, M. W., Ashby, M. L. N., et al. 2018, ArXiv e-prints [arXiv:1805.05489] [Google Scholar]
  16. Driver, S. P., Hill, D. T., Kelvin, L. S., et al. 2011, MNRAS, 413, 971 [Google Scholar]
  17. Elvin-Poole, J., Crocce, M., Ross, A. J., et al. 2018, Phys. Rev. D, 98, 042006 [Google Scholar]
  18. Euclid Collaboration (Guglielmo, V., et al.) 2020, A&A, 642, A192 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Euclid Collaboration (Pocino, A., et al.) 2021, A&A, 655, A44 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  20. Guzzo, L., Scodeggio, M., Garilli, B., et al. 2014, A&A, 566, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Hartley, W. G., Chang, C., Samani, S., et al. 2020, MNRAS, 496, 4769 [Google Scholar]
  22. Ilbert, O., Capak, P., Salvato, M., et al. 2009, ApJ, 690, 1236 [NASA ADS] [CrossRef] [Google Scholar]
  23. Kohonen, T. 1982, Biol. Cybern., 43, 43 [Google Scholar]
  24. Kohonen, T. 1990, Proc. IEEE, 78, 78 [Google Scholar]
  25. Kümmel, M., Walsh, J. R., Pirzkal, N., Kuntschner, H., & Pasquali, A. 2009, PASP, 121, 59 [CrossRef] [Google Scholar]
  26. Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]
  27. Lawson, C., & Hanson, R. J. 1987, Solving Least Squares Problems (Philadelphia: SIAM) [Google Scholar]
  28. Le Fèvre, O., Vettolani, G., Garilli, B., et al. 2005, A&A, 439, 845 [Google Scholar]
  29. Lima, M., Cunha, C. E., Oyaizu, H., et al. 2008, MNRAS, 390, 118 [Google Scholar]
  30. LSST Science Collaboration (Abell, P. A., et al.) 2009, ArXiv e-prints [arXiv:0912.0201] [Google Scholar]
  31. Marchetti, A., Granett, B. R., Guzzo, L., et al. 2013, MNRAS, 428, 1424 [Google Scholar]
  32. Masters, D., Capak, P., Stern, D., et al. 2015, ApJ, 813, 53 [Google Scholar]
  33. Masters, D. C., Stern, D. K., Cohen, J. G., et al. 2019, ApJ, 877, 81 [Google Scholar]
  34. Momcheva, I. G., Brammer, G. B., van Dokkum, P. G., et al. 2016, ApJS, 225, 27 [Google Scholar]
  35. Moosavi, V., Packmann, S., & Vallés, I. 2014, SOMPY: A Python Library for Self Organizing Map (SOM), https://github.com/sevamoo/SOMPY [Google Scholar]
  36. Newman, J. A. 2008, ApJ, 684, 88 [Google Scholar]
  37. Newman, J. A., Abate, A., Abdalla, F. B., et al. 2015, Astropart. Phys., 63, 81 [Google Scholar]
  38. Padmanabhan, N., White, M., Chang, T. C., et al. 2019, ArXiv e-prints [arXiv:1903.01571] [Google Scholar]
  39. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 12 [Google Scholar]
  40. Potter, D., Stadel, J., & Teyssier, R. 2017, Comput. Astrophys. Cosmol., 4, 2 [NASA ADS] [CrossRef] [Google Scholar]
  41. Prevot, M. L., Lequeux, J., Maurice, E., Prevot, L., & Rocca-Volmerange, B. 1984, A&A, 132, 389 [Google Scholar]
  42. Salvato, M., Ilbert, O., & Hoyle, B. 2019, Nat. Astron., 3, 212 [NASA ADS] [CrossRef] [Google Scholar]
  43. Sánchez, E., Carnero, A., García-Bellido, J., et al. 2011, MNRAS, 411, 277 [CrossRef] [Google Scholar]
  44. Schmidt, S. J., Ménard, B., Scranton, R., Morrison, C., & McBride, C. K. 2013, MNRAS, 431, 3307 [Google Scholar]
  45. Schneider, M., Knox, L., Zhan, H., & Connolly, A. 2006, ApJ, 651, 14 [Google Scholar]
  46. Scodeggio, M., Guzzo, L., Garilli, B., et al. 2018, A&A, 609, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  47. Scottez, V., Mellier, Y., Granett, B. R., et al. 2016, MNRAS, 462, 1683 [NASA ADS] [CrossRef] [Google Scholar]
  48. Seo, H.-J., Ho, S., White, M., et al. 2012, ApJ, 761, 13 [NASA ADS] [CrossRef] [Google Scholar]
  49. Stanford, S. A., Masters, D., Darvish, B., et al. 2021, ApJS, 256, 9 [NASA ADS] [CrossRef] [Google Scholar]
  50. Tallada, P., Carretero, J., Casals, J., et al. 2020, Astron. Comput., 32, 32 [Google Scholar]
  51. Tibshirani, R. 1996, J. R. Stat. Soc. Ser. B (Methodol.), 58, 58 [Google Scholar]
  52. Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nat. Methods, 17, 261 [Google Scholar]
  53. Wilson, D., Nayyeri, H., Cooray, A., & Häußler, B. 2020, ApJ, 888, 83 [NASA ADS] [CrossRef] [Google Scholar]
  54. Zou, H., & Hastie, T. 2005, J. R. Stat. Soc.: Ser. B (Stat. Methodol.), 67, 67 [Google Scholar]
  55. Zwicky, F. 1941, PASP, 53, 242 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Spectro-photometry vs. photometry

In Sect. 1 we stated that the combination of stacked spectroscopy and photometry is needed in order to break the colour-redshift degeneracy and recover detailed redshift distributions. Here we justify this claim by comparing the results of different analyses that use the combination of stacked spectroscopy and photometry, stacked spectroscopy alone and stacked photometry.

We analysed a subset of the non-attenuated catalogue and one of the real attenuation catalogue without measurement noise. The colour groups for the analyses were selected with the same criterion used for the analyses presented in the paper (zmean > 1 and σz < 0.2). In Figs. A.1 and A.2 we present the results of the analysis that used the ElasticNet regularisation, which was the best performing linear regression method, with the best fitting parameters labelled as αno noise and βno noise in Sect. 3.5. Figure A.1 panels show two example redshift distribution fits, in the top panel for the non-attenuated catalogue and in the bottom for the real attenuation catalogue. From the plots it is clear that the analyses with stacked spectroscopy alone (green line) are not able to localise the peak of the redshift distribution. On the other hand, stacked photometry (orange line) is able to roughly locate the redshift distribution, but does not fit its substructure and presents spurious peaks. Finally, the combination of stacked spectroscopy and photometry (black line) breaks the colour-redshift degeneracies and recovers the redshift distribution with a significant improvement in accuracy that can be seen by eye.

thumbnail Fig. A.1.

Two example redshift distributions from the analyses of the non-attenuated catalogue, top panel, and the real attenuation one, bottom panel, without the measurement noise and obtained with the ElasticNet regularisation. The results from the combination of stacked spectroscopy and photometry, stacked spectroscopy alone and stacked photometry are respectively plotted in black, green and orange. The real distribution is the filled blue histogram.

thumbnail Fig. A.2.

BAO relative errors as a function of redshift for the samples shown in Fig. A.1. The results from the combination of stacked spectroscopy and photometry and stacked photometry are respectively plotted in black and orange. The BAO position could not be fitted in the spectroscopy-only analysis so the error in this case is not shown.

Figure A.2 shows the BAO relative error derived for all of the colour groups with the combination of stacked spectroscopy and photometry, and for stacked photometry alone. We were unable to fit the BAO angular position for the analysis with stacked spectroscopy alone due to the disperse distributions that were recovered and so it is not shown on the plots. The mean BAO error of the non-attenuated catalogue analyses is 0.0044 for the combination of stacked spectroscopy and photometry and 0.010 for the analyses with stacked photometry alone; for the real attenuation catalogue they are respectively 0.014 and 0.028. Thus, the addition of spectroscopy in the analysis reduces the error by a factor of 2.

These results justify the choice of using the combination of stacked spectroscopy and photometry. Photometry is indeed needed in order to locate the redshift distribution, but the addition of spectroscopic information helps to break the degeneracies in colour-redshift space and significantly improves the constraints.

Appendix B: Colour division in SOMs

In this work we used a very coarse SOM, as we need highly populated colour groups in order to average out the noise in the stacked spectrum (see Sect. 3.3 and Fig. 2). Figure B.1 shows how the colours are mapped to the SOM cells, and B.2 shows the colour distributions of two SOM cells. In Fig. B.2 the colour distribution of two colour groups (blue and red contours) are over-plotted on the distribution of the whole catalogue. The distributions are compact in comparison to the catalogue one, showing that the SOM groups galaxies with similar colours. In addition, the two colour distributions are separated from one another in the bottom panels of Fig. B.2. This gives a first visual proof that even with a limited number of cells the SOM is able to divide the galaxies into distinct colour groups.

thumbnail Fig. B.1.

SOM shown on the left panel of Fig. 4. The colour scale indicate the mean colour of the SOM cell, each panel represents one of the eight colours used to build the SOM.

thumbnail Fig. B.2.

Contour plots of the colour distributions. The filled lines show the distribution of all galaxies in the catalogue, while the red and blue ones represent the colour distributions of two cells of the SOM. The inner and outer contour lines contain 90% and 68% of the samples. The figure shows how the SOM groups galaxies with similar colours in the same cells and how different cells have different colour distributions.

All Tables

Table 1.

10 σ depths.

Table 2.

Attenuation curves and COSMOS templates.

All Figures

thumbnail Fig. 1.

Example of how stacked spectroscopy breaks the colour-redshift degeneracy. Left: to illustrate the method we show a colour degeneracy in Y − J as a function of redshift for starburst (SB) and elliptical (Ell) galaxy templates from Ilbert et al. (2009). The dashed horizontal line indicates Y − J = 0.8 and shows three redshift solutions consistent with the colour: elliptical galaxies at z ∼ 1.7, and starburst galaxies at z ∼ 1.9 and z ∼ 2.5. Right: we see that each solution gives a unique spectral shape in the near infrared range probed by the Euclid NISP instrument (red shaded area). The red line shows the elliptical template at z ∼ 1.7 and the blue and green lines show the starburst template at z ∼ 1.9 and 2.5. The spectra are normalised at the effective wavelength of the Y NISP filter. The Euclid NISP Y and J filter transmission are overplotted. The spectral resolution of the plotted templates is lower than the Euclid NISP spectrograph one. The stacked spectroscopy at fixed colour is built from the linear combination of these templates and encodes enough information to recover the relative contributions of spectral type at each redshift.

In the text
thumbnail Fig. 2.

Stacked spectro-photometry for two groups of galaxies that have been photometrically selected (see Sect. 3.3). The photometric data includes EuclidYJH and the Vera C. Rubin Observatory ugrizy bands and the spectrocscopic data is from the Euclid NISP instrument with simulated noise. The photometric bandpasses used in the analysis are over-plotted. In both plots the photometric uncertainty bars are smaller than the markers. Left: an example stack for a group of galaxies with mean redshift z = 1.10. The stack includes 2 × 103 galaxies. Right: stacked flux for a group at mean redshift z = 1.44 with 2 × 106 galaxies.

In the text
thumbnail Fig. 3.

Arbitrary scaled COSMOS templates used for the Euclid Flagship mock galaxy catalogue. In red are shown the elliptical templates, in blue the lenticular and spiral and in green the starburst ones.

In the text
thumbnail Fig. 4.

Photometric redshift division with SOM. Left: the two-dimensional projection of the galaxy colour groups constructed with the SOM. The colour scale indicates the mean redshift of the SOM cells, which we define as colour groups. The red spots identify the cells with σz < 0.20. Right: the colour groups sorted by their mean redshift. The error bar represents the standard deviation of redshift, σz, in each group. The groups with red error bars are the one with σz < 0.20.

In the text
thumbnail Fig. 5.

Mean BAO error for different sets of LASSO and ElasticNet fit parameters. The colour groups for the analysis were selected to have zmean > 1 and standard deviation in redshift lower than 0.15.

In the text
thumbnail Fig. 6.

Results on ideal spectroscopy without measurement noise. Top row: results of the non-attenuated catalogue analyses. Middle row: results of the fixed attenuation catalogue analyses. Bottom row: results of the real attenuation catalogue analyses. NNLS, LASSO and ElasticNet results are respectively plotted in orange, green and red. The left column shows examples of redshift distribution fits with the real distribution plotted in blue. Shown on the right is the BAO relative error of the analysed colour groups ordered by redshift.

In the text
thumbnail Fig. 7.

Results with noisy spectroscopy. The panels show the same as in Fig. 6, the example redshift distribution fits plotted here are from different colour groups than the ones shown in Fig. 6. The redshift distributions estimated for colour groups at higher redshift tend to be less smooth with a greater frequency of spurious peaks. The LASSO and ElasticNet regression algorithms have free parameters that can be adjusted to give smoother distributions but at the cost of lower accuracy.

In the text
thumbnail Fig. A.1.

Two example redshift distributions from the analyses of the non-attenuated catalogue, top panel, and the real attenuation one, bottom panel, without the measurement noise and obtained with the ElasticNet regularisation. The results from the combination of stacked spectroscopy and photometry, stacked spectroscopy alone and stacked photometry are respectively plotted in black, green and orange. The real distribution is the filled blue histogram.

In the text
thumbnail Fig. A.2.

BAO relative errors as a function of redshift for the samples shown in Fig. A.1. The results from the combination of stacked spectroscopy and photometry and stacked photometry are respectively plotted in black and orange. The BAO position could not be fitted in the spectroscopy-only analysis so the error in this case is not shown.

In the text
thumbnail Fig. B.1.

SOM shown on the left panel of Fig. 4. The colour scale indicate the mean colour of the SOM cell, each panel represents one of the eight colours used to build the SOM.

In the text
thumbnail Fig. B.2.

Contour plots of the colour distributions. The filled lines show the distribution of all galaxies in the catalogue, while the red and blue ones represent the colour distributions of two cells of the SOM. The inner and outer contour lines contain 90% and 68% of the samples. The figure shows how the SOM groups galaxies with similar colours in the same cells and how different cells have different colour distributions.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.