Issue 
A&A
Volume 638, June 2020



Article Number  L1  
Number of page(s)  8  
Section  Letters to the Editor  
DOI  https://doi.org/10.1051/00046361/201936154  
Published online  29 May 2020 
Letter to the Editor
KiDS+VIKING450 and DESY1 combined: Cosmology with cosmic shear
^{1}
Department of Physics, University of Oxford, Denys Wilkinson Building, Keble Road, Oxford OX1 3RH, UK
email: shahab.joudaki@physics.ox.ac.uk
^{2}
RuhrUniversität Bochum, Astronomisches Institut, German Centre for Cosmological Lensing, Universitätsstr. 150, 44801 Bochum, Germany
^{3}
ArgelanderInstitut für Astronomie, Universität Bonn, Auf dem Hügel 71, 53121 Bonn, Germany
^{4}
Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ, UK
^{5}
Leiden Observatory, Leiden University, PO Box 9513, 2300 RA Leiden, The Netherlands
^{6}
Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT, UK
Received:
22
June
2019
Accepted:
22
April
2020
We present a combined tomographic weak gravitational lensing analysis of the Kilo Degree Survey (KV450) and the Dark Energy Survey (DESY1). We homogenize the analysis of these two public cosmic shear datasets by adopting consistent priors and modeling of nonlinear scales, and determine new redshift distributions for DESY1 based on deep public spectroscopic surveys. Adopting these revised redshifts results in a 0.8σ reduction in the DESinferred value for S_{8}, which decreases to a 0.5σ reduction when including a systematic redshift calibration error model from mock DES data based on the MICE2 simulation. The combined KV450+DESY1 constraint on S_{8} = 0.762_{−0.024}^{+0.025} is in tension with the Planck 2018 constraint from the cosmic microwave background at the level of 2.5σ. This result highlights the importance of developing methods to provide accurate redshift calibration for current and future weaklensing surveys.
Key words: cosmology: observations / galaxies: photometry / gravitational lensing: weak / surveys
© ESO 2020
1. Introduction
Weak gravitational lensing tomography has entered the phase of precision cosmology, with observational constraints on the bestmeasured parameter, , at a level of precision ≲5% for all current surveys (Hildebrandt et al. 2020, hereafter H20; Troxel et al. 2018, hereafter T18; Hikage et al. 2019; Joudaki et al. 2017; Jee et al. 2016). Here, σ_{8} refers to the rootmeansquare of the linear matter overdensity field on 8 h^{−1} Mpc scales, and Ω_{m} is the present mean density of nonrelativistic matter relative to the critical density. This phase has been reached as a result of the success in accounting for the systematic uncertainties that affect the measurements. However, as the statistical precision of weaklensing surveys increases with depth and area, the requirements on their ability to control systematic uncertainties increase as well. In Hildebrandt et al. (2017), it was shown that the contribution of systematic uncertainties to the total error budget for the Kilo Degree Survey (KiDS; Kuijken et al. 2015) is comparable to that of the statistical uncertainties. Given the similar constraining power of concurrent weaklensing surveys, such as the Dark Energy Survey (DES; Abbott et al. 2018a) and the Subaru Hyper SuprimeCam survey (HSC; Aihara et al. 2018), a continued reduction in the systematic uncertainties is crucial to obtain unbiased cosmological constraints and to exploit the full statistical power of current and future weaklensing datasets.
The most notable systematic uncertainties pertain to the intrinsic alignment (IA) of galaxies, additive and multiplicative shear calibration, baryonic feedback affecting the nonlinear matter power spectrum, and photometric redshift errors (see Mandelbaum 2018 and references therein). All current weaklensing surveys have reached a statistical precision where notable changes to the cosmological parameter constraints are found when accounting for these systematic uncertainties in the analysis (e.g. Hikage et al. 2019; T18; H20). The expectation is that the final parameter constraints are robust when marginalized over all known systematics. This is generally well motivated through the vast range of checks and extensions of the systematic models beyond the standard approach considered by these surveys. The uncertainty in the redshift distributions, n(z), of weakly lensed galaxies is, however, more difficult to account for, and has been shown to be the only systematic uncertainty to impact the posterior mean of S_{8} by ∼1σ (H20).
The redshift uncertainty is arguably the most challenging systematic to control in both current and future lensing surveys. In KiDS, the estimation of the redshift distributions has benefited from the fully overlapping nearinfrared imaging data from the VISTA KiloDegree Infrared Galaxy Survey (VIKING; Edge et al. 2013). The combined KiDS and VIKING dataset (‘KiDS+VIKING450’ or ‘KV450’; Wright et al. 2019) has allowed for an increased precision in the estimation of photometric redshifts that are used to assign sources to tomographic bins. In addition, KiDS targets deep pencilbeam spectroscopic surveys permitting the redshift distributions to be determined via the weighted direct estimation, or ‘DIR’, approach (Lima et al. 2008; Hildebrandt et al. 2017; H20), which is fully decoupled from the photoz. This DIR method assigns KiDS sources to spectroscopic galaxies by a knearestneighbor matching in order to estimate weights for the spectroscopic objects. The weighted distribution of spectroscopic redshifts can then be used to estimate the n(z) of the sources. The uncertainty Δz_{i} in the mean redshift of each tomographic bin i is obtained from a spatial bootstrap resampling of the spectroscopic calibration sample and propagated in the cosmological analysis as n_{i}(z) → n_{i}(z − Δz_{i}) (H20).
The DIR approach has been found to produce cosmological results consistent with other n(z) estimation techniques, such as the angular crosscorrelation of photometric and spectroscopic galaxy samples (where the spectroscopic samples are obtained from overlapping wide and shallow surveys; Morrison et al. 2017; Johnson et al. 2017). In H20, it was also shown that the cosmological constraints from KV450 are robust to the specific combination of spectroscopic calibration samples used to obtain the DIR n(z) as long as the spectroscopic datasets provide a sufficient coverage in depth and redshift.
Both DES and HSC calibrate their redshift distributions with a highquality photometric redshift catalog in the COSMOS field (Laigle et al. 2016). A similar calibration of the KV450 data yielded a 0.6σ larger value of S_{8} (H20). One hypothesis is that outliers in the COSMOS photoz catalog cause the estimated redshifts to be biased low. Alternatively, there could be a bias in the fiducial KV450 DIR calibration. Here, we construct mock KV450 and DESY1 catalogs based on the MICE2 simulation and quantify the extent to which the redshift distributions might be reliably estimated. As the DESY1 data are slightly shallower than KiDS, which matches the depth of the public spectroscopic redshift catalogs, we spectroscopically calibrate the DESY1 redshift distributions^{1}. Using these newly determined n(z), we evaluate the impact on the cosmological constraints, and perform a combined cosmological analysis with KV450.
2. KV450 and DESY1 cosmological constraints with a homogenized analysis
To meaningfully compare the cosmological constraints from KV450 and DESY1, we begin by homogenizing the cosmological priors and treatment of astrophysical systematic uncertainties (Fig. 1). We consider the KV450 and DESY1 measurements and covariance in H20 and T18, respectively^{2}. We do not remeasure the respective data vectors and covariance, and use only the angular scales advocated in H20 and T18. As KV450 and DESY1 observations do not overlap on the sky, we treat the two surveys as distinct.
Fig. 1. Marginalized posterior contours in the S_{8}–Ω_{m} plane (inner 68% CL, outer 95% CL). We show the KV450 constraints in green (solid) using an analysis setup that follows H20, but including an additional redshift dependence of the IA signal (denoted ‘KV450’). In black (dashed), we show the DESY1 constraints corresponding to the original T18 analysis, noting that the sum of neutrino masses is varied in this analysis (and hence the contour should not be directly compared with the orange (solid) Planck 2018 contour where the neutrino mass is fixed). The blue (solid) contours show the DESY1 constraints where an identical setup to the KV450 analysis is used (along with the original DESY1 redshift distributions). 
The cosmological constraints on KV450 and DESY1 are obtained using the COSMOLSS^{3} likelihood code (Joudaki et al. 2018) in a Markov chain Monte Carlo (MCMC) analysis. This code has been used to benchmark the LSSTDESC Core Cosmology Library (CCL; Chisari et al. 2019) computation of tomographic cosmic shear, galaxygalaxy lensing, and galaxy clustering observables. For completeness, we reproduced the COSMOLSS DESY1 constraints with both COSMOSIS (Zuntz et al. 2015) and the Planck Collaboration’s lensing likelihood in COSMOMC (Planck Collaboration VI 2020). In H20, we moreover showed that the KV450 constraints from COSMOLSS, COSMOSIS, and MONTE PYTHON (Audren et al. 2013) are in excellent agreement.
For both surveys, we implement the cosmological priors of H20 (see Table 3 therein). In the case of DESY1, this includes not only a change in the size of the parameter priors, but notably also a change in the size of the parameter space by fixing the sum of neutrino masses to 0.06 eV instead of varying it freely, a change in the uniform sampling of A_{s} → ln(10^{10}A_{s}), and a change from HALOFIT (Takahashi et al. 2012) to HMCODE (Mead et al. 2015) for the modeling of the nonlinear corrections to the matter power spectrum. Compared to the fiducial DESY1 and KV450 analyses, we also switch from MULTINEST (Feroz et al. 2009) to MCMC sampling of the parameter space. Following H20, we allow baryonic feedback to modify the nonlinear matter power spectrum. This does not particularly affect the DESY1 constraints given the conservative scale cuts in T18. We keep the shear calibration and photometric redshift uncertainties distinct between the two surveys (given by Table 2 in T18 and Table 3 in H20, respectively).
Conservatively, we allow KV450 and DESY1 to have independent parameters governing the IA, using both an amplitude and redshift dependence (as a result, in the combined KV450+DESY1 analysis there are 4 free IA parameters). We use a pivot redshift of z_{0} = 0.3, in agreement with past KiDS analyses and direct measurements of the IA (e.g. Mandelbaum et al. 2011; Joachimi et al. 2011). We find that the S_{8} constraints are robust to the specific treatment of the IA, such as removal of the redshift dependence or by assuming that the IA parameters are shared between the two surveys^{4}.
We compare the KV450 and DESY1 constraints with the Planck 2018 cosmic microwave background (CMB) temperature and polarization measurements (Planck Collaboration VI 2020)^{5}, where the ‘TT,TE,EE+lowE’ data combination gives . We exclude the CMB lensing measurements to isolate the highredshift CMB temperature and polarization constraint on cosmology from the lowredshift Universe.
The KV450 constraint on corresponds to a 2.4σ discrepancy with Planck 2018. The original DESY1 cosmic shear constraint from the publicly released chain^{6} is (we note that T18 quotes the marginal posterior maximum of 0.782 instead of the more common posterior mean given here). Compared with the corresponding Planck 2018 result, where the neutrino mass varies, this is a 1.7σ difference. The DESY1 constraint using the KV450 setup is , which differs by 1.0σ from the Planck 2018 constraint and by 1.1σ from the KV450 constraint. This change reflects a shift in the posterior mean and an increase in uncertainty as a result of using HMCODE instead of HALOFIT, wider priors on the amplitude and spectral index of the primordial power spectrum, uniformly sampling ln(10^{10} A_{s}) instead of A_{s}, and fixing the sum of neutrino masses instead of varying it.
We note that when KV450 and DESY1 are homogenized to the same assumptions and using the fiducial angular scales, the constraining power of the two datasets is comparable, with the DESY1 uncertainty in S_{8} smaller by 8% (instead of 30% smaller uncertainty when simply comparing the DESY1 constraint in T18 with the KV450 constraint in H20). However, this does not account for the improvement in the DESY1 constraining power when extending the scale cuts from the fiducial approach in T18 to better agree with the range of angular scales θ probed by KV450. We find that such a modification to the angular scales (such that {θ_{+} > 3, θ_{−} > 7} arcmin for all tomographic bin combinations) in our correlation function analysis improves the DESY1 uncertainty in S_{8} by approximately 30% (with a 0.5σ decrease in the posterior mean) after marginalizing over baryonic feedback, increasing the deviation from Planck (see also Asgari et al. 2020 for a smallscale analysis with COSEBIs).
3. Spectroscopic determination of the DESY1 source redshift distributions
The redshift distributions for DES and HSC have so far been obtained by using data from the 30band photometric dataset ‘COSMOS2015’ (Laigle et al. 2016). In HSCY1, the fiducial redshift distributions are obtained as a histogram of reweighted COSMOS2015 photometric redshifts (using the weights of the HSC source galaxies and a selforganizing map, or ‘SOM’), and the uncertainties in these distributions are obtained by comparing against the photometric redshift distributions from six different codes where the probability distribution functions of the source galaxy redshifts are stacked (Hikage et al. 2019). In DESY1, the Bayesian photometric redshift code BPZ (Benítez 2000) is used to compute a stacked redshift distribution, which is shifted along the redshift axis to best fit a combination of resampled COSMOS2015 redshift distributions and (for the first three tomographic bins) the clustering of the DES source galaxies and a highquality photoz reference sample (REDMAGiC; Rozo et al. 2016) over a limited redshift range (Hoyle et al. 2018).
To compare these approaches to direct spectroscopic determination, which fully decouples the photoz from the determination of the n(z), H20 considered a DIR estimate of the KV450 redshifts with the help of COSMOS2015, finding a coherent downward shift in the redshift distributions and a consequent increase in the posterior mean for S_{8}. H20 argue that estimating the redshift distributions from COSMOS2015 might however be unreliable given the ‘catastrophic outlier’ fraction of ∼6% in the magnitude range 23 < i < 24 reported in Laigle et al. (2016)^{7} and a residual photoz bias of ⟨z_{spec} − z_{phot}⟩ ≈ 0.01 after rejection of outliers. This can be compared to ∼1% unreliable redshifts for the combined spectroscopic calibration sample^{8}. The outliers in the COSMOS2015 photoz are potentially also more problematic because their effect is most probably asymmetric. Outliers that are truly objects at highz but are assigned a low COSMOS2015 photoz are more likely to fall inside the DESY1 tomographic bins than outliers that are bona fide lowz galaxies but are assigned a high COSMOS2015 photoz. Additionally, the bias in the core of the z_{spec} − z_{phot} distribution is in the same direction, that is, overall the redshifts might be underestimated by the COSMOS2015 photoz.
In the DESY1 analyses, the case is made that a spectroscopic determination of the source redshift distributions would not be sufficiently accurate due to the incompleteness of the existing spectroscopic surveys at the faint end of the DES observations (Hoyle et al. 2018). We find, however, that even the deeper KV450 source sample is well covered by our spectroscopic compilation, implying that the coverage should also be sufficient for the calibration of the DESY1 sample. This is confirmed by a SOM approach to redshift calibration (Masters et al. 2015) presented in Wright et al. (2020).
DESY1 overlaps with almost the same deep spectroscopic redshift surveys that were used by H20. As shown in Fig. 2 (inset), this overlap contains some 30 000 objects with spectroscopic redshifts from zCOSMOS (Lilly et al. 2009), the DEEP2 Redshift Survey (Newman et al. 2013), the VIMOS VLT Deep Survey (VVDS; Le Fèvre et al. 2013), and the Chandra Deep Field South (CDFS; Vanzella et al. 2008; Popesso et al. 2009; Balestra et al. 2010; Le Fèvre et al. 2013). We find that the KV450 source sample is well covered as long as spectroscopic redshifts from DEEP2 – the highestredshift calibration survey – are included and the same is true for DESY1. However, we note that Hoyle et al. (2018) and Hartley et al. (2020) have moreover argued that the 4band DES data may be inherently less suitable to our reweighting scheme than the 9band KiDS+VIKING data, which is a hypothesis that we assess in Appendix A (see also Buchs et al. 2019 for a way to solve this by leveraging the DES Deep fields).
Fig. 2. DESY1 redshift distributions for the four tomographic bins (in black, blue, cyan, and red, respectively), showing the publicly released distributions (dashed) and the spectroscopically determined distributions using the DIR approach (solid). The distributions based on spectroscopy are systematically shifted to higher redshifts than the original distributions (accounting for Δz_{i}), and hence favor a lower value of S_{8} than the original DESY1 analysis in T18. See Table 1 for the mean redshifts of the different tomographic bins for the two approaches. The vertical dotted lines denote the tomographic bin boundaries. The small inset shows the redshift distribution of the matched photometry/spectroscopy catalog for DESY1 containing approximately 30 000 objects used in the DIR method. The spectroscopic calibration samples are obtained from zCOSMOS, VVDSDeep (2h), CDFS, DEEP2 (2h), and VVDSWide (22h). We do not show the uncertainties in the n(z) for visual clarity (instead see Table 1 for uncertainties in the mean redshifts). 
The KV450 and DESY1 spectroscopic calibration samples used here differ in detail: DESY1 overlaps on the sky with VVDS in both the Deep (2h) and Wide (22h) fields compared to only the Deep (2h) field for KV450, and the DESY1 calibration does not include the 23h field of DEEP2 and the GAMAG15Deep sample (Kafle et al. 2018) which are included in the KV450 calibration. Overall, we obtain the DESY1 and KV450 redshift distributions using five and six spectroscopic calibration samples, respectively, four of which are identical^{9}. Note that no shear data from these calibration fields are used in both the KiDS and DES cosmological analyses, maintaining independence in the measured shear correlation functions from the two surveys.
Figure 2 shows that the spectroscopic calibration shifts DESY1 redshift distributions to higher redshifts compared to the original photoz recalibration with COSMOS2015, consistent with the findings of H20. Mean redshifts of the four tomographic bins are reported in Table 1 for both cases. The spectroscopically determined distributions peak closer to the center of the corresponding tomographic bins, and contain higherredshift galaxies. These shifts between the spectroscopically estimated and published DESY1 n(z) are significant because of their coherence, that is, all tomographic bins shift in the same direction. We emphasize that widening the priors on the uncorrelated Δz_{i} nuisance parameters cannot account for such a coherent shift as this is fully degenerate with the cosmological parameters of interest (see the discussion at the end of Sect. 3 in H20).
DESY1 mean redshifts of the four tomographic bins calibrated with COSMOS2015 (T18) and spectroscopic redshifts (this work).
In Appendix A, we further explore the robustness of the DIR calibration on mock KV450 and DESY1 catalogs. This analysis motivates the inclusion of a systematic error model in our analysis to account for potential biases in the DIR calibration. If the error model from the mock survey analysis is fully accurate and representative (see the caveats and discussion in Appendix A), the true mean redshifts of the DESY1 tomographic bins can be lowered by approximately 0.01−0.03 compared to the DIR results presented in Fig. 2 and Table 1.
4. Cosmological impact of DESY1 n(z) recalibration and combined constraints with KV450
We now quantify the impact of the spectroscopic calibration of the DESY1 redshift distributions on the cosmological parameter constraints. As it is now on an equal footing with KV450, we moreover perform a combined analysis of the two surveys, shown in Fig. 3.
Fig. 3. Marginalized posterior contours in the S_{8} – Ω_{m} plane (inner 68% CL, outer 95% CL) for KV450 (green), DESY1 following a spectroscopic calibration of the redshift distributions and identical setup to the KV450 analysis (purple), the above combined (pink), and Planck 2018 (orange). 
The DESY1 constraint following the spectroscopic calibration of the redshift distributions is . Compared to using the original redshift distributions, this is a change in the posterior mean by ΔS_{8} = −0.029 and a marginal (5%) improvement in the S_{8} uncertainty. We verified that this shift in S_{8} is largely recovered by coherently shifting the original DESY1 redshift distributions by the Δz_{i} difference with the spectroscopically calibrated distributions as reported in Table 1 (i.e. changes in the structure of the n_{i}(z) have a subdominant impact on S_{8}). This substantial change in the DESY1 constraint highlights the importance of the redshift calibration. The size of ΔS_{8} corresponds to a 0.8σ shift in terms of the larger DES uncertainty in the KV450 setup, and a 1.1σ shift in terms of the original DESY1 uncertainty quoted in T18. The DESY1 constraint using a KV450 analysis setup and spectroscopically calibrated redshift distributions is different from the Planck 2018 constraint on S_{8} by 1.9σ. The goodness of fit with the spectroscopically calibrated distributions is comparable to that of using the COSMOS2015 distributions (difference in the reduced χ^{2} by 6 × 10^{−3}).
Following the homogenization of the analysis setups, the combined KV450+DESY1 constraint is . This is almost exactly a factor of improvement in precision compared to KV450 and DESY1 on their own. We find a bestfit χ^{2} = 413.4 for 397 degrees of freedom, which corresponds to a reduced χ^{2} of 1.04 and a pvalue of 0.27. Using the log ℐ statistic (Joudaki et al. 2017) and Jeffreys’ scale (Jeffreys 1961; Kass & Raftery 1995), we find that KV450 and DESY1 are in ‘strong’ concordance (logℐ = 1.4), which is an expected outcome given the S_{8} agreement between the two surveys. The KV450+DESY1 constraint is 2.5σ discordant with Planck 2018 (we do not evaluate the log ℐ statistic in this case as the Planck 2018 likelihood is not public). We note that for the cosmological priors used in T18, the combined KV450+DESY1 dataset is even more discordant with Planck. For this case (not shown in Fig. 3), , which is a 3.0σ discordance with Planck 2018.
The constraints on the astrophysical degrees of freedom, such as the IA amplitude and redshift dependence, do not change significantly in the combined analysis from either survey independently. This is partly a consequence of our analysis decision to keep the KV450 and DESY1 intrinsic alignment parameters distinct. We further note that the impact of the spectroscopic calibration for DESY1 decreases to ΔS_{8} = −0.017 (from the fiducial ΔS_{8} = −0.029) if a systematic error model for the DIR calibration from our study of mock DES data in Appendix A is included in the analysis. In the appendix, we show that a selfconsistent change in the redshift distributions of both DESY1 and KV450, based on our mocks constructed for each survey, results in effectively the same combined KV450+DESY1 constraint on S_{8} as in the fiducial analysis (less than 0.1σ difference). While the inclusion of the DEEP2 sample is critical for the redshift calibration of both KV450 and DESY1 (Wright et al. 2020), the S_{8} constraints from both surveys are robust to a change in the specz calibrating fields to the four fields that they have in common. We note that the spectroscopically calibrated source redshift distributions will have a comparable impact on the S_{8} constraint from the DESY1 combined analysis of cosmic shear, galaxygalaxy lensing, and galaxy clustering (Abbott et al. 2018b).
5. Conclusions
We have performed the first combined analysis of StageIII cosmic shear surveys with KiDS+VIKING450 and DESY1. In obtaining reliable cosmological results, we homogenized the analysis setups and spectroscopically calibrated the DESY1 source redshift distributions, both of which have a substantial impact on the parameter constraints. We show that the cosmological constraints from KV450 and DESY1 are comparable when analyzed selfconsistently over the angular scales advocated by each survey, and that the DESY1 constraint on S_{8} changes downward by 0.8σ when calibrating the redshift distributions using overlapping deepfield spectroscopy. The combined KV450+DESY1 constraint on reflects a factor of improvement in precision compared to each survey independently, and is 2.5σ discordant with the Planck CMB temperature and polarization. This increases to 3.0σ when employing the cosmological priors advocated by DESY1, and would only increase further by including smallerscale DESY1 measurements sensitive to baryonic feedback.
The substantial change in the DESY1 redshift distributions and the corresponding impact on the S_{8} constraint suggests that a similar exercise with HSCY1 data would be valuable, and that a selfconsistent combined analysis of all three current cosmic shear surveys may sharpen the tension with Planck 2018 even further. We note that the greater depth of HSC (but also future surveys such as the Legacy Survey of Space and Time, LSST) complicates a direct spectroscopic calibration of the redshift distributions and may instead require other approaches such as the crosscorrelation between photometric and spectroscopic galaxies (Newman 2008). Ultimately, the advent of additional data expected for KiDS, DES, and HSC in the coming years along with selfconsistent combined analyses of cosmic shear surveys will be crucial for resolving the current tension found with the Planck CMB.
A unified analysis of earlier cosmic shear datasets is performed in Chang et al. (2019).
Our comparisons are against the public chains as the Planck 2018 likelihoods were not publicly released at the time of this work. This is not fully selfconsistent given the mostly narrower prior ranges used by Planck (compared to our KV450 and DESY1 runs), but has a negligible impact given the constraining power of the Planck dataset.
In Wright et al. (2020), we show that the change in the estimated redshift distributions from catastrophic specz failures in the spectroscopic compilation is negligible.
See also Hartley et al. (2020) who conduct a similar simulated analysis with a different calibration sample.
Acknowledgments
We thank Chris Blake, Pedro Ferreira, Christos Georgiou, Ian Harrison, Olivier Ilbert, Harry Johnston, Nicolas Martinet, Alexander Mead, Chris Morrison, and Mohammadjavad Vakili for useful discussions. We also thank Ian Harrison for help navigating the public DES data. We thank the DES team and in particular Daniel Grün and Michael Troxel for indepth discussions that led to the inclusion of the simulation results reported in the appendix. We acknowledge the use of CAMB/COSMOMC packages (Lewis et al. 2000; Lewis & Bridle 2002). Author contributions: All authors contributed to the development and writing of this Letter. The authorship list is given in three groups: the lead authors (SJ, HHi, DT), followed by two alphabetical groups. The first alphabetical group includes those who are key contributors to both the scientific analysis and the data products. The second group covers those who have either made a significant contribution to the data products or to the scientific analysis. Part of this work was performed using the DiRAC Data Intensive service at Leicester operated by the University of Leicester IT Services, and DiRAC@Durham managed by the Institute for Computational Cosmology, which form part of the STFC DiRAC HPC Facility (https://dirac.ac.uk) acknowledging BEIS and STFC grants STK0003731, STR0023631, STR0010141, STP0022931, STR0023711, STR0008321. We acknowledge support from the European Research Council under grant numbers 693024 (SJ, DT), 770935 (HHi, AHW), 647112 (CH, MA, TT). SJ and DT acknowledge support from the Beecroft Trust and STFC. HHi is supported by Emmy Noether (Hi 1495/21) and Heisenberg grants (Hi 1495/51) of the Deutsche Forschungsgemeinschaft. NEC is supported by a Royal Astronomical Society research fellowship. HHo and AK acknowledge support from Vici grant 639.043.512, financed by the Netherlands Organisation for Scientific Research (NWO). KK acknowledges support by the Alexander von Humboldt Foundation. LM acknowledges support from STFC grant ST/N000919/1. TT acknowledges funding from the European Union’s Horizon 2020 research and innovation program under the Marie SklodowskaCurie grant agreement No 797794. We are indebted to the staff at ESOGarching and ESOParanal for managing the observations at VST and VISTA that yielded the data presented here. Based on observations made with ESO Telescopes at the La Silla Paranal Observatory under programme IDs 177.A3016, 177.A3017, 177.A3018, 179.A 2004, 298.A5015, and on data products produced by the KiDS consortium. This project used public archival data from the Dark Energy Survey (DES). Funding for the DES Projects has been provided by the US Department of Energy, the US National Science Foundation, the Ministry of Science and Education of Spain, the Science and Technology Facilities Council of the United Kingdom, the Higher Education Funding Council for England, the National Center for Supercomputing Applications at the University of Illinois at UrbanaChampaign, the Kavli Institute of Cosmological Physics at the University of Chicago, the Center for Cosmology and AstroParticle Physics at the Ohio State University, the Mitchell Institute for Fundamental Physics and Astronomy at Texas A&M University, Financiadora de Estudos e Projetos, Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro, Conselho Nacional de Desenvolvimento Científico e Tecnológico and the Ministério da Ciência, Tecnologia e Inovação, the Deutsche Forschungsgemeinschaft, and the Collaborating Institutions in the Dark Energy Survey. The Collaborating Institutions are Argonne National Laboratory, the University of California at Santa Cruz, the University of Cambridge, Centro de Investigaciones Energéticas, Medioambientales y TecnológicasMadrid, the University of Chicago, University College London, the DESBrazil Consortium, the University of Edinburgh, the Eidgenössische Technische Hochschule (ETH) Zürich, Fermi National Accelerator Laboratory, the University of Illinois at UrbanaChampaign, the Institut de Ciències de l’Espai (IEEC/CSIC), the Institut de Física d’Altes Energies, Lawrence Berkeley National Laboratory, the LudwigMaximilians Universität München and the associated Excellence Cluster Universe, the University of Michigan, the National Optical Astronomy Observatory, the University of Nottingham, The Ohio State University, the OzDES Membership Consortium, the University of Pennsylvania, the University of Portsmouth, SLAC National Accelerator Laboratory, Stanford University, the University of Sussex, and Texas A&M University. Based in part on observations at Cerro Tololo InterAmerican Observatory, National Optical Astronomy Observatory, which is operated by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation.
References
 Abbott, T. M. C., Abdalla, F. B., Allam, S., et al. 2018a, ApJS, 239, 18 [NASA ADS] [CrossRef] [Google Scholar]
 Abbott, T. M. C., Abdalla, F. B., Alarcon, A., et al. 2018b, Phys. Rev. D, 98, 043526 [Google Scholar]
 Aihara, H., Armstrong, R., Bickerton, S., et al. 2018, PASJ, 70, S8 [NASA ADS] [Google Scholar]
 Asgari, M., Tröster, T., Heymans, C., et al. 2020, A&A, 634, A127 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Audren, B., Lesgourgues, J., Benabed, K., et al. 2013, JCAP, 2013, 001 [NASA ADS] [CrossRef] [Google Scholar]
 Balestra, I., Mainieri, V., Popesso, P., et al. 2010, A&A, 512, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Benítez, N. 2000, ApJ, 536, 571 [NASA ADS] [CrossRef] [Google Scholar]
 Buchs, R., Davis, C., Gruen, D., et al. 2019, MNRAS, 489, 820 [NASA ADS] [CrossRef] [Google Scholar]
 Chang, C., Wang, M., Dodelson, S., et al. 2019, MNRAS, 482, 3696 [NASA ADS] [CrossRef] [Google Scholar]
 Chisari, N. E., Alonso, D., Krause, E., et al. 2019, ApJS, 242, 2 [NASA ADS] [CrossRef] [Google Scholar]
 Crocce, M., Castander, F. J., Gaztañaga, E., et al. 2015, MNRAS, 453, 1513 [NASA ADS] [CrossRef] [Google Scholar]
 DrlicaWagner, A., SevillaNoarbe, I., Rykoff, E. S., et al. 2018, ApJS, 235, 33 [NASA ADS] [CrossRef] [Google Scholar]
 Edge, A., Sutherland, W., Kuijken, K., et al. 2013, The Messenger, 154, 32 [NASA ADS] [Google Scholar]
 Feroz, F., Hobson, M. P., & Bridges, M. 2009, MNRAS, 398, 1601 [NASA ADS] [CrossRef] [Google Scholar]
 Fosalba, P., Crocce, M., Gaztañaga, E., et al. 2015, MNRAS, 448, 2987 [NASA ADS] [CrossRef] [Google Scholar]
 Gruen, D., & Brimioulle, F. 2017, MNRAS, 468, 769 [NASA ADS] [CrossRef] [Google Scholar]
 Hartley, W. G., Chang, C., Samani, S., et al. 2020, MNRAS, submitted [arXiv:2003.10454] [Google Scholar]
 Hikage, C., Oguri, M., Hamana, T., et al. 2019, ApJS, 71, 43 [Google Scholar]
 Hildebrandt, H., Viola, M., Heymans, C., et al. 2017, MNRAS, 465, 1454 [Google Scholar]
 Hildebrandt, H., Köhlinger, F., van den Busch, J. L., et al. 2020, A&A, 633, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Hoyle, B., Gruen, D., Bernstein, G. M., et al. 2018, MNRAS, 478, 592 [NASA ADS] [CrossRef] [Google Scholar]
 Jee, M. J., Tyson, J. A., Hilbert, S., et al. 2016, ApJ, 824, 77 [NASA ADS] [CrossRef] [Google Scholar]
 Jeffreys, H. 1961, Theory of Probability, 3rd edn. (Oxford, UK: OUP) [Google Scholar]
 Joachimi, B., Mandelbaum, R., Abdalla, F. B., et al. 2011, A&A, 527, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Johnson, A., Blake, C., Amon, A., et al. 2017, MNRAS, 465, 4118 [NASA ADS] [CrossRef] [Google Scholar]
 Joudaki, S., Blake, C., Heymans, C., et al. 2017, MNRAS, 465, 2033 [NASA ADS] [CrossRef] [Google Scholar]
 Joudaki, S., Blake, C., Johnson, A., et al. 2018, MNRAS, 474, 4894 [NASA ADS] [CrossRef] [Google Scholar]
 Kafle, P. R., Robotham, A. S. G., Driver, S. P., et al. 2018, MNRAS, 479, 3746 [NASA ADS] [CrossRef] [Google Scholar]
 Kass, R. E., & Raftery, A. E. 1995, J. Am. Stat. Assoc., 90, 773 [CrossRef] [Google Scholar]
 Kuijken, K., Heymans, C., Hildebrandt, H., et al. 2015, MNRAS, 454, 3500 [NASA ADS] [CrossRef] [Google Scholar]
 Laigle, C., McCracken, H. J., Ilbert, O., et al. 2016, ApJS, 224, 24 [NASA ADS] [CrossRef] [Google Scholar]
 Le Fèvre, O., Vettolani, G., Garilli, B., et al. 2005, A&A, 439, 845 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Le Fèvre, O., Cassata, P., Cucciati, O., et al. 2013, A&A, 559, A14 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Lewis, A., & Bridle, S. 2002, Phys. Rev. D, 66, 103511 [NASA ADS] [CrossRef] [Google Scholar]
 Lewis, A., Challinor, A., & Lasenby, A. 2000, ApJ, 538, 473 [NASA ADS] [CrossRef] [Google Scholar]
 Lilly, S. J., Le Fèvre, O., Renzini, A., et al. 2007, ApJS, 172, 70 [NASA ADS] [CrossRef] [Google Scholar]
 Lilly, S. J., Le Brun, V., Maier, C., et al. 2009, ApJS, 184, 218 [Google Scholar]
 Lima, M., Cunha, C. E., Oyaizu, H., et al. 2008, MNRAS, 390, 118 [NASA ADS] [CrossRef] [Google Scholar]
 Mandelbaum, R. 2018, ARA&A, 56, 393 [NASA ADS] [CrossRef] [Google Scholar]
 Mandelbaum, R., Blake, C., Bridle, S., et al. 2011, MNRAS, 410, 844 [NASA ADS] [CrossRef] [Google Scholar]
 Masters, D., Capak, P., Stern, D., et al. 2015, ApJ, 813, 53 [NASA ADS] [CrossRef] [Google Scholar]
 Mead, A. J., Peacock, J. A., Heymans, C., et al. 2015, MNRAS, 454, 1958 [NASA ADS] [CrossRef] [Google Scholar]
 Morrison, C. B., Hildebrandt, H., Schmidt, S. J., et al. 2017, MNRAS, 467, 3576 [NASA ADS] [CrossRef] [Google Scholar]
 Newman, J. A. 2008, ApJ, 684, 88 [NASA ADS] [CrossRef] [Google Scholar]
 Newman, J. A., Cooper, M. C., Davis, M., et al. 2013, ApJS, 208, 5 [Google Scholar]
 Planck Collaboration VI. 2020, A&A, in press, https://doi.org/10.1051/00046361/201833910 [Google Scholar]
 Popesso, P., Dickinson, M., Nonino, M., et al. 2009, A&A, 494, 443 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Rozo, E., Rykoff, E. S., Abate, A., et al. 2016, MNRAS, 461, 1431 [NASA ADS] [CrossRef] [Google Scholar]
 Takahashi, R., Sato, M., Nishimichi, T., et al. 2012, ApJ, 761, 152 [NASA ADS] [CrossRef] [Google Scholar]
 Troxel, M. A., MacCrann, N., Zuntz, J., et al. 2018, Phys. Rev. D, 98, 043528 [NASA ADS] [CrossRef] [Google Scholar]
 Vanzella, E., Cristiani, S., Dickinson, M., et al. 2008, A&A, 478, 83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Wright, A. H., Hildebrandt, H., Kuijken, K., et al. 2019, A&A, 632, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Wright, A. H., Hildebrandt, H., van den Busch, J. L., et al. 2020, A&A, 637, A100 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Zuntz, J., Paterno, M., Jennings, E., et al. 2015, Astron. Comput., 12, 45 [NASA ADS] [CrossRef] [Google Scholar]
Appendix A: Tests on MICE2 mock catalogs
We test the spectroscopic DIR calibration described in Sect. 3 on mock catalogs created from the public MICE2 simulation (Fosalba et al. 2015; Crocce et al. 2015). These mock catalogs are similar to the ones used in Wright et al. (2020) and will be described in detail in van den Busch et al. (in prep.) for KV450. Here, we further describe how the mock catalogs are designed to resemble the DESY1 data. It is important to stress that this exercise is not meant to produce fully realistic mock catalogs that resemble the data in all aspects. Rather, it is aimed at producing mock catalogs that are similarly complex as the data. As such, the mock catalogs can be used to inform us about the expected size of systematic uncertainties in the DIR calibration.
We first estimate the observed size and shape of each simulated galaxy by taking the semi major and minor axes reported in the MICE2 catalog and adding the seeing (from DrlicaWagner et al. 2018) in quadrature. Together with the 10σ limiting magnitudes quoted in DrlicaWagner et al. (2018) we estimate the noise level of the evolutioncorrected model magnitudes. Subsequently, drawing from the corresponding Gaussian distributions, we create a noise realization for each galaxy in each band and recalculate the magnitude uncertainty based on this realization. This yields a catalog of ‘observed’ magnitudes and their errors. We found that treating the limiting magnitudes from DrlicaWagner et al. (2018) as 10σ limits in this way results in a mock catalog that is too shallow compared to the data, which might be attributed to aperture effects. Deliberately adapting the limits to 12σ yields a good match between data and mocks in terms of the noise level in the four DES bands. We note that we include weaklensing magnification in all magnitude estimates although this particular aspect has virtually no impact on the results.
Subsequently, we match each mock galaxy to its nearest neighbor in the data catalog in 4dimensional magnitude space and assign it the responsivity weight of that galaxy in the data. This yields a properly weighted mock source sample. We run BPZ to estimate photoz for the mock galaxies using the setup described in Hildebrandt et al. (2020), but restricting the redshift range to that of MICE2 (0.06 < z < 1.4)^{10}. This setup differs slightly from what is done in DESY1 (Hoyle et al. 2018) but the properties of the resulting photoz (scatter, bias, and outlier rate as a function of photoz and magnitude) are very similar to what is seen when comparing the DES 4band photoz to the combined specz sample on the data.
The next crucial step is to select samples from the mock catalog that resemble the specz samples used in the DIR analysis presented in Sect. 3. Here, we apply the same selection criteria as the zCOSMOS (i < 22.5; Lilly et al. 2007), VVDS (i < 24; Le Fèvre et al. 2005, 2013), and DEEP2 (R < 24.1 plus color selection; Newman et al. 2013) teams to areas that correspond to the areas sampled by the data. Moreover, we implement the magnitude and partly also redshiftdependent spectroscopic success rates reported in those papers. Where necessary, we further downsample the catalogs to yield numbers comparable to the data. This is required because the number density as a function of redshift is not identical in the simulation and the real Universe.
We find that the fiducial DEEP2 color selection yields a redshift distribution that looks somewhat different from the one in the data. This is probably due to the fact that galaxy colors in MICE2 are not fully realistic, especially at high redshift. Inspecting the location of those highz galaxies that are supposed to be targeted by DEEP2 in B − R versus R − I color space, we slightly adapt those criteria to take the slightly different colors of MICE2 galaxies into account. This yields a better match to the observed spectroscopic redshift distribution. In the end, this adaptation does not have a strong influence on the results as we find by running tests with the original as well as the adapted cuts.
We select tomographic bins from the MICE2 realization of the DESY1 data and calibrate those with the DIR method using the mock specz samples described above. Comparing the true mean redshifts of the galaxies in those four tomographic bins to the ones estimated from DIR on the mocks yields offsets of Δz_{1} = ⟨z_{1}⟩_{True} − ⟨z_{1}⟩_{DIR} = −0.026, Δz_{2} = −0.021, Δz_{3} = −0.033, and Δz_{4} = −0.012 (see also Table A.1). The exact values depend somewhat on the exact definition of the mock specz sample.
KV450 and DESY1 changes in the mean redshift for each tomographic bin informed by the MICE2 mock catalogs (i.e. Truth – DIR_{MICE2}).
These results indicate that we might overestimate the true redshifts of the tomographic bins and hence underestimate S_{8}, an effect opposite – albeit smaller – to the one seen when replacing the original DES n(z) with our spectroscopic recalibration. This could be attributed to the color preselection of DEEP2 in combination with a magnitude space that is limited to four dimensions, such that the photometric information from the DES griz filters alone is not capable of accurately breaking colorredshift degeneracies and downweighting the highz DEEP2 galaxies. This problem was already noted in Gruen & Brimioulle (2017) using a similar technique. While MICE2 shows an impressive similarity to the real Universe, there is certainly the caveat that the simulation is limited to z < 1.4. The modeling of highz tails is therefore not possible. The issue with the colors of highz galaxies further illustrates the limitations of such a mock. Whether these results hold with a mock catalog extending farther in redshift remains to be seen and will be investigated in future work^{11}.
The DESY1 biases are comparable in size to what was found on very similar mocks resembling the KV450 data, as reported in Wright et al. (2020) and Table A.1. While the DIR calibration on 9band KV450 data should be less prone to systematic uncertainties than the one on the 4band DESY1 data, we suppose that the greater depth of KV450 complicates the calibration and leads to biases of the same order. Despite these limitations, the biases found in the mock analysis give an indication of the systematic error inherent in the DIR calibration with typical spectroscopic catalogs. In order to address this concern, we run another cosmology parameter analysis where we apply these shifts by centering the priors on the Δz_{i} parameters on these values instead of zero. As an uncertainty we use the standard deviation over 100 lines of sight in the mocks but multiply this by an arbitrary factor of two to account for limitations in the simulation.
Following this approach, we find that for KV450, for DESY1, and for KV450+DESY1 (as shown in Fig. A.1). These constraints correspond to changes of ΔS_{8} = [−0.014, +0.012, −0.002] and Δχ^{2} = [0.52, 2.0, 2.8]^{12} relative to our fiducial results in the main body of the paper for KV450, DESY1, and KV450+DESY1, respectively. Given the comparable size of the applied MICE2 and bootstrap errors on the Δz_{i} parameters, we do not find significant differences in the size of the S_{8} uncertainties (the largest difference corresponds to a 2% increase in the uncertainty).
Fig. A.1. Marginalized posterior contours in the S_{8} − Ω_{m} plane (inner 68% CL, outer 95% CL) following an alternative analysis of the cosmic shear datasets with MICE2 priors on the Δz_{i} parameters. We show KV450 in green, DESY1 in purple, KV450+DESY1 in pink, and Planck 2018 in orange. 
We note that the KV450 constraint on S_{8} shifts toward lower values despite substantial negative Δz_{i} shifts in the first three tomographic bins. This is explained by the greater constraining power of the higher redshift fourth and fifth bins which exhibit positive shifts in their mean redshifts. The change in S_{8} is −0.017 for DESY1 relative to the COSMOS2015 calibrated redshift distributions, which corresponds to a 0.5σ shift in terms of the larger DESY1 uncertainty in the KV450 setup (as compared to the fiducial shift of 0.8σ; noting the significance of both shifts increase in terms of the original DESY1 uncertainty). In other words, the MICE2 mocks suggest that the redshift distributions from the prerevised DIR in the case of KV450 and from COSMOS2015 in the case of DESY1 both result in an overestimated posterior mean of S_{8} (by 0.014 and 0.017, respectively). Here, the concordance in the MICE2revised S_{8} constraints of KV450 and DESY1 is at the 1.1σ level (as compared to the stronger fiducial concordance in S_{8} of 0.6σ), while the combined KV450+DESY1 constraint on S_{8} remains unchanged at 0.1σ relative to the fiducial result.
In summary, this analysis on the MICE2 mocks illustrates the importance of realistic mock catalogs for future analyses of weaklensing surveys. In particular, a very wide redshift range is desirable to properly account for photoz outliers in the systematic error estimates.
All Tables
DESY1 mean redshifts of the four tomographic bins calibrated with COSMOS2015 (T18) and spectroscopic redshifts (this work).
KV450 and DESY1 changes in the mean redshift for each tomographic bin informed by the MICE2 mock catalogs (i.e. Truth – DIR_{MICE2}).
All Figures
Fig. 1. Marginalized posterior contours in the S_{8}–Ω_{m} plane (inner 68% CL, outer 95% CL). We show the KV450 constraints in green (solid) using an analysis setup that follows H20, but including an additional redshift dependence of the IA signal (denoted ‘KV450’). In black (dashed), we show the DESY1 constraints corresponding to the original T18 analysis, noting that the sum of neutrino masses is varied in this analysis (and hence the contour should not be directly compared with the orange (solid) Planck 2018 contour where the neutrino mass is fixed). The blue (solid) contours show the DESY1 constraints where an identical setup to the KV450 analysis is used (along with the original DESY1 redshift distributions). 

In the text 
Fig. 2. DESY1 redshift distributions for the four tomographic bins (in black, blue, cyan, and red, respectively), showing the publicly released distributions (dashed) and the spectroscopically determined distributions using the DIR approach (solid). The distributions based on spectroscopy are systematically shifted to higher redshifts than the original distributions (accounting for Δz_{i}), and hence favor a lower value of S_{8} than the original DESY1 analysis in T18. See Table 1 for the mean redshifts of the different tomographic bins for the two approaches. The vertical dotted lines denote the tomographic bin boundaries. The small inset shows the redshift distribution of the matched photometry/spectroscopy catalog for DESY1 containing approximately 30 000 objects used in the DIR method. The spectroscopic calibration samples are obtained from zCOSMOS, VVDSDeep (2h), CDFS, DEEP2 (2h), and VVDSWide (22h). We do not show the uncertainties in the n(z) for visual clarity (instead see Table 1 for uncertainties in the mean redshifts). 

In the text 
Fig. 3. Marginalized posterior contours in the S_{8} – Ω_{m} plane (inner 68% CL, outer 95% CL) for KV450 (green), DESY1 following a spectroscopic calibration of the redshift distributions and identical setup to the KV450 analysis (purple), the above combined (pink), and Planck 2018 (orange). 

In the text 
Fig. A.1. Marginalized posterior contours in the S_{8} − Ω_{m} plane (inner 68% CL, outer 95% CL) following an alternative analysis of the cosmic shear datasets with MICE2 priors on the Δz_{i} parameters. We show KV450 in green, DESY1 in purple, KV450+DESY1 in pink, and Planck 2018 in orange. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.