Complementary cosmological simulations

Context. Cosmic variance limits the accuracy of cosmological N -body simulations, introducing bias in statistics such as the power spectrum, halo mass function, or the cosmic shear. Aims. We provide new methods to measure and reduce the e ﬀ ect of cosmic variance in existing and new simulations. Methods. We ran pairs of simulations using phase-shifted initial conditions with matching amplitudes. We set the initial amplitudes of the Fourier modes to ensure that the average power spectrum of the pair is equal to the cosmic mean power spectrum from linear theory. Results. The average power spectrum of a pair of such simulations remains consistent with the estimated nonlinear spectra of the state-of-the-art methods even at late times. We also show that the e ﬀ ect of cosmic variance on any analysis involving a cosmological simulation can be estimated using the complementary pair of the original simulation. To demonstrate the e ﬀ ectiveness of our novel technique, we simulated a complementary pair of the original Millennium run and quantiﬁed the degree to which cosmic variance a ﬀ ected its the power spectrum. The average power spectrum of the original and complementary Millennium simulation was able to directly resolve the baryon acoustic oscillation features.


Introduction
Large-scale galaxy surveys are important probes to test the standard Lambda Cold Dark Matter (ΛCDM) model of cosmology in addition to alternative cosmologies.Past surveys, such as the Automated Plate Measurement (APM) galaxy survey (Maddox et al. 1990;Baugh & Efstathiou 1994), 2dF (Percival et al. 2001), Two Micron All-Sky Survey (2MASS) (Allgood et al. 2001), Sloan Digital Sky Survey (SDSS) (Tegmark et al. 2004), Dark Energy Survey (DES) (The Dark Energy Survey Collaboration 2005), Dark Energy Spectroscopic Instrument (DESI) (DESI Collaboration et al. 2016), and the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) (Chambers et al. 2016), have provided valuable data regarding the evolution of the matter in our Universe.The upcoming Rubin (LSST Science Collaboration et al. 2009), Roman (Green et al. 2012), Spectro-Photometer for the History of the Universe, Epoch of Reionization, and ices Explorer (SPHEREx) (Doré et al. 2014) and Euclid (Tutusaus et al. 2020) surveys will greatly extend these observations by mapping unprecedented volumes with previously unseen etendue.Constraints on cosmological parameters are derived by comparing observations and theoretical predictions using statistical quantities such as the power spectrum, halo mass function, or the cosmic-shear two-point correlation function.Cosmological N-body simulations are widely used for calculating these quantities at late times (e.g., Valdarnini & Borgani 1991;Rimes & Hamilton 2006;Jenkins et al. 2001;Casarini et al. 2012).To answer the demands of the ©2022.All rights reserved.large surveys, simulations were run to calculate the nonlinear evolution of the 3D matter distribution inside large volumes with high precision over the years, such as the Millennium run (Springel et al. 2005), Millennium-II (Boylan-Kolchin et al. 2009), Bolshoi (Klypin et al. 2011), Millennium-XXL (Angulo et al. 2012), MultiDark (Klypin et al. 2016), the Euclid flagship simulation (Potter et al. 2017), and the Outer Rim Simulation (Heitmann et al. 2019).The results of these simulations have been used in hundreds of research projects and are still regularly used today.
Both observations and cosmological simulations are affected by cosmic variance, because they are both only sampling a finite volume of the cosmic density field (Driver & Robotham 2010;Moster et al. 2011;Schneider et al. 2016).The most straightforward method to estimate the effects of cosmic variance is to use an ensemble of cosmological simulations such as the LasDamas (McBride et al. 2009), the Indra (Falck et al. 2021) or the ABACUSSUMMIT (Maksimova et al. 2021) simulation suites.By comparing the results of hundreds of simulations, the cosmic mean and variance can be directly calculated for a given quantity.Paired (Pontzen et al. 2016) and paired-and-fixed (Angulo & Pontzen 2016) initial conditions can effectively reduce the cosmic variance in simulations.Detailed descriptions of these initial conditions are presented in Sect.2.1 and Sect.2.2, respectively.By running just two simulations, the results of the paired-and-fixed initial conditions method can closely match the average of the ensembles of 300 simulations for density distribution, power spectrum, and bispectrum.Although the statistics of the average dark matter clustering are precise in these simulations, the covariances from paired-and-fixed simulations are suppressed, and as a consequence, these cannot be used for generating mock galaxy catalogs as shown in Klypin et al. (2020).Harnois-Déraps et al. (2019) pointed out that reducing the effects of cosmic variance on the cosmic shear covariance calculation is possible with matched pairs of simulations.
Our aim is to extend the techniques shown above by proposing alternative ways to generate initial conditions for cosmological simulations.The outline of this paper is as follows: In Sect.2, we overview the traditional algorithm used to generate initial conditions and propose modifications to this in order to reduce the cosmic variance.Then, in Sect.3, we compare the different simulations that started from the modified initial conditions.In Sect.4, we demonstrate our new best performing method on the original Millennium simulation by running its complementary pair.Using this new simulation, we estimate the effect of cosmic variance on the original Millennium power spectrum.Finally, we summarize our results.

Initial conditions for cosmological simulations
The main goal of initial-condition creation is to represent a δ(x) = ρ(x)/ρ − 1 overdensity field with discrete particles.For cosmological simulations, this field should be consistent with the P target (k) initial power spectrum.The first step in this process is to generate the Fourier transform of the overdensity field (Hockney & Eastwood 1988;Sirko 2005): As this is a complex field, it can be written as and it should satisfy the usual Hermitian constraints, because its inverse Fourier transform is a real field.For a random Gaussian realization of a target P target (k) initial power spectrum, a(k) and b(k) should be drawn independently from a Gaussian distribution (Klypin & Holtzman 1997) for every resolved k wavenumber vector.Equivalent to this is to generate the magnitude A(k) from the Rayleigh distribution and the corresponding phase ϕ(k) is chosen randomly from a uniform distribution on a range between 0 and 2π independently (Sirko 2005), where σ(k) 2 = V P target (k)/(2π) 3 , V is the volume of the simulation, and k = |k|.After the initial field is generated in Fourier space, the δ(x) overdensity field is calculated using the inverse Fourier transform.In the last step of the initialcondition generation, this Eulerian density is transformed into a Lagrangian representation of particles using the Zel'dovich approximation (Zel'dovich 1970) or the second-order Lagrangian perturbation theory (2LPT) (Crocce et al. 2006;Jenkins 2010).
The power spectrum of the traditional initial conditions, will not be equal to P target (k), because the initial amplitudes are drawn from a Rayleigh distribution, and this is the main source of the cosmic variance in simulations.In this article, we use |k |=k to denote the average value of the |k | = k surface in Fourier space.In the rest of this paper, we discuss modifications to this traditional method and show how these modifications can reduce the effects of cosmic variance in cosmological N-body simulations.

Inverted initial conditions
This method was proposed by Pontzen et al. (2016) and can be used to create a density-inverted counterpart to an existing initial condition.This can be achieved by running the same initial condition generator again with the same parameters and random seed, but with ϕ(k) phases shifted by π.As a consequence of this phase shift, the initial δ(x) overdensity fields are inverted in the counterpart, that is, underdensities are substituted for overdensities and vice versa.The simulations that start from the pairs of these initial conditions are called Paired simulations.The power spectrum of the original and the inverted initial condition are the same, because they only differ in the phases.Despite the fact that the pair have the same initial power spectrum, the cosmic variance can still be reduced by averaging the results of paired simulations, as the averaging cancels phase correlations that emerge from the late-time nonlinear evolution of the density field (Villaescusa-Navarro et al. 2018).

Fixed and paired-and-fixed initial conditions
The next step in reducing cosmic variance was developed by Angulo & Pontzen (2016).In their approach, the amplitudes of the initial conditions are fixed to the expected mean value by setting The simulations that start from this initial condition are called Fixed simulations.A further reduction of the cosmic variance can be achieved by combining this technique with the paired simulation method described in Sect.2.1 (Angulo & Pontzen 2016).These paired-and-fixed (PF) simulations can reduce the variance of the power spectrum by factors as large as 10 6 .A detailed description of the statistical properties of the paired and PF simulations can be found in Villaescusa-Navarro et al. (2018).
Although the PF method is very effective in reducing the variance of the power spectrum, the halo mass function, and the bispectrum, Klypin et al. (2020) found that the covariances are significantly suppressed, in contrast to the series of traditional simulations with normal Gaussian initial conditions.The main source of this bias is the fact that the initial conditions are missing the random amplitude fluctuations.As this is a well known and very effective method for reducing the cosmic variance, we use the PF simulations as a reference when we compare the different power spectra in this paper.

Matched pairs
An independent approach was used by Harnois-Déraps et al.
(2019) to reduce cosmic variance in cosmic-shear covariance calculations.These authors generated a large ensemble of independent initial conditions and chose two from the set that had an average that was closest to P target (k).This pair of initial conditions was used as the starting point for a pair of simulations.Unlike the Paired simulations described in Sect.2.2, the effects of phase correlations are not canceled out during the averaging of these simulations, as the ϕ(k) phases are completely independent in these pairs.

Paired-and-mean initial conditions
The first new method we propose in this article is the pairedand-mean (PM) initial conditions.According to this method, the initial δ(k) field is generated in the traditional way in the first step, but the initial amplitudes are scaled by where A PM (k) are the new PM amplitudes used in the initial condition, and A(k) are the amplitudes generated with the traditional technique.This transformation guarantees that the initial power spectrum is equal to P target (k), while there are still fluctuations in the amplitudes.We propose that these simulations be run in inverted pairs in order to reduce the effect of phase correlations.
In practice, PM initial conditions can be generated using traditional initial-condition generators such as NGen_IC (Springel 2015) or 2LPTic (Crocce et al. 2012(Crocce et al. , 2006) ) using the following target power spectrum: where P IC (k) is the power spectrum of a traditional initial condition generated with the same code, parameters, and random seed.As the Zel'dovich or 2LPT approximations also have an effect on the final power spectrum (Crocce et al. 2006), P IC (k) should be calculated from the generated particle distribution.

Complementary initial conditions
Our aim is to use pairs of simulations as part of this new technique, where the initial condition of the first simulation is generated in the traditional way, while the second has modified initial amplitudes and phases.We refer to the former as the "original" simulation, and the latter as the "complementary" simulation.We choose the initial A C (k) values in the complementary initial condition so that the average of the two initial P(k) matches the P target (k) power spectrum.This constraint can be written as where A C (k) is the amplitude in the complementary initial condition at k wavenumber.This can only be satisfied for the k wavenumbers, where must be a positive.Although this cannot be done for the entire |k| = k surface, this technique can lead to a significant reduction in the variance of the power spectrum by compensating amplitudes that satisfy this criterion.There are an infinite number of A C (k) fields that satisfy Eq. 9 when Eq. 10 is true.In this study, we chose to generate these fields with the following formula: For the modes that cannot be compensated, instead of setting them at zero, we chose in order to avoid introducing any strong unwanted beat-coupling effects (Hamilton et al. 2006).The advantage of this choice is that the complementary initial condition can be generated with the usual codes simply by setting the target power spectrum to with the same random seed and parameters, but shifted phases.With this new method, it is possible to provide complementary simulations for existing simulations.By averaging the results of these pairs, a significant reduction in the cosmic variance can be expected, which is similar to the results of the PF simulations.
We call these pairs original-complementary (O-C) pairs.It is possible to generate initial conditions without a phase shift, and these are called original-complementary amplitude (O-CA) pairs.

Comparing the different simulation methods
To test the effects of the new initial-condition-generation techniques, we ran four sets of cosmological N-body simulations.A summary of the simulations is provided in Table 1.All simulations were run with GADGET-4 (Springel et al. 2021), and two had the Planck 2018 cosmological parameters (Planck Collaboration et al. 2020), while two used the parameters of the Millennium run (Springel et al. 2005).In the L box = 260.0Mpc/hand L box = 800.0Mpc/hsets of simulations, there were a few modes where Eq. 10 was not satisfied: only one mode in the L box = 260.0Mpc/hset and two modes in the L box = 800.0Mpc/hseries.To compare the power spectra of the different simulation techniques, we used the PF power spectrum as a reference in the L box = 260.0Mpc/hand L box = 800.0Mpc/hsimulation sets, and calculated the Fig. 1.Reduced chi-squared statistics of the power spectrum.If this quantity is close to one, the power spectrum of the given method is consistent with the PF method.While the original simulation becomes inconsistent with the cosmic average power spectrum at late times, the average power spectrum of the PM and O-C simulations is a good match to the PF results.We used all available k ≤ 1.5h/Mpc modes when we calculated these χ 2 ν (z) functions.
reduced χ 2 for each M method, where ν is the total number of k i bins, ∆N m,i is the number modes per bin, and is the expected statistical error originating from the sample variance (Schneider et al. 2016).The choice of binning has a small impact on the calculated χ 2 ν (z) quantity, as the contribution of each bin in Eq. 15 is weighted by ∆N m,i .The calculated reduced chi-squared statistic as a function of the redshift is plotted in the top panel of Fig. 1.The same quantity was calculated for the L = 800Mpc/h simulation set, and this can be seen in the bottom panel of the same figure.Although the original simulations in both sets were close to χ 2 O 1 at early times, this value increased quickly as the matter field evolved.The power spectra from the P and O-CA simulations perform significantly better than the original simulations, but their χ 2 values are still above 1 at a redshift of z = 0.As expected, the PM and O-C simulations perform best during these tests: the χ 2 PM and χ 2 O−C values remain below 1 throughout the entire simulated redshift range.An addi-tional visualization of the effect of the new method on the power spectrum is shown in Fig. 2, which shows the ratio of the O-C pair and the PF power spectrum for the L = 800Mpc/h simulations with 1% and 1σ deviations.The complementary simulation method significantly reduces the cosmic variance of the power spectrum at all simulated redshifts, and has a sub-percent accuracy for all compensated modes, even at the largest resolved scales.
The final simulation set contains a single O-C pair.Each simulation in the pair has L = 500Mpc/h linear size with 10 billion particles and requires 0.74 million CPU hours to run.We used this set in Sect. 4 to measure the differences between two independent O-C realizations of the same cosmology.We call this simulation pair "NewMillennium".

Complementary Millennium-run
To further demonstrate the effectiveness of the O-C pair method in reducing cosmic variance, we planned to run the complementary pair of the original Millennium run.This dark matter only TreePM simulation contains 2160 3 particles in a periodic box with a linear size of L box = 500M pc/h, and follows the evolution of cosmic structures from redshift z = 127 to z = 0 with more than 10 000 time steps.The cosmological parameters are the following: Ω m = 0.25, Ω b = 0.045, h = 0.73, Ω Λ = 0.75, n = 1, and σ 8 = 0.9.As using the same Lean-Gadget-2 code (Springel et al. 2005;Springel 2005) that was used to run the original simulation is not possible on modern systems, we chose to use the more recent Gadget-4 for this task.To estimate the effects of using a different simulation code, we re-simulated the small volume version of the Millennium run called "Milli-Millennium" with Gadget-4.We find a percent-level discrepancy in the power spectrum at the final state between the results of the LeanGadget-2 and Gadget-4 simulations.We conclude that these differences emerge from a restart of the original simulation, and from some unknown differences in the default setups for the transition between the particle mesh and tree force calculations.As we were unable to minimize the effects emerging from using different versions of the same code, we re-simulated the original Millennium run with Gadget-4.We used the resources of the Texas Advanced Computing Center (TACC) to run the Millennium simulation and its complementary pair.
During the initial condition creation of the complementary pair, we were able to compensate all modes of the original initial condition, because all k i bins satisfy Eq. 10.The mass distributions of both simulations are plotted in Fig. 3.The dimensionless power spectrum of both simulation and the estimated cosmic average power spectrum can be seen in Fig. 4. To further show that the new simulation method produces consistent results, we  compared the power spectrum of the Millennium complementary pair with the average power spectrum of the NewMillennium simulation set in Fig. 5. Using the complementary pair, we estimated the bias of the power spectrum of the original Millennium run due to the initial sampling variance and mode coupling by calculating the weighted mean power spectrum ratio where P mill (k) is the Millennium power spectrum, and P OC (k) is the averaged O-C power spectrum.We find that the original Millennium run, on average, underestimated the power spectrum by 0.997% for k ≤ 1.0h/Mpc scales and by 0.0881% for 50h/Mpc ≥ k > 1.0h/Mpc wavenumbers at the z = 0 redshift.While by itself the original simulation is not large enough to resolve the baryonic wiggles in the power spectrum due to the sample variance at the low-k modes, the average of the two new complementary simulations is able to effectively follow the evolution of these baryon acoustic oscillation (BAO) features, as it can be seen in Fig. 6 with the linear and estimated nonlinear spectrum calculated by Code for Anisotropies in the Microwave Background (CAMB; Lewis et al. (2000)).We calculated similar results from the original simulation, but this was only achievable by rescaling the linear initial power spectrum by the estimated scale-dependent nonlinear growth function of the Millennium run.
The new Millennium run and its complementary pair are available to the public on the SciServer platform (URL: https: //www.sciserver.org/)hosted by the Institute for Data Intensive Engineering and Science at the Johns Hopkins University (Taghizadeh-Popp et al. 2020).We made all particle data, halo (Davis et al. 1985) and subhalo (Springel et al. 2001) catalogs, and power spectra available here.

Conclusions
1. We reviewed the traditional initial condition generation method, and propose two new techniques for pairs of simulations with reduced cosmic variance: the paired-and-mean and the complementary simulation method.2. We compared the power spectrum calculated from the new methods with the paired-and-fixed method, and show that the paired-and-mean and original-complementary pairs produce similar results to the paired-and-fixed method.3. A covariance estimation is left to future work, but based on the fact that the initial amplitudes are not fixed in the new methods, we can expect more precise results than those achievable with the paired-and-fixed method.4. We show that complementary pairs for existing simulations can be generated, and the sample-variance errors of the original run can be estimated with this pair due to the fact that structures are evolving in the opposite way in the new simulation.5. To demonstrate the effectiveness of the originalcomplementary method, we generated the complementary pair of the Millennium-run.Using this new simulation, we show that the original simulation underestimates the power spectrum at all scales at the sub-percent level.The average power spectrum of these two simulations is able to directly resolve the BAO features of the power spectrum.
In this paper, we demonstrated that our new methods can effectively reduce the cosmic variance in N-body simulations.These new methods will be useful in making predictions for future surveys, testing cosmological models, and estimating errors of independent cosmological simulations.Fig. 6.Power spectra of the dark matter density field in the Millennium run and its complementary pair at the BAO scales.All data points have been divided by a linear dark matter-only power spectrum.While the original Millennium simulation cannot resolve these scales by itself due to sample variance, the average spectrum of the original and complementary run can efficiently follow the evolution of the BAO features and matches well at the linear scales with the linear CAMB power spectra.

Fig. 2 .
Fig. 2. Ratio of the O-C and PF power spectra in the L = 800Mpc/h simulation set.The red shaded regions show the ranges where uncompensated modes were present in the complementary simulation.Although these modes were not fully compensated in the average, we included these in the χ 2 ν (z) statistics plotted in Fig. 1.

Fig. 3 .
Fig. 3. Dark matter density field of the Millennium run (left) and its complementary pair (right) at z = 0 redshift.The bottom row is a zoom onto the region outlined by the red boxes in the figures in the top row.As the ϕ(k) phases are shifted by π in the initial condition, a region that collapses to a halo in the original Millennium run tends to expand into a void in the corresponding complementary pair and vice versa, which is similar to the paired simulation method.

Fig. 5 .
Fig.5.Estimated standard deviation (top) and bias (bottom) of the Millennium simulation and its complementary pair.We used the NewMillennium simulation pair as a reference.For this plot, we used 1100 wave number bins with ∆k = 9 • 10 −3 h/Mpc bin size in 0.0125h/Mpc < k < 10.0h/Mpc range.While individually the two simulations show significant variance in this wave number range and binning, the average of the pair matches the independent NewMillennium average power spectrum with sub-percent accuracy.

Table 1 .
Parameters of simulations