Validating the Fisher approach for stage IV spectroscopic surveys

In recent years forecasting activities have become a very important tool for designing and optimising large scale structure surveys. To predict the performance of such surveys, the Fisher matrix formalism is frequently used as a fast and easy way to compute constraints on cosmological parameters. Among them lies the study of the properties of dark energy which is one of the main goals in modern cosmology. As so, a metric for the power of a survey to constrain dark energy is provided by the Figure of merit (FoM). This is defined as the inverse of the surface contour given by the joint variance of the dark energy equation of state parameters $\{w_0,w_a\}$ in the Chevallier-Polarski-Linder parameterisation, which can be evaluated from the covariance matrix of the parameters. This covariance matrix is obtained as the inverse of the Fisher matrix. Inversion of an ill-conditioned matrix can result in large errors on the covariance coefficients if the elements of the Fisher matrix have been estimated with insufficient precision. The conditioning number is a metric providing a mathematical lower limit to the required precision for a reliable inversion, but it is often too stringent in practice for Fisher matrices with size larger than $2\times2$. In this paper we propose a general numerical method to guarantee a certain precision on the inferred constraints, like the FoM. It consists on randomly vibrating (perturbing) the Fisher matrix elements with Gaussian perturbations of a given amplitude, and then evaluating the maximum amplitude that keeps the FoM within the chosen precision. The steps used in the numerical derivatives and integrals involved in the calculation of the Fisher matrix elements can then be chosen accordingly in order to keep the precision of the Fisher matrix elements below this maximum amplitude...


Introduction
Since the discovery of the acceleration of the expansion of the Universe in the late 90s (Riess et al. 1998;Perlmutter et al. 1999), the ΛCDM model remains the most successful model in cosmology. It provides a simple and accurate description of the properties of the Universe, with a very limited number of parameters that are now constrained at the few percent level using measurements from the Planck CMB mission (Planck Collaboration: Aghanim et al. 2018) and other experiments (see e.g. Percival et al. (2001); Blake et al. (2011); Alam et al. (2017); Abbott et al. (2018)). The accelerated expansion of the Universe can be explained by postulating a new form of matter or mechanism dubbed "dark energy", this component coming in addition to the dark matter. Dark energy dominates the energy budget of the Universe today, making up for ∼ 70% of its total energy density.
During the past two decades a variety of dark energy experiments have been proposed in order to study the observed cos-mic acceleration of the Universe through various probes: optical spectroscopic and photometric galaxy clustering, weak gravitational lensing, supernovae, galaxy clusters being the most commonly used. In order to quantify the performance of a survey and in particular its ability to constrain the properties of dark energy, the Dark Energy Task Force (DETF) defined a metric, the figure of merit (FoM, Albrecht et al. 2006): a figure inversely proportional to the surface bounded by the confidence contours for the w 0 and w a parameters from the Chevalier-Polarski-Linder (CPL) parameterisation (Chevallier & Polarski 2001;Linder 2003). The FoM is now standardly used to quantify the performance of a particular experiment in constraining the dark energy parameters, We are currently entering the era of high-precision cosmology with a suite of stage IV experiments about to be deployed. Currently ongoing and forthcoming ground based experiments as well as space missions include DESI 1 (DESI Collaboration: Aghamousa et al. 2016a), Euclid 2 (Laureijs et al. 2011;Amendola et al. 2018;Euclid Collaboration: Blanchard et al. 2019), LSST 3 (LSST Science Collaboration: Abell et al. 2009;Ivezić et al. 2019;Chisari et al. 2019), WFIRST 4 (Akeson et al. 2019;Dore et al. 2019), and the SKA 5 (Square Kilometre Array Cosmology Science Working Group: Bacon et al. 2020). Forecasting activities are the key for predicting future constraints on cosmological parameters and the corresponding dark energy FoM. Indeed, forecasts not only allow to predict the performance of a future survey but also help designing and optimising it.
There exist different ways to compute forecasts. For instance one can use a simulated realization of the data and use a Monte Carlo Markov Chain (MCMC) method to sample the likelihood of the parameters of interest. This approach is known to provide reliable estimations of the constraints on the cosmological parameters (see e.g. Dunkley et al. 2005). However, it is often quite time consuming, especially if one wishes to explore a wide variety of cases. The Fisher matrix formalism as adopted by the DETF, is a computationally fast approach, which allows to get constraints in a few seconds or minutes. However, this method is technically touchy as the covariance matrix is obtained by inversion of the Fisher matrix, which coefficient calculations often require the computation of derivatives and integrals which can lead to inaccurate results, e.g. if the step sizes (as shown in later sections) are not chosen properly to ensure the appropriate precision. Moreover, the Fisher formalism can miss contour shapes and thereby potential degeneracies (see, for example Wolz et al. 2012) as it assumes a Gaussian likelihood in frequentist approach or a Gaussian posterior of the parameters in Bayesian approach; that might not be true if, in particular, parameters are not tightly constrained by the experiment(s). Several methods have been proposed to handle these complex situations (Joachimi & Taylor 2011;Sellentin et al. 2014;Sellentin 2015;Amendola & Sellentin 2016).
In the present study we propose a numerical approach that is independent of the specific problem treated, with the purpose to test the numerical reliability of constraints obtained within the Fisher matrix formalism. As a working example, we consider the spectroscopic galaxy clustering probe for the following stage IV surveys: DESI, Euclid and WFIRST-2.4. The paper is organised as follows: in Section 2 we give a brief description of the aforementioned surveys. In Section 3 we describe the different tests performed in order to validate the Fisher approach: namely, we determine the required precision to compute a valid Fisher matrix, carry out several stability and convergence tests, and perform a comprehensive comparison with the MCMC approach. In Section 4 we present the results and address the following question: How can one numerically validate the Fisher matrix approach? Finally, we report our conclusions in Section 5.

Euclid
Euclid is a medium-class ESA mission with the main objective to understand the Universe's expansion, dark energy, and dark matter. Euclid will measure the shape of more than one billion galaxies and provide tens of millions of spectroscopic redshifts. Euclid will use two main cosmological probes, namely galaxy clustering and weak gravitational lensing. These will allow to probe the dark sector of the Universe with unprecedented precision. Euclid (Laureijs et al. 2011) is targeting a FoM of 400 and is also aiming to distinguish between different theories of gravity, i.e. testing general relativity against alternative models, by measuring the exponent of the growth factor γ with an precision better than 0.02 at 1σ. Euclid will also measure the neutrino masses with a 1σ precision better than 0.03 eV. Combined with Planck, Euclid will also probe some inflation models through the measurement of the non-Gaussianity of initial conditions. Two surveys are planned: a wide one of 15, 000 deg 2 and a deep one of 40 deg 2 . The 1.2m telescope contains two instruments that will allow to observe around 2 billion galaxies and produce 50 million precise redshift measurements. Euclid's large format visible imager (VIS), will probe the photometric galaxy clustering in the optical wavelength range. The near-infrared imaging and spectroscopy (NISP) instrument will perform photometry imaging and produce spectra in the near infrared range.

DESI (Dark Energy Spectroscopic Instrument)
Known as the successor of the stage III BOSS and eBOSS surveys (Dawson et al. 2013;Dawson et al. 2016), DESI is a Stage IV ground-based dark energy experiment whose primary objectives are measuring the baryonic acoustic oscillations (BAO) and constraining the growth of structure through redshift space distortions (RSD) measurements. DESI will allow to estimate the deviations from Gaussianity through the f NL parameter. DESI will also measure the neutrinos masses with a precision of 0.02 eV at 1σ, for a maximum scale k max < 0.2 h.Mpc −1 . DESI will scan a 14, 000 deg 2 sky area and measure more than 30 million spectra from 4 galaxy populations: the bright galaxies (BGS) at low redshift (until z ≤ 0.6), the luminous red galaxies (LRGs) composed of highly biased objects at intermediate redshift z < 1, the bright emission lines galaxies (ELGs) up to z = 1.7 (which can only be detected at high resolution since the OII doublet must be well resolved), and the quasi-stellar objects (QSOs). The latter allow tracing both the matter distribution at high redshift and the neutral hydrogen by the Ly − α absorption in the spectra. DESI is now underway and will observe during five years (2019-2023) using the Mayall 4-meter telescope at Kitt Peak.

WFIRST (Wide Field Infrared Survey Telescope)
WFIRST is a NASA near-infrared imaging and low-resolution spectroscopy observatory that, similarly to Euclid, aims to address fundamental questions about the accelerated expansion of the Universe. WFIRST will determine the expansion history of the Universe and structure growth in order to constrain dark energy and modified gravity models. To estimate the dark energy equation of state, WFIRST will perform weak lensing, supernovae distances and baryonic acoustic oscillations measurements. In this work we use the spectroscopic probe of WFIRST-2.4, a 2, 000 deg 2 survey that will observe more than 20 million galaxies in the redshift range 1 ≤ z ≤ 3.

The surveys combination specification
In general, using a single probe of a stage IV survey does not constrain cosmological parameters well enough to get a Gaussian posterior distribution, in which case testing simply the Fisher formalism is meaningless. However, when the constraints are very tight, the posterior is likely to be close to gaussian. For this reason we have chosen to investigate the Fisher formalism in a case were one can anticipate gaussian posterior (an assumption that is checked by the MCMC forecast used for the full validation of the method). In order to substantially improve the constraints, we will therefore combine Euclid, DESI and WFIRST-2.4. However we note that DESI and WFIRST-2.4 cover Euclid's entire redshift range. Hence, if one wants to fully and consistently combine these three surveys, computing the cross-correlation power spectra between those surveys is necessary. For the purposes of our study we do not wish to consider the cross spectra, therefore we have to remove all the DESI and WFIRST-2.4 redshift bins that overlap with Euclid. In practice, we perform the full spectroscopic Euclid forecasts in the redshift range of 0.9 ≤ z ≤ 1.8, the DESI forecasts at low redshift 0 < z < 0.9 and the WFIRST-2.4 forecasts at high redshift 1.8 < z < 2.7. Moreover, between z = 0.6 and z = 0.9 the ELGs, LRGs and QSOs populations from DESI also overlap. Each of these populations has a different bias, and their cross power spectra would have to be computed as well. For simplicity we only consider the ELGs population at z = {0.6, 0.9} because it has the highest density of galaxies and provides potentially the best constraints over the parameters. At z < 0.6 we consider the BGS population. We consider 4 redshift bins for the DESI survey with the following binning: 2 redshift bins for the BGS population centered at z = {0.125, 0.375} with a redshift size ∆z = 0.25 and 2 redshift bins for the ELGs population centered at z = {0.6, 0.8} with ∆z = 0.2. Both Euclid and WFIRST-2.4 are utilising 3 redshift bins with ∆z = 0.3, centered at z = {1.05, 1.35, 1.65} and z = {1.95, 2.25, 2.55}, respectively. The full binning procedure is summarized in Table 1.

Two approaches towards Fisher matrix cosmological constraints
In this work, we will follow two approaches for the Fisher matrix forecast of cosmological constraints using the galaxy clustering probe. We describe these two approaches, dubbed M1 and M2, below.

Approach M1
In the first approach, M1, the cosmological model considered is an extension of the ΛCDM model in which we assume a variable dark energy equation of state parameter described by the CPL parameterisation. Furthermore, we allow for non-flatness by letting Ω DE vary. We also assume massless neutrinos. The full
set of cosmological parameters is: The cosmological parameters are the quantities of direct interest here. Ω b , h, Ω m , n s are parameters that modulate the shape of the power spectrum. Ω b is the baryon density parameter. h is the reduced Hubble constant defined as h = H 0 /(100km.s −1 .Mpc −1 ) where H 0 is the present day Hubble parameter. Ω m is the matter density parameter, and n s the spectral scalar index. Ω DE is the dark energy content directly linked to the curvature of the universe, and (w 0 ,w a ) represent respectively the dark energy equation of state at z = 0 and its derivative with respect to scale factor at z = 0, in the so-called CPL form (Chevallier & Polarski 2001;Linder 2003). The equation of state being then given by : The σ 8 parameter gives the amplitude of matter fluctuations in the linear regime on a scale of 8h −1 Mpc. Furthermore, at each redshift bin we consider two nuisance parameters: the logarithm of the product between the bias and σ 8 (z), i.e. ln(bσ 8 ), and the (residual) shot noise P shot (z) due to the finite self-pair counts in two-point statistics. The shot noise affects the power spectrum by adding a white systematic component increasing thus the statistical uncertainties. We consider 10 redshift bins z i , thus the total number of nuisance parameters amounts to 20: The cosmological parameters fiducial values are summarized in Table 2, and the nuisance parameters are given in Table 3.  Table 4: Fiducial values of the shape and redshift dependent parameters for the approach M2.

Approach M2
In the second approach, M2, adopted in Euclid Collaboration: Blanchard et al. (2019), 4 quantities which determine the shape of the power spectrum are treated as independent of the cosmological parameters. We call these 4 quantities the shape parameters. The other quantities, the angular diameter distance D a (z), the Hubble rate H(z), and the growth rate of cosmic structure f (z) are called the redshift dependent (rd) parameters. With the addition of 3 redshift dependent parameters and 2 redshift dependent nuisance parameters (the same as in M1) per redshift bin, for ten redshift bins, this amounts to 30 redshift dependent parameters and 20 nuisance parameters. The full set consists of 54 parameters (see Table 4) : The nuisance parameters are identical in the two approaches. As in Euclid Collaboration: Blanchard et al. (2019), we estimate the constraints of the physical baryon density (ω b = Ω b h 2 ), and the physical matter density (ω m = Ω m h 2 ) parameters. In the M2 approach, the constraints on the final cosmological parameters (θ cosmo ) are obtained from the 54 parameters stated above by projection. We also note that taking the logarithm of the parameters helps to stabilise matrix inversion by reducing the dynamic range between minimum and maximum eigenvalues, as described in Euclid Collaboration: Blanchard et al. (2019).

The Fisher matrix formalism
The Fisher matrix formalism (see Coe 2009, for a pedagogical introduction) is commonly used in cosmology to forecast the Gaussian uncertainties of a set of model parameters under some constraints. The Fisher approach relies on the assumption of Gaussian posterior distribution. The Fisher matrix corresponding to the information on cosmological parameters provided by a galaxy clustering survey is given by (Tegmark 1997;Tegmark et al. 1998): where α and β run over the parameters we vary. k is the total wave vector magnitude in Mpc −1 , µ the cosine of the angle to the line-of-sight, and V eff (z, k, µ) the effective volume of the survey: with n(z) the number density of galaxies in each redshift bin and V s (z) the volume of each redshift bin. The observed power spectrum P obs (z, k, µ), taking into account nonlinear effects is given by: The first term represents the Alcock-Paczynski (AP) volume dilation effect that reflects spurious anisotropies in the power spectrum which arises when assuming a cosmology (that might be different from the true cosmology) to convert redshifts into distances. The coefficient q = H ref (z)/H(z) is given by the ratio between the Hubble parameter in the true cosmology and the one in the reference cosmology. The coefficient given by the ratio between the angular distance in the reference cosmology and the one in the true cosmology. The term, between parentheses, contains the linear redshift space distortions (RSD) due to the growth of structure, and finger-of-god effects that aim to describe the nonlinear damping due to the velocity dispersion of satellite galaxies inside host halos. σ p = 1/(6π 2 ) P m (k, z)dk represents the linear-theory velocity dispersion that can be computed from the linear matter spectrum P m (k, z). The third term, P dw , is the dewiggled power spectrum (Eisenstein & Hu 1998) that takes into account nonlinearities in the matter power spectrum. The fourth term, F z , represents the uncertainties coming from the spectroscopic precision of the survey under consideration and P shot (z) the residual shot noise. For a comprehensive description of the observed power spectrum model introduced above we recommend the recent Euclid paper on validated forecasts (Euclid Collaboration: Blanchard et al. 2019), where the specific form of the function F z as well as the rescaling of k and µ due to the AP effect are given in Equations (74)-(79). In terms of nonlinearities, we should note that the model used accounts for the finger-of-God effect as well as for the damping of the BAO feature. More details can be found in Section 3.2.2 of Euclid Collaboration: Blanchard et al. (2019). We move on to describe the estimation of the uncertainties using the Fisher matrix formalism. The inverse of the Fisher matrix gives the covariance matrix that represents the likelihood curvature evaluated at the fiducial values of α and β. According to the Cramér-Rao inequality, the square root of each covariance matrix diagonal elements gives the 1σ lower bound constraint for each parameter (marginalised over all other parameters) where the posterior distribution is Gaussian: In order to compute the Fisher matrix we use SpecSAF (Spectroscopic Super Accurate Forecast), a modified and improved version of a code first used in Tutusaus et al. (2016). SpecSAF computes high-point stencil derivatives with a very high precision level. This code is linked to CAMB (Lewis et al. 2000) and CLASS (Lesgourgues 2011) to compute matter power spectra, and allows the user to directly compute the FoM and plot the contours. We note that SpecSAF is one of the validated codes used in Euclid Collaboration: Blanchard et al. (2019).

Precision of the Fisher matrix formalism
In order to obtain reliable final constraints from the covariance matrix one has to estimate the precision needed on the Fisher elements themselves, to achieve a given precision (for instance a relative precision lower than 10% for the square root of elements of the diagonal of the covariance matrix). This problem is related to the conditioning of the Fisher matrix that needs to be inverted. The condition number C N of a matrix A is known to provide an upper limit to the precision δx of the determination of the solution x of the equation That is, the elements of b and A have to be known with a pre- That is, if the condition number is large, even a very small error in b will result in a large error in x. The condition number (Belsley et al. 2005) is given by the ratio between the largest eigenvalue to the smallest eigenvalue. This is however an upper limit that can be far too stringent in practice: as an example, in our case the condition number in approach M1 is 1.3 × 10 5 , while in the M2 approach the condition number is 4.4 × 10 11 . This formally calls for a precision of ∼ 10 −6 and ∼ 10 −13 to ensure an accuracy of 10% in the final constraints. However, a diagonal matrix (provided that no term in the diagonal vanishes) is well conditioned in practice, even if the condition number is very large. Therefore for a matrix with size significantly greater that 2 × 2, the condition number does not provide useful information on the conditioning of the matrix. This observation has important consequences in practice, as the elements of the Fisher matrix are generally computed from numerical derivatives whose required precision has to be determined in order to achieve reliable results on the covariance. To better quantify the precisoin needed for reliable constraints we introduce a simple method that consists in vibrating (perturbing) each element of the Fisher matrix as follows: with α and β, indices which run over all unique pair of parameters. N(0, 1) is a normal distribution centered at 0 with variance 1 and the amplitude of the perturbations. In order to keep the Fisher matrix symmetric, we only perturb the lower triangle part of the Fisher and compute the symmetric part across the diagonal. The coefficients in the Fisher matrix are computed from Boltzmann codes and are therefore subjects to correlated (numerical) noise. In our approach this correlation is lost during the Fisher matrix vibration. In the approach M1, we apply the perturbations for 2000 amplitudes, , regularly log-spaced in the range 10 −12 -10 −1 . To ensure enough statistical data and reduce the Poisson noise, we produce 10, 000 vibrated Fisher matrix per epsilon values. In the M2 approach this procedure is more expensive in terms of computational time. We therefore reduced the number of values to 600 in the range 10 −6 -10 −1 and we produced 2000 vibrated Fisher matrices per value. The constraints obtained with each vibrated Fisher matrix are then compared with the original Fisher matrix by computing the relative error. For each parameter we compute the number of draws that gives an agreement level on the constraints at the chosen precision (10% in our case) or better. The tolerated precision is the largest value of for which 68% of the constraints on each parameter remain lower than the chosen precision (10% in our case). The results are illustrated in Fig. 1 for the cosmological parameters in the M1 approach. In the M2 approach the size of the matrix is too large so in Fig. 2 we only show the shape parameters and the redshift dependent parameters for a single redshift bin (z = 1.35).

Stability and Convergence tests
The precision and stability (convergence) of numerical derivatives can be an important issue and testing it is an essential validation step. Fortunately, testing the stability of the derivative of the observed power spectrum over any parameter is straightforward and determines the appropriate step sizes to use (and which ones to avoid). However the stability itself does not prove that a numerical derivative is correctly computed since it only shows to what extent the derivative changes as a function of step size. Ideally, a convergence test (computing the relative error between the analytical derivative and the numerical one) also has to be performed. Unfortunately, directly exploiting the convergence with a relative error test is not possible when the derivatives cannot be computed analytically. This is the case for most of the parameters we consider in this study. For the cosmological parameters we need to use a Boltzmann code, and the semi-analytical form of the derivatives over D a (z) and H(z) contain derivatives of the matter power spectrum over the scale k (dP/dk). Here we follow an alternative approach to test derivative convergence by considering three different numerical methods to compute the derivatives: the 3, 5 and 7 points centered schemes. With the 3 (respectively 5, 7) scheme being the method in which we use 3 (resp. 5, 7) perturbed points of the function f(x) to determine the numerical derivative at x, we assume that the 7 points derivative is very near the true value when an appropriate step size is chosen. We compare the 7 points with the 3 and 5 points stencil for different step sizes and look for best agreement between the two derivatives, in other words we make convergence tests between the 3 and the 5 points stencil towards the 7 points stencil by comparing the relative error between the 7 points stencil and the 3 and 5 points stencils. More generally the convergence is always faster with the high stencils derivatives method than with the low stencils derivatives, which means that the optimum for the 5 points stencil often occurs at larger step values than the 3 points stencil. However this does not hold in all cases. For instance applying this method to a highly noisy/oscillatory function can lead to wrong results and finding stable behaviours is not possible. It is therefore very useful to plot the functions under consideration in order to identify any such features.

The MCMC sampling
In order to consolidate the reliability of our Fisher matrix constraints, we compare them with MCMC constraints.
As is well known, the MCMC sampling is much more timeconsuming than the Fisher matrix computation, due to the computational time needed to obtained the power spectrum at each iteration. Moreover, by construction a Markov chain cannot be parallelized, as each draw depends on the previous displacement (note, however, that it is possible to launch several chains at the same time to sample the parameter space more effectively). Con-sidering a set of data and a model equipped with a set of parameters Θ, the MCMC allows to use Bayesian inference and model comparison: with P(Θ|data) the posterior distribution which provides the constraints and the joint covariance between all the parameters considered, P(data|Θ) the Likelihood, P(Θ) the prior that we take to be flat in the present study, and P(data) the Bayesian evidence that can be safely ignored as it is a constant of no interest in our situation. Thus, the previous equation can be reduced to: We compute the posterior distribution with the MCMC module of SpecSAF using the Metropolis-Hastings algorithm (Metropolis & Ulam 1949;Robert 2015). This is described in more detail in Appendix A. We test the chains convergence using the Gelman-Rubin diagnostic (Gelman & Rubin 1992;Brooks & Gelman 1998) described in Appendix B. Our data are modeled by the a realisation of our fiducial model given by the equation 4.

MCMC Comparison
The full validation of the Fisher matrix approach is performed with a direct comparison with the results from the MCMC chains. We sample the two parameter spaces of the two approaches and compare the relative error given by the covariance matrix of the sampling and the inverse of the Fisher matrix. We also visualize the Fisher contours versus the MCMC sampling.
As we use exactly the same specifications for both methods we expect to get the same constraints with a relative error lower than 10% as long as the posterior distribution remains gaussian and doesn't present any degeneracies.

Precision needed
The vibration matrices for M1 and M2 are presented in Figure 1 and Figure 2, respectively. We can clearly see that the sensitivity of the constraints depends on the element vibrated. According to Figure 1, the diagonal elements are the most sensitive, more specifically, Ω m , Ω DE and w a . For our working requested precision (10%), these three parameters require a precision between 0.06% and 0.09%; for the other parameters the diagonal elements required precision spreads between 0.1% and 0.4%; the non-diagonal elements are less cumbersome: some of them can even change by a factor 2 without affecting the constraints (for instance Ω b vs [Ω DE , w 0 , w a , σ 8 ]). Most of the non-diagonal elements can be computed with a precision worse than 1%. The Fisher matrix produced following the M2 approach (Figure 2) shows a greater sensitivity to vibration than the M1 approach, in line with their different condition numbers. The diagonal element h seems to be the most sensitive parameter with a required precision around 0.0012%, which is however much less stringent than what the condition number would suggest (∼ 10 −11 %). The background quantities and ω m require a precision around 0.015%, while ω b and n s are one order of magnitude more tolerant with a target precision of 0.2%. The off-diagonal elements show a higher tolerance to perturbations, however only the background elements vs [Ω b , n s ] have a vibration tolerance higher than one percent.
To summarise, compared to the M1 approach, the Fisher matrix produced in the M2 approach is generally at least two orders of magnitude more sensitive for the most sensitive parameters, and one order of magnitude more sensitive for the less ones. These final figures are poorly reflected by the conditioning number of each approach. The above vibration matrix approach allows one to quantify the precision requested for the elements in the Fisher matrix to achieve the requested precision on the constraints of interest, the FoM in our case. This effective precision is hard to anticipate otherwise: the condition number provides a mathematical lower limit to it. However, for a simple 2 × 2 diagonal matrix, it is clear that the precision needed for each elements is just the one wished on constraints, while the condition number (being the product of the eigenvalues) can be arbitrary large. The precision requested for the elements in the Fisher matrix is therefore intimately related to the internal structure of the Fisher matrix.
In the next section we examine the steps sizes necessary to achieve the requested accuracy on the Fisher matrix elements.

Stability and convergence
In order to compute the elements of the Fisher matrix (Eq. 2), one has to compute several integral quantities involving derivatives. The precision on these elements is therefore directly related to the precision one can achieve on these computations, determined by the step sizes used and the numerical schemes used in these derivations (in the following, all the step sizes shown are given relative to the assumed fiducial value for each parameter, except w a and P shot because their fiducial values are 0). Moreover, we choose to illustrate one shape parameter and one redshift dependent parameter in the following plots, because the derivatives behavior between the shapes parameters is similar as well as the derivative behavior between the redshift dependent parameters. We have therefore examined the precision obtained on each of these elements by first examining the precision reached when the step size varies. In order to illustrate this we provide illustration for derivatives against step size, which are representative of general behavior. The upper panels of Figure 3 illustrate the stability of the square of the derivative of ln P(k, µ) with respect to w a at (k, µ) = (0.0121, 0.5) and lnD a at (k, µ) = (0.098, −0.1) (blue: 3 points stencil, red: 5 points stencil, green: 7 points stencil), at z = 1.35. The upper left panel demonstrates the cosmological parameters stability behaviour. The stability of the derivative is reached when the slope of the derivative value over the step size is close to zero (when the curve shows a horizontal behaviour) while decreasing the step size. These derivatives are most of the time relatively stable in the step size range [10 −4 , 10 −1 ] for each derivative. The truncation errors remains small in all of the range for most of the derivatives. We note that for the low step values, instabilities occur due to roundoff errors. The upper right panel is representative of the typical stability behaviour of ln(H(z)) and ln(D a (z)). For step sizes higher than 3 × 10 −3 instabilities begin to arise for the 3 points stencil. The truncation errors dominate and increase the total error budget. The instability range for the 5 and 7 points derivatives is smaller: typically around 1 × 10 −2 . The lower panels ( Figure  3) represent the corresponding derivative convergence tests with respect to the 7 points derivatives (relative error between the 7 points stencil and the (3, 5) points stencil in (blue, red)). Gen- Fig. 1: Vibration matrix: approach M1. The values are expressed in absolute values. Each cell containing 1 has to be taken as a lower bound value of the limit because we take 1 as a threshold perturbation amplitude. erally, convergence tests are made to compare the relative error between the analytical solution of a derivative and solutions obtained numerically when we vary the step size. The minimum relative error between the analytical form and the numerical solutions corresponds to the optimal step 6 ensuring the most accurate derivative. Because we cannot compute the analytical form of most of the derivatives presented here, we make convergence tests of the 5 and the 3 points stencil using the 7 points stencil results as a reference, since the latter is theoretically more ac-curate. In the bottom left panel, the 3 points derivative achieves its convergence when the step is equal to 10 −2 . The optimal step for the 5 points stencil is around 1 × 10 −1 at most. Concerning the angular diameter distance (bottom right panel), the 3 points convergence level is achieved down to a step of 3 × 10 −7 whereas for the 5 points derivatives it is around 5 × 10 −5 . The figure 3 shows that the convergence is always reached after the stability if we decrease the step size. For instance let's take a closer look on the figure 3 right panel (ln D a (1.35)). If we observe the stability and the corresponding convergence plots at a step of 10 −1 , we see that the convergence and the stability aren't reached yet. Now if we start to decrease the step size, we see that the stability is reached around a step of 10 −2 because the slope of the curve becomes near to 0 . However, the corresponding convergence plot still shows an error of 100% between the 7 points stencil and the 5 and 3 points stencils. If we continue to decrease the step size, the minimum of convergence occurs at 10 −4 for the 5 points stencil and 3.10 −7 for the 3 points stencil. This feature always appears as we can see it as well on the left panel of the figure 3 (at least for the 3 points stencil because we took a maximum step value of 10 −1 ).
Considering that (k, µ) is a (1000, 1000) grid in our forecasts (and we have multiple redshift bins), we preferably need to study convergence in the full grid instead of a specific (k, µ) value. The upper panels of Figure 4 summarize the optimal steps corresponding to the convergence level of the square of the derivative of ln P(k, µ) with respect to Ω b (3 points stencil: upper left panel, 5 points stencil: upper right panel) at z = 1.35. Concerning the 3 points stencil, the optimal step size is located around values of the order of 10 −2 at large scales and (10 −2 , 10 −3 ) at small scales. For the 5 points stencil, it is in the range 10 −1 to 10 −2 at large scales and 10 −1 to 10 −3 at small scales. The bot-  (1.35)). Blue: 3 points stencil, red: 5 points stencil, green: 7 points stencil. tom panels of Figure 4 demonstrate the Ω b critical step size giving a relative error comparable to the elements (Ω b , Ω b ) of the vibration matrix. In other words this is a map of the Ω b step sizes that can potentially lead to an error of 10% on the final constraints. As we have already seen, the truncation errors are small for the cosmological parameters, and here we show steps that are in the round-off errors regime (see Appendix C for these and other definitions relevant to the work). The 3 points stencil shows some round-off issues for steps going from 10 −5 (large scales) to 10 −3 (small scales). Concerning the 5 points stencil, the round-off issues occurs for steps from 10 −6 (large scales) to 10 −3 (small scales). The 5 points stencil shows more tolerance than the 3 points stencil. For cosmological parameters choosing a 5 points method with steps around 10 −2 is the safest way to compute derivatives. There is no dependence of the step size on µ in the M2 model because the derivatives only depend on the matter power spectrum. However, this dependence arises in the M1 model through the Alcock-Paczynski effect. Figure 5 shows the optimal (upper panel) and the critical (bottom panel) step size for ln(H(1.35)) for the 3 points (left) and 5 points (right) stencil derivative. The optimal step size mainly lies around 10 −6 for the 3 points stencil (even 1.5 × 10 −7 in cer-tain cells). The 5 points stencil mainly shows convergence from 10 −5 to a few 10 −4 . Regarding the critical step size, here we took the truncation error as a limit because of its dominance when we compute derivatives over background quantities. The critical range for the 3 points derivatives extends from 10 −1 to 10 −4 , while for the 5 points stencil the range is 10 −1 to a few 10 −3 . The lower bound of the critical range is one order of magnitude lower for the 5 points stencil. For the background quantities taking low step sizes is a good option to compute accurate derivatives. The round-off errors remain much less dominant than the truncation errors.
Ideally one would like to use an optimal step size for each value of the k and µ. However, from figures 4 and 5 one should choose -when possible -a single step size which will work efficiently over the whole parameter space. This single optimal step size should be below the critical value all over the parameter space. Tables 5 and 6 summarize the steps chosen for the M1 and M2 approaches using the 5 points method. For cosmological parameters a step between 10 −1 and 10 −2 is enough to compute accurate derivatives. For background quantities we use on the derivatives better than the chosen level while the latter is not expected to lead to the requested precision. The hypercritical steps are the same as the critical steps except for w a whose step size is twice smaller (M1 approach) in order to increase the numerical noise; we also multiply by a factor 2 the steps of the background quantities (M2 approach) to increase the truncation errors. The constraints obtained by these 3 step cases are compared with the MCMC sampling.

Consistency: approach M1 vs approach M2
In this section we test the consistency of our results by comparing the two approaches, M1 and M2. This can be achieved by projecting the Fisher matrix from the M2 parameter space to the M1 parameter space. Let us consider the initial set of parameters given by the model M2 (these are P initial [ω b , h, ω m , n s , ln(D a (0.125)), ln(H(0.125)), ln( f σ 8 (0.125)), ln(bσ 8 (0.125)), P shot (0.125), ..., ln(D a (2.55)), ln(H(2.55)), ln( f σ 8 (2.55)), ln(bσ 8 (2.55)), P shot (2.55)]) and the final set of parameters (model M1), that is P final [Ω b , h, Ω m , n s , Ω DE , w 0 , w a , σ 8 , ln(bσ 8 (0.125)), P shot (0.125), ..., ln(bσ 8 (2.55)), P shot (2.55)]. We can project the initial Fisher matrix to the final one by simply using the chain rule (Wang 2006;Wang et al. parameter h Ω m n s relative error (%) 1.857e-3 2.332e-3 -2.176e-2 2.155e-2 parameter Ω DE w 0 w a σ 8 relative error (%) -1.137e-3 -7.339e-3 -1.573e-3 -5.706e-3 Table 7: Relative errors on the cosmological parameter constraints between the M1 and M2 approaches (both with optimal step sizes) after projection to the θ cosmo parameter set. In both cases, the covariance matrices are marginalized over the nuisance parameters. 2010). The marginalization over the nuisance parameters can be done before or after the projection. In the latter case they have to be taken into account during the projection: where indices i and j run over all unique pair of parameters corresponding to the final model (we build the other half of the final Fisher matrix by computing the symmetric (by mirroring over the diagonal). Indices α and β run over all parameter pairs corresponding to the M1 approach. We summarize the constraints relative errors obtained from the comparison between the standard M1 approach computation against the constraints after projection from the parameters introduced in the M2 approach in Table  7. Both covariances are marginalized over the nuisance parameters. We find that the two methods are in very good agreement, with the relative errors not exceeding ∼ 0.02%. The corresponding contours are shown in Figure 6. As we can see, they are nearly perfectly superimposed. Therefore, the final Fisher matrices are highly consistent in the two approaches.

Fisher matrix and MCMC comparison
The relative errors between the MCMC and the Fisher constraints are presented in Table 8. With the optimal steps the relative error is lower than 5.5% for both models. With the critical steps, the w a relative error exceeds 10% (model M1). The other parameters have errors ranging from 2 × 10 −3 to ∼ 6%. In the second approach every step agrees well with the MCMC with a highest relative error of 9.118%. Choosing hypercritical steps leads to significant disagreement with the MCMC constraints. In the M1 approach, three cosmological parameters (Ω DE , w 0 and w a ) and one nuisance parameter (ln(bσ 8 (0.375))) do not fulfill the 10% agreement requirement level. The relative error on w a reaches 34%. In the M2 approach, the results are even worse, with most of the parameters having errors above 50%. Figures 7  and 8 show the Fisher contours using the optimal, critical, and hypercritical steps set. The covariances are marginalized over the nuisance parameters. For convenience, we only show the cosmological parameters for the M1 approach, and the cosmological as well as one redshift bin (z = 1.35) for the approach M2. In the M1 approach, Figure 7, the Ω b , h, Ω m , n s and σ 8 constraints are very stable for the 3 steps set. The Ω DE parameter shows a slight difference between the 3 steps set. The dark energy parameters w 0 and w a show more appreciable differences, especially w a . In the M2 approach (Figure 8) we can only see a few differences between the optimal and the critical steps set. However the hypercritical steps set gives very different results on all parameters except n s and P shot (z)). The constraints are heavily overestimated (the uncertainties are too small) and fill around one third  of the two other surface contours. We emphasise here that the only difference between the critical steps and the hypercritical steps is a factor of 2 on the steps for only one parameter in the M1 approach and two parameters on the M2 approach. The transition from the stable derivatives to the unstable derivatives is very sharp.
Checking the Fisher matrix derived constraints against the MCMC can give valuable information regarding the precision of the Fisher matrix estimation. However, this is not enough to validate the full procedure. Indeed, two multi dimensional inference methods can result in the same constraints for each individual parameter, but with different contours orientation. In order to ensure that this is not the case, the full posterior distribution has to be compared so one can check that all the contours orientations are similar. Figures 9 and 10 present the MCMC posterior distributions against the Fisher matrix ones (red lines) using the optimal steps settings. All the Fisher contours have the same orientation as the MCMC contours. Thus, the full procedure is validated. It is important to note that, fortunately, using one optimal step for all (k, µ, z) values is sufficient to have a very good level of agreement with the MCMC. That is to say, no adaptive derivatives approach is required. Moreover, the main differences between the Fisher and the MCMC contours come from slight non-Gaussianities on the posterior distribution. On some likelihood histograms in Figures 9 and 10 we can see that the posterior from the MCMC follows "perfectly" one side of the Fisher, but not the other one (for instance h in the M1 approach). This feature is less visible in the M2 approach likely due to the high number of accepted points in the MCMC chain, but it is still there, for example in the case of n s .

Conclusions
We have investigated how to perform a comprehensive and robust validation of Fisher matrix forecasts, using the combination of three Stage IV spectroscopic surveys. We built vibration matrices for two different approaches for a fiducial cosmological model: a non flat ΛCDM extended model where the dark energy equation of state follows the CPL parameterization which has two free variables: w 0 and w a . In the first approach (M1) the Fisher matrix is obtained directly for the cosmological parameters while in the second approach (M2) the background quanti-ties are varied independently from the shape parameters, resulting in a much larger matrix; the final constraints are projected on the same parameter space for both approaches. We also found that the larger Fisher matrix is globally more sensitive to perturbations (as reflected by its high condition number). For instance the Hubble parameter h is two orders of magnitude more sensitive in the second approach.
We studied the stability of the numerical derivatives over the cosmological parameters and found that the truncation error is particularly small within the step ranges considered. Cosmolog- Fig. 7: Fisher contours: M1 approach, for the optimal (green), critical (blue), and hypercritical (red) steps. The contours are significantly affected when using inappropriate step size, although the 1D likelihoods remains pretty stable.
ical parameters remain stable in the step range from 1 × 10 −1 to 3 × 10 −4 . Lower step values increase the round-off errors and result in higher total errors. Taking "large" steps is the safest way to perform the derivatives with respect to the cosmological parameters. The background quantities behave differently. The round-off errors remain small in all the step ranges considered. However steps larger than 1 × 10 −2 for a five points stencil derivative give unstable results. Taking "large" steps for the background quantities is not safe because the differentiated function can oscillate on a scale smaller than the step size; this is exactly what one must avoid when computing the numerical derivatives. The convergence tests with respect to the 7 points stencil derivatives are performing better for the 5 points stencil (the convergence is reached between one and two orders of magnitude before the 3 points stencil case), and similarly for the convergence level. The safe steps window range of the 5 points stencil is then wider.
Comparison with the MCMC sampling shows the efficiency of the Fisher formalism even with non adaptive derivatives (at least in the case where the posterior distribution is close to Gaussian). Although the optimal steps change with k and µ, choosing a unique step size for all the parameters does not produce appre- Fig. 8: Fisher contours: M2 approach, for the optimal (green), critical (blue), and hypercritical (red) steps. In this case, the use of critical step size leads to contours close the ones obtained with the optimal step size, while the hypercritical (twice the critical one) leads most of the contours to be incorrect, as well as most of the 1D likelihood.
ciably larger errors that would significantly alter the forecasts. Furthermore, we considered a 1000 × 1000 (k, µ) grid which, multiplied by the number of redshift bins, amounts to computing 10 7 derivatives per parameter. One of the reasons for the stability comes from the fact that the contribution of the large scales on the derivatives is very small compared to the small scales contribution. Thus, taking optimal steps from high k values remains the best option. We have also seen that the transition between stable and unstable contours is quite sharp. Changing a step size by a factor of 2 towards the wrong direction can give unstable contours.
Identifying and explaining the derivatives behaviour and the effects on the constraints is difficult without the tests presented. Our strategy allows for a comprehensive and robust validation of the constraints derived from the Fisher matrix formalism through a numerical method.   Table 9: MCMC vs Fisher relative errors (in percentage) for the M2 approach; hcritical refers to the hypercritical steps choice described in the main text.
The weighted sum of W and B is written as : Finally, the R number is defined as: withd the degrees of freedom estimate of a given distribution. The convergence is reached once (R − 1) < 0.03 (i.e., R must be sufficiently close to 1).

Appendix C: General definitions
In this last part of the Appendix we summarize the main definitions relevant for this work. The reader will find here a point of reference for the different definitions spread over the text.
Optimal step: Step size providing the minimum relative error between the analytical form and the numerical solutions. It ensures the most accurate derivative. Critical step: Step size that can potentially lead to an error of 10% on the final constraints. Hypercritical step: Step size that can potentially lead to catastrophic errors (several dozen percent) on the final constraints. Truncation error: Error introduced by considering too large step sizes. The approximation of computing the derivative with a finite amount of terms is in this case no longer valid. Round-off error: Error introduced by considering too small step sizes. The finite precision floating point numbers used on computers cannot distinguish between numbers that are too close. Vibration matrix: Matrix providing the largest value of the vibration for which 68% of the constraints on each parameter remain smaller than the chosen precision.