The nonGaussianity of the cosmic shear likelihood or how odd is the Chandra Deep Field South?
J. Hartlap^{1}  T. Schrabback^{2,1}  P. Simon^{3}  P. Schneider^{1}
1 
ArgelanderInstitut für Astronomie, Universität Bonn, Auf dem Hügel 71, 53121 Bonn, Germany
2  Leiden Observatory, Universiteit Leiden, Niels Bohrweg 2, 2333 CA Leiden, The Netherlands
3  The Scottish Universities Physics Alliance (SUPA), Institute for Astronomy, School of Physics, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ, UK
Received 20 January 2009 / Accepted 19 June 2009
Abstract
Aims. We study the validity of the approximation of a Gaussian cosmic shear likelihood. We estimate the true likelihood for a fiducial cosmological model from a large set of raytracing simulations and investigate the impact of nonGaussianity on cosmological parameter estimation. We investigate how odd the recently reported very low value of
really is as derived from the Chandra Deep Field South (CDFS) using cosmic shear by taking the nonGaussianity of the likelihood into account, as well as the possibility of biases coming from the way the CDFS was selected.
Methods. A brute force approach to estimating the likelihood from simulations must fail because of the high dimensionality of the problem. We therefore use independent component analysis to transform the cosmic shear correlation functions to a new basis, in which the likelihood approximately factorises into a product of onedimensional distributions.
Results. We find that the cosmic shear likelihood is significantly nonGaussian. This leads to both a shift of the maximum of the posterior distribution and a significantly smaller credible region compared to the Gaussian case. We reanalyse the CDFS cosmic shear data using the nonGaussian likelihood in combination with conservative galaxy selection criteria that minimise calibration uncertainties. Assuming that the CDFS is a random pointing, we find
for fixed
.
In a WMAP5like cosmology, a value equal to or lower than this would be expected in 5% of the times. Taking biases into account arising from the way the CDFS was selected, which we model as being dependent on the number of haloes in the CDFS, we obtain
.
Combining the CDFS data with the parameter constraints from WMAP5 yields
and
for a flat universe.
Key words: gravitational lensing  cosmology: cosmological parameters  methods: statistical  methods: numerical
1 Introduction
Weak gravitational lensing by the largescale structure in the Universe, or cosmic shear, is becoming a more and more important tool to constrain cosmological parameters. It is largely complementary to other cosmological probes like the cosmic microwave background or the clustering of galaxies, and particularly sensitive to the matter density and the normalisation of the matter power spectrum . Important constraints have already been obtained by Benjamin et al. (2007), who compiled a set of five weak lensing surveys, and from the CFHT Legacy Survey (Hoekstra et al. 2006; Fu et al. 2008; Semboloni et al. 2006). In subsequent years, a new generation of surveys like KIDS or PanSTARRS (Kaiser & PanSTARRS Collaboration 2005) will allow cosmic shear to be measured with statistical uncertainties that are much smaller than the systematic errors both on the observational and the theoretical sides. Strong efforts are now being made to find sources of systematics in the process of shape measurement and shear estimation (e.g. Massey et al. 2007a). In addition, new methods of shape measurement are being explored, such as the shapelet formalism (Kuijken 2006; Refregier & Bacon 2003) or the methods proposed in Bernstein & Jarvis (2002) and Miller et al. (2007).
It is equally important to have accurate theoretical model predictions that can be fit to the expected highquality measurements. Currently, these models are all based on fitting formulae for the threedimensional matter power spectrum derived from Nbody simulations as given by Peacock & Dodds (1996) and more recently by Smith et al. (2003). However, these are only accurate at best to the percent level on the scales relevant to this and similar works when compared to raytracing simulations based on stateoftheart Nbody simulations (Hilbert et al. 2009), such as the Millennium Run (Springel et al. 2005). Therefore, there is a strong need for a large raytracing effort to obtain accurate seminumerical predictions for a range of cosmological parameters.
While a tremendous effort is currently being directed to the solution of these problems, the actual process of parameter estimation has so far received relatively little attention. Obviously, the statistical data analysis has to achieve the same accuracy as the data acquisition if the aforementioned efforts are not to be wasted.
The standard procedure for converting measurements of secondorder cosmic shear statistics into constraints on cosmological parameters is to write down a likelihood function and to determine the location of its maximum for obtaining estimates of the cosmological parameters of interest. To make this feasible, several approximations are commonly made. Despite the shear field being nonGaussian due to nonlinear structure growth, lacking an analytical description the likelihood is most often approximated by a multivariate Gaussian distribution. The covariance matrix for the Gaussian likelihood then remains to be determined, which is an intricate issue by itself.
In most previous studies, the dependence of the covariance matrix on cosmological parameters has been ignored when writing down the likelihood function. Instead, it was kept fixed to some fiducial cosmological model. The dependence of the covariance matrix on the cosmological parameters has been investigated in Eifler et al. (2009) for the case of Gaussian shear fields. The authors find that this has a significant effect on the constraints on cosmological parameters (reducing the size of the credible regions) and will be particularly important for future largearea surveys.
There are several approaches to determine the covariance for the fiducial set of parameters: Hoekstra et al. (2006) use the covariance matrix derived for a Gaussian shear field. Although this is rather easy to compute (Joachimi et al. 2008), the errors are strongly underestimated particularly on small scales. Another option is to estimate the covariance from the data itself (e.g. Massey et al. 2007b). This will become sensible and feasible mostly for the upcoming large surveys, which can be safely split into smaller subfields without severely underestimating cosmic variance. A third possibility, which currently seems to be the most accurate, is to measure the covariance matrix from a large sample of raytracing simulations. Semboloni et al. (2007) have provided a fitting formula which allows one to transform covariances computed for Gaussian shear fields into covariances including nonGaussianity. Another promising way, which would also easily allow one to take into account the dependence on cosmological parameters, is the semianalytical computation using the halo model (Takada & Jain 2009; Scoccimarro et al. 1999; Cooray & Hu 2001).
However, all these works are based on the assumption that the likelihood is well approximated by a Gaussian. In this paper, we study the impact of this assumption on the shape of the posterior probability distribution of the matter density parameter and the power spectrum normalisation . Furthermore, we compute Fisher matrix constraints for the fourdimensional parameter space spanned by , , h_{100} and . We propose a method to numerically compute the likelihood function from a large set of raytracing simulations based on the technique of independent component analysis (ICA, e.g. Jutten & Hérault 1991; Comon et al. 1991). ICA is a technique for the separation of independent source signals underlying a set of observed random variables, a statistical method related to factor analysis and principal component analysis (PCA). An approach similar to ours, called projection pursuit density estimation, which we use to verify our results, was proposed by Friedman et al. (1984).
In their cosmic shear analysis of the combined HST GEMS and GOODS data of the CDFS, Schrabback et al. (2007) (S07, from hereon) have found a very low value of . In the second part of this paper, we present a reanalysis of the cosmic shear data of S07. Using our estimate of the nonGaussian likelihood, we investigate whether cosmic variance alone is responsible for producing the low estimate or whether the criteria applied by Giacconi et al. (2001) to select a field suitable for deep Xray observations have a share in this.
The outline of our paper is as follows: in Sect. 2, we describe our sample of raytracing simulations which we use for the likelihood estimation. In Sect. 3, we briefly review the lensing quantities relevant for this paper and Bayesian parameter estimation. We introduce our method of estimating the ``true'' likelihood and illustrate the impact of nonGaussianity on parameter estimation using the example of a CDFSlike survey. In Sect. 4, we present the improved cosmic shear analysis of the CDFS and investigate possible reasons for the low power spectrum normalisation found in S07.
2 Raytracing simulations
We have performed a set of 10 Nbody simulations using the publically available code GADGET2 (Springel 2005), all of which are realisations of the same WMAP5like cosmology ( , , , , , h_{100}=0.73). The simulation boxes are on a side, populated by dark matter particles with masses of . We have started the simulations at z=50 and obtained snapshots from z=0 to z=4.5 in intervals of corresponding to the box size, so that a suitable snapshot is available for each lens plane.
In the following, we only give a brief description of our raytracing algorithm and refer the reader to, for example, Jain et al. (2000) or Hilbert et al. (2009) for a more detailed introduction.
The raytracing is performed by dividing the dark matter distribution into redshift slices and projecting each slice onto a lens plane. Starting at the observer, light rays are shot through this array of lens planes. We assume that deflections only take place at the planes themselves, and that the rays propagate on straight lines in the space between two planes. In our case, each redshift slice corresponds to one output box of the Nbody simulation and was projected as a whole onto a lens plane, preserving the periodic boundary conditions of the simulation box. To avoid repetition of structure along the line of sight, the planes were randomly shifted and rotated. The light rays are shot from the observer through the set of lens planes, forming a regular grid on the first plane. We then use FFT methods to compute the lensing potential on each lens plane, from which we obtain the deflection angle and its partial derivatives on a grid. The ray position and the Jacobian of the lens mapping for each ray are obtained by recursion: given the ray position on the current lens plane, its propagation direction (known from the position on the previous plane), and the deflection angle on the current plane interpolated onto the ray, we immediately obtain the ray position on the next plane. Differentiation of this recursion formula with respect to the image plane coordinates yields a similar relation for the Jacobian of the lens mapping, which takes into account the previously computed tidal deflection field (for a detailed description of the formalism used, see Hilbert et al. 2009). The recursion is performed until we reach the redshift cutoff at z=4.5.
We obtain the final Jacobian for a given source redshift distribution by performing a weighted average over the Jacobians for the light paths to each lens plane.
Since our aim is to create mock catalogues comparable to those of the CDFS field, we use the redshift distribution found for our revised galaxy catalogues (Smail et al. 1995, see Sect. 4.1):
where z_{0}=1.55, , and A is a normalisation constant. This corresponds to a mean source redshift of . We then create the mock source catalogue by randomly sampling the resulting shear maps with galaxies, where is the number density of sources and is the side length of the simulated field. In total, we have produced 9600 quasiindependent realisations of the CDFS field, based on different random shifts and rotations of the lens planes and the various Nbody simulations.
3 The nonGaussianity of the cosmic shear likelihood
3.1 Cosmic shear
Perhaps the most common way to extract the lensing information from the measured shapes of distant galaxies is to estimate the twopoint correlation functions of the distortion field. One defines two shear correlation functions (for more details, see e.g. Schneider 2006)
where are the tangential and cross components of the measured ellipticity relative to the line connecting the two galaxies, and is the angular separation. An unbiased estimator for the shear correlation functions for a random distribution of galaxies is given in Schneider et al. (2002):
Here, i and j label galaxies at angular positions and , respectively. The function is 1 if falls into the angular separation bin centred on , and is zero otherwise. Finally, is the number of pairs of galaxies in the bin under consideration.
3.2 Parameter estimation
Let us assume that we have measured the shear correlation functions on p/2 angular separation bins and now wish to infer some parameters of our model for . For what follows, we define the joint data vector , which in total is supposed to have p entries.
Adopting a Bayesian point of view, our aim is to compute the posterior likelihood, i.e. the probability distribution of a parameter vector
given the information provided by the data :
(3) 
Here, is the prior distribution of the parameters, which incorporates our knowledge about prior to looking at the data; such can originate from previous measurements or theoretical arguments. The evidence in this context simply serves as a normalisation factor. Hitherto, it has been assumed that the likelihood is a Gaussian distribution:
where is the covariance matrix of as predicted by the underlying model. Usually, however, the dependence of the covariance matrix upon cosmological parameters is not taken into account. Rather, the covariance that is computed for a fixed fiducial set of parameters is used in Eq. (4). Under this approximation, the likelihood is a function of the difference only:
3.3 Estimating the likelihood
The choice of the functional form of the likelihood as given by Eq. (4) is only approximate. Since the underlying shear field in the correlation function measurement becomes nonGaussian in particular on small scales due to nonlinear structure formation, there is no good reason to expect the distribution of the shear correlation function to be Gaussian. Our aim therefore is to use a very large sample of raytracing simulations to estimate the likelihood and explore the effects of the deviations from a Gaussian shape on cosmological parameter constraints.
In this work, we have to sustain the approximation that the functional form of the likelihood does not depend on cosmology in order to keep computation time manageable. Our raytracing simulations were all done for identical cosmological parameters, which is our fiducial parameter vector . Thus, as in Eq. (5) the likelihood depends on cosmology only through the difference .
Since
is the probability of obtaining the data
given the parameters
,
we in principle have to estimate the pdimensional distribution
of
from our sample of N raytracing simulations. However, due to the high dimensionality of the problem, a brute force approach to estimate the full joint distribution is hopeless.
The problem would simplify considerably if we could find a transformation
such that
Here, is in general a mapping from to ( ) and is our new data vector. This would reduce the problem to estimating onedimensional probability distributions instead of a single pdimensional one. Equation (7) is equivalent to the statement that we are looking for a new set of basis vectors of in which the components s_{i} of the shear correlation function are statistically independent. It is virtually impossible to find the (in general nonlinear) mapping . However, it is possible to make progress if we make the ansatz that is linear:
where is the transformation or ``unmixing'' matrix.
Our likelihood estimation procedure is as follows: the first step is to remove firstorder correlations from the data vector by performing a PCA (e.g. Press et al. 1992). This yields a basis in which the components of
are uncorrelated. If we knew that the distribution
of
were Gaussian, this would be sufficient, because in this case uncorrelatedness is equivalent to statistical independence. However, for a general distribution, uncorrelatedness is only a necessary condition for independence. Since we suspect that the likelihood is nonGaussian, a second change of basis, determined by the ICA technique (described in detail in the next section), is carried out which then results in the desired independence.
We then use a kernel density method (see e.g. Hastie et al. 2001; Venables & Ripley 2002, and references therein) to estimate and tabulate the onedimensional distributions
in this new basis. The density estimate is constructed by smoothing the empirical distribution function of the observations of s_{i},
(9) 
where s_{i}^{(j)} is the jth of N observations of s_{i} and is the Dirac deltafunction, with a smooth kernel K. The estimate of the desired density p_{si} then is given by
where s_{i}^{(j)} is the jth of N observations of s_{i} and b is the bandwidth. For the kernel K we use a Gaussian distribution. It has been shown that the shape of the Kernel K is of secondary importance for the quality of the density estimate; much more important is the choice of the bandwidth b. If b is too small, is essentially unbiased, but tends to have a high variance because the noise is not properly smoothed out. On the other hand, choosing a bandwidth that is too large results in a smooth estimate with low variance, but a higher bias, because real small scale features of the probability density are smeared out. Our choice of the bandwidth is based on the ``rule of thumb'' (e.g. Scott 1992; Silverman 1986; Davison 2003): . Here, is the sample standard deviation and R is the interquartile range of the sample.
Constraints on cosmological parameters can now be derived as follows: we transform our set of model vectors and the measured correlation function to the new ICA basis:
(11) 
(12) 
so that . The ICA posterior distribution is then given by
(13) 
3.4 Independent component analysis
We now briefly outline the ICA method (Hyvärinen & Oja 2000; Hyvärinen et al. 2001), which we use to find the new basis in in which the components of are (approximately) statistically independent. ICA is best introduced by assuming that the data at hand were generated by the following linear model:where is a vector of statistically independent source signals with nonGaussian probability distributions and is the mixing matrix. For simplicity, we will from now on only consider the case , in which case the mixing matrix is simply the inverse of the unmixing matrix in Eq. (8). The goal of ICA is to estimate both and from the data.
An intuitive, though slightly handwaving way to understand how ICA works is to note that a set of linear combinations Y_{i} of independent, nonGaussian random variables X_{j} will usually have distributions that are more Gaussian than the original distributions of the X_{j} (central limit theorem). Reversing this argument, this suggests that the X_{j} could be recovered from a sample of the Y_{i} by looking for linear combinations of the Y_{i} that have the least Gaussian distributions. These linear combinations will also be close to statistically independent. A more rigorous justification of the method can be found in Hyvärinen et al. (2001).
The ICA algorithm consists of two parts, the first of which is a preprocessing step: after subtracting the mean from , the data is whitened, i.e. a linear transformation is introduced such that , where is the unit matrix. This can be achieved by the eigendecomposition of the covariance matrix of , where , by choosing . Note that is orthonormal and that for all i. As will be discussed below, each source signal s_{i} can only be determined up to a multiplicative constant using ICA. We choose these factors such that . The effect of the whitening is that the new mixing matrix between and is orthogonal. This can be seen as follows: . Since we have chosen , the claim follows.
After the preprocessing, the components of
are uncorrelated. This would be equivalent to statistical independence if their distributions were Gaussian. However, as this is not the case here, a further step is needed. It consists of finding a new set of orthogonal vectors
(the row vectors of
)
such that the distributions
p_{zi}(z_{i}) of
maximise a suitable measure of nonGaussianity. A common method to achieve this is to minimise the entropy (or approximations thereof) of the z_{i}, which is defined by
Since it can be shown that the Gaussian distribution has the largest entropy of all distributions of equal variance, this can be rewritten as maximising the socalled negentropy of the z_{i}, defined by
Here, is a Gaussian random variable with the same variance as z_{i} and . Starting from randomly chosen initial directions , the algorithm tries to maximise iteratively (in practice, it is sufficient to use a simple approximation to the negentropy). For more details, the reader is again referred to Hyvärinen et al. (2001).
ICA suffers from several ambiguities, none of which, however, is crucial for this work. First of all, the amplitudes of the source signals cannot be determined, since any prefactor to the signal s_{i} can be cancelled by multiplication of the corresponding column of the mixing matrix by . Secondly, the order of the independent components is not determined, since any permutation of the s_{i} can be accommodated by corresponding changes to . Thirdly, ICA does not yield a unique answer if at least some of the s_{i} are Gaussian  the subset of Gaussian signals is only determined up to an orthogonal transformation. This is not an issue in our context, since the Gaussian signals will be uncorrelated thanks to the preprocessing steps, and uncorrelatedness implies statistical independence for Gaussian random variables.
Several interpretations of ICA and algorithms exist and are described in detail in Hyvärinen et al. (2001). In this work, we use an implementation of the fastICA algorithm (Hyvärinen & Oja 1997) for the R language (R Development Core Team 2007)^{}.
3.5 Tests
In this section, we present the results of a number of tests we have performed to insure that our results are not affected by convergence issues or statistical biases of any kind.The fastICA algorithm requires a set of randomly chosen directions as initial conditions. It then iteratively computes corrections to these vectors in order to increase the negentropy of the projections of the data vectors onto these directions (Eq. (15)), followed by an orthonormalisation step. It is not clear a priori whether the algorithm will settle in the same negentropy maxima for different sets of initial vectors. This concern is backed by the fact that at least some of the might be very close to Gaussian, which might hamper convergence even further. We have therefore tested whether we obtain the same set of basis vectors from a large number of different initial conditions. We find that this is indeed the case for those for which the distribution of departs significantly from a Gaussian. As expected, the directions leading to a rather Gaussian p_{zi} are different for different starting values, reflecting the inability of ICA to distinguish between Gaussian source signals. However, the posterior distributions derived using our algorithm do not differ notably when using different initial conditions. This is even true if the fastICA algorithm does not formally converge (i.e. when the differences of some of the basis vectors between two iterations is not small): after a few hundred iterations, the nonGaussian directions are determined and do not change anymore. The reason for not reaching convergence is that the algorithm still tries to find negentropy maxima in the subspace of Gaussian directions.
As has been noted in Hartlap et al. (2007), statistical biases can become significant already for the Gaussian approximation of the likelihood (Eq. (4)): care has to be taken if the covariance matrix of the correlation function (given on p bins) is estimated from a finite set of N simulations or observations. Inverting the estimated covariance yields a biased estimate of the inverse:
where is the estimated and the true covariance matrix. This bias leads to an underestimation of the size of credible regions by a factor of . We suspect that a similar bias occurs in our likelihood estimation procedure.
Figure 1: Area of the (dashed lines) and (solid lines) credible regions in the plane as function of the sample size N, for the Gaussian likelihood (red, upper curves) and the likelihood computed using the ICA algorithm (black, lower curves). Blue lines are the predicted areas based on Eq. (18). 

Open with DEXTER 
Our method to estimate the likelihood crucially depends on the assumption that a linear transformation makes the components of the shear correlation vectors statistically independent. A necessary condition for mutual statistical independence of all s_{i} is pairwise independence. The components i and j are called pairwise statistical independent if p(s_{i},s_{j})=p_{si}(s_{i}) p_{sj}(s_{j}). We therefore compare the joint pairwise distributions p(s_{i},s_{j}) to the product distributions p_{si}(s_{i}) p_{sj}(s_{j}), where we estimate p(s_{i},s_{j}) using a twodimensional extension (using a bivariate Gaussian kernel) of the kernel density method given by Eq. (10). We give two examples in Fig. 2, where we compare the joint and product distributions of the two mostnonGaussian components and two nearly Gaussian components. As expected, a simple PCA is not enough to achieve pairwise statistical independence in the nonGaussian case. Only after performing the ICA, pairwise independence is achieved.
A more rigorous test for mutual statistical independence for the multivariate, continuous case was proposed by Chiu et al. (2003).
It is based on the observation that if x is a continuous random variable and P(x) is its cumulative distribution function (CDF), then z=P(x) is uniformly distributed in [0,1]. If we are given a set of statistically independent random variables s_{i}, this means that the joint distribution of
z_{i}=P_{i}(s_{i}), where again P_{i} is the CDF of s_{i}, is uniform in the multidimensional unit cube. On the other hand,
if the assumption of statistical independence of the s_{i} is violated, the joint density
of the z_{i} is given by
=  
=  
=  (19) 
Here, p_{i}(s_{i}) is the distribution function of s_{i} only and is the joint distribution function of . This means that the joint distribution of the z_{i} is not uniform if the s_{i} are statistically dependent. Therefore, we can test if the s_{i} we obtain from the ICA procedure are indeed independent by computing their empirical cumulative distribution functions, carrying out the above transformation and finally testing for multivariate uniformity. Such a test was described in Liang et al. (2001), to which we refer the reader for more details.
Figure 2: Comparison of the joint distributions p(s_{i},s_{j}) (black dashed contours) and the product p_{si}(s_{i}) p_{sj}(s_{j}) (solid red contours) for the two most nonGaussian components (i=1, j=2) and two rather Gaussian ones (i=9, j=10), after performing ICA ( left panels) and PCA ( right panels). The components have been ranked and labelled according to their nonGaussianity; the ith PCA component is in general not the same as ith ICAcomponent. In the right panel of each plot, the distributions with respect to the PCA basis vectors are shown and in the left panel, the distributions in the ICA basis are displayed. Statistical independence is indicated by p(s_{i},s_{j})=p_{si}(s_{i}) p_{sj}(s_{j}). 

Open with DEXTER 
Applying the test to the s_{i} that we have obtained from our ICA procedure, we have to reject statistical independence at confidence. This means that the ICA does not remove all dependencies between the components of the shear correlation function. This result, however, does not give an indication of how these residual dependencies affect our likelihood estimate and the conclusions regarding constraints on cosmological parameters. We therefore compare the constraints derived from the ICA likelihood with the constraints from the likelihood estimated using an alternative method, called projection pursuit density estimation (PPDE; Friedman et al. 1984), which we describe in detail in Appendix A. This method is free from any assumptions regarding statistical independence and therefore provides an an ideal crosscheck for the ICA method. For the comparison, we have computed the shear correlation functions with p=10, and we also use independent components. The resulting contours in the plane are shown in Fig. 3. Both posterior likelihoods are very similar, although the credible regions of the PPDE posterior have a slightly smaller area than the contours of the ICA posterior (which actually supports the findings presented in the next section). Given the good agreement of the two methods, we will henceforth only make use of the ICA procedure, which is considerably faster and numerically less contrived than PPDE.
3.6 Results on the posterior
Figure 3: Comparison of the posterior likelihoods for , computed using the ICA likelihood (black contours) and the PPDE likelihood (red contours). Shown are the contours of the , and credible regions. 

Open with DEXTER 
Figure 4: Comparison of the posterior likelihoods for , computed using the ICA likelihood ( left panel) and the Gaussian approximation ( right panel). Shown are the contours of the , and credible regions. The maximum of the ICA posterior is denoted by ``x''; the maximum of the posterior based on the Gaussian likelihood coincides with the fiducial parameter set and is marked by the symbol ``o''. 

Open with DEXTER 
The most interesting question is how much the posterior distribution computed from the nonGaussian ICA likelihood will differ from the Gaussian approximation. We have investigated this for the case of the CDFS and the parameter set . Here and henceforth, we use 15 angular bins for and in the range from to , i.e. p=30. For the data vector, we do not use the correlation functions from our simulations, but take the theoretical prediction for our fiducial parameter set instead. This allows us to study the shape of the posterior likelihood independent of noise in the data and biases due to the fact that the theoretical model does not quite match the mean correlation function from the simulations. In Fig. 4, we show the contours of the posterior computed in this way from the likelihood estimated using our ICA method (left panel) and from the Gaussian likelihood. We have assumed for the dispersion of the intrinsic galaxy ellipticities. The shape of the ICA posterior is different from that of the Gaussian approximation in three respects: it is steeper (leading to smaller credible regions), the maximum is shifted towards higher and lower , and the contours are slightly tilted. The first two differences can be traced back to the shape of the distributions of the individual ICA components: most of the distribution functions p_{si} are generally slightly steeper than a Gaussian and most of the nonGaussian components are in addition strongly skewed, thus shifting the peak of the posterior. Generally, these differences are more pronounced in the direction of the degeneracy and towards lower values of both parameters, where the posterior is shallower.
Of more practical relevance is how the parameter constraints change when the ICA likelihood is used for the analysis of large weak lensing surveys. Here, we consider surveys consisting
of
CDFSlike fields.
Bayesian theory states that if
is large enough, the posterior probability distribution of the parameters becomes Gaussian, centred on the true parameter values, with covariance matrix
(e.g. Gelman et al. 2004). Here,
is the Fisher matrix (Kendall et al. 1987), which is defined by
where denotes the expectation value with respect to the likelihood function. If the likelihood is Gaussian and if the covariance matrix does not depend on cosmology, one can show that
Equation (20) provides us with a way to estimate the Fisher matrix for the nonGaussian likelihood. For each raytracing realisation of the CDFS, we compute the logarithm of the posterior distribution and its derivatives with respect to the cosmological parameters at the fiducial parameter values. Since we use uniform priors for all cosmological parameters, the derivatives of the logposterior are identical to those of the loglikelihood. We can then compute the Fisher matrix by averaging over all realisations:
In Appendix B, we show that the expression for the Fisher matrix of the ICA likelihood can be evaluated further to be
This equation allows a simpler, alternative computation of from the estimated p_{si}(s_{i}), as discussed in Appendix B.
Figure 5: Fisher matrix constraints for a hypothetical 1500 survey, consisting of 6000 CDFSlike fields. The plots on the diagonal show the 1D marginals, the offdiagonal plots the 2D marginals derived from the full 4D posterior. The red dashed (black solid) lines/contours have been computed using the Fisher matrix of the Gaussian likelihood (the ICA likelihood). 

Open with DEXTER 
We have used Eqs. (21) and (23) to compute the Fisher matrices for a 1500 survey ( ). We fit for four cosmological parameters ( ), keeping all other parameters fixed to their true values. To visualise the posterior, we compute twodimensional marginalised posterior distributions for each parameter pair as well as the onedimensional marginals for each parameter. The results are shown in Fig. 5. A general feature of the ICA likelihood, which has already been apparent in the 2Danalysis (Fig. 4), is that the credible intervals are significantly smaller than the ones derived from the Gaussian likelihood. For the twodimensional marginal distributions, the area of the credible regions derived from the ICA likelihood are smaller by . The onedimensional constraints are tighter by . In addition we find that the ICA Fisher ellipses in some cases are slightly tilted with respect to those computed using the Gaussian likelihood. This is particularly apparent for parameter combinations involving the Hubble parameter. Note that the shift of the maximum observed in the twodimensional case for a single CDFSlike field is absent here because it was assumed for the Fisher analysis that the posterior is centred on the true parameter values.
4 How odd is the Chandra Deep Field South?
4.1 The CDFS cosmic shear data
The second part of this work is based on the cosmological weak lensing analysis of the combined HST GEMS and GOODS data of the CDFS (Rix et al. 2004; Giavalisco et al. 2004), which was presented in S07. The mosaic comprises 78 ACS/WFC tiles imaged in F606W, covering a total area of . We refer the reader to the original publication for details on the data and weak lensing analysis, which applies the KSB+ formalism (Luppino & Kaiser 1997; Kaiser et al. 1995; Hoekstra et al. 1998).
In S07, the cosmic shear analysis was performed using two different signaltonoise and magnitude cuts. The first one selects galaxies with S/N>4 and has no magnitude cut, and the second one applies a more conservative selection with S/N>5 and m_{606}<27.0, where S/N is the shear measurement signaltonoise ratio as defined in Erben et al. (2001). The drizzling process in the data reduction introduces correlated noise in adjacent pixels. While these correlations are ignored in the computation of S/N, an approximate correction factor (see S07) is taken into account for , making the above cuts and respectively. The two selection criteria yielded moderately different estimates of 0.52^{+0.11}_{0.15} and 0.59^{+0.11}_{0.14} for (median of the posterior), not assuming a flat Universe. The errors include the statistical and redshift uncertainties. This translates to and 0.65^{+0.12}_{0.15} for our fiducial cosmology with . The difference of the two estimates was considered as a measure for the robustness and hence systematic accuracy of our shear measurement pipeline. While the analysis of the ``Shear TEsting Programme 2 (STEP2) image simulations (Massey et al. 2007a) indicated no significant average shear calibration bias for our method, a detected dependence on galaxy magnitude and size could effectively bias a cosmic shear analysis through the redshift dependence of the shear signal (see also Semboloni et al. 2009). In order to better understand the difference between the two estimates found in S07, and to exclude any remaining calibration uncertainty in the current analysis, we further investigate the shear recovery accuracy as a function of the signaltonoise ratio using the STEP2 simulations in Appendix C. Here we conclude that our KSB+ implementation underestimates gravitational shear for very noisy galaxies with , which likely explains the lower signal found in S07 when all galaxies with S/N>4 ( ) were considered. For the more conservative selection criteria we find no significant mean shear calibration bias and a variation as a function of magnitude and size of . Therefore we base our current analysis on the more robust galaxy sample with S/N>5 ( ) and m_{606}<27.0, which yields a galaxy number density of . Based on the simulations, any remaining calibration uncertainty should be negligible compared to the statistical uncertainty.
Note that Heymans et al. (2005) found a higher estimate of from GEMS, where they extrapolated the redshift distribution from the relatively shallow COMBO17 photometric redshifts (Wolf et al. 2004). Using deeper data from the GOODSMUSIC sample (Grazian et al. 2006), S07 were able to show that the COMBO17 extrapolation significantly underestimates the mean redshift for GEMS, leading to the difference in the results for .
In Fig. 6, we show the posterior distribution for based on this sample of galaxies. For the fit, all other cosmological parameters were held fixed at the fiducial values chosen for our raytracing simulations. This avoids complications in the discussion of cosmic variance and field selection biases due to the effect of parameter degeneracies. We choose a flat prior for , with a lower boundary of to cut off the tail of the posterior distribution towards small values of the power spectrum normalisation, which is caused by the fact that the difference (and therefore the likelihood) between the data and the model vectors changes only very little when (and therefore the shear correlation function) is very small. We have performed the fit for the ICA likelihood as well as for the Gaussian approximation to the likelihood. For the latter, the covariance matrix was in one case estimated from the full sample of our raytracing simulations, and in the other case computed analytically assuming that the shear field is a Gaussian random field (Joachimi et al. 2008). The striking similarity of the posterior densities derived from the ICA likelihood and using the Gaussian covariance matrix for this particular data vector is merely a coincidence and is in general not seen for our set of simulated correlation functions.
For estimates of , we use the maximum of the posterior (henceforth we write ICAMAP for the maximum of the nonGaussian likelihood, and GaussMAP if the Gaussian approximation is used), although we also quote the median (ICA median) for comparison with S07. In the first case, our credible intervals are highest posterior density intervals, whereas for the median we choose to report the interval for which the probability of of being below the lower interval boundary is as high as being above the upper boundary. The results are summarised in Table 1.
Table 1: Estimates of from the CDFS.
4.2 Cosmic variance
The original estimates for given in S07 and those found in the previous section for the Gaussian likelihood are rather low compared to the value reported by WMAP5 (Dunkley et al. 2009). This problem appears less severe when the full nonGaussian likelihood is used, but the estimate is still rather low. It is therefore interesting to know whether this can be fully attributed to cosmic variance or whether the way in which the CDFS was originally selected biases our estimates low.
To begin, we determine the probability of finding a low
in a CDFSlike field if the pointing is completely random. We estimate the sampling distribution of the MAP estimators for Gaussian and ICA likelihoods from the full sample of our raytracing simulations.
We compute the posterior likelihood for
using a uniform prior in the range
and determine the MAP estimator
.
As in the previous sections, we do this using both the Gaussian and the ICA likelihoods. To separate possible biases of the estimators from biases that might arise because the model prediction based on Smith et al. (2003) does not quite fit our simulations, we correct the simulated correlation functions for this: if
is the correlation function measured in the ith realisation, then
is the ``recentred'' shear correlation, where is the mean of all realisations and is our fiducial model.
Figure 6: Posterior distributions for as computed from the CDFS data. The black solid line corresponds to the ICA likelihood, the red dashed line is from the Gaussian likelihood whose covariance matrix was estimated from the raytracing simulations. The blue dotted line was computed from the Gaussian likelihood with an analytically computed covariance matrix, assuming that the shear field is Gaussian. The similarity of the posterior densities derived from the ICA likelihood and using the Gaussian covariance matrix is purely coincidental, occurring only for this particular data vector. 

Open with DEXTER 
The resulting sampling distributions of are shown in Figs. 7 (original ) and 8 (recentred ). All the distributions are well fit by a Gaussian. With the original correlation functions, we obtain estimates which are too high on average. This reflects the fact that the power spectrum fitting formula by Smith et al. (2003) underpredicts the small scale power in the simulations (see also Hilbert et al. 2009). If we correct for this, we see that the maximum of the ICA likelihood is a nearly unbiased estimator of in the onedimensional case considered here, and in addition has a lower variance than the maximum of the Gaussian likelihood.
We estimate the probability of obtaining a power spectrum normalisation as low as the one measured in the CDFS or lower, , by the ratio of the number of realisations which fulfil this condition to the total number of simulations. These estimates agree very well with those computed from the best fitting Gaussian distribution. The results for the MAP and median estimators are summarised in Table 2. As expected from the above considerations, we find higher probabilities for the recentred correlation functions. In this case, the ICAMAP estimator yields for the probability of obtaining an equally low or lower than the CDFS. This reduces to when the uncorrected correlation functions are used, because the misfit of our theoretical correlation functions to the simulations biases the estimates high. If we assume that our simulations are a reasonable representation of the real Universe, we can expect the same bias when we perform fits to real data. Therefore, as derived from the uncorrected correlation functions is most likely closest to reality. The probabilities computed from the GaussMAP estimates are generally smaller than the ICAMAP values because of the lower value of found using these estimators, even though the sampling distributions of the Gauss estimators are broader.
Figure 7: Sampling distributions of the MAP estimators of , derived from 9600 realisations of the CDFS. All other parameters were held fixed at their fiducial values for the fit. The histogram with red dashed lines has been obtained from the Gaussian likelihood, the one with solid lines from the ICA likelihood. Also shown are the best fitting Gaussian distributions. We indicate the fiducial value of and our estimates from the CDFS with vertical lines. 

Open with DEXTER 
Figure 8: Same as Fig. 7, but using recentred correlation functions. 

Open with DEXTER 
4.3 Influence of the CDFS selection criteria
We now investigate if and by how much the way in which the CDFS was selected can bias our estimates of the power spectrum normalisation low. Several local criteria had to be fulfilled by the future CDFS, such as a low galactic H I density, the absence of bright stars and observability from certain observatory sites. Since these conditions do not reach beyond our galaxy, we do not expect them to affect the lensing signal by the cosmological largescale structure.
Furthermore, the field was chosen such that no extended Xray sources from the ROSAT AllSky Survey (RASS), in particular galaxy clusters, are in the field of view. This is potentially important, since it is known from halomodel calculations that the cosmic shear power spectrum on intermediate and small scales is dominated by group and clustersized haloes. Therefore, the exclusion of Xray clusters might bias the selection of a suitable line of sight towards underdense fields. On the other hand, the RASS is quite shallow and thus only contains very luminous or nearby clusters, which have a limited impact on the lensing signal due to their low number or low lensing efficiency. We quantify the importance of this criterion using the halo catalogues of our Nbodysimulations. To each halo, we assign an Xray luminosity in the energy range from 0.1 to using the massluminosity relation given in Reiprich & Böhringer (2002) and convert this into Xray flux using the halo redshift. We then compute the average of the estimates from all fields which do not contain a cluster brighter than a certain flux limit. It is difficult to define an exact overall flux limit to describe the CDFS selection, because the RASS is rather heterogeneous. However, it is apparent from Fig. 9 that even a very conservative limit of will change the average estimate by at most . This bias is therefore most likely not large enough to explain our CDFS result alone.
Figure 9: The average values of the ICAMAP (solid black line) and GaussMAP (solid red line) estimators computed from CDFS realisations that do not contain clusters with an Xray flux larger than . For comparison, we also plot the averages of the corresponding median estimators (dashed lines). 

Open with DEXTER 
Table 2: for the CDFS.
Finally, the CDFS candidate should not contain any ``relevant NED source''. This is very hard to translate into a quantitative criterion, in particular because our simulations contain only dark matter. We model the effect of imposing this requirement by demanding that there be less than group or clustersized haloes ( ) in the redshift range from z=0 and z=0.5 in a CDFS candidate. The impact of this criterion on the estimated value of using the ICA and GaussMAP estimators is shown in Fig. 10. As expected, the median is a monotonically increasing function of . For fields with less than massive haloes, the probability of obtaining a power spectrum normalisation as low as in the CDFS rises above . Given that the average number of massive haloes in the specified redshift range is 18.5, it does not seem to be too unreasonable that fields with less than such haloes could be obtained by selecting ``empty'' regions in the NED. This is also in qualitative agreement with Phleps et al. (2007), who find that the CDFS is underdense by a factor of 2 in the redshift range from to .
Figure 10: Dependence of the ICAMAPestimator for on the number of group and clustersized haloes between z=0 and z=0.5. For each bin, we summarise the distribution of the corresponding subsample of simulated CDFSfields by giving a box plot: the thick horizontal line in each box denotes the median, the upper and lower box boundaries give the upper and lower quartiles of the distribution of the sample values. The error bars (``whiskers'') extend to the and quantiles, respectively. To visualise the tails of the distributions, the most extreme values are given as points. The width of each box is proportional to the square root of the sample size. For comparison, we also show for each subsample the median of the Gauss MAP estimators as red crosses. The solid black horizontal line indicates the true value of , the black dashed line the ICAMAP estimate for the CDFS and the red dotted line the GaussMAP estimate. The average number of haloes with and in a CDFSlike field is . 

Open with DEXTER 
Figure 11: Ratios r_{+} ( upper panel) and r_{} ( lower panel) of the shear correlation functions in a particular bin to the average correlation function of all realisations. The lowest (solid) curve represents the bin with , the second lowest the bin with , and so on. The highest ratio corresponds to the bin with . The error bars have been estimated from the fieldtofield variation. 

Open with DEXTER 
We estimate the impact of this selection criterion on the estimates of cosmological parameters by treating the number of haloes in the CDFS as a nuisance parameter in the process of parameter estimation. Similar to what we did to obtain Fig. 10, we bin the realisations of the CDFS according to the number of groupsized haloes in the realisations. For each bin, we obtain the mean shear correlation function and its ratio to the mean shear correlation function of all realisations,
.
The functions r_{+} and r_{} are shown in Fig. 11. The realisations with fewer (more) haloes than the average generally display a smaller (larger) shear correlation function. We fit the ratios in each bin with a double power law of the form
For values of which do not coincide with one of the bin centres, the functions are obtained by linear interpolation between the fits for the two adjacent bins. With this, we extend our model for the shear correlation function to . In Fig. 12, we show the resulting posterior distributions for and , keeping all other cosmological parameters fixed and using a uniform prior for . The twodimensional distribution shows a weak correlation between the two parameters: as expected, a low (high) value of requires a slightly higher (lower) value of . The marginalised posterior for is very similar to the one shown in Fig. 7, where the field selection is not taken into account. However, including increases the MAP estimate of by to for the ICA likelihood and by to for the Gaussian likelihood and the raytracing covariance matrix. The marginalised posterior distribution of shows a weak peak at (compared to the average of for all raytracing realisations) in the ICA case and even lower values if the Gaussian likelihood is used. Overall, however, the posterior is very shallow.
Having corrected for the field selection, we can now recompute the probabilities given in Table 2 for drawing the CDFS at random. We find for the ICAMAP estimate for the original shear correlation functions and for the recentred ones. For the Gaussian likelihood, we find (original) and (recentred), respectively.
Figure 12: Upper panel: posterior density for and computed using the ICA likelihood, keeping all other cosmological parameters constant. Lower panels: marginalised posterior densities of ( left panel) and ( right panel). Solid black curves show the results from using the ICA likelihood, dashed red lines from the Gaussian likelihood and the raytracing covariance. 

Open with DEXTER 
With this (approximate) treatment of the systematic effects caused by the field selection, we can now put the CDFS in context with the results from the WMAP fiveyear data. For this, we fit the shear correlation function for and , marginalising over h_{100} (with a Gaussian prior centred on h_{100}=0.7 and , as suggested by the Hubble Key Project; Freedman et al. 2001) and with a uniform prior. We use the WMAP Markov chain for a flat CDM model (lcdm+sz+lens; Komatsu et al. 2009; Dunkley et al. 2009), where again we marginalise over all parameters except and . The resulting posterior distributions for the CDFS only (blue dashed contours), WMAP only (red contours) and the combination of both measurements (thick black contours) are shown in Fig. 13. Clearly, the joint posterior is dominated by the WMAP data; however, the constraints from the CDFS allow us to exclude parameter combinations where both and are large. We find the MAP estimates and when marginalising over the other parameter.
Finally, note that the two criteria discussed in this section are not strictly independent. However, it is highly improbable that a single field will contain more than one massive halo above the Xray flux limit. Therefore, selecting fields without an Xraybright cluster prior to performing the steps that lead to Fig. 10 would change the halo numbers that go into the analysis by at most one and would not significantly influence the foregoing discussion.
Figure 13: Posterior density for and , where we have marginalised over the Hubble constant h_{100} and the number of haloes in the field . The dashed blue contours show the , and credible regions resulting from the cosmic shear analysis of the CDFS (using the ICA likelihood), the red contours show the posterior from the WMAP 5year data (using the flat CDM model). The combined posterior is shown with thick black contours. 

Open with DEXTER 
5 Summary and discussion
In this paper, we have investigated the validity of the approximation of a Gaussian likelihood for the cosmic shear correlation function, which is routinely made in weak lensing studies. We have described a method to estimate the likelihood from a large set of raytracing simulations. The algorithm tries to find a new set of (nonorthogonal) basis vectors with respect to which the components of the shear correlation functions become approximately statistically independent. This then allows us to estimate the highdimensional likelihood as a product of onedimensional probability distributions. A drawback of this method is that quite a large sample of realistically simulated correlation functions is required to get good results for the tails of the likelihood. However, this should become less problematic in the near future when increasingly large raytracing simulations will become available.We have investigated how the constraints on matter and vacuum energy density, Hubble parameter and power spectrum normalisation depend on the shape of the likelihood for a survey composed of fields and a redshift distribution similar to the CDFS. We find that if the nonGaussianity of the likelihood is taken into account, the posterior likelihood becomes more sharply peaked and skewed. When fitting only for and , the maximum of the posterior is shifted towards lower and higher , and the area of the highest posterior density credible region decreases by about compared to the case of a Gaussian likelihood. For the fourdimensional parameter space, we have conducted a Fisher matrix analysis to obtain lower limits on the errors achievable with a survey. As in the twodimensional case, we find the most important effect to be that the error bars decrease by compared to the Gaussian likelihood. Less severe is the slight tilt of the Fisher ellipses when marginalising over two of the four parameters, particularly when h_{100} is involved.
In the second part of this work, we have presented a reanalysis of the CDFSHST data. Using the nonGaussian likelihood, we find for (keeping all other parameters fixed to their fiducial values), compared to obtained from the Gaussian likelihood with a covariance matrix estimated from the raytracing simulations. We have then tried to quantify how (un)likely it is to randomly select a field with the characteristics of the Chandra Deep Field South with a power spectrum normalisation this low. We have used 9600 raytracing realisations of the CDFS to estimate the sampling distribution of the ICAMAP estimator for . For our fiducial, WMAP5like cosmology, we find that , assuming that the location of the CDFS on the sky was chosen randomly. The fact that the CDFS was selected not to contain an extended Xray source in the ROSAT AllSky Survey can lead to a bias of the estimated by at most . This is because the clusters excluded by this criterion are rare and mostly at low redshifts, and therefore not very lensingefficient. The second relevant selection criterion is that the CDFS should not contain any relevant NED source. We model this by selecting only those fields which contain a specific number of group and clustersized haloes. We find that for those realisations for which the number of such haloes is below the average, the estimates of can be biased low by about 5. We include this effect in our likelihood analysis by extending our model shear correlation function by a correction factor depending on and treating as a nuisance parameter. This increases the estimate of by to for the ICA likelihood and by to for the Gaussian likelihood. This procedure also yields tentative evidence that the number of massive haloes in the CDFS is only of the average, in qualitative agreement with the findings of Phleps et al. (2007).
Finally, we combine the CDFS cosmic shear results with the constraints on cosmological parameters from the WMAP experiment. We fit for and , where we marginalise over the Hubble constant and take into account the field selection bias by marginalising also over . While the posterior is clearly dominated by the WMAP data, the CDFS still allows us to exclude parts of the parameter space with high values of both and . Assuming a flat Universe, the MAP estimates for these two parameters are and .
Acknowledgements
We would like to thank Tim Eifler for useful discussions and the computation of the Gaussian covariance matrix, and Sherry Suyu for careful reading of the manuscript. We thank Richard Massey and William High for providing the STEP2 image simulations. JH acknowledges support by the Deutsche Forschungsgemeinschaft within the Priority Programme 1177 under the project SCHN 342/6 and by the BonnCologne Graduate School of Physics and Astronomy. T.S. acknowledges financial support from the Netherlands Organization for Scientific Research (NWO) and the German Federal Ministry of Education and Research (BMBF) through the TR33 ``The Dark Universe''. We acknowledge the use of the Legacy Archive for Microwave Background Data Analysis (LAMBDA). Support for LAMBDA is provided by the NASA Office of Space Science.
Appendix A: Projection pursuit density estimation
In order to have an independent check of the ICAbased likelihood estimation algorithm, we employ the method of projection pursuit density estimation (PPDE; Friedman et al. 1984). Like our ICA method, PPDE aims to estimate the joint probability density of a random vector , given a set of observations of . As starting point, an initial model for the multidimensional probability distribution has to be provided, for which a reasonable choice is e.g. a multivariate Gaussian with a covariance matrix estimated from the data. The method then identifies the direction along which the marginalised model distribution differs most from the marginalised density of the data points and corrects for the discrepancy along the direction by multiplying p_{0} with a correction factor. This yields a refined density estimate , which can be further improved by iteratively applying the outlined procedure.
More formally, the PPDE density estimate is of the form
where p_{M} is the estimate after M iterations of the procedure and p_{0} is the initial model. The univariate functions are multiplicative corrections to the initial model along the directions . The density estimate can be obtained iteratively using the relation . At the Mth step of the iteration, a direction and a function f_{M} are chosen to minimise the KullbackLeibler divergence (Kullback & Leibler 1951) between the actual data density and the density estimate ,
(A.2) 
as a goodnessoffit measure. The KullbackLeibler divergence provides a ``distance measure'' between two probability distribution functions, since it is nonnegative and zero only if , albeit not symmetric. Only the cross term
(A.3) 
of the KL divergence is relevant for the minimisation, all other terms do not depend on and f_{M}. By using Eq. (A.1), one sees that the minimum of W is attained at the same location as the minimum of
which is the expectation value of with respect to . The data density is unknown; however, the data comprise a set of N samples from this distribution. The expectation value of can therefore be estimated by
(A.5) 
For fixed , the minimum of Eq. (A.4) is attained for
where and are the marginal densities of the data and of model density from the (M1)st iteration along the direction , respectively. With this, the iterative process that leads to estimates of and f_{M} schematically consists of:
 choosing a direction ;
 computing the marginal densities and ;
 computing according to Eq. (A.6);
 computing ;
 choosing a new that decreases ;
 continuing from step 2 until a convergence criterion is fulfilled.
Note that the PPDE technique, although using very similar methodology as our ICAbased procedure, is different in the important point that it does not rely on the assumption that a linear transformation of the data leads to statistical independence of the components of the transformed data vectors. It therefore comprises a good test of the validity of this approximation.
Appendix B: Fisher matrix of the ICA likelihood
In this appendix, we give the derivation of Eq. (23). In
the general case, the Fisher matrix is given by (e.g. Kendall et al. 1987)
In our case, the likelihood depends on cosmological parameters only through the difference between data and model vector, i.e. (see Eq. (8)). This allows us to write
=  (B.2)  
=  (B.3) 
where in the last step we have made use of the fact that the likelihood factorises in the ICA basis. The expression for the Fisher matrix then can be written as
(B.4) 
Next, we compute the expectation value on the right hand side and obtain
=  (B.5)  
(B.6) 
The integrals in the first term of the right hand side vanish since the p_{si} drop to zero for very large and small values of s_{i}. This leaves us with
The derivatives in Eq. (B.7) can be strongly affected by noise in the estimated p_{si}(s_{i}), in particular in the tails of the distributions. For their numerical computation, we therefore choose the following fourpoint finite difference operator (Abramowitz & Stegun 1964):
(B.8) 
which we find to be more stable against this problem than its more commonly used twopoint counterpart. Because of this potential difficulty, we crosscheck our results with the alternative method provided by Eq. (22). This method is significantly slower, but numerically simpler. This is because the derivatives of the loglikelihood in Eq. (22) are on average computed close to the maximumlikelihood point, where the likelihood estimate is well sampled. Reassuringly, we find excellent agreement between the two methods. Finally, we have investigated the influence of the choice of the Kernel function K in Eq. (10), which might affect the computation of the numerical derivatives. Our results prove to be stable against variation of K, provided that we chose a differentiable Kernel function.
Appendix C: Further conclusions for our KSB+ pipeline from the STEP simulations
In this appendix we assume that the reader is familiar with basic KSB notation. For a short introduction and a summary of differences between various implementations see Heymans et al. (2006).Within the Shear TEsting Programme^{} (STEP) simulated images containing sheared galaxies are analysed in blind tests, in order to test the shear measurement accuracy of weak lensing pipelines. In these analyses the shear recovery accuracy has been quantified in terms of a multiplicative calibration bias m and additive PSF residuals c. From the analysis of the first set of simulations (STEP1, Heymans et al. 2006), which mimic simplified groundbased observations, we find that our KSB+ implementation significantly underestimates gravitational shear on average if no calibration correction is applied. After the elimination of selection and weightingrelated effects this shear calibration bias amounts to a relative underestimation of . According to our testing the largest contribution to this bias originates from the inversion of the tensor, which describes the response of galaxy ellipticity to gravitational shear. While a fulltensor inversion reduces this bias, it strongly increases the measurement noise (see also Erben et al. 2001) and dependence on galaxy selection criteria. We therefore decided to stick to the commonly applied approximation of , which we measure from individual galaxies, and correct the shear estimate using a multiplicative calibration factor of in the S07 analysis. This average calibration correction was found to be stable to the level between different STEP1 simulation subsets. However, note that the bias depends on the details of the KSB implementation, which might explain some of the scatter between the results for different KSB codes in STEP1. In particular, we identified a strong dependence on the choice of the Gaussian filter scale , which is used in the computation of the KSB brightness moments. For example changing from our default , where is the flux radius as measured by SExtractor (Bertin & Arnouts 1996), to , worsens the bias to .
The average calibration correction likewise proved to be robust for the second set of image simulations (STEP2, Massey et al. 2007a), which also mimics groundbased data but takes into account more complex PSFs and galaxy morphologies by applying the shapelets formalism (Massey et al. 2004). Yet, the STEP2 analysis revealed a significant magnitude dependence of the shear recovery accuracy for our implementation, with a strong deterioration at faint magnitudes. In this analysis we applied the same signaltonoise cut S/N>4.0 as in STEP1 (KSB S/N as defined in Erben et al. 2001), where we however ignored the strong noise correlations present in the STEP2 data, which was added to mimic the influence of drizzle.
Figure C.1: Estimate of the effective influence of the noise correlations in the STEP2 simulations: plotted is the ratio of the pixel value dispersion measured from large areas of N=M^{2} pixels to the estimate from the normal single pixel dispersion as a function of M, determined from an objectfree STEP2 image. In the absence of noise correlations r=1 for all M. The value for gives the factor by which the signaltonoise is overestimated when measured from the single pixel dispersion ignoring the correlations. 

Open with DEXTER 
In the case of uncorrelated noise the dispersion of the sum over the pixel
values of N pixels
scales as
where is the dispersion computed from single pixel values. Drizzling, or convolution in the case of the STEP2 simulations, reduces but introduces correlations between neighbouring pixels. The signaltonoise of an object is usually defined as the ratio of the summed object flux convolved with some window or weight function, divided by an rms estimate for the noise in an equal area convolved with the same weight function. If the noise estimate is computed from and scaled according to Eq. (C.1), the correlations are neglected and the noise estimate is too small.
In order to estimate the effective influence
of the noise correlations in STEP2, we use a pure noise image which was
provided together with the simulated images.
We compute the rms of the pixel sum
in independent quadratic
subregions of the image with side length
and determine the ratio
(C.2) 
which in the absence of correlated noise would be equal to 1 for all N. In the presence of noise correlations it will for large N converge to the factor by which underestimates the uncorrelated . This can be understood as drizzling or convolution typically redistributes pixel flux within a relatively small area. As soon as this kernel is much smaller than the area spanned by M^{2} pixels, the correlations become unimportant for the area pixel sum. The measured r(M) is plotted in Fig. C.1. Extrapolating to we estimate that ordinary noise measures based on the single pixel dispersion, which ignore the noise correlation, will overestimate the signaltonoise of objects by a factor . Hence, our original selection criterion S/N>4.0 corresponds to a very low true cut including much noisier objects than in STEP1.
Figure C.2: Calibration bias m as a function of the uncorrected KSB signaltonoise S/N for the TS analysis of the STEP2 simulations. Thin solid (dashed) lines show () estimates for individual PSFs, where we show individual errorbars only for one PSF for clarity. The bold solid line and errorbars show the mean and standard deviation of the individual PSF estimates and shear components. Note the deterioration of the shear estimate for the STEP2 galaxies with ( ). For this plot an adapted calibration correction of 1.08 was applied. 

Open with DEXTER 
Figure C.3: Calibration bias m and PSF residuals c as a function of input galaxy magnitude and size for our refined analysis of the STEP2 data. Thin solid (dashed) lines show () estimates for individual PSFs, where we include individual errorbars only for one PSF for clarity. Bold solid lines and errorbars show the mean and standard deviation of the individual PSF estimates and shear components. In this plot only galaxies with S/N>7 ( ) are taken into account, which strongly reduces the deterioration found in Massey et al. (2007a) for faint magnitudes. For this plot an adapted calibration correction of 1.08 was applied. 

Open with DEXTER 
We plot the dependence of our STEP2 shear estimate on the (uncorrected) S/Nin Fig. C.2. For , corresponding to , a significant deterioration of the shear signal occurs, with a mean calibration bias and a large scatter between the different PSF models. We conclude that this approximately marks the limit down to which our KSB+ implementation can reliably measure shear. If we apply a modified cut S/N>7.0 to the STEP2 galaxies, the resulting magnitude and size dependence of the shear calibration bias is 5% (top panels in Fig. C.3). The remaining galaxies are best corrected with a slightly reduced calibration factor , which we apply for the plots shown in this appendix and the updated shear analysis presented in this paper. The difference between the calibration corrections derived from STEP1 and STEP2 agrees with the estimated accuracy. Note that the error increases for the highly elliptical PSFs D and E ( ) in STEP2, for which in addition significant PSF anisotropy residuals occur (bottom panels in Fig. C.3). This should however not affect our analysis given that typical ACS PSF ellipticities rarely exceed , see e.g. S07.
References
 Abramowitz, M., & Stegun, I. A. 1964, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, ninth edn. (New York: Dover) (In the text)
 Benjamin, J., Heymans, C., Semboloni, E., et al. 2007, MNRAS, 381, 702 [NASA ADS] [CrossRef] (In the text)
 Bernstein, G. M., & Jarvis, M. 2002, AJ, 123, 583 [NASA ADS] [CrossRef] (In the text)
 Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Chiu, K.C., Liu, Z.Y., & Xu, L. 2003, in Proc. 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), Nara, Japan, 751 (In the text)
 Comon, P., Jutten, C., & Hérault, J. 1991, Signal Processing, 24, 11 [CrossRef]
 Cooray, A., & Hu, W. 2001, ApJ, 554, 56 [NASA ADS] [CrossRef]
 Davison, A. C. 2003, Statistical Models, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press)
 Dunkley, J., Komatsu, E., Nolta, M. R., et al. 2009, ApJS, 180, 306 [NASA ADS] [CrossRef] (In the text)
 Eifler, T., Schneider, P., & Hartlap, J. 2009, A&A, 502, 721 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Erben, T., van Waerbeke, L., Bertin, E., Mellier, Y., & Schneider, P. 2001, A&A, 366, 717 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Freedman, W. L., Madore, B. F., Gibson, B. K., et al. 2001, ApJ, 553, 47 [NASA ADS] [CrossRef] (In the text)
 Friedman, J., Stuetzle, W., & Schroeder, A. 1984, J. Am. Stat. Assoc., 79, 599 [CrossRef] (In the text)
 Fu, L., Semboloni, E., Hoekstra, H., et al. 2008, A&A, 479, 9 [NASA ADS] [CrossRef] [EDP Sciences]
 Gelman, A., Carlin, J. B., Stern, H., & Rubin, D. B. 2004, Bayesian Data Analysis (Chapman & Hall/CRC) (In the text)
 Giacconi, R., Rosati, P., Tozzi, P., et al. 2001, ApJ, 551, 624 [NASA ADS] [CrossRef] (In the text)
 Giavalisco, M., Ferguson, H. C., Koekemoer, A. M., et al. 2004, ApJ, 600, L93 [NASA ADS] [CrossRef]
 Grazian, A., Fontana, A., de Santis, C., et al. 2006, A&A, 449, 951 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Hartlap, J., Simon, P., & Schneider, P. 2007, A&A, 464, 399 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Hastie, T., Tibshirani, R., & Friedman, J. 2001, The Elements of Statistical Learning (Springer)
 Heymans, C., Brown, M. L., Barden, M., et al. 2005, MNRAS, 361, 160 [NASA ADS] [CrossRef] (In the text)
 Heymans, C., Van Waerbeke, L., Bacon, D., et al. 2006, MNRAS, 368, 1323 [NASA ADS] [CrossRef] (In the text)
 Hilbert, S., Hartlap, J., White, S. D. M., & Schneider, P. 2009, A&A, 499, 31 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Hoekstra, H., Franx, M., Kuijken, K., & Squires, G. 1998, ApJ, 504, 636 [NASA ADS] [CrossRef]
 Hoekstra, H., Mellier, Y., van Waerbeke, L., et al. 2006, ApJ, 647, 116 [NASA ADS] [CrossRef]
 Hyvärinen, A. & Oja, E. 1997, Neural Computation, 9(7), 1438 (In the text)
 Hyvärinen, A. & Oja, E. 2000, Neural Networks, 13(45), 411
 Hyvärinen, A., Karhunen, J., & Oja, E. 2001, Independent Component Analysis (Wiley Interscience)
 Jain, B., Seljak, U., & White, S. 2000, AJ, 530, 547 [NASA ADS] [CrossRef] (In the text)
 Joachimi, B., Schneider, P., & Eifler, T. 2008, A&A, 477, 43 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Jutten, C., & Hérault, J. 1991, Signal Processing, 24, 1 [CrossRef]
 Kaiser, N., & PanSTARRS Collaboration 2005, in BAAS, 37, 465 (In the text)
 Kaiser, N., Squires, G., & Broadhurst, T. 1995, ApJ, 449, 460 [NASA ADS] [CrossRef]
 Kendall, M. G., Stuart, A., & Ord, J. K. (eds.) 1987, Kendall's advanced theory of statistics (New York, NY, USA: Oxford University Press, Inc.) (In the text)
 Komatsu, E., Dunkley, J., Nolta, M. R., et al. 2009, ApJS, 180, 330 [NASA ADS] [CrossRef]
 Kuijken, K. 2006, A&A, 456, 827 [NASA ADS] [CrossRef] [EDP Sciences]
 Kullback, S., & Leibler, R. A. 1951, Annals of Mathematical Statistics, 22, 79 [CrossRef] (In the text)
 Liang, J.J., Fang, K.T., Hickernell, F. J., & Li, R. 2001, Math. Comput., 70, 337 [NASA ADS] (In the text)
 Luppino, G. A., & Kaiser, N. 1997, ApJ, 475, 20 [NASA ADS] [CrossRef]
 Massey, R., Refregier, A., Conselice, C. J., David, J., & Bacon, J. 2004, MNRAS, 348, 214 [NASA ADS] [CrossRef] (In the text)
 Massey, R., Heymans, C., Bergé, J., et al. 2007a, MNRAS, 376, 13 [NASA ADS] [CrossRef] (In the text)
 Massey, R., Rhodes, J., Leauthaud, A., et al. 2007b, ApJS, 172, 239 [NASA ADS] [CrossRef] (In the text)
 Miller, L., Kitching, T. D., Heymans, C., Heavens, A. F., & van Waerbeke, L. 2007, MNRAS, 382, 315 [NASA ADS] [CrossRef] (In the text)
 Peacock, J. A., & Dodds, S. J. 1996, MNRAS, 280, 19 [NASA ADS] (In the text)
 Phleps, S., Wolf, C., Peacock, J. A., Meisenheimer, K., & van Kampen, E. 2007, in Cosmic Frontiers, ed. N. Metcalfe, & T. Shanks, ASP Conf. Ser., 379, 327 (In the text)
 Press, W., et al. 1992, Numerical Recipes in C (Cambridge University Press) (In the text)
 R Development Core Team. 2007, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (In the text)
 Refregier, A., & Bacon, D. 2003, MNRAS, 338, 48 [NASA ADS] [CrossRef]
 Reiprich, T. H., & Böhringer, H. 2002, ApJ, 567, 716 [NASA ADS] [CrossRef] (In the text)
 Rix, H.W., Barden, M., Beckwith, S. V. W., et al. 2004, ApJS, 152, 163 [NASA ADS] [CrossRef]
 Schneider, P. 2006, in SaasFee Advanced Course 33: Gravitational Lensing: Strong, Weak and Micro, ed. G. Meylan, P. Jetzer, P. North, P. Schneider, C. S. Kochanek, & J. Wambsganss, 269 (In the text)
 Schneider, P., van Waerbeke, L., Kilbinger, M., & Mellier, Y. 2002, A&A, 396, 1 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Schrabback, T., Erben, T., Simon, P., et al. 2007, A&A, 468, 823 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
 Scoccimarro, R., Zaldarriaga, M., & Hui, L. 1999, ApJ, 527, 1 [NASA ADS] [CrossRef]
 Scott, D. W. 1992, Multivariate Density Estimation: Theory, Practice, and Visualization (New York: John Wiley & Sons)
 Semboloni, E., Mellier, Y., van Waerbeke, L., et al. 2006, A&A, 452, 51 [NASA ADS] [CrossRef] [EDP Sciences]
 Semboloni, E., van Waerbeke, L., Heymans, C., et al. 2007, MNRAS, 375, 6 [NASA ADS] (In the text)
 Semboloni, E., Tereno, I., van Waerbeke, L., & Heymans, C. 2009, MNRAS, 397, 608 [NASA ADS] [CrossRef] (In the text)
 Silverman, B. W. 1986, Density Estimation (London: Chapman and Hall)
 Smail, I., Hogg, D. W., Yan, L., & Cohen, J. G. 1995, ApJ, 449, 105 [NASA ADS] [CrossRef] (In the text)
 Smith, R. E., Peacock, J. A., Jenkins, A., et al. 2003, MNRAS, 341, 1311 [NASA ADS] [CrossRef] (In the text)
 Springel, V. 2005, MNRAS, 364, 1105 [NASA ADS] [CrossRef] (In the text)
 Springel, V., White, S. D. M., Jenkins, A., et al. 2005, Nature, 435, 629 [NASA ADS] [CrossRef] (In the text)
 Takada, M., & Jain, B. 2009, MNRAS, 395, 2065 [NASA ADS] [CrossRef]
 Venables, W., & Ripley, B. 2002, Modern Applied Statistics with S (Springer)
 Wolf, C., Meisenheimer, K., Kleinheinrich, M., et al. 2004, A&A, 421, 913 [NASA ADS] [CrossRef] [EDP Sciences] (In the text)
Footnotes
 ...(R Development Core Team 2007)^{}
 http://www.rproject.org/
 ... Programme^{}
 http://www.physics.ubc.ca/~heymans/step.html
All Tables
Table 1: Estimates of from the CDFS.
Table 2: for the CDFS.
All Figures
Figure 1: Area of the (dashed lines) and (solid lines) credible regions in the plane as function of the sample size N, for the Gaussian likelihood (red, upper curves) and the likelihood computed using the ICA algorithm (black, lower curves). Blue lines are the predicted areas based on Eq. (18). 

Open with DEXTER  
In the text 
Figure 2: Comparison of the joint distributions p(s_{i},s_{j}) (black dashed contours) and the product p_{si}(s_{i}) p_{sj}(s_{j}) (solid red contours) for the two most nonGaussian components (i=1, j=2) and two rather Gaussian ones (i=9, j=10), after performing ICA ( left panels) and PCA ( right panels). The components have been ranked and labelled according to their nonGaussianity; the ith PCA component is in general not the same as ith ICAcomponent. In the right panel of each plot, the distributions with respect to the PCA basis vectors are shown and in the left panel, the distributions in the ICA basis are displayed. Statistical independence is indicated by p(s_{i},s_{j})=p_{si}(s_{i}) p_{sj}(s_{j}). 

Open with DEXTER  
In the text 
Figure 3: Comparison of the posterior likelihoods for , computed using the ICA likelihood (black contours) and the PPDE likelihood (red contours). Shown are the contours of the , and credible regions. 

Open with DEXTER  
In the text 
Figure 4: Comparison of the posterior likelihoods for , computed using the ICA likelihood ( left panel) and the Gaussian approximation ( right panel). Shown are the contours of the , and credible regions. The maximum of the ICA posterior is denoted by ``x''; the maximum of the posterior based on the Gaussian likelihood coincides with the fiducial parameter set and is marked by the symbol ``o''. 

Open with DEXTER  
In the text 
Figure 5: Fisher matrix constraints for a hypothetical 1500 survey, consisting of 6000 CDFSlike fields. The plots on the diagonal show the 1D marginals, the offdiagonal plots the 2D marginals derived from the full 4D posterior. The red dashed (black solid) lines/contours have been computed using the Fisher matrix of the Gaussian likelihood (the ICA likelihood). 

Open with DEXTER  
In the text 
Figure 6: Posterior distributions for as computed from the CDFS data. The black solid line corresponds to the ICA likelihood, the red dashed line is from the Gaussian likelihood whose covariance matrix was estimated from the raytracing simulations. The blue dotted line was computed from the Gaussian likelihood with an analytically computed covariance matrix, assuming that the shear field is Gaussian. The similarity of the posterior densities derived from the ICA likelihood and using the Gaussian covariance matrix is purely coincidental, occurring only for this particular data vector. 

Open with DEXTER  
In the text 
Figure 7: Sampling distributions of the MAP estimators of , derived from 9600 realisations of the CDFS. All other parameters were held fixed at their fiducial values for the fit. The histogram with red dashed lines has been obtained from the Gaussian likelihood, the one with solid lines from the ICA likelihood. Also shown are the best fitting Gaussian distributions. We indicate the fiducial value of and our estimates from the CDFS with vertical lines. 

Open with DEXTER  
In the text 
Figure 8: Same as Fig. 7, but using recentred correlation functions. 

Open with DEXTER  
In the text 
Figure 9: The average values of the ICAMAP (solid black line) and GaussMAP (solid red line) estimators computed from CDFS realisations that do not contain clusters with an Xray flux larger than . For comparison, we also plot the averages of the corresponding median estimators (dashed lines). 

Open with DEXTER  
In the text 
Figure 10: Dependence of the ICAMAPestimator for on the number of group and clustersized haloes between z=0 and z=0.5. For each bin, we summarise the distribution of the corresponding subsample of simulated CDFSfields by giving a box plot: the thick horizontal line in each box denotes the median, the upper and lower box boundaries give the upper and lower quartiles of the distribution of the sample values. The error bars (``whiskers'') extend to the and quantiles, respectively. To visualise the tails of the distributions, the most extreme values are given as points. The width of each box is proportional to the square root of the sample size. For comparison, we also show for each subsample the median of the Gauss MAP estimators as red crosses. The solid black horizontal line indicates the true value of , the black dashed line the ICAMAP estimate for the CDFS and the red dotted line the GaussMAP estimate. The average number of haloes with and in a CDFSlike field is . 

Open with DEXTER  
In the text 
Figure 11: Ratios r_{+} ( upper panel) and r_{} ( lower panel) of the shear correlation functions in a particular bin to the average correlation function of all realisations. The lowest (solid) curve represents the bin with , the second lowest the bin with , and so on. The highest ratio corresponds to the bin with . The error bars have been estimated from the fieldtofield variation. 

Open with DEXTER  
In the text 
Figure 12: Upper panel: posterior density for and computed using the ICA likelihood, keeping all other cosmological parameters constant. Lower panels: marginalised posterior densities of ( left panel) and ( right panel). Solid black curves show the results from using the ICA likelihood, dashed red lines from the Gaussian likelihood and the raytracing covariance. 

Open with DEXTER  
In the text 
Figure 13: Posterior density for and , where we have marginalised over the Hubble constant h_{100} and the number of haloes in the field . The dashed blue contours show the , and credible regions resulting from the cosmic shear analysis of the CDFS (using the ICA likelihood), the red contours show the posterior from the WMAP 5year data (using the flat CDM model). The combined posterior is shown with thick black contours. 

Open with DEXTER  
In the text 
Figure C.1: Estimate of the effective influence of the noise correlations in the STEP2 simulations: plotted is the ratio of the pixel value dispersion measured from large areas of N=M^{2} pixels to the estimate from the normal single pixel dispersion as a function of M, determined from an objectfree STEP2 image. In the absence of noise correlations r=1 for all M. The value for gives the factor by which the signaltonoise is overestimated when measured from the single pixel dispersion ignoring the correlations. 

Open with DEXTER  
In the text 
Figure C.2: Calibration bias m as a function of the uncorrected KSB signaltonoise S/N for the TS analysis of the STEP2 simulations. Thin solid (dashed) lines show () estimates for individual PSFs, where we show individual errorbars only for one PSF for clarity. The bold solid line and errorbars show the mean and standard deviation of the individual PSF estimates and shear components. Note the deterioration of the shear estimate for the STEP2 galaxies with ( ). For this plot an adapted calibration correction of 1.08 was applied. 

Open with DEXTER  
In the text 
Figure C.3: Calibration bias m and PSF residuals c as a function of input galaxy magnitude and size for our refined analysis of the STEP2 data. Thin solid (dashed) lines show () estimates for individual PSFs, where we include individual errorbars only for one PSF for clarity. Bold solid lines and errorbars show the mean and standard deviation of the individual PSF estimates and shear components. In this plot only galaxies with S/N>7 ( ) are taken into account, which strongly reduces the deterioration found in Massey et al. (2007a) for faint magnitudes. For this plot an adapted calibration correction of 1.08 was applied. 

Open with DEXTER  
In the text 
Copyright ESO 2009