Weak-lensing shear estimates with general adaptive moments, and studies of bias by pixellation, PSF distortions, and noise

Patrick Simon; Peter Schneider

doi:10.1051/0004-6361/201629591

Home

All issues

Volume 604 (August 2017)

A&A, 604 (2017) A109

Full HTML

Free Access

Issue		A&A Volume 604, August 2017


Article Number		A109
Number of page(s)		18
Section		Cosmology (including clusters of galaxies)
DOI		https://doi.org/10.1051/0004-6361/201629591
Published online		22 August 2017

A&A 604, A109 (2017)

Weak-lensing shear estimates with general adaptive moments, and studies of bias by pixellation, PSF distortions, and noise

Patrick Simon and Peter Schneider

Argelander-Institut für Astronomie, Universität Bonn, Auf dem Hügel 71, 53121 Bonn, Germany
e-mail: psimon@astro.uni-bonn.de

Received: 26 August 2016
Accepted: 18 May 2017

Abstract

In weak gravitational lensing, weighted quadrupole moments of the brightness profile in galaxy images are a common way to estimate gravitational shear. We have employed general adaptive moments (GLAM ) to study causes of shear bias on a fundamental level and for a practical definition of an image ellipticity. The GLAM ellipticity has useful properties for any chosen weight profile: the weighted ellipticity is identical to that of isophotes of elliptical images, and in absence of noise and pixellation it is always an unbiased estimator of reduced shear. We show that moment-based techniques, adaptive or unweighted, are similar to a model-based approach in the sense that they can be seen as imperfect fit of an elliptical profile to the image. Due to residuals in the fit, moment-based estimates of ellipticities are prone to underfitting bias when inferred from observed images. The estimation is fundamentally limited mainly by pixellation which destroys information on the original, pre-seeing image. We give an optimised estimator for the pre-seeing GLAM ellipticity and quantify its bias for noise-free images. To deal with images where pixel noise is prominent, we consider a Bayesian approach to infer GLAM ellipticity where, similar to the noise-free case, the ellipticity posterior can be inconsistent with the true ellipticity if we do not properly account for our ignorance about fit residuals. This underfitting bias, quantified in the paper, does not vary with the overall noise level but changes with the pre-seeing brightness profile and the correlation or heterogeneity of pixel noise over the image. Furthermore, when inferring a constant ellipticity or, more relevantly, constant shear from a source sample with a distribution of intrinsic properties (sizes, centroid positions, intrinsic shapes), an additional, now noise-dependent bias arises towards low signal-to-noise if incorrect prior densities for the intrinsic properties are used. We discuss the origin of this prior bias. With regard to a fully-Bayesian lensing analysis, we point out that passing tests with source samples subject to constant shear may not be sufficient for an analysis of sources with varying shear.

Key words: gravitational lensing: weak / methods: data analysis / methods: statistical

© ESO, 2017

1. Introduction

Over the past decade measurements of distortions of galaxy images by the gravitational lensing effect have developed into an important, independent tool for cosmologists to study the large-scale distribution of matter in the Universe and its expansion history (recent reviews: Schneider 2006; Munshi et al. 2008; Hoekstra & Jain 2008; Massey et al. 2010; Kilbinger 2015). These studies exploit the magnification and shear of galaxy light bundles by the tidal gravitational field of intervening matter. The shear gives rise to a detectable coherent distortion pattern in the observed galaxy shapes. The distortions are usually weak, only of the order of a few per cent of the unlensed shape of a typical galaxy image. Therefore, the key to successfully devising gravitational shear as cosmological tool is accurate measurements of the shapes of many, mostly faint and hardly resolved galaxy images.

There has been a boost of interest in methods of shape measurements in anticipation of the gravitational lensing analysis of the upcoming next generation of wide-field imaging surveys (e.g. Euclid; Laureijs et al. 2011). Despite being sufficient for contemporary surveys, current methodologies are not quite at the required level of accuracy to fully do justice to the quantity of cosmological information we expect from future lensing surveys (Heymans et al. 2006; Massey et al. 2007). The challenge all methodologies face is that observable, noisy (post-seeing) galaxy images are modifications of the actual (pre-seeing) image owing to instrumental and possible atmospheric effects. Post-seeing galaxy images are subject to pixellation as well as instrumental noise, sky noise, photon noise, and random overlapping with very faint objects (Kitching et al. 2012; Hoekstra et al. 2015). In addition, galaxies are not intrinsically circular such that their ellipticities are noisy estimators of the cosmic distortion. Current theoretical work consequently focuses on sources of bias in shape measurements, such as pixel-noise bias, shape-noise bias, underfitting bias, colour gradients, or several selection biases (Hirata & Seljak 2003; Hirata et al. 2004; Mandelbaum et al. 2005; Melchior et al. 2010; Viola et al. 2011; Melchior & Viola 2012; Kacprzak et al. 2012; Massey et al. 2013; Semboloni et al. 2013).

One major source of bias is the pixel-noise bias or simply noise bias hereafter. This bias can at least partly be blamed on the usage of point estimates of galaxy shapes in a statistical analysis, that is single-value estimators of galaxy ellipticities (Refregier et al. 2012). This begs the question of whether it is feasible to eradicate noise bias by means of a more careful treatment of the statistical uncertainties in the measurement of galaxy ellipticities within a fully Bayesian framework. Indeed recent advances in image processing for weak gravitational lensing strongly support this idea, at least for the inference of constant shear (Sheldon 2014; Bernstein & Armstrong 2014, BA14 hereafter; Bernstein et al. 2016, BAKM16 hereafter). In contrast, the contemporary philosophy with point estimates is to perform elaborate, time-consuming calibrations of biased estimators by means of simulated images; the calibration accuracy is, additionally, only as good as the realism of simulated images (e.g. Hoekstra et al. 2015). It is the case that code implementations of non-Bayesian techniques are typically an order of magnitude or more faster than Bayesian codes which could be a decisive factor for upcoming surveys.

Here we take a new look into possible causes of bias in shear measurements on a fundamental level. To this end, we examine, step-by-step with increasing complexity, a fully-Bayesian lensing analysis based on weighted brightness moments of galaxy images (Gelman et al. 2013; MacKay 2003). While the method in BA14 and BAKM16 is set in Fourier space, we work with moments in angular space which has benefits in the case of correlated noise or missing pixels in realistic images. Moment-based methods as ours are non-parametric; this means they are free from assumptions about the galaxy brightness profile. They hence appear to be advantageous for reducing bias, but nonetheless the specific choice of the adaptive weight for the moments is known to produce bias (Viola et al. 2014; Voigt & Bridle 2010). The origin of this problem, which principally also affects unweighted moments, becomes obvious in our formalism. We define as practical measure of galaxy shape a generalization of the impractical ellipticity ϵ expressed in terms of unweighted moments (Kaiser et al. 1995; Seitz & Schneider 1997, SS97 hereafter). Being Bayesian, our measurement of ellipticity results in a Monte-Carlo sample of the probability distribution function (PDF) of ϵ which should be propagated in a fully Bayesian analysis. That is: we do not devise point estimators in order to ideally stay clear of noise bias. This overall approach of general adaptive moments (GLAM ) is inspired by and comparable to Bernstein & Jarvis (2002) apart from the Bayesian framework and some technical differences: (i) for any adaptive weight, the perfectly measured GLAM ellipticity is an unbiased estimator of gravitational shear unaffected by shape-noise bias; (ii) the adaptive weight may have a non-Gaussian radial profile; (iii) our inference of the pre-seeing ellipticity is realised as forward-fitting of elliptical profiles (so-called templates), that is we do not determine the brightness moments of the post-seeing image and correct them to estimate the pre-seeing moments (cf. Hirata & Seljak 2003; Mandelbaum et al. 2005).

As a disclaimer, the GLAM methodology outlined here is prone to bias, even where a fully Bayesian analysis can be realised, and, at this stage, its performance is behind that of other techniques. The aim of this paper is to elucidate causes of bias, instead of proposing a new technique that is competitive to existing techniques. However, these findings are also relevant for other methodologies because model-based or moment-based approaches are linked to GLAM .

For this paper, we exclude bias from practically relevant factors: the insufficient knowledge of the PSF or noise properties, blending of images, and the selection of source galaxies (see e.g. Heymans et al. 2006; Hartlap et al. 2011; Dawson et al. 2016). We focus on the core of the problem of shape measurements which is the inference of pre-seeing ellipticities from images whose full information on the brightness profile have been lost by instrumental limitations.

The paper is laid out as follows. In Sect. 2, we introduce the formalism of GLAM for a practical definition of ellipticity with convenient transformation properties under the action of gravitational shear. We also analytically investigate the limits of measuring the pre-seeing ellipticity from a noise-free but both PSF-convolved and pixellated image. In Sect. 3, we construct a statistical model for the GLAM ellipticity of noisy post-seeing images. We then study the impact of inconsistencies in the posterior model of ellipticity in three, increasingly complex scenarios. First, we analyse with independent exposures of the same pre-seeing image the ellipticity bias due to a misspecified likelihood in the posterior (underfitting bias). Second, we consider samples of noisy images with the same ellipticity but distributions of intrinsic properties such as sizes or centroid positions. Here a new contribution to the ellipticity bias emerges if the prior densities of intrinsic properties are incorrectly specified (prior bias). Third in Sect. 4, we perform numerical experiments with samples of noisy galaxy images of random intrinsic shapes that are subject to constant shear. With these samples we study the impact of inconsistent ellipticity posteriors on shear constraints (shear bias). We also outline details on our technique to Monte-Carlo sample ellipticity or shear posteriors. We discuss our results and possible improvements of the GLAM approach in Sect. 5.

2. General adaptive moments

2.1. Definition

Let I(x) be the light distribution in a galaxy image of infinite resolution and without noise where x is the position on the sky. A common way of defining a (complex) ellipticity of a galaxy image uses the quadrupole moments, $Q_{ij} = \frac{\int d^{2} x (x_{i} - X_{0,i}) (x_{j} - X_{0,j}) I (x)}{\int d^{2} xI (x)},$ $\begin{equation} \label{eq:unwqij} Q_{ij}= \frac{\int\d^2x\;(x_i-X_{0,i})\,(x_j-X_{0,j})\,I(\vec{x})} {\int\d^2x\;I(\vec{x})}, \end{equation}$ (1)of I(x) relative to a centroid position, $X_{0} = \frac{\int d^{2} x x I (x)}{\int d^{2} xI (x)},$ $\begin{equation} \label{eq:uwx0} \vec{X}_0= \frac{\int\d^2x\;\vec{x}\,I(\vec{x})} {\int\d^2x\;I(\vec{x})}, \end{equation}$ (2)of the image (e.g. Bartelmann & Schneider 2001). For real applications, the quadrupole moments are soundly defined only if they involve a weight, decreasing with separation from the centroid position X₀, because galaxies are not isolated so that the normalisation ${}^{\int}d^{2} xI (x)$ $\hbox{$\int\d^2x\;I(\vec{x})$}$ and the brightness moments diverge. Hirata & Seljak (2003, H03 hereafter) address the divergence problems by realising an adaptive weighting scheme by minimising the error functional, sometimes dubbed the energy functional, $E (p | I) = \frac{1}{2} \int d^{2} x [I (x) - Af (ρ)]^{2},$ $\begin{equation} \label{eq:functional} E(\vec{p}|I)= \frac{1}{2} \int\d^2x\; \Big[I(\vec{x})-Af(\rho)\Big]^2, \end{equation}$ (3)with the quadratic form $ρ : = (x - x_{0})^{T} M^{-1} (x - x_{0})$ $\begin{equation} \rho:= (\vec{x}-\vec{x}_0)^{\rm T}\mat{M}^{-1}(\vec{x}-\vec{x}_0) \end{equation}$ (4)and the second-order tensor $M = \frac{T}{2} (\begin{matrix} 1 + e_{1} & e_{2} \\ e_{2} & 1 - e_{1} \end{matrix}) .$ $\begin{equation} \label{eq:ellipticity} \mat{M}=\frac{T}{2} \left( \begin{array}{cc} 1+e_1 & e_2 \\ e_2 & 1-e_1 \end{array} \right). \end{equation}$ (5)The tensor M is expressed in terms of the complex ellipticity e = e₁ + ie₂ and the size T of the image; f(ρ) is a weight function that HS03 chose to be a Gaussian weight f(ρ) = e^{− ρ/ 2}. In comparison to HS03, we have slightly changed the definition of ρ for convenience: here we use ρ instead of ρ². The set p = (A,x₀,M), comprises a set of six parameters on which the functional E depends for a given galaxy image I.

Frequently another definition of complex ellipticity, the ϵ-ellipticity, is more convenient (sometimes also known as the third flattening). It arises if we write ρ in the form $ρ = {| V^{-1} (x - x_{0}) |}^{2} = (x - x_{0})^{T} V^{-2} (x - x_{0}),$ $\begin{equation} \label{eq:rho} \rho =\left|\mat{V}^{-1}(\vec{x}-\vec{x}_0)\right|^2 =(\vec{x}-\vec{x}_0)^{\rm T}\mat{V}^{-2}(\vec{x}-\vec{x}_0), \end{equation}$ (6)where V is symmetric. Obviously, we have V² = M or $V = \sqrt{M}$ $\hbox{$\mat{V}=\sqrt{\mat{M}}$}$ . By writing V in the form $V = \frac{t}{2} (\begin{matrix} 1 + ϵ_{1} & ϵ_{2} \\ ϵ_{2} & 1 - ϵ_{1} \end{matrix})$ $\begin{equation} \mat{V}= \frac{t}{2} \left( \begin{array}{cc} 1+\epsilon_1 & \epsilon_2\\ \epsilon_2 & 1-\epsilon_1 \end{array} \right) \end{equation}$ (7)we see that V² = M implies 2T = t² (1 + | ϵ | ²), and $e = \frac{2 ϵ}{1 + | ϵ |^{2}}, ϵ = ϵ_{1} + i ϵ_{2} = \frac{e}{1 + \sqrt{1 - | e |^{2}}} \cdot$ $\begin{equation} \label{eq:elltransf} e =\frac{2\epsilon}{1+|\epsilon|^2}~,~ \epsilon= \epsilon_1+\i\epsilon_2=\frac{e}{1+\sqrt{1-|e|^2}}\cdot \end{equation}$ (8)We henceforth use ϵ as parametrisation of M because ϵ is an unbiased estimator of reduced shear in the absence of a PSF and pixellation (SS97). Conversely, the ellipticity e has to be calibrated with the distribution of unsheared ellipticities which poses another possible source of bias in a lensing analysis (H03).

As generally derived in Appendix A, the parameters p at the minimum of (3) are: $x_{0} = \frac{\int d^{2} x x I (x) f^{'} (ρ)}{\int d^{2} xI (x) f^{'} (ρ)}, A = \frac{\int d^{2} xI (x) f (ρ)}{\int d^{2} x f^{2} (ρ)}$ $\begin{equation} \label{eq:amx0} \vec{x}_0 = \frac{\int\d^2x\;\vec{x}\,I(\vec{x})f^\prime(\rho)} {\int\d^2x\;I(\vec{x})\,f^\prime(\rho)}~,~ A= \frac{\int\d^2x\;I(\vec{x})f(\rho)} {\int\d^2x\;f^2(\rho)}\ \end{equation}$ (9)and $M = β \frac{\int d^{2} x (x - x_{0}) (x - x_{0})^{T} I (x) f^{'} (ρ)}{\int d^{2} xI (x) f (ρ)},$ $\begin{equation} \label{eq:amm} \mat{M} = \beta \,\frac{\int\d^2x\; (\vec{x}-\vec{x}_0)(\vec{x}-\vec{x}_0)^{\rm T} I(\vec{x})f^\prime(\rho)} {\int\d^2x\;I(\vec{x})f(\rho)}, \end{equation}$ (10)where $β : = - 2 {^{\int}}_{0}^{\infty} d s f^{2} (s) / f^{2} (0)$ $\hbox{$\beta:=-2\int_0^\infty\d s\;f^2(s)/f^2(0)$}$ is a constant. These equations are derived by HS03 for a Gaussian f(ρ), for which we have f′(ρ): = df/ dρ = − f/ 2. This shows that the best-fitting f(ρ) has the same centroid and, up to a scalar factor, the same second moment M = V² as the with f′(ρ) adaptively weighted image I(x). This is basically also noted in Lewis (2009) where it is argued that the least-square fit of any sheared model to a pre-seeing image I(x) provides an unbiased estimate of the shear, even if it fits poorly.

In this system of equations, the centroid x₀ and tensor M are implicitly defined because both f(ρ) and f′(ρ) on the right-hand-side are functions of the unknowns x₀ and M: the weights adapt to the position, size, and shape of the image. Equations (9) and (10) therefore need to be solved iteratively. The iteration should be started at a point which is close to the final solution. Such a starting point could be obtained by using the image position, determined by the image detection software, as initial value for x₀, and tensor M determined from a circular weight function with the same functional form as f. Nevertheless, there is no guarantee that the solution of this set of equations is unique. In fact, for images with two brightness peaks one might suspect that there are multiple local minima in p of the functional E. This may occur, for instance, in the case of blended images. One standard solution to this particular problem is to identify blends and to remove these images from the shear catalogue. Alternatively we could in principle try to fit two template profiles to the observed image, that is by adding a second profile E₂(p₂ | I) to the functional (3) and by minimising the new functional with respect to the parameter sets p and p₂ of both profiles simultaneously.

2.2. Interpretation

If an image I(x) has confocal elliptical isophotes, with the same ellipticity for all isophotes, one can define the ellipticity of the image uniquely by the ellipticity of the isophotes. In this case, the ellipticity ϵ defined by the minimum of (3) coincides with the ellipticity of the isophotes for any weight f. We show this property in the following.

Assume that the brightness profile I(x) is constant on confocal ellipses so that we can write I(x) = S(ζ) where ζ: = (x − x_c)^TB^-2(x − x_c). Here x_c denotes the centre of the image, and the matrix elements of B describe the size and the shape of the image, in the same way as we discussed for the matrix V before. The function S(ζ) describes the radial brightness profile of the image. We start by writing (9) in the form $0 = \int d^{2} xS (ζ) f^{'} (ρ) (x - x_{0}),$ $\begin{equation} 0=\int\d^2x\;S(\zeta)\,f^\prime(\rho)\,(\vec{x}-\vec{x}_0), \end{equation}$ (11)and introduce the transformed position vector z = B^-1(x − x_c), or x = Bz + x_c. Then the previous equation becomes $0 = \int d^{2} zS (| z |^{2}) f^{'} (ρ) (B z + x_{c} - x_{0}),$ $\begin{equation} \label{eq:centroid1} 0=\int\d^2z\;S(|\vec{z}|^2)\,f^\prime(\rho)\,(\mat{B}\vec{z}+\vec{x}_{\rm c}-\vec{x}_0), \end{equation}$ (12)where in terms of z the quadratic form ρ is $ρ = (B z + x_{c} - x_{0})^{T} V^{-2} (B z + x_{c} - x_{0}) .$ $\begin{equation} \rho=(\mat{B}\vec{z}+\vec{x}_{\rm c}-\vec{x}_0)^{\rm T}\mat{V}^{-2}(\mat{B}\vec{z}+\vec{x}_{\rm c}-\vec{x}_0). \end{equation}$ (13)From these equations, we can see that x₀ = x_c is the solution of (12) since then ρ is an even function of z, S is an even function of z, whereas the term in the parenthesis of (12) is odd, and the integral vanishes due to symmetry reasons. Thus we found that our adaptive moments approach yields the correct centre of the image.

Next, we rewrite (10) in the form $\begin{matrix} V^{2} \int d^{2} xI (x) f (ρ) & = & β \int d^{2} x (x - x_{0}) (x - x_{0})^{T} I (x) f^{'} (ρ) \\ = \end{matrix}$ $\begin{eqnarray} \nonumber \mat{V}^2\int\d^2x\;I(\vec{x})f(\rho)&=& \beta\int\d^2x\;(\vec{x}-\vec{x}_0)\,(\vec{x}-\vec{x}_0)^{\rm T} I(\vec{x})\,f^\prime(\rho)\\ &=& \label{eq:secondorder1} \beta\det{\mat{B}}\int\d^2z\;S(|\vec{z}|^2)\,f^\prime(\rho)\,\mat{B}\vec{z}\vec{z}^{\rm T}\mat{B}, \end{eqnarray}$ (14)where β, see Eq. (10), is a constant factor (see Appendix A.3). We again used the transformation from x to z = B^-1(x − x_c) and employed the fact that x₀ = x_c. Accordingly, we have ρ = z^TBV^-2Bz. We now show that the solution of (14) is given by V = λB with λ being a scalar factor. Using this Ansatz, we get ρ = λ^-2 | z | ², and (14) can be written, after multiplying from the left and from the right by B^-1, as $B^{-1} V^{2} B^{-1} = λ^{2} 1 = β \frac{\int d^{2} zS (| z |^{2}) f^{'} (| z |^{2} / λ^{2}) z z T}{\int d^{2} zS (| z |^{2}) f (| z |^{2} / λ^{2})} \cdot$ $\begin{equation} \label{eq:secondorder2} \mat{B}^{-1}\mat{V}^2\mat{B}^{-1}=\lambda^2\mat{1} = \beta\frac{\int\d^2z\;S(|\vec{z}|^2)f^\prime(|\vec{z}|^2/\lambda^2)\vec{z}\vec{z}^{\rm T}} {\int\d^2z\;S(|\vec{z}|^2)f(|\vec{z}|^2/\lambda^2)}\cdot \end{equation}$ (15)Since both S and f′ in the numerator depend solely on | z | ², the tensor on the right hand side is proportional to the unit tensor 1, and (15) becomes a scalar equation for the scalar λ, $λ^{2} = β \frac{\int d ss S (s) f^{'} (s / λ^{2})}{\int d sS (s) f (s / λ^{2})},$ $\begin{equation} \lambda^2= \beta\frac{\int\d s\;s\,S(s)f^\prime(s/\lambda^2)} {\int\d s\;S(s)f(s/\lambda^2)}, \end{equation}$ (16)whose solution depends on the brightness profile S and the chosen weight function f. However, the fact that B differs from V only by the scalar factor λ implies that the derived ellipticity ϵ of V is the same as that of the elliptical image. Therefore we have shown that the approach of adapted moments recovers the true ellipticity with elliptical isophotes for any radial weight function f.

For a general brightness profile of the image, the interpretation of the GLAM ellipticity ϵ is less clear, and the ellipticity generally depends on the weight f. Nevertheless, ϵ is uniquely defined as long as a minimum of the functional (3) can be found. More importantly, for any weight f the GLAM ellipticity obeys the same simple transformation law under the action of gravitational shear, as shown in the following section.

2.3. Transformation under shear

We now consider the effect of a shear γ = γ₁ + iγ₂ and convergence κ on the GLAM ellipticity ϵ (Bartelmann & Schneider 2001). For an image with no noise, no pixellation, and no PSF convolution the ellipticity ϵ should be an unbiased estimate of the reduced shear g = g₁ + ig₂ = γ (1 − κ)^-1. This is clearly true for sources that intrinsically have circular isophotes since the isophotes of the sheared images are confocal ellipses with an ellipticity ϵ = g. The minimum of (3) is hence at g by means of the preceding discussion.

For general images, let ϵ_s = ϵ_s,1 + iϵ_s,2 be the complex ellipticity of the image in the source plane and ϵ = ϵ₁ + iϵ₂ its complex ellipticity in the lens plane. We show now that for any brightness profile and template f(ρ), GLAM ellipticities have the extremely useful property to transform under the action of a reduced shear according to $ϵ (g, ϵ_{s}) = {\begin{matrix} \frac{ϵ_{s} + g}{1 + g^{*} ϵ_{s}}, & | g | \leq 1, \\ \frac{1 + ϵ_{s}^{*} g}{ϵ_{s}^{*} + g^{*}}, & | g | > 1. \end{matrix}$ $\begin{equation} \label{eq:ss95} \epsilon(g,\epsilon_{\rm s})= \left\{ \begin{array}{ll} {\frac{\epsilon_{\rm s}+g}{1+g^\ast\epsilon_{\rm s}}} ,&\,|g|\le1,\\\\ {\frac{1+\epsilon_{\rm s}^\ast\,g}{\epsilon_{\rm s}^\ast+g^\ast}} ,&\,|g|>1. \end{array} \right. \end{equation}$ (17)This is exactly the well-known transformation obtained from unweighted moments (SS97). To show this generally for GLAM , let I_s(y) be the surface brightness of an image in the source plane with source plane coordinates y. The centroid y₀ and moment tensor M_s of I_s are defined by the minimum of (3) or, alternatively, by the analog of Eqs. (9) and (10) through $\int d^{2} y I_{s} (y) f (ρ_{s}) (y - y_{0}) = 0$ $\begin{equation} \label{eq:y0center} \int\d^2y\;I_{\rm s}(\vec{y})f(\rho_{\rm s})(\vec{y}-\vec{y}_0)=0 \end{equation}$ (18)and $M_{s} = β \frac{\int d^{2} y (y - y_{0}) (y - y_{0})^{T} I_{s} (y) f^{'} (ρ_{s})}{\int d^{2} y I_{s} (y) f (ρ_{s})}$ $\begin{equation} \label{eq:Ms} \mat{M}_{\rm s}= \beta\,\frac{\int\d^2y\;(\vec{y}-\vec{y}_0)(\vec{y}-\vec{y}_0)^{\rm T}I_{\rm s}(\vec{y})f^\prime(\rho_{\rm s})} {\int\d^2y\;I_{\rm s}(\vec{y})f(\rho_{\rm s})} \end{equation}$ (19)with $ρ_{s} = (y - y_{0})^{T} M_{s}^{-1} (y - y_{0}) .$ $\begin{equation} \label{eq:rhos} \rho_{\rm s}=(\vec{y}-\vec{y}_0)^{\rm T}\mat{M}_{\rm s}^{-1}(\vec{y}-\vec{y}_0). \end{equation}$ (20)The ellipticity ϵ_s is given by $M_{s} = V_{s}^{2}, V_{s} = \frac{t_{s}}{2} (\begin{matrix} 1 + ϵ_{s, 1} & ϵ_{s, 2} \\ ϵ_{s, 2} & 1 - ϵ_{s, 1} \end{matrix}) .$ $\begin{equation} \label{eq:ms1} \mat{M}_{\rm s}=\mat{V}_{\rm s}^2~,~ \mat{V}_{\rm s}= \frac{t_{\rm s}}{2} \left( \begin{array}{cc} 1+\epsilon_{{\rm s},1} & \epsilon_{{\rm s},2}\\ \epsilon_{{\rm s},2} & 1-\epsilon_{{\rm s},1} \end{array} \right). \end{equation}$ (21)We now shear the image I_s. The shear and the convergence are assumed to be constant over the extent of the image, that is lens plane positions x are linearly mapped onto source plane positions y by virtue of $\hbox{$\vec{y}-\vec{y}_{\rm c}={\cal A}(\vec{x}-\vec{x}_{\rm c})$}$ where $𝒜 = 𝒜^{T} = (1 - κ) (\begin{matrix} 1 - g_{1} & - g_{2} \\ - g_{2} & 1 + g_{1} \end{matrix}),$ $\begin{equation} {\cal A}={\cal A}^{\rm T}= (1-\kappa) \left( \begin{array}{cc} 1-g_1 & -g_2 \\ -g_2 & 1+g_1 \end{array} \right), \end{equation}$ (22)x_c and y_c are such that the point x_c is mapped onto the point y_c by the lens equation, and both are chosen to be the central points around which the lens equation is linearised. We then find for the centroid y₀ in the source plane $y - y_{0} = y - y_{c} + y_{c} - y_{0} = 𝒜 (x - x_{1}),$ $\begin{equation} \vec{y}-\vec{y}_0=\vec{y}-\vec{y}_{\rm c}+\vec{y}_{\rm c}-\vec{y}_0 ={\cal A}(\vec{x}-\vec{x}_1), \end{equation}$ (23)where $\hbox{$\vec{x}_1-\vec{x}_{\rm c}={\cal A}^{-1}(\vec{y}_0-\vec{y}_{\rm c})$}$ . This then yields for (20) $ρ_{s} = (x - x_{1})^{T} 𝒜 M_{s}^{-1} 𝒜 (x - x_{1}) .$ $\begin{equation} \rho_{\rm s}=(\vec{x}-\vec{x}_1)^{\rm T}{\cal A}\mat{M}_{\rm s}^{-1}{\cal A}\,(\vec{x}-\vec{x}_1). \end{equation}$ (24)In the next step, the expression (18) for the source centre can be rewritten by transforming to the image coordinates and using the conservation of surface brightness I_s(y(x)) = I(x): $\begin{matrix} 0 & = & \int d^{2} y I_{s} (y) f (ρ_{s}) (y - y_{0}) \\ = & \begin{matrix} \det \end{matrix} (𝒜) 𝒜 \int d^{2} xI (x) f (ρ_{s}) (x - x_{1}) . \end{matrix}$ $\begin{eqnarray} \label{eq:xcentre1} 0&=&\int\d^2y\;I_{\rm s}(\vec{y})\,f(\rho_{\rm s})\,(\vec{y}-\vec{y}_0)\\ \nonumber&=& \det{(\cal A)}\,{\cal A}\int\d^2x\;I(\vec{x})\,f(\rho_{\rm s})\,(\vec{x}-\vec{x}_1). \end{eqnarray}$ (25)With the same transformation, we rewrite the moment tensor (19) as $M_{s} = β 𝒜 \frac{\int d^{2} x (x - x_{1}) (x - x_{1})^{T} I (x) f^{'} (ρ_{s})}{\int d^{2} xI (x) f (ρ_{s})} 𝒜 .$ $\begin{equation} \label{eq:xtensor1} \mat{M}_{\rm s}= \beta{\cal A}\, \frac{\int\d^2x\;(\vec{x}-\vec{x}_1)(\vec{x}-\vec{x}_1)^{\rm T}I(\vec{x})f^\prime(\rho_{\rm s})} {\int\d^2x\;I(\vec{x})f(\rho_{\rm s})}\, {\cal A}. \end{equation}$ (26)On the other hand, minimising the functional (3) for the surface brightness I(x) in the lens plane yields the expressions (9) and (10) for x₀ and M, respectively. We then see that Eqs. (25), (26) and (9), (10) agree with each other, if we set $M_{s} = 𝒜M𝒜, x_{1} = x_{0},$ $\begin{equation} \mat{M}_{\rm s}={\cal A}\mat{M}{\cal A}~,~ \vec{x}_1=\vec{x}_0, \end{equation}$ (27)for which ρ_s = ρ. In particular, this shows that the centre x₀ of the image is mapped onto the centre y₀ of the source.

The relation $M = 𝒜^{-1} M_{s} 𝒜^{-1}$ $\begin{equation} \mat{M}={\cal A}^{-1}\mat{M}_{\rm s}{\cal A}^{-1} \end{equation}$ (28)can be rewritten in terms of the square roots of the matrix M as $M_{s} \equiv V_{s}^{2} = 𝒜 V^{2} 𝒜$ $\begin{equation} \label{eq:MsVs} \mat{M}_{\rm s}\equiv\mat{V}^2_{\rm s}={\cal A}\mat{V}^2{\cal A} \end{equation}$ (29)where $V_{s} = \sqrt{M_{s}}$ $\hbox{$\mat{V}_{\rm s}=\sqrt{\mat{M}_{\rm s}}$}$ is uniquely defined by requiring that for the symmetric, positive-definite matrix M_s, V_s is symmetric and positive-definite. Although both V and $\hbox{$\cal A$}$ are symmetric, $\hbox{$\mat{V}{\cal A}$}$ is in general not. Therefore V_s cannot be readily read off from (29). Instead, we use a rotation matrix $R (ϕ) = (\begin{matrix} \cos ϕ & - \sin ϕ \\ + \sin ϕ & \cos ϕ \end{matrix}), R^{T} (ϕ) = R^{-1} (ϕ) = R (- ϕ),$ $\begin{equation} \mat{R}(\varphi)= \left( \begin{array}{cc} \cos{\varphi} & -\sin{\varphi} \\ +\sin{\varphi} & \cos{\varphi} \end{array} \right)~,~ \mat{R}^{\rm T}(\varphi)=\mat{R}^{-1}(\varphi)=\mat{R}(-\varphi), \end{equation}$ (30)to write $V_{s}^{2} = 𝒜V R^{-1} (ϕ) R (ϕ) V𝒜 = [R (ϕ) V𝒜]^{T} [R (ϕ) V𝒜] .$ $\begin{equation} \mat{V}_{\rm s}^2 = {\cal A}\mat{V}\mat{R}^{-1}(\varphi)\mat{R}(\varphi)\mat{V}{\cal A} = [\mat{R}(\varphi)\mat{V}{\cal A}]^{\rm T}[\mat{R}(\varphi)\mat{V}{\cal A}]. \end{equation}$ (31)If we now choose ϕ to be such that $\hbox{$\mat{R}(\varphi)\mat{V}{\cal A}$}$ is symmetric, then $\hbox{$\mat{V}_{\rm s}=\mat{R}(\varphi)\mat{V}{\cal A}$}$ . After a bit of algebra, we find that the rotation angle ϕ is given through $e^{- i ϕ} = \frac{1 - ϵ g^{*}}{| 1 - ϵ g^{*} |},$ $\begin{equation} \e^{-\i\varphi}= \frac{1-\epsilon g^\ast}{|1-\epsilon g^\ast|}, \end{equation}$ (32)and we obtain as final result V_s as in (21) with $t_{s} = | 1 - ϵ g^{*} | t; ϵ^{s} = \frac{ϵ - g}{1 - ϵ g^{*}} \cdot$ $\begin{equation} \label{eq:epstrans} t_{\rm s}=|1-\epsilon g^\ast|\,t~;~ \epsilon^{\rm s}=\frac{\epsilon-g}{1-\epsilon g^\ast}\cdot \end{equation}$ (33)The inverse of (33) is given by $ϵ = \frac{ϵ_{s} + g}{1 + ϵ_{s} g^{*}} \cdot$ $\begin{equation} \label{eq:einverse} \epsilon= \frac{\epsilon_{\rm s}+g}{1+\epsilon_{\rm s}g^\ast}\cdot \end{equation}$ (34)We recover for the GLAM ellipticity ϵ exactly the transformation law of unweighted moments (SS97). The GLAM ellipticity ϵ is therefore an unbiased estimator of the reduced shear g along the line-of-sight of the galaxy, and there is no need to determine unweighted moments.

As a side remark, the transformation between ϵ and ϵ_s is a linear conformal mapping from the unit circle onto the unit circle, and from the origin of the ϵ-plane onto the point − g in the ϵ_s-plane. If | g | > 1, then g has to be replaced by 1 /g^∗ in Eq. (34), but we shall not be concerned here with this situation in the strong lensing regime.

2.4. Point spread function and pixellation

We have defined the GLAM ellipticity ϵ of an image I(x) relative to an adaptive weight f(ρ). This definition is idealised in the sense that it assumes an infinite angular resolution and the absence of any atmospheric or instrumental distortion of the image. Equally important, it ignores pixel noise. In this section, we move one step further to discuss the recovery of the original ϵ of an image after it has been convolved with a PSF and pixellated. The problem of properly dealing with noise in the image is discussed subsequently.

Let I_pre(x) be the original image prior to a PSF convolution and pixellation. This we call the “pre-seeing” image. Likewise, by the vector I_post of N_pix values we denote the “post-seeing” image that has been subject to a convolution with a PSF and pixellation. For mathematical convenience, we further assume that I_pre(x) is binned on a fine auxiliary grid with N ≫ N_pix pixels of solid angle Ω. We list these pixel values as vector I_pre. The approximation of I_pre(x) by the vector I_pre becomes arbitrarily accurate for N → ∞. Therefore we express the post-seeing image I_post = LI_pre by the linear transformation matrix L applied to the pre-seeing image I_pre. The matrix L with N × N_pix elements combines the effect of a (linear) PSF convolution and pixellation. Similarly, we bin the template f(ρ) in the pre-seeing frame to the grid of I_pre, and we denote the binned template by the vector f_ρ; as usual, the quadratic form ρ is here a function of the variables (x₀,ϵ,t), Eq. (6). The GLAM parameters p_pre of the pre-seeing image are given by the minimum of E(p | I_pre), or approximately by $E_{pre} (p | I_{pre}) : = (I_{pre} - A f_{ρ})^{T} (I_{pre} - A f_{ρ}) \approx 2 Ω^{-1} E (p | I_{pre}) .$ $\begin{equation} \label{eq:highresE} E_{\rm pre}(\vec{p}|\vec{I}_{\rm pre}):= (\vec{I}_{\rm pre}-A\vec{f}_\rho)^{\rm T}(\vec{I}_{\rm pre}-A\vec{f}_\rho) \approx 2\Omega^{-1}E(\vec{p}|I_{\rm pre}) . \end{equation}$ (35)For the recovery of the pre-seeing ellipticity ϵ, the practical challenge is to derive the pre-seeing parameters p_pre from the observed image I_post in the post-seeing frame. For this task, we assume that the transformation L is exactly known. We note that a linear mapping L is an approximation here; we ignore the nonlinear effects in the detector (Plazas et al. 2014; Gruen et al. 2015; Niemi et al. 2015; Melchior et al. 2015).

For a start, imagine a trivial case where no information is lost by going from I_pre to I_post. We express this case by a transformation L that can be inverted, which means that N = N_pix and L is regular. We then obtain p_pre by minimising E_pre(p | L^-1I_post) with respect to p: we map I_post to the pre-seeing frame and analyse I_pre = L^-1I_post there. This is equivalent to minimising the form (I_post − ALf_ρ)^T(LL^T)^-1(I_post − ALf_ρ) in the post-seeing frame.

Fig. 1

Examples of GLAM templates in the pre-seeing frame, f_ρ (top left), and the post-seeing frame, Lf_ρ (other panels); the templates are Gaussian radial profiles with f(ρ) = e^{− ρ/ 2}. The bottom left panel simulates only pixellation, whereas the right column also shows the impact of a PSF, indicated in the top right corner, without (top) and with pixellation (bottom).

For realistic problems where L^-1 does not exist, because N ≫ N_pix, this trivial case at least suggests to determine the minimum p_post of the new functional $E_{post} (p | I_{post}) : = (I_{post} - A L f_{ρ})^{T} U (I_{post} - A L f_{ρ})$ $\begin{equation} \label{eq:functional2} E_{\rm post}(\vec{p}|\vec{I}_{\rm post}):= (\vec{I}_{\rm post}-A\mat{L}\vec{f}_\rho)^{\rm T}\mat{U} (\vec{I}_{\rm post}-A\mat{L}\vec{f}_\rho) \end{equation}$ (36)as estimator of p_pre. This way we are setting up an estimator by forward-fitting the template f_ρ to the image in the post-seeing frame with the matrix U being a metric for the goodness of the fit. Clearly, should L^-1 exist we recover (35) only by adopting U = (LL^T)^-1. So we could equivalently obtain p_pre, without bias, by fitting Lf_ρ to the observed image I_post in this case. However, realistically L is singular: the recovery of p_pre from (36) can only be done approximately. Then we could at least find an optimal metric to minimise the bias. We return to this point shortly. In any case, the metric has to be positive-definite and symmetric such that always E_post ≥ 0. We note that the moments at the minimum of (36) are related but not identical to the adaptive moments in the post-seeing frame. To obtain the latter we would fit a pixellated f_ρ with U = 1 to I_post. The bottom and top right images in Fig. 1 display examples of post-seeing templates that are fitted to a post-seeing image to estimate p_pre with the functional (36).

For singular L, the minimum of the functional yields an unbiased p_pre for any U if

1.
I_pre(x) has confocal elliptical isophotes with the radial brightness profile S(x);
2.
and if we choose f(ρ) = S(ρ) as GLAM template;
3.
and if E_post has only one minimum (non-degenerate).

To explain, due to 1. and 2. we find a vanishing residual $R_{pre} = I_{pre} - A f_{ρ} = 0, for p = p_{pre},$ $\begin{equation} \vec{R}_{\rm pre}=\vec{I}_{\rm pre}-A\vec{f}_\rho=0~~,~{\rm for}~\vec{p}=\vec{p}_{\rm pre}\,, \end{equation}$ (37)at the minimum of E_pre and consequently E_pre(p_pre | I_pre) = 0. At the same time for any metric U, we also have E_post(p_pre | I_post) = 0 because I_post − ALf_ρ = LR_pre = 0 for p = p_pre. Because of the lower bound E_post ≥ 0, these parameters p_pre have to coincide with a minimum of E_post and hence indeed p_post = p_pre. We note that the previous argument already holds for the weaker condition LR_pre = 0 so that a mismatch between I_pre and the template at p_pre produces no bias if the mapped residuals vanish in the post-seeeing frame. In addition, if this is the only minimum of E_post then the estimator p_post is also uniquely defined (condition 3). An extreme example of a violation of condition 3 is the degenerate case N_pix = 1: the observed image consists of only one pixel. Then every parameter set (x₀,ϵ,t) produces E_post = 0, if A is chosen correspondingly. But this should be a rare case because it is not expected to occur for N_pix> 6, thus for images that span over more pixels than GLAM parameters.

A realistic pre-seeing image is neither elliptical nor is our chosen template f(ρ) likely to perfectly fit the radial light profile of the image, even if it were elliptical. This mismatch produces a bias in p only if L is singular, and the magnitude of the bias scales with the residual of the template fit in the pre-seeing frame. To see this, let ${\begin{matrix} ˜ \\ I \end{matrix}}_{pre}$ $\hbox{$\tilde{\vec{I}}_{\rm pre}$}$ be the best-fitting template Af_ρ with parameters p_pre. The residual of the template fit in the pre-seeing frame is $R_{pre} = I_{pre} - {\begin{matrix} _{˜} \\ I \end{matrix}}_{pre}$ $\hbox{$\vec{R}_{\rm pre}=\vec{I}_{\rm pre}-\tilde{\vec{I}}_{\rm pre}$}$ . In the vicinity of p_pre, we express the linear change of Af_ρ for small δp by its gradient at p_pre, $G = (G_{1}, ..., G_{6}) = \nabla_{p} (A f_{ρ}),$ $\begin{equation} \mat{G}=(\vec{G}_1,\ldots,\vec{G}_6)=\nabla_{\vec{p}}(A\vec{f}_\rho), \end{equation}$ (38)where $G_{i} = {\frac{\partial (A f_{ρ})}{\partial p_{i}} |}_{p = p_{pre}}, for i = 1, ..., 6.$ $\begin{equation} \vec{G}_i=\left.\frac{\partial(A\vec{f}_\rho)}{\partial p_i}\right|_{\vec{p}=\vec{p}_{\rm pre}} ~,\,{\rm for}~i=1,\ldots,6. \end{equation}$ (39)Each column G_i of the matrix G denotes the change of Af_ρ with respect to p_i. Therefore, close to p_pre we find the Taylor expansion $\begin{matrix} A f_{ρ} & = & {\begin{matrix} ˜ \\ I \end{matrix}}_{pre} + \sum_{i = 1}^{6} G_{i} δ p_{i} + O (δ p_{i} δ p_{j}) \\ = \end{matrix}$ $\begin{eqnarray} \nonumber A\vec{f}_\rho&=&\tilde{\vec{I}}_{\rm pre}+\sum_{i=1}^6\vec{G}_i\,\delta p_i+O(\delta p_i\,\delta p_j)\\ &=& \label{eq:taylor} \tilde{\vec{I}}_{\rm pre}+\mat{G}\,\delta\vec{p}+O(\delta p_i\,\delta p_j). \end{eqnarray}$ (40)Furthermore, since p_pre is a local minimum of E_pre, Eq. (35), we find at the minimum the necessary condition $\nabla_{p} E_{pre} (p | I_{pre}) = 0 = \Rightarrow G^{T} R_{pre} = 0 .$ $\begin{equation} \label{eq:epremin} \nabla_{\vec{p}}E_{\rm pre}(\vec{p}|\vec{I}_{\rm pre})=\vec{0} \Longrightarrow \mat{G}^{\rm T}\vec{R}_{\rm pre}=\vec{0}. \end{equation}$ (41)This means that the residual R_res is orthogonal to every G_i. Now let $R = L R_{pre} = I_{post} - L {\begin{matrix} _{˜} \\ I \end{matrix}}_{pre}$ $\hbox{$\vec{R}=\mat{L}\vec{R}_{\rm pre}=\vec{I}_{\rm post}-\mat{L}\tilde{\vec{I}}_{\rm pre}$}$ be the residual mapped to the post-seeing frame. Then (36) in the vicinity to p_pre is approximately $E_{post} (p_{pre} + δ p | I_{post}) \approx (R - LG δ p)^{T} U (R - LG δ p),$ $\begin{equation} \label{eq:Epost} E_{\rm post}(\vec{p}_{\rm pre}+\delta\vec{p}|\vec{I}_{\rm post}) \approx (\vec{R}-\mat{L}\mat{G}\delta\vec{p})^{\rm T}\mat{U} (\vec{R}-\mat{L}\mat{G}\delta\vec{p}), \end{equation}$ (42)which we obtain by plugging (40) into (36). This approximation is good if the bias is small, by which we mean that the minimum p_post of E_post(p | I_post) is close to p_pre. As shown in Aitken (1934) in the context of minimum-variance estimators, the first-order bias δp_min = p_post − p_pre that minimises (42) is then given by $δ p_{\min} = {(G^{T} L^{T} {ULG}^{)}}^{-1} G^{T} L^{T} U R = {(G^{T} U_{L} G^{)}}^{-1} G^{T} U_{L} R_{pre},$ $\begin{equation} \label{eq:bias} \delta\vec{p}_{\rm min}= \left(\mat{G}^{\rm T}\mat{L}^{\rm T}\mat{ULG}\right)^{-1}\mat{G}^{\rm T}\mat{L}^{\rm T}\mat{U}\,\vec{R} = \left(\mat{G}^{\rm T}\mat{U}_L\mat{G}\right)^{-1}\mat{G}^{\rm T}\mat{U}_L\,\vec{R}_{\rm pre}, \end{equation}$ (43)with U_L: = L^TUL. This reiterates that the bias δp_min always vanishes either for vanishing residuals R_pre = 0, or if L^-1 exists and we choose U = (LL^T)^-1 as metric. The latter follows from Eq. (43) with U_L = 1 and Eq. (41). We note that δp_min does not change if we multiply U by a scalar λ ≠ 0. Thus any metric λU generates as much bias as U.

With regard to an optimal metric U, we conclude from Eq. (43) and G^TR_pre = 0 that we can minimise the bias by the choice of U for which U_L ≈ 1, or if ∥ L^TUL − 1 ∥ ² is minimal with respect to the Frobenius norm ∥ Q ∥ ² = tr⁽QQ^T⁾. This optimised choice of U corresponds to the so-called pseudo-inverse (LL^T)⁺ of LL^T (Press et al. 1992). The pseudo-inverse is the normal inverse of LL^T if the latter is regular.

A practical computation of U = (LL^T)⁺ could be attained by choosing a set of orthonormal basis function b_i in the pre-seeing frame, or approximately a finite number N_base of basis functions that sufficiently describes images in the pre-seeing frame. For every basis function, one then computes the images Lb_i and the matrix $L L^{T} = L (\sum_{i = 1}^{\infty} b_{i} b_{i}^{T}) L^{T} \approx \sum_{i = 1}^{N_{base}} (L b_{i}) {(L b_{i})}^{T} .$ $\begin{equation} \label{eq:LL} \mat{L}\mat{L}^{\rm T}= \mat{L}\left(\sum_{i=1}^\infty\vec{b}_i^{}\vec{b}_i^{\rm T}\right)\mat{L}^{\rm T} \approx \sum_{i=1}^{N_{\rm base}} \left(\mat{L}\vec{b}_i\right)\left(\mat{L}\vec{b}_i\right)^{\rm T}. \end{equation}$ (44)The pseudo-inverse of this matrix is the metric in the post-seeing frame.

Specifically, for images that are only pixellated the matrix LL^T is diagonal. To see this, consider pixellations that map points e_i in the pre-seeing frame to a single pixels e′(e_i) in the post-seeing frame, or Le_i = e′(e_i); both e_i and e′(e_i) are unit vectors from the standard bases in the two frames. According to (44), the matrix LL^T = ∑ _ie′(e_i) [ e′(e_i) ] ^T is then always diagonal, typically proportional to the unit matrix, and easily inverted to obtain the optimal metric U. Unfortunately, this U only makes the linear-order bias (43) vanish while we still can have a higher-order bias because of the singular L.

2.5. Similarity between model-based and moment-based techniques

Through the formalism in the previous sections it becomes evident that there is no fundamental difference between model-based techniques and those involving adaptive moments. Model-based techniques perform fits of model profiles to the observed image, whereas moment-based ellipticities with the adaptive weight f′(ρ) are equivalently obtained from the image by fitting the ellipticial profile f(ρ) to the image. This requires, however, the existence of a unique minimum of the functional E(p | I) which we assume throughout the paper. We note that the similarity also extends to the special case of unweighted moments in Eqs. (1) and (2) which are in principle obtained by fitting f(ρ) = ρ since f′(ρ) = 1 in this case.

Nonetheless one crucial difference to a model-fitting technique is that a fit of f(ρ) does not assume a perfect match to I(x): the functional E(p | I) needs to have a minimum, but the fit is allowed to have residuals R_pre, this means E(p | I) ≠ 0 at the minimum. As shown, for the estimator based on (36) this may cause bias, Eq. (43), but only when analysing post-seeing images hence for L ≠ 1. In practice, different methodologies to estimate the pre-seeing moment-based ellipticity certainly use different approaches. Our choice of a forward-fitting estimator (36) is very specific but is optimal in the sense that it is always unbiased for LR_pre = 0 or regular L. Yet other options are conceivable. For instance, we could estimate moments in the post-seeing frame first and try to map those to the pre-seeing frame (e.g. H03 or Melchior et al. 2011, which aim at unweighted moments). It is then unclear which weight is effectively applied in the pre-seeing frame. Therefore the expression Eq. (43) for the bias is strictly applicable only to our estimator and adaptive moments. But it seems plausible that the bias of estimators of pre-seeing moments generally depends on the residual R_pre since the brightness moments x_0,i and M_ij are the solutions to a best-fit of elliptical templates.

In the literature the problem of bias due to residuals in model fits is known as model bias or underfitting bias (Zuntz et al. 2013; Bernstein 2010). Consequently, moment-based techniques are as prone to underfitting bias as model-based methodologies.

3. Statistical inference of ellipticity

Realistic galaxy images I are superimposed by instrumental noise δI. Therefore the pre-seeing GLAM ellipticity can only be inferred statistically with uncertainties, and it is, according to the foregoing discussion, subject to underfitting bias. For a statistical model of the ellipticity ϵ, we exploit the previous conclusions according to which the ellipticity of I(x) for the adaptive weight f′(ρ) is equivalent to ϵ of the best-fitting template f(ρ). This renders the inference of ϵ a standard forward-fit of a model ALf(ρ) to I.

We consider post-seeing images I with Gaussian noise δI, that means images I = I_post + δI. The covariance of the noise is N = ⟨ δIδI^T ⟩, while I_post = LI_pre is the noise-free image in the post-seeing frame. A Gaussian noise model is a fair assumption for faint galaxies in the sky-limited regime (Miller et al. 2007). Possible sources of noise are: read-out noise, sky noise, photon noise, or faint objects that blend with the galaxy image. If an approximate Gaussian model is not applicable, the following model of the likelihood has to be modified accordingly.

The statistical model of noise are given by the likelihood ℒ(I | p) of an image I = LI_pre + δI given the GLAM parameters p. We aim at a Bayesian analysis for which we additionally quantify our prior knowledge on parameters by the PDF P_p(p). We combine likelihood and prior to produce the marginal posterior $P_{ϵ} (ϵ | I) \propto \int d A d t d^{2} x_{0} ℒ (I | p) P_{p} (p)$ $\begin{equation} \label{eq:posterior} P_\epsilon(\epsilon|\vec{I})\propto \int\d A\,\d t\,\d^2x_0\;{\cal L}(\vec{I}|\vec{p})\,P_{\rm p}(\vec{p}) \end{equation}$ (45)of ellipticity by integrating out the nuisance parameters (x₀,A,t); the constant normalisation of the posterior is irrelevant for this paper but we assume that the posterior is proper (it can be normalised). Our choice for the numerical experiments in this study is a uniform prior P_p(p) for positive sizes t and amplitudes A, ellipticities | ϵ | < 1, and centroid positions x₀ inside the thumbnail image. As known from previous Bayesian approaches to shear analyses, the choice of the prior affects the consistency of the ellipticity posteriors (see, e.g. BA14). The origin of the prior-dependence will become clear in Sect. 3.3.

With regard to notation, we occasionally have to draw random numbers or vectors of random numbers x from a PDF P(x) or a conditional density P(x | y). We denote this by the shorthand x ~ P(x) and x ~ P(x | y), respectively. As common in statistical notation, distinct conditional probability functions may use the same symbol, as for instance the symbol P in P(x | y) and P(y | x).

3.1. Caveat of point estimates

Fig. 2

Toy-model demonstration of a maximum-likelihood estimator (red), a maximum-likelihood estimator with first-order bias correction (green), and an estimator exploiting the full posterior (blue). Data points display the estimator average (y-axis) over 10⁶ data points at varying signal-to-noise levels (x-axis). The true value to be estimated is x = 1. The panels show different signal-to-noise regimes; − nppt denotes y = 1 − n/ 10³.

The bias in a lensing analysis is not only affected by how we statistically infer galaxy shapes but also how we process the statistical information later on. To demonstrate in this context the disadvantage of point estimators in comparison to a fully Bayesian treatment, we consider here a simplistic nonlinear toy model. This model has one parameter x and one single observable y = x³ + n that is subject to noise n. By n ~ N(0,σ) we draw random noise from a Gaussian distribution N(0,σ) with mean zero and variance σ. From the data y, we statistically infer the original value of x. Towards this goal we consider the (log-)likelihood of y given x which is − 2lnℒ(y | x) = (y − x³)²σ^-2 + const.

A maximum likelihood estimator of x is given by x_est = y^{1 / 3}, the maximum of ℒ(y | x). We determine the bias of x_est as function of signal-to-noise ratio (S/N) x/σ by averaging the estimates of N_real = 10⁶ independent realisations of y. The averages and the standard errors are plotted as red line in Fig. 2. Clearly, x_est is increasingly biased low towards lower S/N levels. In the context of lensing, this would be noise bias. As an improvement we then correct the bias by employing the first-order correction in Refregier et al. (2012) for each realisation of y. As seen in the figure, this correction indeed reduces the systematic error, but nevertheless breaks down for S/N ≲ 3.

On the other hand in a fully Bayesian analysis, we obtain constraints on x that are consistent with the true value for any S/N. For this purpose, we make N_real independent identically distributed realisations (i.i.d.) y_i and combine their posterior densities P_post(x | y_i) ∝ ℒ(y_i | x) P_prior(x) by multiplying the likelihoods; we adopt a uniform (improper) prior P_prior(x) = const. This gives us for y = (y₁,...,y_{N_real}), up to a normalisation constant, the combined posterior $\ln P_{post} (x | y) + const = \sum_{i = 1}^{N_{real}} \ln ℒ (y_{i} | x) = - \frac{1}{2 σ^{2}} \sum_{i = 1}^{N_{real}} (y_{i} - x^{3})^{2} .$ $\begin{equation} \ln{P_{\rm post}(x|\vec{y})}+{\rm const}= \sum_{i=1}^{N_{\rm real}}\ln{{\cal L}(y_i|x)} = -\frac{1}{2\sigma^2}\sum_{i=1}^{N_{\rm real}}(y_i-x^3)^2. \end{equation}$ (46)As expected due to the asymptotic normality of posteriors (under regularity conditions), for i.i.d. experiments y_i the product density is well approximated by a Gaussian N(x₀,σ_x) and is consistent with the true value x (van der Vaart 1998). We plot values of x₀ and σ_x in Fig. 2 as blue data points.

In conclusion, keeping the full statistical information P_post(x | y) in the inference of x yields consistent constraints over the entire S/N range probed: the noise bias vanishes. Also note that the Bayesian approach has not substantially increased the error σ_x compared to the error of the point estimator (relative sizes of error bars); both approaches have similar efficiency.

3.2. Likelihood model and underfitting bias

Inspired by the foregoing Bayesian toy model that is free of noise bias, we set up a Bayesian approach for GLAM ellipticities. To construct a likelihood for a GLAM fit in the pre-seeing frame, we first consider, similar to Sect. 2.4, the trivial case where L is regular. This is straightforward since we can map the noisy image I → L^-1I back to the pre-seeing frame and determine the noise residual for given pre-seeing p, $δ I_{pre} (p) = L^{-1} I - R_{pre} - A f_{ρ} .$ $\begin{equation} \delta\vec{I}_{\rm pre}(\vec{p})= \mat{L}^{-1}\vec{I}-\vec{R}_{\rm pre}-A\vec{f}_\rho. \end{equation}$ (47)The inverse noise covariance in the pre-seeing frame is L^TN^-1L. The logarithmic likelihood of δI_pre(p) in the Gaussian case is thus $\begin{matrix} - 2 \ln ℒ_{pre} (I | p) + const . = δ I_{pre} (p)^{T} L^{T} N^{-1} L δ I_{pre} (p) \\ = (I - A L f_{ρ} - L R_{pre})^{T} N^{-1} (I - A L f_{ρ} - L R_{pre}) \\ = : ∥ I - A L f_{ρ} - L R_{pre} ∥_{N}^{2}, \end{matrix}$ $\begin{eqnarray} \nonumber \lefteqn{-2\ln{{\cal L}_{\rm pre}(\vec{I}|\vec{p})+{\rm const.}} =\delta\vec{I}_{\rm pre}(\vec{p})^{\rm T}\mat{L}^{\rm T}\,\mat{N}^{-1}\mat{L}\,\delta\vec{I}_{\rm pre}(\vec{p})} \\ \nonumber &&=\Big(\vec{I}-A\,\mat{L}\vec{f}_\rho-\mat{L}\vec{R}_{\rm pre}\Big)^{\rm T}\mat{N}^{-1}\, \Big(\vec{I}-A\,\mat{L}\vec{f}_\rho-\mat{L}\vec{R}_{\rm pre}\Big) \\ \label{eq:likee0} &&=:\,\|\vec{I}-A\mat{L}\vec{f}_\rho-\mat{L}\vec{R}_{\rm pre}\|^2_{\mat{N}}, \end{eqnarray}$ (48)where “const.” expresses the normalisation of the likelihood. Therefore, we can equivalently write the pre-seeing fit in terms of available post-seeing quantities. We note here that the transform LR_pre of the (unknown) pre-seeing residual is for singular L and R_pre ≠ 0 not equal to the post-seeing residual R_post, which we defined in a least-square fit of ALf_ρ to I_post (see Sect. 2.4).

In reality, L is singular so the previous steps cannot be applied. Nevertheless, Eq. (48) is the correctly specified likelihood of a forward fit if the model m(p): = Af_ρ + R_pre perfectly describes the brightness profile I_pre for some parameters p_true. Therefore we expect no inconsistencies for singular L as long as the correct R_pre can be given. Since R_pre is unknown, however, we investigate in the following the impact of a misspecified likelihood that does not properly account for our ignorance in the pre-seeing residuals. We do this by assuming R_pre ≡ 0 and employing $- 2 \ln ℒ (I | p) + const . = ∥ I - A L f_{ρ} ∥_{N}^{2} .$ $\begin{equation} \label{eq:likee1} -2\ln{{\cal L}(\vec{I}|\vec{p})}+{\rm const.}= \|\vec{I}-A\mat{L}\vec{f}_\rho\|^2_{\mat{N}}. \end{equation}$ (49)Obviously, this is a reasonable approximation of (48) if ∥ LR_pre ∥ ≪ ∥ I ∥ so that residuals are only relevant if they are not small when compared to the image in the post-seeing frame. It also follows from Eq. (48) that for LR_pre = 0 the likelihood is correctly specified even if R_pre ≠ 0 and L being singular, similar to the noise-free case. More generally we discuss later in Sect. 5 how we could modify the likelihood ℒ(I | p) to factor in our insufficient knowledge about R_pre. Until then the approximation (49) introduces underfitting bias into the likelihood model, making it inconsistent with the true p_pre.

To show the inconsistency, we proceed as in the foregoing section on the toy model. We consider a series of n i.i.d. realisations I_i of the same image I_post and combine their likelihoods ℒ(I_i | p) for a given p into the joint (product) likelihood $- 2 \ln 􏽙_{i = 1}^{n} ℒ (I_{i} | p) + const . : = \sum_{i = 1}^{n} ∥ I_{i} - A L f_{ρ} ∥_{N}^{2},$ $\begin{equation} -2\ln{\prod_{i=1}^n{\cal L}(\vec{I}_i|\vec{p})}+ {\rm const.}:= \sum_{i=1}^n\|\vec{I}_i-A\mat{L}\vec{f}_\rho\|^2_{\mat{N}}, \end{equation}$ (50)and we work out its limit for n → ∞. The joint likelihood can be written as $\begin{matrix} - 2 \ln 􏽙_{i = 1}^{n} ℒ (I_{i} | p) + const . = \sum_{i = 1}^{n} I_{i}^{T} N^{-1} I_{i} \\ - & 2 \sum_{i = 1}^{n} I_{i}^{T} N^{-1} (A L f_{ρ}) + \sum_{i = 1}^{n} (A L f_{ρ})^{T} N^{-1} (A L f_{ρ}) \\ = & tr (N^{-1} \sum_{i} I_{i}^{} I_{i}^{T}) - 2 \sum_{i} I_{i}^{T} N^{-1} (A L f_{ρ}) + n ∥ A L f_{ρ} ∥_{N}^{2} . \end{matrix}$ $\begin{eqnarray} \lefteqn{-2\ln{\prod_{i=1}^n{\cal L}(\vec{I}_i|\vec{p})}+{\rm const.} = \sum_{i=1}^n\vec{I}_i^{\rm T}\mat{N}^{-1}\vec{I}_i}\\ &-& 2\sum_{i=1}^n\vec{I}_i^{\rm T}\mat{N}^{-1}(A\mat{L}\vec{f}_\rho) + \sum_{i=1}^n(A\mat{L}\vec{f}_\rho)^{\rm T}\mat{N}^{-1}(A\mat{L}\vec{f}_\rho)\nonumber\\ &=&\nonumber \tr{\mat{N}^{-1}\,\sum_i\vec{I}_i^{}\vec{I}_i^{\rm T}}- 2\sum_i\vec{I}_i^{\rm T}\,\mat{N}^{-1}(A\mat{L}\vec{f}_\rho) + n\,\|A\mat{L}\vec{f}_\rho\|^2_{\mat{N}}. \end{eqnarray}$ (51)Here we have made use of the properties of the trace of matrices, namely its linearity $tr^{(} A + B^{)} = tr {}^{(}A^{)} + tr {}^{(}B^{)}$ $\hbox{$\tr{\mat{A}+\mat{B}}=\tr{\mat{A}}+\tr{\mat{B}}$}$ and that $I_{i}^{T} A I_{i} = tr {}^{(}I_{i}^{T} A I_{i}^{)} = tr^{(} A I_{i} {I_{i}^{T}}^{)}$ $\hbox{$\vec{I}_i^{\rm T}\,\mat{A}\,\vec{I}_i=\tr{\vec{I}_i^{\rm T}\,\mat{A}\,\vec{I}_i}=\tr{\mat{A}\,\vec{I}_i\,\vec{I}_i^{\rm T}}$}$ . For large n, we can employ the asymptotic expressions $\sum_{i} I_{i}^{T} \to n I_{post}, \sum_{i} I_{i} I_{i}^{T} \to n (N + I_{post} {I_{post}^{T}}^{)}$ $\begin{equation} \label{eq:Niapprox} \sum_i\vec{I}_i^{\rm T}\to n\,\vec{I}_{\rm post}~,~ \sum_i\vec{I}_i^{}\vec{I}_i^{\rm T}\to n\,\left(\mat{N}+\vec{I}_{\rm post}^{}\vec{I}_{\rm post}^{\rm T}\right) \end{equation}$ (52)to the following effect: $\begin{matrix} - 2 \ln 􏽙_{i = 1}^{n} ℒ (I_{i} | p) + const . \to n tr (N^{-1} N^{)} + n tr (I_{post}^{} I_{post}^{T} N^{-1}) \\ - 2 n I_{post}^{T} N^{-1} (A L f_{ρ}) + n (A L f_{ρ})^{T} N^{-1} (A L f_{ρ}) \\ = & n N_{pix} + n ∥ I_{post} - A L f_{ρ} ∥_{N}^{2} . \end{matrix}$ $\begin{eqnarray} \label{eq:Ppbias} \lefteqn{-2\ln{\prod_{i=1}^n{\cal L}(\vec{I}_i|\vec{p})}\!+\,{\rm const.} \to n\,\tr{\mat{N}^{-1}\mat{N}}\!+\! n\,\tr{\vec{I}^{}_{\rm post}\vec{I}_{\rm post}^{\rm T}\mat{N}^{-1}}} \\ &&\nonumber -2n\,\vec{I}_{\rm post}^{\rm T}\mat{N}^{-1}(A\mat{L}\vec{f}_\rho) +n\,(A\mat{L}\vec{f}_\rho)^{\rm T}\mat{N}^{-1}(A\mat{L}\vec{f}_\rho)\\ &=&\nonumber n\,N_{\rm pix}+ n\,\|\vec{I}_{\rm post}-A\mat{L}\vec{f}_\rho\|^2_{\mat{N}}. \end{eqnarray}$ (53)Thus the joint likelihood peaks for n → ∞ at the minimum p_post of ∥ I_post − ALf_ρ ∥ _N. This is equivalent to the location of the minimum of E_post(p | I_post), Eq. (36), with metric U = N^-1. Consequently, as in the noise-free case, we find an inconsistent likelihood if the residual R_pre is non-vanishing and if L^TN^-1L is not proportional to the unity matrix. We additionally expect a smaller bias δp = p_post − p_pre for smaller levels of residuals¹.

With regards to the noise dependence of underfitting bias, we find that δp does not change if we increase the overall level of noise in the image I. This can be seen by scaling the noise covariance N → λ N with a scalar λ. Any value λ> 0 results in the same minimum location for E_post(p | I_post) so that δp is independent of λ: there is no noise bias.

Moreover, the bias δp of a misspecified likelihood seems to depend on our specific assumption of a Gaussian model for the likelihood. It can be argued on the basis of general theorems on consistency and asymptotic normality of posteriors, however, that for n → ∞ we obtain the same results for other likelihood models under certain regularity conditions (van der Vaart 1998; Appendix B in Gelman et al. 2013). The latter requires continuous likelihoods that are identifiable, hence ℒ(I | p₁) ≠ ℒ(I | p₂) for p₁ ≠ p₂, and that the true p_true is not at the boundary of the domain of all parameters p. This is stricter than our previous assumptions for the Gaussian model where we needed only a unique global maximum of the likelihood ℒ(I_post | p). It could therefore be that a unique maximum is not sufficient for non-Gaussian likelihoods.

3.3. Prior bias

The foregoing section discusses the consistency of the likelihood of a single image I. We can interpret the analysis also in a different way: if we actually had n independent exposures I_i of the same pre-seeing image, then combining the information in all exposures results in a posterior $P_{p} (p | I) \propto {^{􏽑}}_{i = 1}^{n} ℒ (I_{i} | p) P_{p} (p)$ $\hbox{$P_{\rm p}(\vec{p}|\vec{I})\propto\prod_{i=1}^n\,{\cal L}(\vec{I}_i|\vec{p})\,P_{\rm p}(\vec{p})$}$ that is consistent with p_post for a uniform prior P_p(p) = 1. As discussed in van der Vaart (1998), the more general Bernstein-von Mises theorem additionally shows that under regularity conditions the choice of the prior is even irrelevant provided it sets prior mass around p_post (Cromwell’s rule). This might suggest that for a correctly specified likelihood ℒ(I_i | p), a fully Bayesian approach for the consistent measurement of ϵ might be found that is independent of the specifics of the prior and has no noise bias in the sense of Sect. 3.2. This is wrong as shown in the following.

Namely, in contrast to the previous simplistic scenario, sources in a lensing survey have varying values of p: they are intrinsically different. For more realism, we therefore assume now i = 1...n pre-seeing images that, on the one hand, shall have different centroid positions x_0,i, sizes t_i, amplitudes A_i but, on the other hand, have identical ellipticities ϵ. Our goal in this experiment is to infer ϵ from independent image realisations I_i = I_post,i + δI_i by marginalising over the 4n nuisance parameters q_i = (x_0,i,t_i,A_i). This experiment is similar to the standard test for shear measurements where a set of different pre-seeing images is considered whose realisations I_i are subject to the same amount of shear (e.g. Bridle et al. 2010). As a matter of fact, the inference of constant shear from an ensemble of images would just result in 2n additional nuisance parameters for the intrinsic shapes with essentially the same following calculations.

Let P_q(q_i) be the prior for the four nuisance parameters q_i of the ith image and P_ϵ(ϵ) = 1 a uniform prior for ϵ. We combine the GLAM parameters in p_i: = (q_i,ϵ), and we assume that all images have the same prior density and that the noise covariance N applies to all images. The marginal posterior of ϵ is then the integral $\begin{matrix} 𝒩 P_{ϵ} (ϵ | I_{1}, ..., I_{n}) & = & 􏽙_{i = 1}^{n} \int d^{4} q_{i} ℒ (I_{i} | p_{i}) P_{q} (q_{i}) P_{ϵ} (ϵ) \\ = & \int d^{4} q_{1} ... d^{4} q_{n} 􏽙_{i = 1}^{n} ℒ (I_{i} | p_{i}) \times 􏽙_{i = 1}^{n} P_{q} (q_{i}), \end{matrix}$ $\begin{eqnarray} \label{eq:Pproduct0} {\cal N}\,P_\epsilon(\epsilon|\vec{I}_1,\ldots,\vec{I}_n)&=& \prod_{i=1}^n\int\d^4q_i\; {\cal L}(\vec{I}_i|\vec{p}_i)\,P_{\rm q}(\vec{q}_i)\,P_\epsilon(\epsilon) \nonumber\\ &=&\int\d^4q_1\ldots\d^4q_n\; \prod_{i=1}^n{\cal L}(\vec{I}_i|\vec{p}_i) \times\prod_{i=1}^nP_{\rm q}(\vec{q}_i), \end{eqnarray}$ (54)with $\hbox{${\cal N}$}$ being a normalization constant.

The product of the likelihood densities inside the integral is given by $\begin{matrix} - 2 \ln 􏽙_{i = 1}^{n} ℒ (I_{i} | p_{i}) + const . = n tr (N^{-1} \frac{\sum_{i} I_{i}^{} I_{i}^{T}}{n}) \\ - 2 tr (N^{-1} \sum_{i} I_{i}^{T} (A_{i} L f_{ρ,i})) + \sum_{i} ∥ A_{i} L f_{ρ,i} ∥_{N}^{2} . \end{matrix}$ $\begin{eqnarray} \label{eq:Pproduct1} \lefteqn{-2\ln{\prod_{i=1}^n{\cal L}(\vec{I}_i|\vec{p}_i)+{\rm const.}} = n\,\tr{\mat{N}^{-1}\frac{\sum_i\vec{I}_i^{}\vec{I}_i^{\rm T}}{n}}}\\ &&-2\,\tr{\mat{N}^{-1}\sum_i\vec{I}_i^{\rm T}(A_i\mat{L}\vec{f}_{\rho,i})} + \sum_i\|A_i\mat{L}\vec{f}_{\rho,i}\|^2_{\mat{N}}.\nonumber \end{eqnarray}$ (55)Here we have taken into account that the GLAM parameters partly differ, indicated by the additional index in A_i and f_ρ,i. This is different in Eq. (53) where we take the product of full likelihoods in p-space without marginalization. In the limit of n → ∞, we find in addition to the relations (52) that $\begin{matrix} tr (N^{-1} \sum_{i} I_{i}^{T} (A_{i} L f_{ρ,i})) & = & tr (N^{-1} \sum_{i} (I_{post,i} + δ I_{i})^{T} (A_{i} L f_{ρ,i})) \\ \to & tr (N^{-1} \sum_{i} I_{post,i}^{T} A_{i} L f_{ρ,i}) \end{matrix}$ $\begin{eqnarray} \tr{\mat{N}^{-1}\sum_i\vec{I}_i^{\rm T}(A_i\mat{L}\vec{f}_{\rho,i})}&=& \tr{\mat{N}^{-1}\sum_i(\vec{I}_{{\rm post},i}+\delta\vec{I}_i)^{\rm T}(A_i\mat{L}\vec{f}_{\rho,i})}\nonumber\\ & \to& \tr{\mat{N}^{-1}\sum_i\vec{I}_{{\rm post},i}^{\rm T}\,A_i\mat{L}\vec{f}_{\rho,i}} \end{eqnarray}$ (56)because δI_i is uncorrelated to ALf_ρ,i so that $δ I_{i}^{T} (A L f_{ρ,i})$ $\hbox{$\delta\vec{I}_i^{\rm T}(A\mat{L}\vec{f}_{\rho,i})$}$ vanishes on average for many δI_i. Therefore, for the asymptotic statistic we can replace all I_i by I_post,i in Eq. (55) to obtain $􏽙_{i = 1}^{n} ℒ (I_{i} | p_{i}) \to 􏽙_{i = 1}^{n} ℒ (I_{post,i} | p_{i}),$ $\begin{equation} \prod_{i=1}^n{\cal L}(\vec{I}_i|\vec{p}_i)\to \prod_{i=1}^n{\cal L}(\vec{I}_{{\rm post},i}|\vec{p}_i), \end{equation}$ (57)and, as a result, for Eq. (54) $\begin{matrix} 𝒩 & P_{ϵ} (ϵ | I_{1}, ..., I_{n}) \\ \to 􏽙_{i = 1}^{n} \int d^{4} q_{i} ℒ (I_{post,i} | p_{i}) P_{q} (q_{i}) = : 􏽙_{i = 1}^{n} P_{ϵ} (ϵ | I_{post,i}), \end{matrix}$ $\begin{eqnarray} \label{eq:limit} & {\cal N}&\,P_\epsilon(\epsilon|\vec{I}_1,\ldots,\vec{I}_n)\nonumber\\ && \to \prod_{i=1}^n\int\d^4q_i\; {\cal L}(\vec{I}_{{\rm post},i}|\vec{p}_i)\,P_{\rm q}(\vec{q}_i) =:\prod_{i=1}^nP_\epsilon(\epsilon|\vec{I}_{{\rm post},i}), \end{eqnarray}$ (58)where P_ϵ(ϵ | I_post,i) is the marginal ellipticity posterior for the ith noise-free image.

The limit (58) has the interesting consequence that the consistency with p_post of the marginal posterior depends on the specific choice of the prior density P_q(q_i). To show this, consider one particular case in which, for simplicity, all pre-seeing images are identical such that I_post,i ≡ I_post. Then, according to (58), the ellipticity posterior converges in distribution to [ P_ϵ(ϵ | I_post) ] ⁿ which for n → ∞ peaks at the global maximum of P_ϵ(ϵ | I_post). It is then easy to see that we can always change the position of this maximum by varying the prior density in P_ϵ(ϵ | I_post). In particular, even if the likelihoods are correctly specified, we generally find an inconsistent marginal posterior depending on the prior. A similar argument can be made if I_post,i ≠ I_post,j for i ≠ j.

Fig. 3

Prior bias in the marginal posterior P_ϵ(ϵ | I₁,...,I_n) as function of S/N ν for different galaxy sizes r_h (in arcsec). The posterior assumes a uniform prior. Shown is the error δϵ = | ϵ − ϵ_true | of the inferred ϵ for a true ϵ_true = 0.3 as obtained by combining the marginal posteriors of 5 × 10³ exposures of the same galaxy with random centroid positions. The pixel size 0.1 arcsec equals the PSF size (Moffat). Galaxy profiles and GLAM templates have a Sérsic profile with n = 2: there is no underfitting.

For a concrete example, we perform for Fig. 3 a simulated analysis of 5 × 10³ noisy images with ϵ_true = 0.3. All pre-seeing images are identical to one particular template Af_ρ (Sérsic profile with n = 2). Therefore we have a correctly specified likelihood model and no underfitting. We adopt a uniform prior P_q(q) (and P_ϵ(ϵ)). The details on the simulated images and their analysis are given in Sect. 4. For each data point, we plot the mean and variance of the marginal posterior P_ϵ(ϵ | I₁,...,I_n) relative to the true ellipticity ϵ_true as function of S/N ν and for different image sizes r_h. Evidently, the bias δϵ increases for smaller ν thereby producing a noise-dependent bias which is not present when analysing the consistency of ℒ(I | p) of individual images as in Sect. 3.2.

The sensitivity of the posterior to the prior implies that consistency (for a correctly specified likelihood) could be regained by choosing an appropriate prior density P_q(q_i). On the other hand, the dependence on the prior runs against the conventional wisdom that the prior should become asymptotically irrelevant for n → ∞ as the joint likelihood starts dominating the information on inferred parameters. Indeed, general theorems show this under certain regularity conditions (see e.g. discussion in Chap. 4 of Gelman et al. 2013). These conditions are, however, not given here because the total number of model parameters is not fixed but rather increases linearly with n due to a new set of nuisance parameters q_i for every new galaxy image I_i. The observed breakdown of consistency is the result. A trivial (but practically useless) prior to fix the problem is one that puts all prior mass at the true values of the nuisance parameters which essentially leaves only ϵ as free model parameter. Likewise, an analysis with sources that knowingly have the same values for the nuisance parameters also yields consistent constraints; this is exactly what is done in Sect. 3.2. A non-trivial solution, as reported by BA14, is to use “correct priors” for the nuisance parameters that are equal to the actual distribution of q in the sample. On the downside, this raises the practical problem of obtaining correct priors from observational data.

In summary, for an incorrect prior of q, such as our uniform prior, we find a noise-dependent bias in the inferred ellipticity despite a fully Bayesian approach and a correctly specified likelihood. We henceforth call this bias “prior bias”. This emphasises the difference to the noise bias found in the context of point estimators.

4. Simulated shear bias

We have classified two possible sources of bias in our Bayesian implementation of adaptive moments: underfitting bias and prior bias. In this section, we study their impact on the inference of reduced shear g by using samples of mock images with varying S/N, galaxy sizes, and galaxy brightness profiles. In contrast to our previous experiments, the images in each sample have random intrinsic shapes but are all subject to the same amount of shear, and we perform numerical experiments to quantify the bias. Moreover, this section outlines practical details of a sampling technique for the posteriors of GLAM ellipticities and the reduced-shear, which is of interest for future applications (Sects. 4.4 and 4.5).

The PSF shall be exactly known for these experiments. For a large shear bias, we choose a relatively small PSF size and galaxy images that are not much larger than a few pixels. According to the discussion in Sect. 2.4, underfitting bias is mainly a result of pixellation which mathematically cannot be inverted. We note that a larger PSF size would reduce the underfitting bias since it spreads out images over more image pixels. Overall the values for shear bias presented here are larger than what is typically found in realistic surveys (e.g. Zuntz et al. 2013). We start this section with a summary of our simulation specifications.

4.1. Point-spread function

If not stated otherwise, our simulated postage stamps consist of 20 × 20 pixels (squares), of which one pixel has a size of 0.1′′; the simulated galaxies are relatively small with a half-light radius r_h of only a few times the pixel size (0.15′′ ≤ r_h ≤ 0.3′′). Additionally, we adopt an isotropic Moffat PSF, $I_{psf} (x) \propto {(1 + \frac{| x |^{2}}{α^{2}})}^{- β}, α : = \frac{θ_{FWHM}}{2 \sqrt{2^{1 / β} - 1}},$ $\begin{equation} I_{\rm psf}(\vec{x})\propto \left(1+\frac{|\vec{x}|^2}{\alpha^2}\right)^{-\beta}~~,~~ \alpha:=\frac{\theta_{\rm FWHM}}{2\sqrt{2^{1/\beta}-1}}, \end{equation}$ (59)with the full width half maximum (FWHM) of θ_FWHM = 0.1′′ and β = 5 (Moffat 1969).

4.2. GLAM template profiles

As GLAM templates we employ truncated Sérsic-like profiles with index n, $f (ρ) = \exp (- {[\frac{ρ}{ρ_{0}}]}^{\frac{1}{2 n}}) h (\sqrt{ρ}); h (x) : = \frac{1}{e^{5 (x - 3)} + 1}$ $\begin{equation} \label{eq:sersic} f(\rho)= \exp{\left(-\left[\frac{\rho}{\rho_0}\right]^{\frac{1}{2n}}\right)}\, h(\!\sqrt{\rho})~;~ h(x):=\frac{1}{\e^{5(x-3)}+1} \end{equation}$ (60)and $ρ_{0} : = (1.992 n - 0.3271)^{- 2 n}$ $\begin{equation} \rho_0:=(1.992n-0.3271)^{-2n} \end{equation}$ (61)(Capaccioli 1989). To avoid a numerical bias due to aliasing at the edges of the grid during the Fourier transformation steps in the sampling code, we have introduced the auxiliary function h(x) that smoothly cuts off the Sérsic profile beyond ρ ≈ 9. If f(ρ) were the radial light profile of an elliptical galaxy, the truncation would be located at about three half-light radii.

We use template profiles with n = 2 throughout. This index n falls between the values of n used for the model galaxies and thus is a good compromise to minimise the fit residual and thereby the underfitting bias.

4.3. Mock galaxy images

We generate postage stamps of mock galaxy images with varying half-light radii r_h, radial light profiles, and signal-to-noise ratios ν. To this end, we utilise the code that computes the post-seeing GLAM templates (Appendix B). We consider pre-seeing images of galaxies with elliptical isophotes of three kinds of light profiles, Eq. (60): (1) exponential profiles with Sérsic index n = 1 (EXP; exponential), (2) de-Vaucouleur profiles with n = 4 (DEV; bulge-like) and (3) galaxies with profiles n = 2 that match the profile of the GLAM template (TMP; template-like). TMP galaxies hence cannot produce underfitting bias.

We devise uncorrelated Gaussian noise in the simulation of postage stamps. To determine the RMS variance σ_rms of the pixel noise for a given ν, let f_i be the flux inside image pixels that is free of noise and f_tot = ∑ _if_i the total flux. From this, we compute a half-light flux threshold f_th defined such that the integrated flux f_hl = ∑ _{f_i ≥ f_th}f_i above the threshold is f_hl = f_tot/ 2 or as close as possible to this value. The pixels i with f_i ≥ f_th are defined to be within the half-light radius of the image; their number is N_hl: = ∑ _{f_i ≥ f_th}; the integrated noise variance within the half-light radius is $\sqrt{N_{hl}} σ_{rms}$ $\hbox{$\sqrt{N_{\rm hl}}\,\sigma_{\rm rms}$}$ . The signal-to-noise ratio within the half-light radius is therefore $ν = f_{hl} N_{hl}^{- 1 / 2} σ_{rms}^{-1}$ $\hbox{$\nu=f_{\rm hl}^{}\,N_{\rm hl}^{-1/2}\,\sigma_{\rm rms}^{-1}$}$ , or $σ_{rms} = \frac{f_{hl}}{\sqrt{N_{hl}} ν} \cdot$ $\begin{equation} \sigma_{\rm rms}= \frac{f_{\rm hl}}{\sqrt{N_{\rm hl}}\,\nu}\cdot \end{equation}$ (62)Figure 4 depicts four examples with added noise for different ν.

Fig. 4

Examples of simulated images of galaxies with random ellipticities. Signal-to-noise ratios from left to right and top to bottom: ν = 10, 20, 40, and 200. The radial light profile of the pre-seeing images is exponential with r_h = 0.2′′; the pixel size is 0.1′′. The FWHM of the PSF is indicated by the circle in the top right corners.

To simulate galaxy images with intrinsic ellipticities that are sheared by g, we make a random realisation of an intrinsic ellipticity ϵ_s and compute the pre-seeing ellipticity with (17). As PDF for the intrinsic ellipticities we assume a bivariate Gaussian with a variance of σ_ϵ = 0.3 for each ellipticity component; we truncate the PDF beyond | ϵ_s | ≥ 1.

In realistic applications, centroid positions of galaxy images are random within an image pixel, so we average all our following results for bias over subpixel offsets within the quadratic solid angle of one pixel at the centre of the image. This means: in an sample of mock images, every image has a different subpixel position which is chosen randomly from a uniform distribution. In this averaging process, care has to be taken to perform an even sampling of subpixel offsets. To ensure this with a finite number of random points in all our following tests, we employ a subrandom Sobol sequence in two dimensions for the random centroid offsets (Press et al. 1992).

4.4. Monte-Carlo sampling of ellipticity posterior

For practical applications of a Bayesian GLAM analysis, we produce a Monte Carlo sample of the posterior P_ϵ(ϵ | I), Eq. (45) with the approximate likelihood ℒ(I | p) in Eq. (49). This sample consists of a set (ϵ_i,w_i) of 1 ≤ i ≤ N_real pairs of values ϵ_i and w_i, which determine the sampling position ϵ_i and a sampling weight w_i. For this paper, we use N_real = 50 sampling points. In contrast to a lensing analysis with single-valued point estimators of ellipticity, a Bayesian analysis employs the sample (ϵ_i,w_i) of each galaxy. To attain the sample (ϵ_i,w_i) we invoke the importance sampling technique (e.g. Marshall 1956; Kilbinger et al. 2010).

For this technique, we define an approximation Q(p) of P_p(p | I), the so-called importance distribution function, from which we draw a random sample p_i. The ellipticity component ϵ_i of p_i is then assigned the weight w_i = P_p(p_i | I) Q(p_i)^-1 which we normalise to ∑ _iw_i = 1 afterwards. As importance function, we use a multivariate Gaussian with mean at the maximum p_ml of the likelihood ℒ(I | p) and a covariance defined by the inverse Fisher matrix F^-1 at the maximum (Fisher 1935; Tegmark et al. 1997). More implementation details are given in Appendix B.

If Q(p) is too different from P_p(p | I), the sample will be dominated by few points with large weights, usually due to sampling points in the tail of the posterior for which the importance function predicts a too low probability, this means w_i becomes large. This is indicated by a small effective number $N_{eff} = (\sum_{i} w_{i}^{2})^{-1}$ $\hbox{$N_{\rm eff}=(\sum_iw_i^2)^{-1}$}$ of points compared to N_real. This can produce a poor convergence and thus extreme outliers in the analysis that merely appear to have tight constrains on ϵ. Typically affected by this are images that are both small compared to the pixel size and low in signal-to-noise. These images tend to exhibit posteriors that allow parameter solutions with small sizes t compared to the true t or high eccentricities | ϵ | ≳ 0.7, giving the posterior a complex, distinctly non-Gaussian shape. In future applications, this problem can be addressed by finding a better model of the importance function.

For the scope of this paper, we address this problem at the expense of computation time by increasing the number of sampling points and by resampling. This means: for Monte-Carlo samples with N_eff<N_real/ 2, we draw more samples from the importance distribution until N_eff ≥ N_real/ 2 in the expanded sample. By an additional rule, we stop this process if the expanded sample size becomes too large and reaches 10 × N_real. This case is indicative of a failure of the importance sampling. If this happens, we switch to a time-consuming standard Metropolis algorithm to sample the posterior with 10³ points after 10² preliminary burn-in steps (Metropolis et al. 1953). This technique does not assume a particular shape for the posterior but performs a random walk through the p space, thereby producing a correlated sample along a Monte-Carlo Markov chain. As symmetric proposal distribution for this algorithm, we adopt a multivariate Gaussian with covariance 1.5 × F^-1; all points of the chain have equal weight w_i; the starting point of the chain is p_ml. Finally, in the resampling phase, we bootstrap the expanded or Metropolis sample by randomly drawing N_real points from it with probability w_i (with replacement). All selected data points are given the equal weights $w_{i} = N_{real}^{-1}$ $\hbox{$w_i=N_{\rm real}^{-1}$}$ in the final catalogue.

4.5. Posterior density of reduced shear

For the inference of g, we convert the ellipticity posterior P_ϵ(ϵ | I) into a posterior P_g(g | I) of shear. To this end, let us first determine the P_g(g | ϵ) for an exactly known ϵ. We express our uncertainty on the intrinsic ellipticity ϵ_s by the prior P_s(ϵ_s), and P_g(g) is our a-priori information on g. The values of g and ϵ_s shall be statistically independent, by which we mean that P_sg(ϵ_s,g) = P_s(ϵ_s) P_g(g)². Applying a marginalization over ϵ_s and then Bayes’ rule, we find $\begin{matrix} P_{g} (g | ϵ) = \int d^{2} ϵ_{s} P_{sg} (ϵ_{s},g | ϵ) = \int d^{2} ϵ_{s} \frac{P_{ϵ} (ϵ | g, ϵ_{s}) P_{sg} (ϵ_{s},g)}{𝒩 (ϵ)}, \end{matrix}$ $\begin{eqnarray} P_{\rm g}(g|\epsilon) =\int\d^2\epsilon_{\rm s}\;P_{\rm sg}(\epsilon_{\rm s},g|\epsilon) = \int\d^2\epsilon_{\rm s}\; \frac{P_{\epsilon}(\epsilon|g,\epsilon_{\rm s})\,P_{\rm sg}(\epsilon_{\rm s},g)} {{\cal N}(\epsilon)}, \end{eqnarray}$ (63)or, equivalently, $\begin{matrix} 𝒩 (ϵ) P_{g} (g | ϵ) & = & \int d^{2} ϵ_{s} P_{ϵ} (ϵ | g, ϵ_{s}) P_{g} (g) P_{s} (ϵ_{s}) \\ = & \int d^{2} ϵ_{s} δ_{D} (ϵ - ϵ (g, ϵ_{s})) P_{g} (g) P_{s} (ϵ_{s}) \\ = \end{matrix}$ $\begin{eqnarray} \label{eq:pgstep0} {\cal N}(\epsilon)\,P_{\rm g}(g|\epsilon)&=& \int \d^2\epsilon_{\rm s}\;P_\epsilon(\epsilon|g,\epsilon_{\rm s}) \,P_{\rm g}(g)\,P_{\rm s}(\epsilon_{\rm s})\\ &=&\nonumber \int \d^2\epsilon_{\rm s} \;\delta_{\rm D}\Big(\epsilon-\epsilon(g,\epsilon_{\rm s})\Big) P_{\rm g}(g)\,P_{\rm s}(\epsilon_{\rm s})\\ &=&\label{eq:pgstep1} P_{\rm g}(g) \,P_{\rm s}\Big(\epsilon_{\rm s}(g,\epsilon)\Big) \left|\frac{\d^2\!\epsilon_{\rm s}}{\d^2\!\epsilon}\right|\cdot \end{eqnarray}$ By $\hbox{${\cal N}(\epsilon)$}$ we denote the normalisation of P_g(g | ϵ), $𝒩 (ϵ) : = \int d^{2} g P_{g} (g) P_{s} (ϵ_{s} (g,ϵ)) | \frac{d^{2} ϵ_{s}}{d^{2} ϵ} | \cdot$ $\begin{equation} {\cal N}(\epsilon):= \int\d^2g\;P_{\rm g}(g) \,P_{\rm s}\Big(\epsilon_{\rm s}(g,\epsilon)\Big) \left|\frac{\d^2\!\epsilon_{\rm s}}{\d^2\!\epsilon}\right|\cdot \end{equation}$ (66)The normalisation $\hbox{${\cal N}(\epsilon)$}$ only depends on the modulus of ϵ for isotropic priors for symmetry reasons. The integration over the Dirac delta function δ_D(x) uses Eq. (17). The determinant | d²ϵ_s/ d²ϵ |, the Jacobian of the linear conformal mapping ϵ_s(g,ϵ), is a function of ϵ and g. For the weak lensing regime | g | ≤ 1 of interest, this is $| \frac{d^{2} ϵ_{s}}{d^{2} ϵ} | = \frac{(1 - | g |^{2})^{2}}{| 1 - ϵ g^{*} |^{4}}$ $\begin{equation} \left|\frac{\d^2\!\epsilon_{\rm s}}{\d^2\!\epsilon}\right|= \frac{(1-|g|^2)^2}{|1-\epsilon\,g^\ast|^4} \end{equation}$ (67)(Geiger & Schneider 1998). To now account for the measurement error of ϵ in the shear posterior, we marginalise P_g(g | ϵ), Eq. (65), over ϵ with P_ϵ(ϵ | I) as error distribution, $\begin{matrix} P_{g} (g | I) & = & \int d^{2} ϵ P_{g} (g | ϵ) P_{ϵ} (ϵ | I) \\ = & P_{g} (g) (1 - | g |^{2})^{2} \int d^{2} ϵ \frac{P_{s} (ϵ_{s} (g,ϵ)) P_{ϵ} (ϵ | I)}{𝒩 (ϵ) | 1 - ϵ g^{*} |^{4}} \cdot \end{matrix}$ $\begin{eqnarray} \nonumber P_{\rm g}(g|\vec{I})&=& \int\d^2\epsilon\; P_{\rm g}(g|\epsilon)\,P_\epsilon(\epsilon|\vec{I})\\ \label{eq:gtransform} &=& P_{\rm g}(g)\,(1-|g|^2)^2\, \int\d^2\epsilon\; \frac{P_{\rm s}\Big(\epsilon_{\rm s}(g,\epsilon)\Big) \,P_\epsilon(\epsilon|\vec{I})} {{\cal N}(\epsilon)\,|1-\epsilon\,g^\ast|^4}\cdot \end{eqnarray}$ (68)In our Monte Carlo scheme, we sample the ellipticity posterior of every galaxy i through (ϵ_ij,w_ij) ~ P_ϵ(ϵ | I_i) by 1 ≤ j ≤ N_real values ϵ_ij of ellipticity and statistical weight w_ij. To convert this sample to an approximation of the g posterior, we replace P_ϵ(ϵ | I_i) inside the integral (68) by the point distribution Σ_jw_ijδ_D(ϵ − ϵ_ij), $P_{g} (g | I_{i}) \approx P_{g} (g) (1 - | g |^{2})^{2} \sum_{j = 1}^{N_{real}} \frac{w_{ij} P_{s} (ϵ_{s} (g, ϵ_{ij})^{)}}{𝒩 (ϵ_{ij}) | 1 - ϵ_{ij} g^{*} |^{4}} \cdot$ $\begin{equation} \label{eq:gtransform2} P_{\rm g}(g|\vec{I}_i) \approx P_{\rm g}(g)\,(1-|g|^2)^2\, \sum_{j=1}^{N_{\rm real}} \frac{w_{ij}\,P_{\rm s}\!\left(\epsilon_{\rm s}(g,\epsilon_{ij})\right)} {{\cal N}(\epsilon_{ij})\,|1-\epsilon_{ij}g^\ast|^4}\cdot \end{equation}$ (69)For the following, we adopt a uniform prior for g, which means P_g(g) ∝ H(1 − | g |). For reasons discussed in Sect. 3.3, the prior on the nuisance ϵ_s presumably has to be equal to the distribution of intrinsic shapes for a consistent shear posterior. To investigate the sensitivity of shear bias as to P_s(ϵ_s) we therefore use two types of priors: the correct prior, which is the true intrinsic-shape distribution in Sect. 4.3, or a uniform prior H(1 − | ϵ_s |). For each prior P_s(ϵ_s), we numerically compute $\hbox{${\cal N}(\epsilon)$}$ for different ϵ once and interpolate between them later on in (69). We finally combine the posteriors P_g(g | I_i) of all images I_i in the sample by multiplying posterior values on a g-grid. For this, we apply (69) to every galaxy in the sample independently.

4.6. Results

Fig. 5

Plots of the multiplicative bias m for simulated images, based on Eq. (69), for two different priors P_s(ϵ_s) (filled data points: correct prior; open data points: uniform prior). Different styles for the data points indicate different image sizes r_h, see key inside figure, while colours vary with galaxy type: GLAM template (TMP; red); exponential (EXP; green); bulge-like (DEV; blue). Data points for r_h = 0.3′′, same galaxy type, and same prior are connected by dotted lines to guide the eye. The prior for galaxy sizes t, amplitudes A, and centroids x₀ is uniform giving rise to noise-dependent prior bias; the constant offset of EXP and DEV is due to underfitting. A square image pixel has the size 0.1′′, which also equals the PSF size. Results for m for a larger PSF size can be found in Table 1.

For each experiment, we consider a sample of statistically independent galaxy images $\hat{I} = {I_{1}, ..., I_{n}}$ $\hbox{$\hat{\vec{I}}=\{\vec{I}_1,\ldots,\vec{I}_n\}$}$ that are subject to the same reduced shear g but have random intrinsic ellipticities ϵ_s. To infer g from $\hat{I}$ $\hbox{$\hat{\vec{I}}$}$ in a fully Bayesian manner, we compute the posteriors P_g(g | I_i) of g for every image I_i separately with Eq. (69) and then combine all posteriors by the product $P_{g} (g | \hat{I}) \propto 􏽙_{i = 1}^{n} P_{g} (g | I_{i}) .$ $\begin{equation} \label{eq:combinedPg} P_{\rm g}(g|\hat{\vec{I}})\propto \prod_{i=1}^nP_{\rm g}(g|\vec{I}_i). \end{equation}$ (70)This test is related, but not identical, to the ring test in Nakajima & Bernstein (2007): we do not fix the ellipticity magnitude | ϵ_s | during one test run but instead use a random sample of intrinsic ellipticities. Each galaxy sample consists of n = 5 × 10⁴ simulated noisy galaxy images to typically attain a precision for g of the order 10^-3. This number applies for the non-uniform prior P_s(ϵ_s). The precision is lower for the less informative uniform prior. Moreover, for each galaxy with intrinsic ϵ_s we also use − ϵ_s in another image belonging to the sample. By doing so the mean of intrinsic ellipticities in an sample is always zero, which substantially reduces the numerical noise in the result. For every set of image parameters, we vary the input shear between the three values g_true ∈ { 0, ± 0.05, ± 0.1 }. For these five measurements, let ⟨ g | g_true ⟩ be the mean of the posterior (70) for a given g_true of the sample. We then determine the multiplicative bias m and the additive bias c in $⟨ g | g_{true} ⟩ = (1 + m) g_{true} + c$ $\begin{equation} \label{eq:mcbias} \ave{g|g_{\rm true}}= (1+m)\,g_{\rm true}+c \end{equation}$ (71)by best-fitting m and c to the mean and dispersion of (70) for the three input values of g_true; m shall be a real-valued number.

In Fig. 5, we plot m (filled data points) juxtaposed with the corresponding values based on a uniform prior (open data points); same data points indicate the same size r_h while colours, varying by row, indicate the light profiles of the images. For the numbers in the table and the data points in the figure, error bars quantify a 68% posterior interval of the measurement around the quoted mean. The size of error bars account for pixel noise and shape noise. For the range of ν studied here, the impact of pixel noise is subdominant: the errors increase only between 10% and 20% from ν = 200 to ν = 10; the increase is larger for small images. Values for the additive bias c (not shown) for all profiles are small within a few 10^-4, except at ν = 10 for DEV with r_h = 0.15′′,0.2′′ and for TMP at 0.15′′: there c can reach negative amplitudes of the order of 10^-3. This indicates that an additive bias could be relevant for image sizes close to the pixel size.

Table 1

Multiplicative bias m in per cent for three profiles TMP, EXP, DEV, and sizes r_h = 0.2′′ only.

In contrast, a multiplicative bias m is typically significant. At high S/N ν = 200, this bias in Fig. 5 is mainly underfitting bias which is consequently absent for TMP galaxies since their likelihood is correctly specified. The shear of bulge-like galaxies DEV, with steeper slope compared to f(ρ), is underestimated by m = − 9.6 ± 0.5%; the shear of exponential galaxies EXP, which have a shallower slope, is overestimated by m = + 7.7 ± 0.5%. These numbers are the mean and standard error for all galaxy sizes at ν = 200; they depend on the specific PSF and pixel sizes chosen for this experiment.

Compared to the underfitting bias at ν = 200, the bias m significantly drops if ν ≲ 20 for all galaxies, most prominently at r_h = 0.15′′, but stays roughly constant within a couple of per cent otherwise. We attribute the noise-dependence of m to prior bias owing to our choice of an (incorrect) uniform prior for the nuisance parameters q of the images (see Sect. 3.3). The prior bias thus becomes only relevant here at sufficiently low S/N. The particular value of S/N is specific to the PSF size. This can be seen in Table 1 which lists estimates for m for galaxies, in a slightly smaller sample n = 2 × 10⁴, with size r_h = 0.2′′ but for a modified, roughly twice as large PSF size 0.22′′. Now the noise-dependence of prior bias is already visible below ν ≲ 40. On the other hand, the underfitting bias, to be read off at high S/N ν = 200, is roughly halved since the images are spread out over more pixels.

The choice of a uniform prior for the intrinsic ellipticities, distinctively different from the true distribution of ϵ_s in the sample, has a negligible effect on shear bias here for both PSF sizes. Values of m between the two priors P_s(ϵ_s) are consistent, as found by comparing filled and open data points of same colour and at same ν in Fig. 5. Nevertheless, the resulting statistical errors of shear, as opposed to its bias, are larger by roughly the factor 1.7 with the uniform prior, which reflects the larger a-priori uncertainty on ϵ_s. This implies that a shear bias due to an incorrect prior P_s(ϵ_s) is negligible in this experiment.

5. Discussion

Unlike galaxy ellipticities defined in terms of unweighted moments, GLAM provide a practical definition of ellipticity with useful properties for any chosen adaptive weight f′(ρ): they (i) are identical with the ellipticity of isophotes of elliptical galaxy images; (ii) under the influence of shear they behave exactly like ϵ defined with unweighted moments; (iii) they are unbiased estimators of reduced shear; (i)−(iii) assume ideal conditions without pixellation, PSF convolution, or pixel noise. These effects fundamentally limit our ability to determine the GLAM ellipticity (see below). Under ideal conditions, see SS97, it is known that ϵ for unweighted moments already obeys (iii). However, unweighted moments are formally infinite for realistic galaxy images because of pixel noise so that weighting is a necessity which is done adaptively for GLAM . For adaptive weights, we have shown (i) and (ii) in Sects. 2.2 and 2.3. Their relevance as unbiased estimators for reduced shear, statement (iii), follows thereafter from (ii) and the conclusions in SS97 for unweighted moments. We emphasise that GLAM do not require “deweighting” in a lensing analysis (Melchior et al. 2011): the ellipticity in term of the weighted moments is actually unbiased. In particular, we do not have to devise the same weight function for all galaxies as any weight function equally obeys (ii). If the minimum of the functional (3) uniquely exists, the moments (x₀,α M_I) of the best-fitting template f(ρ) will, for a constant scalar α, be the adaptively weighted moments (x₀,M_I) of the galaxy light profile I(x) for the weight f′(ρ) (Appendix A). This means that the radial profile of the weight at separation r from the centroid is w(r) ∝ r^-1df_r(r) / dr for a template profile f_r(r): = f(r²). We exploited this relation between adaptive moments and moments of a best-fitting elliptical profile to analyse limits to adaptive moment-based methods under the loss of image information.

Namely, for realistic images subject to pixellation and PSF effects, underfitting bias can be present for GLAM if estimated from the post-seeing image. It potentially arises because of a loss of information on the pre-seeing galaxy image, fundamentally due to pixellation. To explore the limitations, we assume noise-free images and express in Sect. 2.4 the mapping between a pre-seeing and post-seeing image by the linear operation L. The pre-seeing adaptive moments are equivalent to a least-square fit with residual R_pre of an elliptical profile f(ρ) to the pre-seeing image I(x). If estimated from the post-seeing image whilst ignoring residuals, the inferred ellipticity is biased for LR_pre ≠ 0and a singular L. The latter indicates a loss of information on the pre-seeing image. For noise-free images and the estimator (36) the bias is, to linear order, given by Eq. (43). In (36) the bias is optimally reduced through the metric U = (LL^T)⁺. With this metric the estimator is unbiased for regular L, hence in principle also for an invertible convolution with an anisotropic PSF; invertible convolutions have kernels K(x) with $\begin{matrix} _{˜} \\ K \end{matrix} (ℓ) \neq 0$ $\hbox{$\tilde{K}(\vec{\ell})\ne0$}$ in the Fourier domain. Practically, however, a singular mapping is always present due to pixellation so that the quadratic estimator (36) could never fully remove underfitting bias.

The inference of GLAM ellipticity from noisy images with a likelihood that ignores fitting residuals produces underfitting bias; the underfitting bias depends on the correlation of pixel noise and its homogeneity over the image. To deal with pixel noise in images, we have put GLAM in a Bayesian setting in Sect. 3 by employing an approximate model (49) for the likelihood that ignores fit residuals and thus equals an imperfect model fit with an elliptical profile. We use this to explore the impact of a misspecified likelihood that, as a result, can be shown to be subject to underfitting bias. The bias is revealed in the limit of combining the likelihoods of n → ∞ independent exposures of the same image (see Sect. 3.2). To lowest order in R_pre, we find the bias to vanish if L^TN^-1L is proportional to the unit matrix. The underfitting bias is unchanged for a rescaled noise covariance: there hence is no noise bias from a misspecified likelihood. Additionally, the magnitude of bias depends on the details of N and can also emerge for L = 1 if N is not proportional to the unit matrix, which is the case for heterogeneous or correlated noise. Due to the close relation of GLAM ellipticity to other definitions of ellipticity we expect similar dependencies of bias on the noise covariance there, which may impact future calibration strategies in lensing applications because they apparently should not ignore cases of correlated or heterogeneous noise. We note that several steps in the reduction of lensing data can produce correlated pixel noise.

The underfitting bias from the likelihood can presumably be addressed at the expense of an increased statistical error, such as by means of a follow-up high-resolution survey that gathers statistical information on the residuals R_pre. First of all, one obvious way to reduce underfitting bias is to choose a template that provides a good fit to galaxy images to reduce the residuals. The optimal template has the profile f_r(r) ∝ S_r(r) for a pre-seeing galaxy profile S_r(r); this also optimises the signal-to-noise ratio of the ellipticity (Bernstein & Jarvis 2002). Even so, realistic galaxies can never be perfectly fit by an elliptical model; there are always fitting residuals R_pre. To deal with these residuals, we can imagine a more elaborate Bayesian scheme where R_pre are additional nuisance parameters that we marginalise over by using an empirical distribution P(R_pre | p) of R_pre given p (prior), that is $ℒ (I | p) = \int d R_{pre} ℒ (I | p, R_{pre}) P (R_{pre} | p),$ $\begin{equation} {\cal L}(\vec{I}|\vec{p})= \int\d\vec{R}_{\rm pre}\;{\cal L}(\vec{I}|\vec{p},\vec{R}_{\rm pre})\,P(\vec{R}_{\rm pre}|\vec{p}), \end{equation}$ (72)where ℒ(I | p,R_pre) is the likelihood of a post-seeing image, correctly specified by the post-seeing model m(p) = L(Af_ρ + R_pre) and the noise distribution. The marginalization would then increase the statistical uncertainty in the posterior P_ϵ(ϵ | I) in Eq. (45). It is conceivable to measure the prior in a dedicated survey by assessing the distribution of residuals in template fits to high resolution, high S/N images. This approach would be similar to that of BA14 where a (correct) prior on galaxy-specific descriptors is needed. In comparison to BA14, it is noteworthy here that the residuals R_pre in the pre-seeing frame are those of sheared images for which we infer ϵ; no assumptions on the unsheared, source-plane images have to be made for this prior. All this, however, still has to be tested, and it is not clear how this can be properly implemented and if this marginalization is not the source of another bias.

A consistent likelihood of single images is not sufficient for a lensing analysis as new inconsistencies for ϵ or g can arise in samples of intrinsically different sources through incorrect priors for intrinsic source parameters, such as sizes or centroid positions (prior bias). Take for example our TMP galaxies in Sect. 4 which have a correctly specified likelihood by definition; they are free of underfitting bias. For the two experiments presented in Figs. 3 and 5, the source sample consists of pre-seeing images with an unknown distribution of intrinsic parameters. The first experiment considers samples with constant ellipticity, the second samples with constant reduced shear. Despite the absent underfitting, noise bias now emerges in both cases if we apply our uniform prior for the nuisance parameters in the marginal posterior of either ellipticity or reduced shear; see TMP data points at ν ≲ 20. We argue in Sect. 3.3 that the emerging noise bias is related to the specific choice of priors and can only be avoided by a special family of correct priors related to the maximum of the posterior of noise-free images. According to previous work, correct priors are the distributions of nuisance parameters of the sources in the sample which have to be defined externally (e.g. BA14, Miller et al. 2013; Kitching et al. 2008). A conceivable alternative to external priors, potentially worthwhile investigating in future studies, might be to specify the priors down to a family of distributions, so-called hyperpriors, with a finite, preferably small number of hyperparameters (Gelman et al. 2013). A hierarchical approach would then jointly determine the posterior density of ϵ or g and the hyperparameters from all I_i alone in a fully Bayesian way. Marginalising over the hyperparameters accounts for the uncertainty in the priors. We note that the incorrect uniform prior for ϵ_s in Fig. 5 has no significant impact on the shear bias (compare open to filled data points). Clearly, the importance of a correct prior depends on the type of nuisance parameter.

That prior bias in Fig. 5 or Table 1 becomes relevant only at low S/N can be understood qualitatively. For a sufficiently high S/N, the mass of an (identifiable) likelihood ℒ(I_post | p) concentrates around p_post such that the prior P_p(p) details have no strong impact on the marginal posterior of one source. We also expect for high S/N the likelihood to be well approximated by a normal Gaussian with maximum p_post such that the marginalization over q = (A,t,x₀) approximately yields a Gaussian posterior P_ϵ(ϵ | I_post) with maximum near ϵ_post, which is the true pre-seeing ellipticity for a correctly specified likelihood. We note that the marginalization over q of a multivariate Gaussian with maximum at p₀ = (ϵ₀,q₀) produces another Gaussian density with maximum at ϵ₀. In addition, should the likelihood be misspecified, ϵ_post will be prone to underfitting bias which is clearly visible by the offsets in m for EXP and DEV in Fig. 5.

Finally, we point out that while marginal posteriors P_ϵ(ϵ | I_i), obtained on an image-by-image basis, are sufficient to infer a constant ellipticity (or shear) it is not obvious how to incoorporate them into a fully Bayesian analysis of sources with varying shear. For example, in a fully Bayesian inference of ellipticity correlations $ξ_{ϑ} = ⟨ ϵ_{i} ϵ_{j}^{*} ⟩$ $\hbox{$\xi_\vartheta=\ave{\epsilon_i^{}\epsilon_j^\ast}$}$ for images ij at separation ϑ, we would compute the likelihood $ℒ (\hat{I} | ξ_{ϑ})$ $\hbox{${\cal L}(\hat{\vec{I}}|\xi_\vartheta)$}$ for given values of ξ_ϑ and a range of ϑ, which principally requires the repeated shape measurements for the entire sample of images $\hat{I}$ $\hbox{$\hat{\vec{I}}$}$ for every new ξ_ϑ. This poses a practical, computational problem. Proposed solutions to this problem leave the path of a fully Bayesian analysis and commonly use unbiased point estimators of ϵ_i on a image-by-image basis as input for a statistical analysis (e.g. Miller et al. 2013). This technique has essentially been applied to all lensing analyses so far, but requires a calibration of the ellipticity estimates, especially for the noise bias. The recently proposed hierarchical Bayesian technique by Alsing et al. (2016), applied to data in Alsing et al. (2017), is closest to a fully Bayesian analysis in the above sense but also uses as input unbiased estimators of source ellipticities with a multivariate Gaussian probability density for the statistical errors of the estimates. Yet, it is surely conceivable to extend this technique by directly modelling the likelihood of source images I_i. To reverse-propagate our P_ϵ(ϵ | I_i) to a posterior of ξ_ϑ, we might be tempted here to Monte-Carlo a posterior PDF of ξ_ϑ by independently drawing ellipticities ϵ_im ~ P_ϵ(ϵ | I_i) from the marginal posteriors to produce realisations of a joint vector ϵ_m = (ϵ_1m,...,ϵ_nm), which could be used with an estimator of ξ_ϑ. Unfortunately, this falsely assumes a separable joint posterior $P_{ϵ} (ϵ | \hat{I}) = P_{ϵ} (ϵ_{1} | I_{1}) \times ... \times P_{ϵ} (ϵ_{n} | I_{n})$ $\hbox{$P_\epsilon(\vec{\epsilon}|\hat{\vec{I}})= P_\epsilon(\epsilon_1|\vec{I}_1)\times \ldots\times P_\epsilon(\epsilon_n|\vec{I}_n)$}$ with statistically independent ellipticities, potentially biasing (low) constraints on ξ_ϑ. As alternative we speculate about the following hierarchical Bayesian scheme. We assume a parametric model $P_{ϵ} (ϵ | \hat{I}, θ)$ $\hbox{$P_\epsilon(\vec{\epsilon}|\hat{\vec{I}},\vec{\theta})$}$ for the joint posterior density of ellipticities that is determined by (i) the measured marginal ellipticities P_ϵ(ϵ | I_i) and (ii) a set of unknown hyperparameters θ = (θ₁,θ₂,...) that quantify the correlations between the source ellipticities. A convenient choice in this respect seem to be copula models that are specified exactly that way (e.g. Sect. 2 in Takeuchi 2010). We further express our ignorance about θ_i by a (sufficiently uninformative) hyperprior P_θ(θ). We then produce realisations ϵ_m by first randomly drawing θ_m ~ P_θ(θ) followed by $ϵ_{m} ~ P_{ϵ} (ϵ | \hat{I}, θ_{m})$ $\hbox{$\vec{\epsilon}_{\rm m}\sim P_\epsilon(\vec{\epsilon}|\hat{\vec{I}},\vec{\theta}_m)$}$ . A sample { ϵ₁,ϵ₂,... } of joint ellipticities is then utilised to propagate uncertainties on source ellipticities and their mutual correlations through the lensing analysis, based on an estimator for ξ_ϑ given a joint sample ϵ. We note that in the limit ν → ∞ the hyperprior for θ becomes irrelevant because all marginal posteriors P_ϵ(ϵ | I_i) will be sharply peaked at the true ellipticity (if correctly specified).

¹

Incidentally, the bias δp in the likelihood and the underfitting bias in the noise-free case are equal for N^-1 ∝ (LL^T)⁺, which for regular L is equivalent to homogeneous, uncorrelated noise in the pre-seeing frame, which means we have L^TN^-1L ∝ 1.

²

This assumption would be false if shear and intrinsic ellipticity were correlated, for instance due to selection effects related to the shape of a sheared image (Hirata & Seljak 2003; Heymans et al. 2006). Thus correlations between intrinsic shapes and shear may affect Bayesian methodologies already at this level.

Acknowledgments

The authors are particularly grateful to Gary Bernstein who read and commented on an earlier manuscript of this paper. The comments lead to a substantial improvement. We would also like to thank Tim Schrabback for helpful discussions and comments, and our colleague Reiko Nakajima for organising a weekly seminar on shape measurements in our institute that triggered the research presented in this paper. We also acknowledge the comments by the referee of this paper that helped to clarify and revise certain points, in particular the section on the prior bias. This work has been supported by the Deutsche Forschungsgemeinschaft in the framework of the Collaborative Research Center TR33 “The Dark Universe”. Patrick Simon also acknowledges support from the German Federal Ministry for Economic Affairs and Energy (BMWi) provided via DLR under project No. 50QE1103.

References

Aitken, A. 1934, On least squares and linear combination of observations, 55, 42 [Google Scholar]
Alsing, J., Heavens, A., Jaffe, A. H., et al. 2016, MNRAS, 455, 4452 [NASA ADS] [CrossRef] [Google Scholar]
Alsing, J., Heavens, A., & Jaffe, A. H. 2017, MNRAS, 466, 3272 [NASA ADS] [CrossRef] [Google Scholar]
Bartelmann, M., & Schneider, P. 2001, Phys. Rep., 340, 291 [NASA ADS] [CrossRef] [Google Scholar]
Bernstein, G. M. 2010, MNRAS, 406, 2793 [NASA ADS] [CrossRef] [Google Scholar]
Bernstein, G. M., & Armstrong, R. 2014, MNRAS, 438, 1880 [NASA ADS] [CrossRef] [Google Scholar]
Bernstein, G. M., & Jarvis, M. 2002, AJ, 123, 583 [NASA ADS] [CrossRef] [Google Scholar]
Bernstein, G. M., Armstrong, R., Krawiec, C., & March, M. C. 2016, MNRAS, 459, 4467 [NASA ADS] [CrossRef] [Google Scholar]
Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bezdek, J., Hathaway, R., Howard, R., Wilson, C., & Windham, M. 1987, J. Optim. Theory Appl., 54, 471 [CrossRef] [Google Scholar]
Bridle, S., Balan, S. T., Bethge, M., et al. 2010, MNRAS, 405, 2044 [NASA ADS] [Google Scholar]
Capaccioli, M. 1989, in World of Galaxies, eds. H. G. Corwin, Jr., & L. Bottinelli, 208 [Google Scholar]
Dawson, W. A., Schneider, M. D., Tyson, J. A., & Jee, M. J. 2016, ApJ, 816, 11 [NASA ADS] [CrossRef] [Google Scholar]
Fisher, R. A. 1935, J. Roy. Stat. Soc., 98, 39 [CrossRef] [Google Scholar]
Frigo, M., & Johnson, S. G. 2005, Proc. IEEE, special issue on Program Generation, Optimization, and Platform Adaptation, 93, 216 [Google Scholar]
Geiger, B., & Schneider, P. 1998, MNRAS, 295, 497 [NASA ADS] [CrossRef] [Google Scholar]
Gelman, A., Carlin, J., Stern, H., et al. 2013, Bayesian Data Analysis, third edn., Chapman & Hall/CRC Texts in Statistical Science (Taylor & Francis) [Google Scholar]
Gruen, D., Bernstein, G. M., Jarvis, M., et al. 2015, Journal of Instrumentation, 10, C05032 [CrossRef] [Google Scholar]
Hartlap, J., Hilbert, S., Schneider, P., & Hildebrandt, H. 2011, A&A, 528, A51 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Heymans, C., Van Waerbeke, L., Bacon, D., et al. 2006, MNRAS, 368, 1323 [NASA ADS] [CrossRef] [Google Scholar]
Hirata, C., & Seljak, U. 2003, MNRAS, 343, 459 [NASA ADS] [CrossRef] [Google Scholar]
Hirata, C. M., Mandelbaum, R., Seljak, U., et al. 2004, MNRAS, 353, 529 [NASA ADS] [CrossRef] [Google Scholar]
Hoekstra, H., & Jain, B. 2008, Ann. Rev. Nucl. Part. Sci., 58, 99 [Google Scholar]
Hoekstra, H., Herbonnet, R., Muzzin, A., et al. 2015, MNRAS, 449, 685 [NASA ADS] [CrossRef] [Google Scholar]
Kacprzak, T., Zuntz, J., Rowe, B., et al. 2012, MNRAS, 427, 2711 [NASA ADS] [CrossRef] [Google Scholar]
Kaiser, N., Squires, G., & Broadhurst, T. 1995, ApJ, 449, 460 [NASA ADS] [CrossRef] [Google Scholar]
Kilbinger, M. 2015, Rep. Prog. Phys., 78, 086901 [Google Scholar]
Kilbinger, M., Wraith, D., Robert, C. P., et al. 2010, MNRAS, 405, 2381 [NASA ADS] [Google Scholar]
Kitching, T. D., Miller, L., Heymans, C. E., van Waerbeke, L., & Heavens, A. F. 2008, MNRAS, 390, 149 [NASA ADS] [CrossRef] [Google Scholar]
Kitching, T. D., Balan, S. T., Bridle, S., et al. 2012, MNRAS, 423, 3163 [NASA ADS] [CrossRef] [Google Scholar]
Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]
Lewis, A. 2009, MNRAS, 398, 471 [NASA ADS] [CrossRef] [Google Scholar]
MacKay, J. D. 2003, Information Theory, Inference, and Learning Algorithms (Cambridge, UK: Cambridge University Press) [Google Scholar]
Mandelbaum, R., Hirata, C. M., Seljak, U., et al. 2005, MNRAS, 361, 1287 [NASA ADS] [CrossRef] [Google Scholar]
Marquardt, D. 1963, J. Soc. Ind. Appl. Mathem., 11, 431 [Google Scholar]
Marshall, A. 1956, in Symp. on Monte Carlo methods (Wiley), ed. M. Meyer, 123 [Google Scholar]
Massey, R., Heymans, C., Bergé, J., et al. 2007, MNRAS, 376, 13 [NASA ADS] [CrossRef] [Google Scholar]
Massey, R., Kitching, T., & Richard, J. 2010, Rep. Prog. Phys., 73, 086901 [Google Scholar]
Massey, R., Hoekstra, H., Kitching, T., et al. 2013, MNRAS, 429, 661 [Google Scholar]
Melchior, P., & Viola, M. 2012, MNRAS, 424, 2757 [NASA ADS] [CrossRef] [Google Scholar]
Melchior, P., Böhnert, A., Lombardi, M., & Bartelmann, M. 2010, A&A, 510, A75 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Melchior, P., Viola, M., Schäfer, B. M., & Bartelmann, M. 2011, MNRAS, 412, 1552 [NASA ADS] [CrossRef] [Google Scholar]
Melchior, P., Suchyta, E., Huff, E., et al. 2015, MNRAS, 449, 2219 [NASA ADS] [CrossRef] [Google Scholar]
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. 1953, J. Chem. Phys., 21, 1087 [NASA ADS] [CrossRef] [Google Scholar]
Miller, L., Kitching, T. D., Heymans, C., Heavens, A. F., & van Waerbeke, L. 2007, MNRAS, 382, 315 [NASA ADS] [CrossRef] [Google Scholar]
Miller, L., Heymans, C., Kitching, T. D., et al. 2013, MNRAS, 429, 2858 [Google Scholar]
Moffat, A. F. J. 1969, A&A, 3, 455 [NASA ADS] [Google Scholar]
Munshi, D., Valageas, P., van Waerbeke, L., & Heavens, A. 2008, Phys. Rep., 462, 67 [NASA ADS] [CrossRef] [Google Scholar]
Nakajima, R., & Bernstein, G. 2007, AJ, 133, 1763 [NASA ADS] [CrossRef] [Google Scholar]
Niemi, S.-M., Cropper, M., Szafraniec, M., & Kitching, T. 2015, Exp. Astron., 39, 207 [NASA ADS] [CrossRef] [Google Scholar]
Plazas, A. A., Bernstein, G. M., & Sheldon, E. S. 2014, Journal of Instrumentation, 9, C4001 [CrossRef] [Google Scholar]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical recipes in C. The art of scientific computing 2nd edn (Cambridge University Press) [Google Scholar]
Refregier, A., Kacprzak, T., Amara, A., Bridle, S., & Rowe, B. 2012, MNRAS, 425, 1951 [NASA ADS] [CrossRef] [Google Scholar]
Schneider, P. 2006, in Saas-Fee Advanced Course 33: Gravitational Lensing: Strong, Weak and Micro, eds. G. Meylan, P. Jetzer, P. North, P. Schneider, C. S. Kochanek, & J. Wambsganss, 269 [Google Scholar]
Seitz, C., & Schneider, P. 1997, A&A, 318, 687 [NASA ADS] [Google Scholar]
Semboloni, E., Hoekstra, H., Huang, Z., et al. 2013, MNRAS, 432, 2385 [NASA ADS] [CrossRef] [Google Scholar]
Sheldon, E. S. 2014, MNRAS, 444, L25 [NASA ADS] [CrossRef] [Google Scholar]
Takeuchi, T. T. 2010, MNRAS, 406, 1830 [NASA ADS] [Google Scholar]
Tegmark, M., Taylor, A. N., & Heavens, A. F. 1997, ApJ, 480, 22 [NASA ADS] [CrossRef] [Google Scholar]
van der Vaart, A. W. 1998, Asymptotic statistics, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press) [Google Scholar]
Viola, M., Melchior, P., & Bartelmann, M. 2011, MNRAS, 410, 2156 [NASA ADS] [CrossRef] [Google Scholar]
Viola, M., Kitching, T. D., & Joachimi, B. 2014, MNRAS, 439, 1909 [NASA ADS] [CrossRef] [Google Scholar]
Voigt, L. M., & Bridle, S. L. 2010, MNRAS, 404, 458 [NASA ADS] [Google Scholar]
Zuntz, J., Kacprzak, T., Voigt, L., et al. 2013, MNRAS, 434, 1604 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Least-square conditions

In this section, we derive general characteristics of GLAM parameters p = (A,x₀,M) at a local minimum of the error functional E(p | I), Eq. (3). For convenience, we introduce the bi-linear scalar product $⟨ g (x),h (x) ⟩ : = \frac{1}{2} \int d^{2} xg (x) h (x)$ $\appendix \setcounter{section}{1} \begin{equation} \label{eq:biform} \ave{g(\vec{x}),h(\vec{x})}:= \frac{1}{2}\int\d^2x\;g(\vec{x})h(\vec{x}) \end{equation}$ (A.1)to write the functional as $E (p | I) = ⟨ I (x) - Af (ρ),I (x) - Af (ρ) ⟩ .$ $\appendix \setcounter{section}{1} \begin{equation} \label{eq:chi2} E(\vec{p}|I)= \Ave{I(\vec{x})-Af(\rho),I(\vec{x})-Af(\rho)}. \end{equation}$ (A.2)Notice that ρ: = (x − x₀)^TM^-1(x − x₀) is a function of p. The scalar product has the properties $\begin{matrix} ⟨ g (x),h (x) ⟩ = ⟨ h (x),g (x) ⟩, \\ ⟨ a (p) g (x) + b (p) f (x),h (x) ⟩ = a (p) ⟨ g (x),h (x) ⟩ + b (p) ⟨ f (x),h (x) ⟩, \\ ⟨ a (p) g (x),h (x) ⟩ = ⟨ g (x),a (p) h (x) ⟩, \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} \nonumber & &\ave{g(\vec{x}),h(\vec{x})} = \ave{h(\vec{x}),g(\vec{x})} , \\ \nonumber && \ave{a(\vec{p})\,g(\vec{x})\!+\!b(\vec{p}) f(\vec{x}),h(\vec{x})} \!= \! a(\vec{p})\ave{g(\vec{x}),h(\vec{x})}\!+\!b(\vec{p})\ave{f(\vec{x}),h(\vec{x})} , \\ \nonumber && \ave{a(\vec{p})g(\vec{x}),h(\vec{x})} = \ave{g(\vec{x}),a(\vec{p})h(\vec{x})}, \end{eqnarray}$ of which we make heavily use in the following.

Starting from here, we find as necessary condition for the least-square solution of any parameter p_i: $\begin{matrix} \frac{∂E (p | I)}{\partial p_{i}} = 0 \\ = ⟨ - \frac{\partial [Af (ρ)]}{\partial p_{i}},I (x) - Af (ρ) ⟩ + ⟨ I (x) - Af (ρ), - \frac{\partial [Af (ρ)]}{\partial p_{i}} ⟩ \\ = - 2 ⟨ \frac{\partial [Af (ρ)]}{\partial p_{i}},I (x) - Af (ρ) ⟩ \\ = - 2 ⟨ \frac{\partial [Af (ρ)]}{\partial p_{i}},I (x) ⟩ + 2 A ⟨ \frac{\partial [Af (ρ)]}{\partial p_{i}},f (ρ) ⟩, \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} \lefteqn{\frac{\partial E(\vec{p}|I)}{\partial p_i}=0}\\ &&\nonumber =\Ave{-\frac{\partial[Af(\rho)]}{\partial p_i},I(\vec{x})-Af(\rho)}+ \Ave{I(\vec{x})-Af(\rho),-\frac{\partial[Af(\rho)]}{\partial p_i}}\\ &&\nonumber =-2\Ave{\frac{\partial[Af(\rho)]}{\partial p_i},I(\vec{x})-Af(\rho)}\\ &&\nonumber =-2\Ave{\frac{\partial[Af(\rho)]}{\partial p_i},I(\vec{x})}+ 2A\Ave{\frac{\partial[Af(\rho)]}{\partial p_i},f(\rho)}, \end{eqnarray}$ (A.3)or simply $⟨ \frac{\partial [Af (ρ)]}{\partial p_{i}}, A^{-1} I (x) ⟩ = ⟨ \frac{\partial [Af (ρ)]}{\partial p_{i}},f (ρ) ⟩$ $\appendix \setcounter{section}{1} \begin{equation} \label{eq:mlcondition} \Ave{\frac{\partial[Af(\rho)]}{\partial p_i},A^{-1}I(\vec{x})}= \Ave{\frac{\partial[Af(\rho)]}{\partial p_i},f(\rho)} \end{equation}$ (A.4)that is fulfilled for all p_i.

Appendix A.1: Template normalisation

From (A.4) follows for the normalisation p_i = A at the minimum of E(p | I) that $A = \frac{⟨ f (ρ),I (x) ⟩}{⟨ f (ρ),f (ρ) ⟩} \cdot$ $\appendix \setcounter{section}{1} \begin{equation} \label{eq:lsa} A= \frac{\ave{f(\rho),I(\vec{x})}} {\ave{f(\rho),f(\rho)}} \cdot \end{equation}$ (A.5)

Appendix A.2: Image centroid position

For the least-square centroid position x_0,i with i = 1,2 at the minimum of E(p | I), we find ∂ [ Af(ρ) ] /∂x_0,i = Af′(ρ) ∂ρ/∂x_0,i using f′(ρ): = df(ρ) / dρ. This can be expanded further by considering $\begin{matrix} \frac{∂ρ}{\partial x_{0,i}} & = & \frac{\partial}{\partial x_{0,i}} \sum_{k,l} (x_{k} - x_{0,k}) [M^{-1}]_{kl} (x_{l} - x_{0,l}) \\ = & - \sum_{k,l} (δ_{ki}^{K} [M^{-1}]_{il} (x_{l} - x_{0,l}) + δ_{li}^{K} [M^{-1}]_{ki} (x_{k} - x_{0,k})^{)} \\ = & - 2 \sum_{k} [M^{-1}]_{ki} (x_{k} - x_{0,k}) = - 2 {[M^{-1} (x - x_{0})^{]}}_{i} . \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} \label{eq:rhox0} \frac{\partial\rho}{\partial x_{0,i}}&=& \frac{\partial}{\partial x_{0,i}} \sum_{k,l} (x_k-x_{0,k})[\mat{M}^{-1}]_{kl}(x_l-x_{0,l})\\ \nonumber &=& -\sum_{k,l} \left( \delta^{\rm K}_{ki}[\mat{M}^{-1}]_{il}(x_l-x_{0,l})+ \delta^{\rm K}_{li}[\mat{M}^{-1}]_{ki}(x_k-x_{0,k}) \right)\\ \nonumber &=& -2\sum_k[\mat{M}^{-1}]_{ki}(x_k-x_{0,k}) =-2\left[\mat{M}^{-1}(\vec{x}-\vec{x}_0)\right]_i. \end{eqnarray}$ (A.6)Here we have exploited the symmetry [ M^-1 ] _ij = [ M^-1 ] _ji and introduced the Kronecker symbol $δ_{ij}^{K}$ $\hbox{$\delta^{\rm K}_{ij}$}$ . Therefore we find for the ith component of the centroid position $⟨ A f^{'} (ρ) [M^{-1} (x - x_{0})]_{i},f (ρ) - A^{-1} I (x)^{⟩} = 0$ $\appendix \setcounter{section}{1} \begin{equation} \Ave{Af^\prime(\rho)[\mat{M}^{-1}(\vec{x}-\vec{x}_0)]_i,f(\rho)-A^{-1}I(\vec{x})}=0 \end{equation}$ (A.7)or, equally, for both centroid components combined $A M^{-1} ⟨ f^{'} (ρ) (x - x_{0}),f (ρ) ⟩ = M^{-1} ⟨ f^{'} (ρ) (x - x_{0}),I (x) ⟩ .$ $\appendix \setcounter{section}{1} \begin{equation} A\mat{M}^{-1}\ave{f^\prime(\rho)(\vec{x}-\vec{x}_0),f(\rho)}= \mat{M}^{-1}\ave{f^\prime(\rho)(\vec{x}-\vec{x}_0),I(\vec{x})}. \end{equation}$ (A.8)The scalar product on the left-hand-side has to vanish due to symmetry reasons. Therefore using the right-hand-side, we find at the minimum $x_{0} = \frac{⟨ f^{'} (ρ) x,I (x) ⟩}{⟨ f^{'} (ρ),I (x) ⟩} \cdot$ $\appendix \setcounter{section}{1} \begin{equation} \label{eq:lsx0} \vec{x}_0= \frac{\ave{f^\prime(\rho)\vec{x},I(\vec{x})}} {\ave{f^\prime(\rho),I(\vec{x})}}\cdot \end{equation}$ (A.9)This shows that the centroid x₀ at the least-square point is the first-order moment x of the image I(x) weighted with the kernel f′(ρ). The weight is thus the derivative f′(ρ) not f(ρ).

Appendix A.3: Image second-order moments

For the components of M, we consider the least-square values of the inverse W_ij = [ M^-1 ] _ij, hence values W_ij for which ∂ [ Af(ρ) ] /∂W_ij = Af′(ρ) ∂ρ/∂W_ij vanishes. The function ρ can be expressed as $ρ = tr ((x - x_{0})^{T} W (x - x_{0})) = tr (W X^{)},$ $\appendix \setcounter{section}{1} \begin{equation} \rho= {\rm tr}\big((\vec{x}-\vec{x}_0)^{\rm T}\mat{W} (\vec{x}-\vec{x}_0)\big)= {\rm tr}\left(\mat{W}\mat{X}\right), \end{equation}$ (A.10)using X: = (x − x₀)(x − x₀)^T = X^T and the relation tr(ABC) = tr(BCA) for traces of matrices. From this follows that ∂ρ/∂W_ij = [ X ] _ij = X_ij, or $\begin{matrix} A ⟨ f^{'} (ρ) X_{ij}, A^{-1} I (x)^{⟩} & = & A ⟨ f^{'} (ρ) X_{ij},f (ρ)^{⟩} \\ \Leftarrow \Rightarrow ⟨ f^{'} (ρ) X_{ij},I (x)^{⟩} & = & ⟨ f^{'} (ρ) X_{ij},Af (ρ)^{⟩} \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} A\Ave{f^\prime(\rho)X_{ij},A^{-1}I(\vec{x})}&=& A\Ave{f^\prime(\rho)X_{ij},f(\rho)}\\ \nonumber \Longleftrightarrow~ \Ave{f^\prime(\rho)X_{ij},I(\vec{x})}&=& \Ave{f^\prime(\rho)X_{ij},Af(\rho)} \end{eqnarray}$ (A.11)by employing Eq. (A.4). This set of (four) equations can compactly be written as $⟨ f^{'} (ρ) X,Af (ρ) ⟩ = ⟨ f^{'} (ρ) X,I (x) ⟩$ $\appendix \setcounter{section}{1} \begin{equation} \Ave{f^\prime(\rho)\mat{X},Af(\rho)}= \Ave{f^\prime(\rho)\mat{X},I(\vec{x})} \end{equation}$ (A.12)or utilising Eq. (A.5): $\frac{⟨ f^{'} (ρ) X,f (ρ) ⟩}{⟨ f (ρ),f (ρ) ⟩} = \frac{⟨ f^{'} (ρ) X,I (x) ⟩}{⟨ f (ρ),I (x) ⟩} \cdot$ $\appendix \setcounter{section}{1} \begin{equation} \label{eq:lsm1} \frac{\ave{f^\prime(\rho)\mat{X},f(\rho)}} {\ave{f(\rho),f(\rho)}} = \frac{\ave{f^\prime(\rho)\mat{X},I(\vec{x})}} {\ave{f(\rho),I(\vec{x})}}\cdot \end{equation}$ (A.13)This last relation shows that the best-fitting template f(ρ) has, up to a constant scalar α, the same central second-order moments M_I as the image, $\begin{matrix} M & = & \frac{⟨ f^{'} (ρ) X,f (ρ) ⟩}{⟨ f^{'} (ρ),f (ρ) ⟩} \\ = & \frac{⟨ f (ρ),f (ρ) ⟩}{⟨ f^{'} (ρ),f (ρ) ⟩} \frac{⟨ f^{'} (ρ),I (x) ⟩}{⟨ f (ρ),I (x) ⟩} \frac{⟨ f^{'} (ρ) X,I (x) ⟩}{⟨ f^{'} (ρ),I (x) ⟩} \\ = : \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} \label{eq:lsm2} \mat{M} &=& \frac{\ave{f^\prime(\rho)\mat{X},f(\rho)}} {\ave{f^\prime(\rho),f(\rho)}}\\ &=& \frac{\ave{f(\rho),f(\rho)}} {\ave{f^\prime(\rho),f(\rho)}} \frac{\ave{f^\prime(\rho),I(\vec{x})}} {\ave{f(\rho),I(\vec{x})}} \frac{\ave{f^\prime(\rho)\mat{X},I(\vec{x})}} {\ave{f^\prime(\rho),I(\vec{x})}}\\ &=:&\label{eq:lsm2b} \alpha \frac{\ave{f^\prime(\rho)\mat{X},I(\vec{x})}} {\ave{f^\prime(\rho),I(\vec{x})}} =\alpha\mat{M}_{\rm I}. \end{eqnarray}$ In particular, the best-fitting template Af(ρ) has, for a given profile f(s), the same ellipticity ϵ as the image I(x). Moreover, we derive from (A.13) the relation $M = \frac{⟨ f (ρ),f (ρ) ⟩}{⟨ f^{'} (ρ),f (ρ) ⟩} \frac{⟨ f^{'} (ρ) X,I (x) ⟩}{⟨ f (ρ),I (x) ⟩} = : β \frac{⟨ f^{'} (ρ) X,I (x) ⟩}{⟨ f (ρ),I (x) ⟩} \cdot$ $\appendix \setcounter{section}{1} \begin{equation} \mat{M}= \frac{\ave{f(\rho),f(\rho)}}{\ave{f^\prime(\rho),f(\rho)}} \frac{\ave{f^\prime(\rho)\mat{X},I(\vec{x})}} {\ave{f(\rho),I(\vec{x})}} =: \beta\frac{\ave{f^\prime(\rho)\mat{X},I(\vec{x})}} {\ave{f(\rho),I(\vec{x})}}\cdot \end{equation}$ (A.17)The scalar prefactor equals β = − 2 for f(ρ) = e^{− ρ/ 2}. We note that we have on the right-hand-side f(ρ) in the denominator rather than f′(ρ).

We evolve the expression of β a bit further by changing the variables Δx = M^{1 / 2}y and hence $d^{2} x = \sqrt{\det M} d^{2} y$ $\hbox{$\d^2\!x=\sqrt{\det{\mat{M}}}\,\d^2\!y$}$ for ρ = Δx^TM^-1Δx in the integrals of the denominator and nominator of β, $β = \frac{\int d^{2} y f^{2} (y^{T} y)}{\int d^{2} y f^{'} (y^{T} y) f (y^{T} y)} = \frac{\int_{0}^{\infty} d yy f^{2} (y^{2})}{\int_{0}^{\infty} d yy f^{'} (y^{2}) f (y^{2})}$ $\appendix \setcounter{section}{1} \begin{equation} \beta= \frac{\int\d^2y\;f^2(\vec{y}^{\rm T}\vec{y})} {\int\d^2y\;f^\prime(\vec{y}^{\rm T}\!\vec{y})f(\vec{y}^{\rm T}\!\vec{y})} = \frac{\int_0^\infty\d y\;y\,f^2(y^2)} {\int_0^\infty\d y\;y\,f^\prime(y^2)f(y^2)} \end{equation}$ (A.18)or after another change of variables $y = \sqrt{s}$ $\hbox{$y=\sqrt{s}$}$ and $d y = d s / (2 \sqrt{s})$ $\hbox{$\d y=\d s/(2\sqrt{s})$}$ , $β = \frac{⟨ f (ρ),f (ρ) ⟩}{⟨ f^{'} (ρ),f (ρ) ⟩} = \frac{\int_{0}^{\infty} \frac{d s}{2} f^{2} (s)}{\int_{0}^{\infty} d s \frac{d}{4 d s} f^{2} (s)} = - 2 \frac{\int_{0}^{\infty} d s f^{2} (s)}{f^{2} (0)},$ $\appendix \setcounter{section}{1} \begin{equation} \beta= \frac{\ave{f(\rho),f(\rho)}}{\ave{f^\prime(\rho),f(\rho)}}= \frac{\int_0^\infty\frac{\d s}{2}\;f^2(s)} {\int_0^\infty\d s\;\frac{\d}{4\d s}f^2(s)} = -2\,\frac{\int_0^\infty\d s\;f^2(s)} {f^2(0)}, \end{equation}$ (A.19)where the last step assumes that the profile f(s) vanishes for s → ∞.

Appendix B: Importance sampling of posterior distribution

An overview of our algorithm for sampling the posterior P_p(p | I) is given by Algorithm 1, more details follow below.

For the importance sampling of the posterior GLAM variables p, we initially construct an analytical approximation Q(p) of the (full) posterior P_p(p | I): = ℒ(I | p) P_p(p) in Eq. (45), the so-called importance function. Our importance function is a local Gaussian approximation of ℒ(I | p) that is truncated for | ϵ | ≥ 1; Q(p) is hence defined by its mean and its covariance. As previously discussed, we ignore residuals and set R_pre ≡ 0 in Eq. (48) for the following.

For the mean p_ml of the Gaussian Q(p), we numerically determine the maximum-likelihood (ML) point of ℒ(I | p) by minimising (48) with respect to p. This is the most time consuming step in our algorithm, which is done in the following three steps.

1.
We quickly guess a starting point p₀ of the centroid position x₀ and the galaxy size t by measuring the unweighted first and second-order moments of the post-seeing postage stamp. The initial ellipticity ϵ is set to zero. Herein we ignore the PSF. Alternatively, one could use estimates from object detection software such as SExtractor if available (Bertin & Arnouts 1996).
2.
For an improved initial guess of the size t and the centroid position x₀, starting from p₀ we perform a series of one-dimensional (1D) searches wherein we vary one parameter p_i while all other parameters p_j with i ≠ j are fixed (coordinate descent; Bezdek et al. 1987). For each 1D search, we compute − 2lnℒ(I | p) at five points over the range of values of p_i that are allowed according to the prior P_p(p). We spline interpolate between the points to estimate the global minimum $p_{i}^{\min}$ $\hbox{$p_i^{\rm min}$}$ . Then we update p_i to the value $p_{i}^{\min}$ $\hbox{$p_i^{\rm min}$}$ and move on to the 1D search of the next parameter. After every update we replace A by its most likely value $A_{ml} = \frac{I^{T} N^{-1} L f_{ρ}}{[L f_{ρ}]^{T} N^{-1} L f_{ρ}}$ $\appendix \setcounter{section}{2} \begin{equation} A_{\rm ml}= \frac{\vec{I}^{\rm T}\mat{N}^{-1}\mat{L}\vec{f}_\rho} {[\mat{L}\vec{f}_\rho]^{\rm T}\mat{N}^{-1}\mat{L}\vec{f}_\rho} \end{equation}$ (B.1)(Eq. A.5) given the current values of (x₀,ϵ,t). We carry out the searches first for the size t and then for the two components of the centroid position x₀.
3.
Starting from p in step 2, we apply the Levenberg-Marquardt algorithm (LMA) to iteratively approach p_ml (Marquardt 1963). This algorithm devises a linear approximation of Lf_ρ at p based on the partial derivatives of Lf_ρ. Our implementation of the LMA searches only in the directions of (x₀,ϵ,t) and again updates A to A_ml after each iteration as in step 2. We find a slow convergence for large ellipticities, that is | ϵ | ≳ 0.3. This is improved, however, by adding an extra 1D search for the size t, ϵ₁ and ϵ₂ after each LMA iteration with an interval width of ± 0.1t (10 spline points) and ± 0.1 (5 points) around the latest values of t and ϵ_i, respectively. For | ϵ_n | ≥ 0.6, we use 10 instead of 5 spline points which increases the accuracy for very elliptical objects. We repeat the LMA until iteration n> 5 for which | ϵ_{n − 1} − ϵ_n | ≤ 10^-4; we change this threshold to 2 × 10^-5 once | ϵ_n | ≥ 0.6. The latter is needed because the change of ϵ_n at each iteration becomes small for high ϵ.

For the covariance of Q(p), we then numerically compute the elements F_ij of the (expected) Fisher matrix F around the ML point p_ml, $F_{ij} = {(\frac{\partial [A L f_{ρ}]}{\partial p_{i}})}^{T} \times N^{-1} \times (\frac{\partial [A L f_{ρ}]}{\partial p_{j}}),$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:fisher} F_{ij}= \left( \frac{\partial[A\mat{L}\vec{f}_\rho]} {\partial p_i} \right)^{\rm T} \times\mat{N}^{-1} \times \left( \frac{\partial[A\mat{L}\vec{f}_\rho]} {\partial p_j} \right), \end{equation}$ (B.2)by considering partial derivatives of the best-fitting GLAM templates (e.g. Tegmark et al. 1997). This gives us $Q (p) \propto \exp (- \frac{1}{2} {[p - p_{ml}]}^{T} F [p - p_{ml}]) H (P_{p} (p)^{)},$ $\appendix \setcounter{section}{2} \begin{equation} Q(\vec{p})\propto \exp{\left( -\frac{1}{2} \left[\vec{p}-\vec{p}_{\rm ml}\right]^{\rm T} \mat{F} \left[\vec{p}-\vec{p}_{\rm ml}\right] \right)}\, H\left(P_{\rm p}(\vec{p})\right), \end{equation}$ (B.3)where H(x) denotes the Heaviside step function. To quickly construct the Fisher matrix, we employ the partial derivatives of Lf_ρ in the last iteration of the previous LMA; the derivative with respect to A is simply Lf_ρ.

For the main loop of the importance sampling, we randomly draw N_real points p_i ~ Q(p). Each realisation p_i receives an importance weight w_i = P_p(p_i | I) Q(p_i)^-1 for which we have to evaluate the posterior P_p(p_i | I); neither P_p(p | I) nor Q(p) need to be normalised here. In the end, we normalise all weights to unity for convenience, that is ∑ _iw_i ≡ 1. Moreover, for later usage, we keep only the ellipticities ϵ_i of the parameter sets p_i. This Monte Carlo marginalises over the uncertainties of all other parameters.

As another technical detail, the above algorithm has to repeatedly produce pixellated and PSF convolved versions of GLAM templates Lf_ρ for a given set of parameters p. To implement a fast computation in this regard, we construct the template images on a high resolution (high-res) 256 × 256 grid in Fourier space. To reduce boundary effects, we make the angular extent of the high-res grid about 20 per cent larger than the extent of the post-seeing galaxy image. Let $\begin{matrix} _{˜} \\ c_{psf} \end{matrix} (ℓ)$ $\hbox{$\tilde{c}_{\rm psf}(\vec{\ell})$}$ be the Fourier transform of the PSF and $\begin{matrix} ˜ \\ f \end{matrix} (ℓ) = 2 π \int_{0}^{\infty} d sf (s) J_{0} (sℓ)$ $\appendix \setcounter{section}{2} \begin{equation} \tilde{f}(\ell)= 2\pi\int_0^\infty\d s\;f(s)\,J_0(s\ell) \end{equation}$ (B.4)

the transform of the radial template profile f(ρ). Then the Fourier coefficients of the PSF-convolved template are $\begin{matrix} 􏽥 \\ f_{psf} \end{matrix} (ℓ) = (\begin{matrix} \det \end{matrix} V) e^{i ℓ \cdot x_{0}} \begin{matrix} ˜ \\ c_{psf} \end{matrix} (ℓ) \begin{matrix} ˜ \\ f \end{matrix} (ℓ^{T} V^{2} ℓ) .$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:fourier} \widetilde{f}_{\rm psf}(\vec{\ell})= (\det{\mat{V}})\,\e^{{\rm i}\vec{\ell}\cdot\vec{x}_0}\tilde{c}_{\rm psf}(\vec{\ell})\tilde{f}(\vec{\ell}^{\rm T}\mat{V}^2\vec{\ell}). \end{equation}$ (B.5)After setting up the grid of values in Fourier space, we back-transform to coordinate space by means of the FFTW package (Frigo & Johnson 2005). We note that the Fourier coefficient at ℓ = 0 has to be set to $\begin{matrix} _{􏽥} \\ f_{psf} \end{matrix} (0) = \det V \begin{matrix} _{˜} \\ c_{psf} \end{matrix} (0) \begin{matrix} _{˜} \\ f \end{matrix} (0)$ $\hbox{$\widetilde{f}_{\rm psf}(0)= \det{V}\,\tilde{c}_{\rm psf}(0)\,\tilde{f}(0)$}$ to avoid a constant offset of the template. After the FFT, we apply pixellation to produce the vector Lf_ρ of post-seeing pixel values by averaging pixels of the high-res grid within the extent of the post-seeing image. We choose the angular extent of the high-res grid such that an integer number of high-res pixels is enclosed by one post-seeing pixel. For example, for our 20 × 20 post-seeing images with pixel size r_post = 0.103125 ≈ 0.1 arcsec, we use a high-res grid with extent of 2.4 arcsec along each axis, or a high-res pixel size of r_hres = 2.4 / 256 = 0.009375 arcsec (square). Hence exactly (r_post/r_hres)² = 121 high-res pixel correspond to one post-seeing pixel.

Furthermore, to quickly compute the partial derivatives of ∂(Lf_ρ) /∂p_i, needed for the LMA and the Fisher matrix, we carry out the foregoing procedure, but we insert the derivatives $\partial \begin{matrix} _{􏽥} \\ f_{psf} \end{matrix} (ℓ) / \partial p_{i}$ $\hbox{$\partial\widetilde{f}_{\rm psf}(\vec{\ell})/\partial p_i$}$ as Fourier coefficients instead. These can be computed analytically from Eq. (B.5).

All Tables

Table 1

Multiplicative bias m in per cent for three profiles TMP, EXP, DEV, and sizes r_h = 0.2′′ only.

In the text

All Figures

	Fig. 1 Examples of GLAM templates in the pre-seeing frame, f_ρ (top left), and the post-seeing frame, Lf_ρ (other panels); the templates are Gaussian radial profiles with f(ρ) = e^{− ρ/ 2}. The bottom left panel simulates only pixellation, whereas the right column also shows the impact of a PSF, indicated in the top right corner, without (top) and with pixellation (bottom).
In the text

Fig. 2

Toy-model demonstration of a maximum-likelihood estimator (red), a maximum-likelihood estimator with first-order bias correction (green), and an estimator exploiting the full posterior (blue). Data points display the estimator average (y-axis) over 10⁶ data points at varying signal-to-noise levels (x-axis). The true value to be estimated is x = 1. The panels show different signal-to-noise regimes; − nppt denotes y = 1 − n/ 10³.

In the text

Fig. 3

Prior bias in the marginal posterior P_ϵ(ϵ | I₁,...,I_n) as function of S/N ν for different galaxy sizes r_h (in arcsec). The posterior assumes a uniform prior. Shown is the error δϵ = | ϵ − ϵ_true | of the inferred ϵ for a true ϵ_true = 0.3 as obtained by combining the marginal posteriors of 5 × 10³ exposures of the same galaxy with random centroid positions. The pixel size 0.1 arcsec equals the PSF size (Moffat). Galaxy profiles and GLAM templates have a Sérsic profile with n = 2: there is no underfitting.

In the text

	Fig. 4 Examples of simulated images of galaxies with random ellipticities. Signal-to-noise ratios from left to right and top to bottom: ν = 10, 20, 40, and 200. The radial light profile of the pre-seeing images is exponential with r_h = 0.2′′; the pixel size is 0.1′′. The FWHM of the PSF is indicated by the circle in the top right corners.
In the text

Fig. 5

Plots of the multiplicative bias m for simulated images, based on Eq. (69), for two different priors P_s(ϵ_s) (filled data points: correct prior; open data points: uniform prior). Different styles for the data points indicate different image sizes r_h, see key inside figure, while colours vary with galaxy type: GLAM template (TMP; red); exponential (EXP; green); bulge-like (DEV; blue). Data points for r_h = 0.3′′, same galaxy type, and same prior are connected by dotted lines to guide the eye. The prior for galaxy sizes t, amplitudes A, and centroids x₀ is uniform giving rise to noise-dependent prior bias; the constant offset of EXP and DEV is due to underfitting. A square image pixel has the size 0.1′′, which also equals the PSF size. Results for m for a larger PSF size can be found in Table 1.

In the text


In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Aitken, A. 1934, On least squares and linear combination of observations, 55, 42 [Google Scholar]

[2] Alsing, J., Heavens, A., Jaffe, A. H., et al. 2016, MNRAS, 455, 4452 [NASA ADS] [CrossRef] [Google Scholar]

[3] Alsing, J., Heavens, A., & Jaffe, A. H. 2017, MNRAS, 466, 3272 [NASA ADS] [CrossRef] [Google Scholar]

[4] Bartelmann, M., & Schneider, P. 2001, Phys. Rep., 340, 291 [NASA ADS] [CrossRef] [Google Scholar]

[5] Bernstein, G. M. 2010, MNRAS, 406, 2793 [NASA ADS] [CrossRef] [Google Scholar]

[6] Bernstein, G. M., & Armstrong, R. 2014, MNRAS, 438, 1880 [NASA ADS] [CrossRef] [Google Scholar]

[7] Bernstein, G. M., & Jarvis, M. 2002, AJ, 123, 583 [NASA ADS] [CrossRef] [Google Scholar]

[8] Bernstein, G. M., Armstrong, R., Krawiec, C., & March, M. C. 2016, MNRAS, 459, 4467 [NASA ADS] [CrossRef] [Google Scholar]

[9] Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[10] Bezdek, J., Hathaway, R., Howard, R., Wilson, C., & Windham, M. 1987, J. Optim. Theory Appl., 54, 471 [CrossRef] [Google Scholar]

[11] Bridle, S., Balan, S. T., Bethge, M., et al. 2010, MNRAS, 405, 2044 [NASA ADS] [Google Scholar]

[12] Capaccioli, M. 1989, in World of Galaxies, eds. H. G. Corwin, Jr., & L. Bottinelli, 208 [Google Scholar]

[13] Dawson, W. A., Schneider, M. D., Tyson, J. A., & Jee, M. J. 2016, ApJ, 816, 11 [NASA ADS] [CrossRef] [Google Scholar]

[14] Fisher, R. A. 1935, J. Roy. Stat. Soc., 98, 39 [CrossRef] [Google Scholar]

[15] Frigo, M., & Johnson, S. G. 2005, Proc. IEEE, special issue on Program Generation, Optimization, and Platform Adaptation, 93, 216 [Google Scholar]

[16] Geiger, B., & Schneider, P. 1998, MNRAS, 295, 497 [NASA ADS] [CrossRef] [Google Scholar]

[17] Gelman, A., Carlin, J., Stern, H., et al. 2013, Bayesian Data Analysis, third edn., Chapman & Hall/CRC Texts in Statistical Science (Taylor & Francis) [Google Scholar]

[18] Gruen, D., Bernstein, G. M., Jarvis, M., et al. 2015, Journal of Instrumentation, 10, C05032 [CrossRef] [Google Scholar]

[19] Hartlap, J., Hilbert, S., Schneider, P., & Hildebrandt, H. 2011, A&A, 528, A51 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[20] Heymans, C., Van Waerbeke, L., Bacon, D., et al. 2006, MNRAS, 368, 1323 [NASA ADS] [CrossRef] [Google Scholar]

[21] Hirata, C., & Seljak, U. 2003, MNRAS, 343, 459 [NASA ADS] [CrossRef] [Google Scholar]

[22] Hirata, C. M., Mandelbaum, R., Seljak, U., et al. 2004, MNRAS, 353, 529 [NASA ADS] [CrossRef] [Google Scholar]

[23] Hoekstra, H., & Jain, B. 2008, Ann. Rev. Nucl. Part. Sci., 58, 99 [Google Scholar]

[24] Hoekstra, H., Herbonnet, R., Muzzin, A., et al. 2015, MNRAS, 449, 685 [NASA ADS] [CrossRef] [Google Scholar]

[25] Kacprzak, T., Zuntz, J., Rowe, B., et al. 2012, MNRAS, 427, 2711 [NASA ADS] [CrossRef] [Google Scholar]

[26] Kaiser, N., Squires, G., & Broadhurst, T. 1995, ApJ, 449, 460 [NASA ADS] [CrossRef] [Google Scholar]

[27] Kilbinger, M. 2015, Rep. Prog. Phys., 78, 086901 [Google Scholar]

[28] Kilbinger, M., Wraith, D., Robert, C. P., et al. 2010, MNRAS, 405, 2381 [NASA ADS] [Google Scholar]

[29] Kitching, T. D., Miller, L., Heymans, C. E., van Waerbeke, L., & Heavens, A. F. 2008, MNRAS, 390, 149 [NASA ADS] [CrossRef] [Google Scholar]

[30] Kitching, T. D., Balan, S. T., Bridle, S., et al. 2012, MNRAS, 423, 3163 [NASA ADS] [CrossRef] [Google Scholar]

[31] Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv e-prints [arXiv:1110.3193] [Google Scholar]

[32] Lewis, A. 2009, MNRAS, 398, 471 [NASA ADS] [CrossRef] [Google Scholar]

[33] MacKay, J. D. 2003, Information Theory, Inference, and Learning Algorithms (Cambridge, UK: Cambridge University Press) [Google Scholar]

[34] Mandelbaum, R., Hirata, C. M., Seljak, U., et al. 2005, MNRAS, 361, 1287 [NASA ADS] [CrossRef] [Google Scholar]

[35] Marquardt, D. 1963, J. Soc. Ind. Appl. Mathem., 11, 431 [Google Scholar]

[36] Marshall, A. 1956, in Symp. on Monte Carlo methods (Wiley), ed. M. Meyer, 123 [Google Scholar]

[37] Massey, R., Heymans, C., Bergé, J., et al. 2007, MNRAS, 376, 13 [NASA ADS] [CrossRef] [Google Scholar]

[38] Massey, R., Kitching, T., & Richard, J. 2010, Rep. Prog. Phys., 73, 086901 [Google Scholar]

[39] Massey, R., Hoekstra, H., Kitching, T., et al. 2013, MNRAS, 429, 661 [Google Scholar]

[40] Melchior, P., & Viola, M. 2012, MNRAS, 424, 2757 [NASA ADS] [CrossRef] [Google Scholar]

[41] Melchior, P., Böhnert, A., Lombardi, M., & Bartelmann, M. 2010, A&A, 510, A75 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[42] Melchior, P., Viola, M., Schäfer, B. M., & Bartelmann, M. 2011, MNRAS, 412, 1552 [NASA ADS] [CrossRef] [Google Scholar]

[43] Melchior, P., Suchyta, E., Huff, E., et al. 2015, MNRAS, 449, 2219 [NASA ADS] [CrossRef] [Google Scholar]

[44] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. 1953, J. Chem. Phys., 21, 1087 [NASA ADS] [CrossRef] [Google Scholar]

[45] Miller, L., Kitching, T. D., Heymans, C., Heavens, A. F., & van Waerbeke, L. 2007, MNRAS, 382, 315 [NASA ADS] [CrossRef] [Google Scholar]

[46] Miller, L., Heymans, C., Kitching, T. D., et al. 2013, MNRAS, 429, 2858 [Google Scholar]

[47] Moffat, A. F. J. 1969, A&A, 3, 455 [NASA ADS] [Google Scholar]

[48] Munshi, D., Valageas, P., van Waerbeke, L., & Heavens, A. 2008, Phys. Rep., 462, 67 [NASA ADS] [CrossRef] [Google Scholar]

[49] Nakajima, R., & Bernstein, G. 2007, AJ, 133, 1763 [NASA ADS] [CrossRef] [Google Scholar]

[50] Niemi, S.-M., Cropper, M., Szafraniec, M., & Kitching, T. 2015, Exp. Astron., 39, 207 [NASA ADS] [CrossRef] [Google Scholar]

[51] Plazas, A. A., Bernstein, G. M., & Sheldon, E. S. 2014, Journal of Instrumentation, 9, C4001 [CrossRef] [Google Scholar]

[52] Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical recipes in C. The art of scientific computing 2nd edn (Cambridge University Press) [Google Scholar]

[53] Refregier, A., Kacprzak, T., Amara, A., Bridle, S., & Rowe, B. 2012, MNRAS, 425, 1951 [NASA ADS] [CrossRef] [Google Scholar]

[54] Schneider, P. 2006, in Saas-Fee Advanced Course 33: Gravitational Lensing: Strong, Weak and Micro, eds. G. Meylan, P. Jetzer, P. North, P. Schneider, C. S. Kochanek, & J. Wambsganss, 269 [Google Scholar]

[55] Seitz, C., & Schneider, P. 1997, A&A, 318, 687 [NASA ADS] [Google Scholar]

[56] Semboloni, E., Hoekstra, H., Huang, Z., et al. 2013, MNRAS, 432, 2385 [NASA ADS] [CrossRef] [Google Scholar]

[57] Sheldon, E. S. 2014, MNRAS, 444, L25 [NASA ADS] [CrossRef] [Google Scholar]

[58] Takeuchi, T. T. 2010, MNRAS, 406, 1830 [NASA ADS] [Google Scholar]

[59] Tegmark, M., Taylor, A. N., & Heavens, A. F. 1997, ApJ, 480, 22 [NASA ADS] [CrossRef] [Google Scholar]

[60] van der Vaart, A. W. 1998, Asymptotic statistics, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press) [Google Scholar]

[61] Viola, M., Melchior, P., & Bartelmann, M. 2011, MNRAS, 410, 2156 [NASA ADS] [CrossRef] [Google Scholar]

[62] Viola, M., Kitching, T. D., & Joachimi, B. 2014, MNRAS, 439, 1909 [NASA ADS] [CrossRef] [Google Scholar]

[63] Voigt, L. M., & Bridle, S. L. 2010, MNRAS, 404, 458 [NASA ADS] [Google Scholar]

[64] Zuntz, J., Kacprzak, T., Voigt, L., et al. 2013, MNRAS, 434, 1604 [NASA ADS] [CrossRef] [Google Scholar]