Super-resolution method using sparse regularization for point-spread function recovery

F. M. Ngolè Mboula; J.-L. Starck; S. Ronayette; K. Okumura; J. Amiaux

doi:10.1051/0004-6361/201424167

Home

All issues

Volume 575 (March 2015)

A&A, 575 (2015) A86

Full HTML

Free Access

Issue		A&A Volume 575, March 2015


Article Number		A86
Number of page(s)		11
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/201424167
Published online		02 March 2015

A&A 575, A86 (2015)

Super-resolution method using sparse regularization for point-spread function recovery

F. M. Ngolè Mboula, J.-L. Starck, S. Ronayette, K. Okumura and J. Amiaux

Laboratoire AIM, UMR CEA-CNRS-Paris 7, Irfu, Service d’Astrophysique, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France
e-mail: fred-maurice.ngole-mboula@cea.fr

Received: 9 May 2014
Accepted: 3 September 2014

Abstract

In large-scale spatial surveys, such as the forthcoming ESA Euclid mission, images may be undersampled due to the optical sensors sizes. Therefore, one may consider using a super-resolution (SR) method to recover aliased frequencies, prior to further analysis. This is particularly relevant for point-source images, which provide direct measurements of the instrument point-spread function (PSF). We introduce SParse Recovery of InsTrumental rEsponse (SPRITE), which is an SR algorithm using a sparse analysis prior. We show that such a prior provides significant improvements over existing methods, especially on low signal-to-noise ratio PSFs.

Key words: techniques: image processing / methods: numerical

© ESO, 2015

1. Introduction

The weak gravitational lensing is one of the most promising tools to probe the dark matter distribution in the universe. The idea is to infer, from a billion images of galaxies, the shape distortions due to dark matter gravitational lensing and then estimate the dark matter mass density along different lines of sight. The Euclid mission (launch planned for 2020) provides the data for such a purpose (ESA/SRE 2011). Nevertheless, galaxy images are distorted due to the point-spread function (PSF). Therefore, it is critical to know this distortion accurately. It can be modeled in first approximation as a convolution of the desired image by the PSF of the telescope, which is typically space and time-varying. In practice, isolated stars provide PSF measurements at different locations in the field of view. Nevertheless, these stars images can be aliased as it is the case in Euclid, given the CCD sensor sizes. On the other hand, the surveys are generally designed so that different images of the same stars are available and likely to be with different subpixel offsets on the sensor grid. Moreover, one may consider that nearby star images give to some extend the same local PSF. We, thus, can consider that different low-resolution versions of the same PSF are available in practice, so that one may apply a super-resolution (SR) method to recover aliased frequencies.

This paper precisely tackles this problem. The SR is a widely studied topic in general image processing literature. Yet, some methods have been specifically proposed for astronomical data. For instance, there is the software IMCOM (Rowe et al. 2011) and PSFEx, which proposes an SR option. The IMCOM provides an oversampled output image from multiple undersampled input images, assuming that the PSF is fully specified. Since it does not deal with the PSF restoration itself, we use PSFEx as our main reference. The PSFEx performs SR by minimizing the sum of two terms. The first term is a weighted quadratic distance, relating the underlying PSF to each of the low resolution measurements. The second term consists of the square l₂ norm of the difference between the underlying PSF and a smooth first guess. This term is meant for regularization. In the proposed algorithm, we introduce a new regularization scheme based on the optimization variable sparsity in a suitable dictionary.

Fig. 1

General observation model. See Sect. 2.2 for a detailed description.

Fig. 2

Adapted observation model. This time we consider an isolated star as an input, and unlike in the general model, the output of the block “Blur” is the PSF.

Fig. 3

SR general scheme. The deblurring does not apply to our case, since we want to precisely estimate the blur.

Section 2 presents the general principle of SR along with some state-of-the-art methods in astronomical domain. In Sect. 3, we present the proposed algorithm in details, which is followed with some numerical experiments in Sect. 4. We conclude by summarizing the main results and giving some perspectives.

2. Super-resolution overview

2.1. Notations

We adopt the following notation conventions:

we use bold low case letters for vectors;
we use bold capital case letters for matrices;
the vectors are treated as column vectors unless explicitly mentioned otherwise.

The underlying high resolution (HR) image of size d₁p₁ × d₂p₂ is written in lexicographic order (for instance, lines after lines) as a vector of pixels values x = (x₁,...,x_q)^T, where q = d₁p₁d₂p₂, and d₁ and d₂ are respectively the line and column downsampling factors. We consider n LR observations. The vector of pixels values y_k = (y_k1,...,y_kp)^T denotes the kth LR observation written in lexicographical order with p = p₁p₂ and k = 1...n.

2.2. Observation model

We assume that x does not change during the acquisition of the n LR images so that we have $y_{k} = D B_{k} M_{k} x + n_{k},k = 1... n .$ $\begin{equation} \vec{y}_k = \mathbf{D}\mathbf{B}_{k}\vec{M}_{k}\vec{x} + \vec{n}_{k}, \; k=1...n. \label{obs_mod_eq} \end{equation}$ (1)The variable M_k is a warp matrix of size q × q. It represents the motions of the observations relative to each other and those in general need to be estimated. The variable B_k is a blur matrix of size q × q. It accounts for different blurs (the atmosphere blur, which is particularly considerable for ground based telescopes, the system optics blur, the imaging system shaking, etc.). The variable D is a matrix of size p × q, which simply realizes a downsampling operation. Finally, n_k is a noise vector of q elements. This model is illustrated in Fig. 1.

In our case, we are interested in estimating the PSF, or in other terms, the telescope’s contribution to the blur. We consider that the PSF varies slowly in the field so that the blocks “Warping” and “Blur” in Fig. 1 may be swapped, for slow motions between observations. Therefore, the block diagram can be adapted as in Fig. 2. This model still holds in presence of atmospheric blur (for ground-based telescopes) and jitter movements, if the LR images are extracted from the same exposure.

In the general case, the model (1) may simply be written as $y_{k} = W_{k} x + n_{k},k = 1... n,$ $\begin{equation} \vec{y}_k = \mathbf{W}_{k}\vec{x} + \vec{n}_{k}, \; k=1...n, \label{obs_mod_eq_1} \end{equation}$ (2)where W_k is a p × q matrix accounting for warping, blur, and downsampling.

2.3. SR techniques in astronomy

Generally, SR techniques involve three steps, which may be combined or performed separately. The registration step consists in evaluating the relative motions between different observations, so that their samples can be arranged on a common grid. It is critical that the precision of this registration should be smaller than the pixels dimensions over the target upsampling factors. Since these relative motions are arbitrary, this grid is non-uniform. The next step would be to interpolate this grid in such a way to get a regularly sampled HR image. This image is blurry and noisy. Therefore, the final step is a restoration procedure. This scheme is summarized in Fig. 3. In the next sections, we describe some SR techniques dedicated to astronomical images, but one may refer to Park et al. (2003) for more details on various SR frameworks in general image processing literature.

2.3.1. Shift-and-add method

The most simple SR method is certainly the shift-and-add method. It is performed in three steps. First, the images are upsampled to the target resolution. Then, they are shifted on a common grid and averaged. It has been used in astronomy for a long time, particularly for ground based telescopes. This method is simple, fast and is used for comparisons in the numerical experiments part. It has been shown that this method provides an optimal solution in the sense of the maximum likelihood (ML) with additive white Gaussian noise (WGN) and when only pure integer translation motions are considered with respect to the finer grid (Elad & hel Or 2001). The interpolation operator should be D^T (from Eq. (1)), which comes down to a simple zero-padding interpolation. It has been shown in the same work that the matrix $R = \sum_{k = 1}^{n} M_{k}^{T} D^{T} D M_{k}$ $\hbox{$R = \sum_{k=1}^n \mathbf{M}_{k}^{\rm T}\mathbf{D}^{\rm T}\mathbf{D}\mathbf{M}_{k}$}$ is diagonal in this simple case. Thus, after registration and stacking of the interpolated images, each pixel value should be divided by the corresponding R diagonal coefficient.

2.3.2. PSFEx method

The software PSFEx is an open source program, which has been used in many projects such as the Dark Energy Survey (Mohr et al. 2012) or CFHTLS¹ for PSF modeling. It takes a catalog of objects extracted from an astronomical image using SExtractor as input, which is also an open source tool for point sources (or stars) extraction. This catalog contains information about extracted point sources, such as signal-to-noise ratio (S/N), luminosity, full width at half maximum (FWHM), centroid coordinates, multiple flags related to saturation or blending, etc. Based on these measurements (performed in SExtractor) and some user provided parameters, PSFEx selects which sources are proper for PSF modeling. Afterwards, it constructs a PSF model, provides a fitting with an analytic function, and computes some of the PSF geometrical features. The PSF model construction may simply consist in optimally combining the input sources images for denoising, but it may also involve SR if these images are also undersampled. This SR functionality is our second reference for comparisons².

The desired HR image is now a matrix X of size dp × dp, where d is the downsampling factor and the kth observation Y_k is a matrix of size p × p with k = 1...n. The coordinates of the centroid of the kth observation are denoted (i_k,j_k). Assuming that the images are bandlimited, the samples of the LR images can be interpolated from the desired HR image, thanks to the Shannon sampling theorem (Shannon 1949). In theory, this interpolation should involve a 2D sinus cardinal (sinc) kernel with an infinite support, which is not convenient for practical implementation. One can instead use a support compact function, which approximates the sinus cardinal. Let h(.,.) denote such a function. The estimate of the sample (i,j) of the kth observation is given by $ŷ k,ij = \sum_{l} \sum_{m} h [l - d (i - i_{k}),m - d (j - j_{k})] x_{ij} .$ $\begin{equation} \hat{y}_{k,ij} = \sum_l\sum_m h\left[ l-d(i-i_k),m-d(j-j_k)\right]x_{ij}. \label{intp_rel} \end{equation}$ (3)Then, we can define the cost function $J_{1} (X) = \sum_{k = 1}^{n} \sum_{i = 1}^{p} \sum_{j = 1}^{p} \frac{(y_{k,ij} - f_{k} ŷ k,ij)^{2}}{σ_{k}^{2}}$ $\begin{equation} J_1(\mathbf{X}) = \sum_{k=1}^n\sum_{i=1}^p\sum_{j=1}^p\frac{(y_{k,ij}-f_k \hat{y}_{k,ij})^2}{\sigma_k^2} \label{psfex_cost} \end{equation}$ (4)where f_k accounts for possible luminosity differences. The parameter $σ_{k}^{2}$ $\hbox{$\sigma_k^2$}$ is related to the local background variance and any other uncertainty on the pixel value. These parameters need to be estimated.

The PSFEx uses a Lanczos4 interpolant, which is defined in 1D as $h_{1 D} (x) = {\begin{matrix} 1 & if x = 0 \\ sinc (x) sinc (x / 4) & if 0 < | x | < 4 \\ 0 & else, \end{matrix}$ $\begin{equation} h_{1D}(x) = \left\{ \begin{array}{ll} 1 & \mbox{if} \; x=0 \\ \sinc(x)\sinc({x}/{4}) & \mbox{if}\; 0<|x|<4 \\ 0 & \mbox{else,} \end{array} \right. \end{equation}$ (5)so that h(x,y) = h_1D(x)h_1D(y).

Finally, PSFEx minimizes the cost function defined as $J_{2} (Δ) = J_{1} (Δ + X^{(0)}) + λ ∥ Δ ∥_{2}^{2},$ $\begin{equation} \label{cost_func_reg} J_2(\bm{\Delta}) = J_1(\bm{\Delta}+\mathbf{X}^{(0)}) + \lambda \|\bm{\Delta}\|_{2}^2, \end{equation}$ (6)where Δ = X − X⁽⁰⁾ and X⁽⁰⁾ is a median image computed from the LR observations. This second term is meant for regularization purposes if the problem is ill-conditioned or undetermined (n<d²). One particularity of point-source images (see Fig. 4) is that one knows that the light comes from a single point, which is generally inferred to be the light blob’s centroid. Thus, one only needs the images centroid coordinates to perform the registration, which is implicitly done in Eq. (3).

Fig. 4

Simulated optical point-spread function.

3. Sparse penalty

3.1. Sparsity-based approaches

Using a sparsity prior to regularize a linear inverse problem has proven very effective in a large variety of domains and, in particular, in astronomical data processing (see Starck et al. 2010 and the references therein). It has recently led to impressive results for 3D weak gravitational lensing mass map reconstruction (Leonard et al. 2014). Let us consider the following general linear inverse problem: $y = A x + n,$ $\begin{equation} \vec{y} = \mathbf{A}\vec{x} + \vec{n}, \label{lin_inv} \end{equation}$ (7)where x ∈ℝ^p is the signal to recover, y ∈ℝ^m are the noisy measurements, and n is an additive noise. The variable A is a matrix of size m × p, which might be ill-conditioned, and in the general case m can be smaller than p. To regularize this problem, the signal x is assumed to be sparsely represented in an appropriate overcomplete dictionary Φ: x = Φ^Tα, where there is only a few non zero values in the vector α. We can define the l₀ pseudo-norm of α as $∥ α ∥_{0} = k,$ $\begin{equation} \|\boldsymbol{\alpha}\|_0 = k, \end{equation}$ (8)where k is the number of non-zero values in alpha. Thus, one way to tackle the problem stated in Eq. (7) under the sparsity assumption would be to solve the following $\min_{α} ∥ α ∥_{0} s . t . ∥ y - A Φ^{T} α ∥_{2}^{2} < ϵ,$ $\begin{equation} \min_{\boldsymbol{\alpha}} \|\boldsymbol{\alpha}\|_0 {\rm s.t.} \|\vec{y} - \mathbf{A}\boldsymbol{\Phi}^{\rm T}\boldsymbol{\alpha}\|_2^2 < \epsilon, \label{l0_pb} \end{equation}$ (9)where ϵ is related to the noise variance. In practice, for most problems, the signals of interest are not strictly sparse, but this assumption can be relaxed to compressible signals; when expressed in a suitable dictionary, the coefficients of a compressible signal exhibit a polynomial decrease of their sorted absolute values: $| α_{(i)} | ⩽ C i^{- \frac{1}{b}}, 1 ⩽ i ⩽ q,b \in] 0, 1]$ $\begin{equation} |\alpha_{(i)}| \leq C i^{-\frac{1}{b}},\;\;1\leq i\leq q,\;b \in ]0,1] \end{equation}$ (10)where C is a constant. This tends to be verified for a well-chosen dictionary.

Besides, the problem 9 is combinatorial and is untrackable in most practical applications. One instead uses the l₁ norm as a relaxation of the l₀ pseudo-norm and solves a convex optimization problem of the form: $\min_{α} \frac{1}{2} ∥ y - A Φ^{T} α ∥_{2}^{2} + λ ∥ α ∥_{1},$ $\begin{equation} \min_{\boldsymbol{\alpha}} \frac{1}{2}\|\vec{y} - \mathbf{A}\boldsymbol{\Phi}^{\rm T}\boldsymbol{\alpha}\|_2^2 + \lambda\|\boldsymbol{\alpha}\|_1, \label{bpdn_lag} \end{equation}$ (11)where the parameter λ balances the sparsity against the data fidelity. This formulation is known as the augmented Lagrangian form of the basis pursuit denoising (BPDN) problem (Chen et al. 2001). In solving this problem, one seeks a sparse way to synthesize the wanted signal from the atoms of the dictionary Φ. This prior is referred to as the synthesis prior in the sparse recovery literature. Another way of promoting sparsity is through the analysis prior, which consists of seeking a solution, which has a sparse representation in the transform domain, without imposing the signal to be written as a linear combination of some dictionary atoms. This is done by solving the following problem: $\min_{x} \frac{1}{2} ∥ y - A x ∥_{2}^{2} + λ ∥ Φ x ∥_{1},$ $\begin{equation} \min_{\vec{x}} \frac{1}{2}\|\vec{y} - \mathbf{A}\vec{x}\|_2^2 + \lambda\|\boldsymbol{\Phi}\vec{x}\|_1, \label{analys_prior} \end{equation}$ (12)where the matrix Φ transforms x in the new representation domain. The problems 11 and 12 are not equivalent when the matrix Φ is not unitary. Therefore, a choice has to be made between the two, which is discussed in the next section.

3.2. Method

Now, let consider Eq. (3). It can be rewritten as $ŷ k = D H_{k} x,$ $\begin{equation} \hat{\vec{y}}_k = \mathbf{D}\mathbf{H}_k\vec{x}, \label{intp_rel_mat} \end{equation}$ (13)where x and ŷ_k are, respectively, the desired matrix and the kth observation estimate written this time as column vectors in lexicographic order (we used the lines order); besides, H_k is a Toeplitz matrix (Gray 2006) of size d²p² × d²p², which contains the values of the kernel h(.,.) appearing in Eq. (3) and D is a decimation matrix of size p² × d²p² with d being the downsampling factor. We are assumming that the images are squared for convenience but they might be non-squared as well. In the same way, we can redefine the objective function of Eq. (4) as $J_{1} (x) = \frac{1}{2} \sum_{k = 1}^{n} ∥ y_{k} - f_{k} D H_{k} x ∥_{2}^{2} / σ_{k}^{2},$ $\begin{equation} J_1(\vec{x}) = \frac{1}{2}\sum_{k=1}^n{\|\vec{y}_{k}-f_k\mathbf{D}\mathbf{H}_k\vec{x}\|_2^2}/{\sigma_k^2}, \label{psfex_cost_mat} \end{equation}$ (14)where y_k is the kth observation rewritten consistently with x and ŷ_k. The function J₁(x) is nothing but the log-likelihood associated with the observation model in the case of an uncorrelated Gaussian noise that is stationary for each observation up to a scalar factor. It can be written in an even more compact way as $J_{1} (x) = \frac{1}{2} ∥ Σ^{-1} y - Σ^{-1} FW x ∥_{2}^{2},$ $\begin{equation} J_1(\vec{x}) = \frac{1}{2}\|\boldsymbol{\Sigma}^{-1}\vec{y}-\boldsymbol{\Sigma}^{-1}\mathbf{F}\mathbf{W}\vec{x}\|_2^2, \label{psfex_cost_mat_2} \end{equation}$ (15)where W is obtained by concatening vertically the matrices DH_k, F is a diagonal matrix constructed by repeating the coefficients f_kp² times for k = 1...n, and Σ is constructed the same way using the coefficients σ_k. Therefore, we can simply write $J_{1} (x) = \frac{1}{2} ∥ z - M x ∥_{2}^{2},$ $\begin{equation} J_1(\vec{x}) = \frac{1}{2} \|\vec{z}-\mathbf{M}\vec{x}\|_2^2, \label{psfex_cost_mat_3} \end{equation}$ (16)where M is a matrix of size np² × d²p². The SPRITE method constrains the minimization of this objective function using an analysis prior. Instead of using a single lagrangian multiplier as in Eq. (12), the analysis coefficients has individual weights κλ_i, where i is the coefficient index in the transform domain. This leads to the following formulation of the problem: $\underset{Δ}{\begin{matrix} \min \end{matrix}} J_{1} (Δ + x^{(0)}) + κ ∥ λ ⊙ ΦΔ ∥_{1},$ $\begin{equation} \underset{\bm{\Delta}}\min J_1(\bm{\Delta}+\vec{x}^{(0)}) +\kappa\|\bm{\lambda}\odot\bm{\Phi}\bm{\Delta}\|_1, \label{sr_sp1} \end{equation}$ (17)where Δ = x − x⁽⁰⁾ is defined as in 6, x⁽⁰⁾ is a first guess, and λ is now a vector of the same size as ΦΔ, ⊙ denoting the pointwise product.

Additionally, the PSF or equivalently the telescope optical impulse response is by definition a positive valued function (Thompson 1969). Therefore, we want the reconstruction x to have positive entries. This additional constraint is integrated as follows: $\underset{Δ}{\begin{matrix} \min \end{matrix}} J_{1} (Δ + x^{(0)}) + κ ∥ λ ⊙ ΦΔ ∥_{1} s . t . Δ ⩾ - x^{(0)},$ $\begin{equation} \underset{\bm{\Delta}}\min J_1(\bm{\Delta}+\vec{x}^{(0)}) +\kappa\|\bm{\lambda}\odot\bm{\Phi}\bm{\Delta}\|_1 \; {\rm s.t.} \bm{\Delta} \ge -\vec{x}^{(0)}, \label{sr_sp1p} \end{equation}$ (18)where $\hbox{$\ge$}$ is a pointwise inequality, or equivalently, $\underset{Δ}{\begin{matrix} \min \end{matrix}} J_{1} (Δ + x^{(0)}) + κ ∥ λ ⊙ ΦΔ ∥_{1} + 1_{𝒮_{x^{(0)}}} (Δ),$ $\begin{equation} \underset{\bm{\Delta}}\min J_1(\bm{\Delta}+\vec{x}^{(0)}) +\kappa\|\bm{\lambda}\odot\bm{\Phi}\bm{\Delta}\|_1 + \mathbb{1}_{\mathcal{S}_{\vec{x}^{(0)}}}(\bm{\Delta}), \label{sr_sp1p2} \end{equation}$ (19)where $\hbox{$\mathcal{S}_{\vec{x}^{(0)}}$}$ is the set of vectors t ∈ ℝ^{d² p²} satisfying the pointwise inequality $\hbox{$\vec{t} \ge -\vec{x}^{(0)}$}$ and _{𝒮_x⁽⁰⁾} is its indicator function (see Appendix B). The impact of this constraint is emphasized in Appendix C.

As we show in the Sect. 3.4.3, the choice of the parameters λ and κ relies on the noise expected on the analysis coefficients of the solution estimate. The choice of a vector regularization parameter rather than a single scalar is precisely motivated by the fact that this noise might be non-stationary.

As stated before, the problem 17 is not equivalent to its synthesis version, if the dictionary Φ is redundant. The synthesis prior is expected to be efficient if the desired solution can be accurately written as a sparse linear combination of the chosen dictionary atoms, which we cannot assume to be true for every PSF profiles. In contrast, the analysis prior appears to be more flexible. Moreover, in the cases where the problem would be ill-conditioned or underdetermined, the analysis prior would definitely be more suitable since it involves far less variables.

A similar l₁ penalty has already been applied for SR. An example may be found in Yamagishi et al. (2012), where the cost function is minimized using variants of the alternating direction method of multipliers (ADMM). Moreover, advantages of such approaches over quadratic regularizers have been shown in many related problems in image and signal processing.

The use of the l₁ norm as a relaxation for an l₀ penalty has a well-known drawback, which is that it tends to bias the solution. Indeed with a l₁ norm penalty, the problem resolution involves soft thresholding operations, which affect both weak and strong entries unlike a hard thresholding, which would only affect the weak and therefore unwanted entries. Formal definitions of soft and hard thresholding are given in Appendix B.

This is particularly unsuitable for scientific data analysis. The reweighting l₁ minimization proposed in Candès et al. (2008) is one way to tackle this issue, while staying in the proof of convergence sets. Indeed, it consists of solving a succession of l₁ minimization problems of the form $\underset{Δ}{\begin{matrix} \min \end{matrix}} J_{1} (Δ + x^{(0)}) + κ ∥ w^{(k)} ⊙ λ ⊙ ΦΔ ∥_{1} + 1_{𝒮_{x^{(0)}}} (Δ),$ $\begin{equation} \underset{\bm{\Delta}}\min J_1(\bm{\Delta}+\vec{x}^{(0)}) +\kappa\|\vec{w}^{(k)}\odot\bm{\lambda}\odot\bm{\Phi}\bm{\Delta}\|_1 + \mathbb{1}_{\mathcal{S}_{\vec{x}^{(0)}}}(\bm{\Delta}), \label{sr_sp_rw} \end{equation}$ (20)where w^(k) is a weighting vector for the transform coefficients at the kth minimization. Each entry of w^(k) is calculated as a decreasing function of the corresponding transform coefficient magnitude in the (k − 1)th minimization. This way, the strong transform coefficients are less penalized than the weaker ones in the new minimization. One may refer to Appendix D for quantitative study of the reweighting effect.

3.3. Algorithm

In the proposed method, the reweighting scheme is performed according to Candès et al. (2008):

1.
Set k = 0, for each entry in the weighting vector w^(k), set $w_{j}^{(k)} = 1$ $\hbox{$w_j^{(k)} = 1$}$ .
2.
Solve the problem 20 yielding a solution Δ^(k).
3.
Compute α^(k) = ΦΔ^(k).
4.
Update the weight vector according to $w_{j}^{(k + 1)} = \frac{1}{1 + | α_{j}^{(k)} | / 3 σ_{j}}$ $\hbox{$w_j^{(k+1)} = \frac{1}{1+|\alpha_j^{(k)}|/3\sigma_j}$}$ , where σ_j is the noise standard deviation expected at the jth transform coefficient, see Sect. 3.4.3.
5.
Terminated on convergence or when reaching the maximum number of iterations; otherwise, go to step 2.

The step 2 resolution is detailed in Algorithm 1. We use the generalized forward-backward splitting introduced in Raguet et al. (2011). It requires the computation of proximity operators associated with the regularization functions in 20. One may refer to Appendix B for an introduction to proximal calculus.

Since the dictionary Φ is redundant, we do not have a closed-form expression for the proximity operator. Yet, it can be calculated as ${\begin{matrix} {prox}_{\frac{μ}{ω_{1}} κ ∥ diag (w^{(k)} ⊙ λ) Φ ∥_{1}} (x) = x - Φ^{T} û \\ û = {argmin}_{| u_{j} | < \frac{μ}{ω_{1}} κ w_{j}^{(k)} λ_{j}} \frac{1}{2} ∥ x - Φ^{T} u ∥_{2}^{2} \end{matrix}$ $\begin{equation} \left\lbrace \begin{matrix}\prox_{\frac{\mu}{\omega_1}\kappa\|\diag{(\vec{w}^{(k)}\odot\bm{\lambda})}\bm{\Phi}\|_1}{(\vec{x})} = \vec{x} - \bm{\Phi}^{\rm T}\hat{\vec{u}} \\ \hat{\vec{u}} = \argmin_{|u_j| < \frac{\mu}{\omega_1}\kappa w_j^{(k)} \lambda_j} \frac{1}{2}\|\vec{x}-\bm{\Phi}^{\rm T}\vec{u}\|_2^2 \end{matrix}\right.\, \label{prox_anl} \end{equation}$ (21)where û can be estimated using a forward-backward algorithm (Bauschke et al. 2011) as follows:

1.
Set p = 0, initialize u₀ = 0.
2.
$\begin{matrix} _{˜} \\ u_{p + 1} \end{matrix} = u_{p} + μ_{prox} Φ^{(} x - Φ^{T} u_{p}^{)}$ $\hbox{$\tilde{\vec{u}}_{p+1} = \vec{u}_p +\mu_{\prox}\bm{\Phi}\left(\vec{x}-\bm{\Phi}^{\rm T}\vec{u}_{p}\right)$}$ .
3.
$u_{p + 1} = \begin{matrix} _{˜} \\ u_{p + 1} \end{matrix} - {SoftThresh}_{\frac{μ}{ω_{1}} κ w^{(k)} ⊙ λ} \begin{matrix} _{˜} \\ u_{p + 1} \end{matrix}$ $\hbox{$\vec{u}_{p+1} = \tilde{\vec{u}}_{p+1} - \ST_{\frac{\mu}{\omega_1}\kappa\vec{w}^{(k)}\odot\bm{\lambda}}\tilde{\vec{u}}_{p+1}$}$ .
4.
Terminate on convergence or when reaching the maximum number of iterations, otherwise go to step 2.

The thresholding operator SoftThresh is defined in Appendix B. The operator ${prox}_{\frac{μ}{ω_{2}} 1_{𝒮_{x^{(0)}}}}$ $\hbox{$\prox_{\frac{\mu}{\omega_2}\mathbb{1}_{\mathcal{S}_{\vec{x}^{(0)}}}}$}$ is simply the orthogonal projector onto the set $\hbox{$\mathcal{S}_{\vec{x}^{(0)}}$}$ defined in the previous section; it is given explicitly Appendix B.

A full description of SPRITE is provided in Algorithm 2.

3.4. Parameter estimation

The data fidelity term in the problem 20 is defined as ${\begin{matrix} J_{1} (x) = \frac{1}{2} \sum_{k = 1}^{n} \sum_{i = 1}^{p} \sum_{j = 1}^{p} \frac{(y_{k,ij} - f_{k} ŷ k,ij)^{2}}{σ_{k}^{2}} \\ ŷ k,ij = \sum_{l} \sum_{m} h [l - d (i - i_{k}),m - d (j - j_{k})] x_{ij} \end{matrix},$ $\begin{equation} \left\lbrace \begin{matrix}J_1(\vec{x}) = \frac{1}{2}\sum_{k=1}^n\sum_{i=1}^p\sum_{j=1}^p\frac{(y_{k,ij}-f_k \hat{y}_{k,ij})^2}{\sigma_k^2} \\ \hat{y}_{k,ij} = \sum_l\sum_m h\left[ l-d(i-i_k),m-d(j-j_k)\right]x_{ij} \end{matrix}\right., \label{sprite_cost} \end{equation}$ (22)where $\hbox{$\vec{x} = (x_{ij})_{1 \leq i,j \leq dp}$}$ is the desired image and h(.,.) is a 2D Lanczos kernel. As stated in Sect. 3.2, it can be written as $J_{1} (x) = \frac{1}{2} ∥ z - M x ∥_{2}^{2},$ $\begin{equation} J_1(\vec{x}) = \frac{1}{2}\|\vec{z}-\mathbf{M}\vec{x}\|_2^2, \label{psfex_cost_mat_3bis} \end{equation}$ (23)if we write x in lexicographic order as a vector. The following parameters are required:

for the data fidelity term parameters the noise standard deviationin the LR images σ_k, the photometric flux f_k, and the shift parameters (i_k,j_k);
a first guess x⁽⁰⁾ (see problem 20);
the sparsity constraint parameters κ, λ = (λ_i)_i and the dictionary Φ;
algorithmic parameter such as the gradient step size μ, the relaxation parameter λ in Algorithm 1, and the gradient step size μ_prox in Eq. (21) resolution.

3.4.1. Data fidelity parameters

Noise standard deviations

At the first step of the algorithm, the noise standard deviations in the low resolution images can be robustly estimated using the median absolute deviation (MAD) estimator (Starck et al. 2010).

Subpixel shifts

The subpixel shifts between the images are estimated based on the low resolution images centroids positions. Those are calculated on the low resolution images after a hard thresholding operation. For the image x_i, the threshold is chosen as $k = \begin{matrix} \min \end{matrix} (4 σ_{i}, (\frac{\max (| x_{i} |)}{σ_{i}} - 1) σ_{i}) .$ $\begin{equation} k = \min\left(4\sigma_i,\left(\frac{\max(|\vec{x}_i|)}{\sigma_i}-1\right)\sigma_i\right). \end{equation}$ (24)In this way, we only keep pixels with a high S/N for the centroid estimation. We then estimate the centroid positions using the iteratively weighted algorithm introduced in Baker & Moallem (2007). The thresholding operation undoubtedly biases the estimated centroid position, but the resulting estimated shifts are expected to be unbiased, up to the finite sampling and noise effects.

Photometric flux

The flux parameters are calculated by integrating the low resolution images on a fixed circular aperture centered on their centroids estimates. At Euclid resolution (see Sect. 4), we obtained quite accurate flux estimates in simulations using a radius of 3 pixels for the aperture. These parameters define the matrix M in Eq. (23).

All these parameters are automatically calculated without requiring any user input.

3.4.2. First guess computation

As one can see in the Algorithm 2, the final image is computed as $x̂ = Δ^{(K_{\max} - 1)} + x^{(0)} .$ $\begin{equation} \hat{\vec{x}} = \bm{\Delta}^{(K_{\max}-1)}+\vec{x}^{(0)}. \end{equation}$ (25)This implies that the noise and any artifact in x⁽⁰⁾ which does not have a sparse decomposition in Φ will be present in the final solution. Therefore, one has to be careful at this step. To do so, we compute a noisy first guess $x_{n}^{(0)}$ $\hbox{$\vec{x}_n^{(0)}$}$ using a shift-and-add, as presented in Sect. 2.3.1. Then we apply a wavelet denoising to $x_{n}^{(0)}$ $\hbox{$\vec{x}_n^{(0)}$}$ . In other terms, we transform $x_{n}^{(0)}$ $\hbox{$\vec{x}_n^{(0)}$}$ in a “sparsifying” wavelet dictionary W. We threshold each wavelet scale in such a way to keep only the coefficients above the noise level expected in the scale. Finally, we apply a reconstruction operator, which is a dictionary $\begin{matrix} 􏽢 \\ W \end{matrix}$ $\hbox{$\widehat{\mathbf{W}}$}$ verifying $\begin{matrix} _{􏽢} \\ W \end{matrix} W = I$ $\hbox{$\widehat{\mathbf{W}}\mathbf{W} = \Id$}$ to the thresholded coefficients (see Starck et al. 2010). We note β = (β_i)_i, a vector made of the denoising thresholds for each wavelet scale. We set β_i at 5σ_i, where σ_i is the noise standard deviation in the ith wavelet scale. The first guess is finally computed as $x^{(0)} = \begin{matrix} 􏽢 \\ W \end{matrix} {HardThresh}_{β} W x_{n}^{(0)},$ $\begin{equation} \label{mod_den} \vec{x}^{(0)} = \widehat{\mathbf{W}}\HT_{\bm{\beta}}\mathbf{W}\vec{x}_n^{(0)}, \end{equation}$ (26)which robustly removes the noise without breaking important features. One can refer to Appendix A for (σ_i)_i estimation.

3.4.3. The choice of dictionary and regularization parameter

Regularization parameter

The regularization parameter κ can be set, according to a desired level of significance. Indeed, it can be seen that the transform domain vector û is constrained into weighted l^∞-ball of radius μκ in Eq. (21) and can be interpreted as the non-significant part of the wanted signal current estimate. To set this radius according to the expected level of noise for each transform coefficient, we propagate the noise on the data vector z from Eq. (23) through μΦM^TM and estimate its standard deviation at each transform coefficient, which sets the parameters λ_j. In practice, this can be done in two ways. We can either run a Monte-Carlo simulation of the noise in z and take the empirical variance of the sets of realizations of each transform coefficient. On the other hand, if Φ is a wavelet dictionary and if the noise is expected to be stationary in each wavelet scale, then we only need to compute a single standard deviation per scale. This can be done by estimating the noise in each scale of the wavelet transform of the gradient at each iteration (up to the factor μ) using a MAD, for instance. Indeed, the residual z − M(d_n + x⁽⁰⁾) tend to be consistent with the noise in z, so that it can be used as a noise realization. With a stationary noise in each input image, the two approaches give very close estimates of the noise standard deviation and the second one is far less demanding in terms of complexity. As a result, coefficients below κλ_j are considered as part of the noise and one only needs to set the global parameter κ to tune the sparsity constraint according to the noise level.

Dictionary

The choice of the dictionary impacts the performance of the algorithm. We considered two transforms: a biorthogonal undecimated wavelet transform with a 7/9 filter bank and the second generation starlet transform (Starck et al. 2011). These two transforms are generic and not specifically tuned to a given PSF profile.

3.4.4. Algorithmic parameters

Gradient steps sizes

The gradient step size μ in Algorithm 1 needs to be chosen just in $] 0, 2 / ρ (M^{T} M)^{[}$ $\hbox{$\left]0,2/\rho(\mathbf{M}^{\rm T}\mathbf{M})\right[$}$ , where ρ(.) denotes the spectral radius of a square matrix. In the same way, μ_prox needs to be chosen in $] 0, 2 / ρ (Φ Φ^{T})^{[}$ $\hbox{$\left]0,2/\rho(\bm{\Phi}\bm{\Phi}^{\rm T})\right[$}$ .

Relaxation parameter

The parameter λ in Algorithm 1 needs to be chosen in $] 0, \min (\frac{3}{2}, \frac{1 + 2 / ρ (M^{T} M) μ}{2}) [$ $\hbox{$\left]0,\min\left(\frac{3}{2},\frac{1+2/{\rho(\mathbf{M}^{\rm T}\mathbf{M})\mu}}{2}\right)\right[$}$ (Raguet et al. 2011). This parameter tunes the updating speed of the auxiliary variables in Algorithm 1. In practice, we use μ = 1 /ρ(M^TM) and λ = 1.4.

3.4.5. User parameters

It is important to mention that the user only has to set the parameter κ and the dictionary with the other parameters being automatically estimated. In all our experiments, we took κ = 4, which is quite convenient, if we assume Gaussian noise in the data. The dictionary choice will be emphasized in the next section.

4. Numerical experiments

This section presents the data used, the numerical experiments realized as a mean to compare three SR techniques (shift-and-add, PSFEx method, and our method) and the results.

4.1. Data set

The PSFs provided are optical PSFs computed using a fast Fourier transform of the exit pupil. They are not a system PSF, so they do not include a jitter or detector response. A set of PSFs covering the whole field of view is provided. They are monochromatic PSFs at 800 nm and are derived from tolerance analysis. They account for manufacturing and alignments errors and thermal stability of the telescope. Manufacturing and alignment errors are partially compensated by a best focus optimization, while thermal stability effects are simulated by a small displacement of the optics that are not compensated on a short-time scale. The optical model used is dated from 2011 and is prior to the current reference model (provided by Astrium, which has been awarded the payload module contract in 2013). In particular, the 2011 model does not contain the latest definition of the pupil mask. The pupil, however, includes central obscuration and a three-vane spider. This is the model that has been used for the science feasibility studies that led to the acceptance of the Euclid mission.

Fig. 5

Critically sampled PSF.

Fig. 6

PSF sampled at Euclid resolution with different offsets and noise.

4.2. Simulation

In the Euclid mission, the actual sampling frequency is about 0.688 times the Nyquist frequency that we define as twice the telescope spatial cut-off frequency (Cropper et al. 2013). Therefore we target an upsampling factor of 2, which gives a sufficient bandpass to recover the high frequencies. The PSF is typically space-varying, and this is particularly true for wide field of view instruments as Euclid telescope (ESA/SRE 2011). Thus, the data set contains simulated PSF measurements on a regular 18 × 18 grid on the field of view. The original PSF models are downsampled to twice Euclid resolution. For each PSF, four randomly shifted “copies” are generated and downsampled to Euclid resolution (see Figs. 5 and 6 below). These LR images are of size 84 × 84. Different levels of WGN are added. We define the signal level as its empirical variance that is calculated in a 50 × 50 patch centered on the HR PSF main lobe. For each algorithm, we used these four images to reconstruct a PSF which has twice their resolutions in lines and columns.

4.3. Quality criterion

For an image X = (x_ij)_i,j, the weighted central moments are defined as $μ_{p,q} (X) = \sum_{i} \sum_{j} (i - i_{c})^{p} (j - j_{c})^{p} f_{ij} x_{ij}$ $\begin{equation} \mu_{p,q}(\mathbf{X}) = \sum_i\sum_j(i-i_{\rm c})^p(j-j_{\rm c})^p f_{ij}x_{ij} \end{equation}$ (27)with (p,q) ∈ℕ², (i_c,j_c) are the weighted image centroid coordinates, and F = (f_ij)_i,j is an appropriate weighting function (typically a Gaussian function). The ellipticity parameters are then defined as follows: The vector ϵ = [e₁,e₂] is an important tool, since if measured on a large set of galaxies, it can be statistically related to the dark matter induced geometrical distortions and finally its mass density. Furthermore, this ellipticity parameters are magnitude invariant and approximately shift invariant. The error on ellipticity is therefore an interesting criteria for quality assessment. Thus, we used the mean absolute error for each ellipticity parameter, $ℰ_{j} = \frac{1}{n} \sum_{i = 1}^{n} | e_{j} (X_{i}) - e_{j} ({X̂}_{i}) |,j = 1, 2$ $\begin{equation} \mathcal{E}_j = \frac{1}{n}\sum_{i=1}^n |e_j(\mathbf{X}_\mathbf{i})-e_j(\mathbf{\hat{X}}_\mathbf{i})|, \; j=1,2 \label{ell_err} \end{equation}$ (28)and the associated empirical standard deviations.

Moreover, the PSF size is also an important characteristic of the PSF kernel. For example, it has been shown in Paulin-Henriksson et al. (2008) that the PSF size largely contributes to the systematic error in weak gravitational lensing surveys. Therefore, we use it in quality assessment by computing the mean absolute error on the FWHM. The FWHM is estimated by fitting a modified Lorentzian function on the PSF images. We used routines from a publicly available library³.

4.4. Results and discussion

Figures 5 and 6 show a simulated PSF that is sampled at almost the Nyquist rate and the two LR shifted and noisy PSF derive, with S/N of around 30dB.

Figure 7 shows an example of super-resolved PSF at 30dB of S/N, from four LR images and the corresponding error maps that are defined as the absolute value of the difference between the original HR noise free PSF and the PSF reconstructions for each algorithm. This error map standard deviation is at least 30% lower with SPRITE.

Fig. 7

PSF reconstruction and error map at 30dB for three methods: from the left to the right, shift-and-add, PSFEx, and SPRITE. The error image standard deviation is at least 30% smaller with SPRITE.

Fig. 8

Errors in log on ellipticity parameters versus the S/N. For S/N = 10 dB, SPRITE achieves around 6 dB less than others methods, which corresponds to a factor of e⁶ on a linear scale.

Figure 8 shows smaller errors and errors dispersions are achieved with SPRITE algorithm, especially at low S/N. One can note that the dispersion is slightly smaller with biorthogonal undecimated wavelet. The error on the FWHM given in percent in Fig. 9 is smaller with SPRITE. In practice, there is more variability in the PSF (wavelength and spatial dependency, time variations...) so that the real problem will be more underdetermined. Thanks to the multiple exposures on the one hand, and that the spatial variations of the PSF are expected to be slow on the other hand, the real problem could actually be very well constrained. Moreover, these results suggest that even better results could be achieved by using more adapted dictionaries, built either from PSF model or through a dictionary learning algorithm (Beckouche et al. 2013).

5. Complexity and performances

The simulations were run on a typical desktop computer. Let us suppose that we have n LR images of sizes p₁ × p₂ and that we choose an upsampling factor d in lines and columns. As stated before, we took p₁ = p₂ = 84, n = 4, and d = 2 in our numerical experiments. Under this setting, it takes roughly 60 s and 1 GB of physical memory to compute a super-resolved PSF. More generally, the computational complexity of the algorithm is in O(np₁p₂d²log (p₁p₂d²)), which is related to the implementation of the matrices M from Eq. (16) and M^T using FFT.

Fig. 9

Mean absolute error on the full width at half maximum (FWHM) in percent. SPRITE achieves on average an error of 6% on the FWHM which is 2% less than PSFEX in average.

6. Software

Following the philosophy of reproducible research (Buckheit & Donoho 1995), the algorithm introduced in this paper and the data used are available online⁴. We used the following calls for the SPRITE executable:

run_sprite -t 2 -s 4 -r 2 -F -N data_file output_fileoutput_directory for the second genration Starlet transform;
run_sprite -t 24 -s 4 -r 2 -F -N data_file output_file output_directory for the undecimated biorthogonal wavelet transform.

The options “-s” and “-r” set the parameter κ (see Sect. 3.4.3) and the upsampling factor for both lines and columns respectively. The options “-F” and “-N” indicate that the photometric flux and the noise might have different levels in the LR images and need to be estimated.

7. Conclusion

We introduced SPRITE, which is a super-resolution algorithm based on sparse regularization. We show that adding a sparse penalty in the recovery leads to far better accuracy in terms of ellipticity error, especially at low S/N.

Quantitatively, we achieved

a 30% lower error on the reconstruction itself at 30dBof S/N;
around 6dB less than other methods on the shape parameters, which corresponds to a factor of e⁶ on a linear scale, at 10 dB of S/N;
6%of error on the FWHM in average, which 2% less than PSFEX.

However this algorithm does not handle the PSF spatial variations. Thus one natural extension of this work would be to simultaneously perform SR and dimensionality reduction assuming only one LR version of each PSF, as it is the case in practice strictly speaking, but by using a large PSF set. The PSF wavelength dependency would also be an interesting aspect to investigate.

¹

http://terapix.iap.fr/cplt/T0007/doc/T0007-doc.html

²

These codes and the associated documentation may be found on the website http://www.astromatic.net/.

³

http://www.astro.washington.edu/docs/idl/htmlhelp/slibrary21.html

⁴

http://www.cosmostat.org/sprite.html

References

Baker, K. L., & Moallem, M. M. 2007, Optics Express, 15, 5147 [NASA ADS] [CrossRef] [Google Scholar]
Bauschke, H. H., Burachik, R. S., Combettes, P. L., et al. 2011, Fixed-Point Algorithms for Inverse Problems in Science and Engineering (Springer), 185 [Google Scholar]
Beckouche, S.,Starck, J. L., &Fadili, J. 2013, A&A, 556, A132 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Blumensath, T., Yaghoovi, M., & Davies, M. E. 2007, in IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA [Google Scholar]
Buckheit, J., & Donoho, D. 1995, in Wavelets and Statistics, eds. A. Antoniadis, & G. Oppenheim (New York: Springer), Lect. Notes Stat., 103, 55 [Google Scholar]
Candès, E.,Wakin, M., &Boyd, S. 2008, J. Fourier Analysis and Applications, 14, 877 [CrossRef] [Google Scholar]
Chen, S.,Donoho, D., &Saunders, M. 2001, SIAM Rev., 43, 129 [Google Scholar]
Combettes, P. L., &Wadjs, V. R. 2005, Simul., 4, 1168 [Google Scholar]
Cropper, M.,Hoekstra, H.,Kitching, T., et al. 2013, MNRAS, 431, 3103 [NASA ADS] [CrossRef] [Google Scholar]
Elad, M., & hel Or, Y. 2001, IEEE Transactions on Image Processing, 10, 1187 [Google Scholar]
ESA/SRE. 2011, EUCLID Mapping the geometry of the dark universe, Tech. Rep., ESA [Google Scholar]
Gray, R. M. 2006, Foundations and Trends^® in Communications and Information Theory, 2, 155 [Google Scholar]
Leonard, A.,Lanusse, F., &Starck, J.-L. 2014, MNRAS, 440, 1281 [NASA ADS] [CrossRef] [Google Scholar]
Mohr, J. J., Armstrong, R., Bertin, E., et al. 2012, in SPIE Conf. Ser., 8451 [Google Scholar]
Park, S. C.,Park, M. K., &Kang, M. G. 2003, IEEE Signal Processing Magazine, 3, 1053 [Google Scholar]
Paulin-Henriksson, S.,Amara, A.,Voigt, L.,Refregier, A., &Bridle, S. L. 2008, A&A, 484, 67 [Google Scholar]
Raguet, H.,Fadili, J., &Peyré, G. 2011, SIAM J. Imag. Sci., 6, 1199 [Google Scholar]
Rodgers, J. L., &Nicewander, A. W. 1988, The American Statistician, 42, 59 [Google Scholar]
Rowe, B., Hirata, C., & Rhodes, J. 2011 [arXiv:1105.2852v2] [Google Scholar]
Shannon, C. 1949, Proc. IRE, 37, 10 [NASA ADS] [CrossRef] [Google Scholar]
Starck, J.-L., Murtagh, F., & Fadili, J. 2010, Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity (New York, NY, USA: Cambridge University Press) [Google Scholar]
Starck, J.-L., Murtagh, F., & Bertero, M. 2011, in Handbook of Mathematical Methods in Imaging, ed. O. Scherzer (New York: Springer), 1489 [Google Scholar]
Thompson, B. J. 1969, Science, 164, 170 [CrossRef] [Google Scholar]
Yamagishi, M.,Ono, S., &Yamada, I. 2012, CAS, 111, 465, 49 [Google Scholar]

Appendix A: First guess noise

We keep the same notations as in Sect. 3.4.2. We note x_wi as the ith scale of $x_{n}^{(0)}$ $\hbox{$\vec{x}_n^{(0)}$}$ wavelet transform in W. As the noise in $x_{n}^{(0)}$ $\hbox{$\vec{x}_n^{(0)}$}$ is correlated, the estimation of σ_i is not straightforward, so we proceed as follows:

1.
$σ_{i}^{0} = 1.4826 MAD (x_{wi})$ $\hbox{$\sigma_i^{0} = 1.4826\mad(\vec{x}_{wi})$}$
2.
$x̂_{wi} = \underset{x}{argmin} \frac{1}{2} ∥ x - x_{wi} ∥_{2}^{2} + k σ_{i}^{0} ∥ x ∥_{1}$ $\hbox{$\hat{\vec{x}}_{wi} = \underset{\vec{x}}\argmin \frac{1}{2}\|\vec{x}-\mathbf{x}_{wi}\|_2^2 + k\sigma_i^{0}\|\vec{x}\|_1$}$
3.
$σ_{i} = 1.4826 MAD (\begin{matrix} ^{􏽢} \\ x_{wi} \end{matrix} - x_{wi})$ $\hbox{$\sigma_i = 1.4826\mad(\widehat{\vec{x}}_{wi}-\vec{x}_{wi})$}$ .

The factor 1.4826 comes from the assumption that the noise is approximately Gaussian. Finally, the factor k must be sufficiently high so that all the noise will remain in the residual $\begin{matrix} ^{􏽢} \\ x_{wi} \end{matrix} - x_{wi}$ $\hbox{$\widehat{\vec{x}}_{wi}-\vec{x}_{wi}$}$ . We took k = 5. The minimization in step 2 usually takes up to five iterations to converge. For the finer scales, $σ_{i}^{0}$ $\hbox{$\sigma_i^{0}$}$ is quite close to σ_i. But for coarser scales, $σ_{i}^{0}$ $\hbox{$\sigma_i^{0}$}$ is significantly overestimated (it might be more than 10 times greater than σ_i). We implicitly assumed that the noise in the wavelet scales is stationary, which is reasonable apart from the edges effects due to the wavelet transform.

Appendix B: Proximal calculus

Let ℋ be a finite-dimensional Hilbert space (typically a real vector space) equipped with the inner product ⟨ .,. ⟩ and associated with the norm ∥ . ∥. A real-valued function ℱ defined on ℋ is

proper if its domain, as defined by domℱ = { x ∈ ℋ / ℱ(x) < + ∞ }, is non-empty;
lower semicontinuous (LSC) if liminf_{x → x₀}ℱ(x) ≥ ℱ(x₀).

We define Γ₀(ℋ) as the class of all proper LSC convex real-valued function defined on ℋ.

Fig. C.1

Errors in log scale on ellipticity parameters versus the S/N. The positivity constraint significantly improves the accuracy.

Moreau (1962) introduced the notion of proximity operator as a generalization of a convex projection operator. Let ℱ ∈ Γ₀(ℋ). Then the function $y \to \frac{1}{2} ∥ α - y ∥^{2} + ℱ (y)$ $\hbox{$\vec{y} \rightarrow \frac{1}{2}\|\bm{\alpha}-\vec{y}\|^2+\mathcal{F}(\vec{y})$}$ achieves its minimum at a unique point denoted by prox_ℱ(α), (∀α ∈ ℋ). The operator prox_ℱ is the proximity operator of ℱ. The indicator function of a closed convex subset $\hbox{$\mathcal{C}$}$ of ℋ is the function defined on ℋ by $1_{𝒞} (x) = {\begin{matrix} 0, if x \in 𝒞 \\ + \infty, otherwise . \end{matrix}$ $\appendix \setcounter{section}{2} \begin{equation} \mathbb{1}_{\mathcal{C}}(\vec{x}) = \left\lbrace \begin{matrix} 0, \; \text{if} \; \vec{x} \in \mathcal{C} \\ +\infty, \; \text{otherwise}. \end{matrix}\right.\ \end{equation}$ (B.1)It is clear from the definitions that the proximity operator of _𝒞 is the orthogonal projector onto $\hbox{$\mathcal{C}$}$ . Thus, for $\hbox{$\mathcal{C} = \mathcal{S}_{\vec{x}^{(0)}}$}$ , which is defined in Sect. 3.2, and for x ∈ℝ^{d² p²} , we have, ${prox}_{1_{𝒮_{x^{(0)}}}} (x) = (\begin{matrix} \max \end{matrix} (x_{i}, - x_{i}^{(0)}))_{1 ⩽ i ⩽ d^{2} p^{2}} .$ $\appendix \setcounter{section}{2} \begin{equation} \prox_{\mathbb{1}_{\mathcal{S}_{\vec{x}^{(0)}}}}(\vec{x}) = (\max(x_i,-x^{(0)}_i))_{1\leq i \leq d^2p^2}. \end{equation}$ (B.2)Now we suppose that ℋ =ℝ^p, and we want to solve $\underset{α \in R^{p}}{\begin{matrix} \min \end{matrix}} ℱ_{1} (α) + ℱ_{2} (α)$ $\appendix \setcounter{section}{2} \begin{equation} \underset{\bm{\alpha} \in \mathbb{R}^p }\min \mathcal{F}_1(\bm{\alpha}) + \mathcal{F}_2(\bm{\alpha}) \label{GenConv} \end{equation}$ (B.3)where ℱ₁,ℱ₂ ∈ Γ₀(ℝ^p). Many problems in signal and image processing may be formulated this way, where ℱ₁ would be the data attachment function and ℱ₂ would constrain this solution based on prior knowledges. It has been shown (Combettes & Wadjs 2005) that if ℱ₁ is differentiable with a β-Lipschitz continuous gradient, then the problem (B.3) admits at least one solution and that its solutions, for γ> 0, verify the fixed point equation, $x = {prox}_{γ ℱ_{2}} (x - γ \nabla ℱ_{1} (x)) .$ $\appendix \setcounter{section}{2} \begin{equation} \vec{x} = \prox_{\gamma\mathcal{F}_2}(\vec{x} - \gamma\nabla\mathcal{F}_1(\vec{x})). \end{equation}$ (B.4)This suggests the following iterative scheme, $x_{n + 1} = {prox}_{γ_{n} ℱ_{2}} (x_{n} - γ_{n} \nabla ℱ_{1} (x_{n})),$ $\appendix \setcounter{section}{2} \begin{equation} \vec{x}_{n+1} = \prox_{\gamma_n\mathcal{F}_2}(\vec{x}_n - \gamma_n\nabla\mathcal{F}_1(\vec{x}_n)), \end{equation}$ (B.5)for appropriate values of the parameter γ_n. This type of scheme is known as forward-backward (FB) algorithm: a forward gradient step using ℱ₁ and a backward step involving ℱ₂ through its proximity operator. Some examples of FB algorithms that have been shown to converge to a solution of (B.3) can be found in Bauschke et al. (2011). A popular example of proximity operator is the one associated with ℱ = λ ∥ . ∥ ₁ with λ ∈ℝ: $\begin{matrix} {prox}_{λ ∥ . ∥_{1}} (α) & = & {SoftThresh}_{λ} (α) \\ = & {({(1 - \frac{λ}{| α_{i} |})}_{+} α_{i})}_{1 ⩽ i ⩽ p}, \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray} \prox_{\lambda\|.\|_1}(\bm{\alpha}) &=& \ST_\lambda(\bm{\alpha}) \nonumber \\ &=& \left(\left(1 - \frac{\lambda}{|\alpha_i|}\right)_+\alpha_i\right)_{1 \leq i \leq p}, \end{eqnarray}$ (B.6)where p is the dimension of ℋ, α_i the components of alpha in the basis associated with ∥ . ∥ _l1 and (.)₊ = max(.,0). One may refer to Bauschke et al. (2011) for proximity operator properties and examples. For $\hbox{$\bm{\lambda} = (\lambda_i)_{1 \leq i \leq p}$}$ , the proximity operator associated with the weighted l₁ norm ∥ diag(λ)(.) ∥ ₁ is given by $\begin{matrix} {prox}_{∥ diag (λ) (.) ∥_{1}} (α) & = & {SoftThresh}_{λ} (α) \\ = & {({(1 - \frac{λ_{i}}{| α_{i} |})}_{+} α_{i})}_{1 ⩽ i ⩽ p} . \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray} \prox_{\|\diag(\bm{\lambda})(.)\|_{1}}(\bm{\alpha}) &=& \ST_{\bm{\lambda}}(\bm{\alpha}) \nonumber \\ &=& \left(\left(1 - \frac{\lambda_i}{|\alpha_i|}\right)_+\alpha_i\right)_{1 \leq i \leq p}. \end{eqnarray}$ (B.7)The hard thresholding operator defined as $\begin{matrix} {HardThresh}_{λ} (α) & = & (\begin{matrix} ˜ \\ α_{i} \end{matrix})_{1 ⩽ i ⩽ p}, \\ with \begin{matrix} ˜ \\ α_{i} \end{matrix} & = & {\begin{matrix} \end{matrix} \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray} \HT_{\bm{\lambda}}(\bm{\alpha}) &=& (\tilde{\alpha}_i)_{1 \leq i \leq p}, \nonumber \\ \text{with} \; \tilde{\alpha}_i &=& \left\{ \begin{array}{ll} \alpha_i & \mbox{if} \; |\alpha_i|\geq \lambda_i \\ 0 & \mbox{else.} \end{array} \right. \end{eqnarray}$ (B.8)is often used instead in practice. See Blumensath et al. (2007) for more insight on the hard thresholding behavior in terms of convergence.

Appendix C: Positivity constraint

Fig. C.2

Map of negative values in the PSF reconstruction map (in absolute value) at 15 dB. On the left, SPRITE has a positivity constraint; on the right, SPRITE without positivity constraint.

We can drop the positivity constraint by simply solving $\underset{Δ}{\begin{matrix} \min \end{matrix}} J_{1} (Δ + x^{(0)}) + κ ∥ w^{(k)} ⊙ λ ⊙ ΦΔ ∥_{1}$ $\appendix \setcounter{section}{3} \begin{equation} \underset{\bm{\Delta}}\min J_1(\bm{\Delta}+\vec{x}^{(0)}) +\kappa\|\vec{w}^{(k)}\odot\bm{\lambda}\odot\bm{\Phi}\bm{\Delta}\|_1 \label{no_pos_eq} \end{equation}$ (C.1)at step 6 in Algorithm 2. We did a similar numerical experiment as the one presented in Sect. 4 to quantify the impact of this constraint. The comparison is given in Fig. C.1. One can see that the positivity constraint is actually important from the point view of the ellipticity parameters at low S/N. This result is illustrated in Fig. C.2. With the positivity constraint, the reconstruction is less influenced by the negative oscillations in the data, due to noise; thus it yields a better robustness. Even if the negative residual values in the final PSF are 1000 order of magnitude smaller than the peak value, it is sufficient to considerably bias the ellipticity measurements.

Appendix D: Reweighting

As stated in Sect. 3.2, the reweighting scheme used in SPRITE is meant to mitigate the bias due to the l₁ norm penalty. To verify this, we basically did the same as for the positivity in the previous section. Thus, we ran Algorithm 2 with K_max = 1 and K_max = 2 and we compute in each case the mean correlation coefficient in Pearson sense (Rodgers & Nicewander 1988)

Fig. D.1

Mean correlation coefficients (× 100) between the SPRITE PSF reconstructions and the reference images versus the S/N; the reweighting reduces the global bias.

between the reference images and the reconstructions for different S/N. The result is given in Fig. D.1. As expected, the reweighting improves the correlation and consequently, reduces the global bias on the reconstruction.

All Figures

	Fig. 1 General observation model. See Sect. 2.2 for a detailed description.
In the text

	Fig. 2 Adapted observation model. This time we consider an isolated star as an input, and unlike in the general model, the output of the block “Blur” is the PSF.
In the text

	Fig. 3 SR general scheme. The deblurring does not apply to our case, since we want to precisely estimate the blur.
In the text

	Fig. 4 Simulated optical point-spread function.
In the text

	Fig. 5 Critically sampled PSF.
In the text

	Fig. 6 PSF sampled at Euclid resolution with different offsets and noise.
In the text

	Fig. 7 PSF reconstruction and error map at 30dB for three methods: from the left to the right, shift-and-add, PSFEx, and SPRITE. The error image standard deviation is at least 30% smaller with SPRITE.
In the text

	Fig. 8 Errors in log on ellipticity parameters versus the S/N. For S/N = 10 dB, SPRITE achieves around 6 dB less than others methods, which corresponds to a factor of e⁶ on a linear scale.
In the text

	Fig. 9 Mean absolute error on the full width at half maximum (FWHM) in percent. SPRITE achieves on average an error of 6% on the FWHM which is 2% less than PSFEX in average.
In the text

	Fig. C.1 Errors in log scale on ellipticity parameters versus the S/N. The positivity constraint significantly improves the accuracy.
In the text

	Fig. C.2 Map of negative values in the PSF reconstruction map (in absolute value) at 15 dB. On the left, SPRITE has a positivity constraint; on the right, SPRITE without positivity constraint.
In the text

	Fig. D.1 Mean correlation coefficients (× 100) between the SPRITE PSF reconstructions and the reference images versus the S/N; the reweighting reduces the global bias.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Baker, K. L., & Moallem, M. M. 2007, Optics Express, 15, 5147 [NASA ADS] [CrossRef] [Google Scholar]

[2] Bauschke, H. H., Burachik, R. S., Combettes, P. L., et al. 2011, Fixed-Point Algorithms for Inverse Problems in Science and Engineering (Springer), 185 [Google Scholar]

[3] Beckouche, S.,Starck, J. L., &Fadili, J. 2013, A&A, 556, A132 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[4] Blumensath, T., Yaghoovi, M., & Davies, M. E. 2007, in IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA [Google Scholar]

[5] Buckheit, J., & Donoho, D. 1995, in Wavelets and Statistics, eds. A. Antoniadis, & G. Oppenheim (New York: Springer), Lect. Notes Stat., 103, 55 [Google Scholar]

[6] Candès, E.,Wakin, M., &Boyd, S. 2008, J. Fourier Analysis and Applications, 14, 877 [CrossRef] [Google Scholar]

[7] Chen, S.,Donoho, D., &Saunders, M. 2001, SIAM Rev., 43, 129 [Google Scholar]

[8] Combettes, P. L., &Wadjs, V. R. 2005, Simul., 4, 1168 [Google Scholar]

[9] Cropper, M.,Hoekstra, H.,Kitching, T., et al. 2013, MNRAS, 431, 3103 [NASA ADS] [CrossRef] [Google Scholar]

[10] Elad, M., & hel Or, Y. 2001, IEEE Transactions on Image Processing, 10, 1187 [Google Scholar]

[11] ESA/SRE. 2011, EUCLID Mapping the geometry of the dark universe, Tech. Rep., ESA [Google Scholar]

[12] Gray, R. M. 2006, Foundations and Trends^® in Communications and Information Theory, 2, 155 [Google Scholar]

[13] Leonard, A.,Lanusse, F., &Starck, J.-L. 2014, MNRAS, 440, 1281 [NASA ADS] [CrossRef] [Google Scholar]

[14] Mohr, J. J., Armstrong, R., Bertin, E., et al. 2012, in SPIE Conf. Ser., 8451 [Google Scholar]

[15] Park, S. C.,Park, M. K., &Kang, M. G. 2003, IEEE Signal Processing Magazine, 3, 1053 [Google Scholar]

[16] Paulin-Henriksson, S.,Amara, A.,Voigt, L.,Refregier, A., &Bridle, S. L. 2008, A&A, 484, 67 [Google Scholar]

[17] Raguet, H.,Fadili, J., &Peyré, G. 2011, SIAM J. Imag. Sci., 6, 1199 [Google Scholar]

[18] Rodgers, J. L., &Nicewander, A. W. 1988, The American Statistician, 42, 59 [Google Scholar]

[19] Rowe, B., Hirata, C., & Rhodes, J. 2011 [arXiv:1105.2852v2] [Google Scholar]

[20] Shannon, C. 1949, Proc. IRE, 37, 10 [NASA ADS] [CrossRef] [Google Scholar]

[21] Starck, J.-L., Murtagh, F., & Fadili, J. 2010, Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity (New York, NY, USA: Cambridge University Press) [Google Scholar]

[22] Starck, J.-L., Murtagh, F., & Bertero, M. 2011, in Handbook of Mathematical Methods in Imaging, ed. O. Scherzer (New York: Springer), 1489 [Google Scholar]

[23] Thompson, B. J. 1969, Science, 164, 170 [CrossRef] [Google Scholar]

[24] Yamagishi, M.,Ono, S., &Yamada, I. 2012, CAS, 111, 465, 49 [Google Scholar]