Deep Learning for space-variant deconvolution in galaxy surveys

Deconvolution of large survey images with millions of galaxies requires to develop a new generation of methods which can take into account a space variant Point Spread Function and have to be at the same time accurate and fast. We investigate in this paper how Deep Learning could be used to perform this task. We employ a U-NET Deep Neural Network architecture to learn in a supervised setting parameters adapted for galaxy image processing and study two strategies for deconvolution. The first approach is a post-processing of a mere Tikhonov deconvolution with closed form solution and the second one is an iterative deconvolution framework based on the Alternating Direction Method of Multipliers (ADMM). Our numerical results based on GREAT3 simulations with realistic galaxy images and PSFs show that our two approaches outperforms standard techniques based on convex optimization, whether assessed in galaxy image reconstruction or shape recovery. The approach based on Tikhonov deconvolution leads to the most accurate results except for ellipticity errors at high signal to noise ratio where the ADMM approach performs slightly better, is also more computation-time efficient to process a large number of galaxies, and is therefore recommended in this scenario.


Introduction
Deconvolution of large galaxy survey images requires to take into account spatial-variation of the Point Spread Function (PSF) across the field of view. The PSF field is usually estimated beforehand, via parametric models and simulations as in Krist et al. (2011) or directly estimated from the (noisy) observations of stars in the field of view (Bertin 2011;Kuijken et al. 2015;Zuntz et al. 2018;Mboula et al. 2016;Schmitz et al. 2019). Even with the "perfect" knowledge of the PSF, this ill-posed deconvolution problem is challenging, in particular due to the size of the image to process. Starck et al. (2000) proposed an Object-Oriented Deconvolution, consisting in first detecting galaxies and then deconvolving each object independently. Following this idea, Farrens et al. (2017) introduced a space-variant deconvolution approach for galaxy images, based on two regularization strategies: using either a sparse prior in a transformed domain (Starck et al. 2015a) or trying to learn unsupervisedly a low-dimensional subspace for galaxy representation using a low-rank prior on the recovered galaxy images. Provided a sufficient number of galaxies are jointly processed (more than 1000) they found that the low-rank approach provided significantly lower ellipticity errors than sparsity, which illustrates the importance of learning adequate representations for galaxies. To go one step further in learning, supervised deep learning techniques taking profit of databases of galaxy images could be employed to learn complex mappings that could regularize our deconvolution problem. Deep convolutional architectures have also proved to be computationally efficient to process large number of images once the model has been learned, and are therefore promising in the context of modern galaxy surveys.
Deep Learning and Deconvolution: In the recent years, deep learning approaches have been proposed in a large number of inverse problems with high empirical success. Some potential explanations could lie on the expressivity of the deep architectures (e.g. the theoretical works for simple architecture in (Eldan & Shamir 2015;Safran & Shamir 2017;Petersen & Voigtlaender 2018)) as well as new architectures or new optimization strategies that increased the learning performance (for instance Kingma & Ba (2014); Ioffe & Szegedy (2015); He et al. (2016); Szegedy et al. (2016)). Their success also depend on the huge datasets collected in the different applications for training the networks, as well as the increased computing power available to process them. With the progress made on simulating realistic galaxies (based for instance on real Hubble Space Telescope (HST) images as inRowe et al. (2015); Mandelbaum et al. (2015)), deep learning techniques have therefore the potential to show the same success for deconvolution of galaxy images as in the other applications. Preliminary work have indeed shown that deep neural networks (DNN) can perform well for classical deconvolution of galaxy images (Flamary 2017;Schawinski et al. 2017).
This paper: we investigates two different strategies to interface deep learning techniques with space variant deconvolution approaches inspired from convex optimization. In section 2, we review deconvolution techniques based on convex optimization and deep learning schemes. The space variant deconvolution is presented in section 3 where the two proposed methods are described, the first one using a deep neural network (DNN) for post-processing of a Tikhonov deconvolution and the second one including a DNN trained for denoising in an iterative algorithm derived from convex optimization. The neural network architecture proposed for deconvolution is also presented in this section. The experiment settings are described in section 4 and the results presented in section 5. We conclude in section 6.

Deconvolution before Deep Learning
The standard deconvolution problem consists in solving the linear inverse problem Y = HX + N, where Y is the observed noisy data, X the unknown solution, H the matrix related to the PSF and N is the noise. Images Y, X and N are represented by a column vector of n p pixels arranged in lexicographic order, with n p being the total number of pixels, and H is a n p × n p matrix. State of the art deconvolution techniques typically solve this ill-posed inverse problem (i.e. with no unique and stable solution) through a modeling of the forward problem motivated from physics, and adding regularization penalty term R (X) which can be interpreted as enforcing some constraints on the solution. It leads to minimise: where || · || F is the Frobenius norm. The most simple (and historic) regularization corresponding is the Tikhonov regularization (Tikhonov & Arsenin 1977;Hunt 1972;Twomey 1963), where R (X) is a quadratic term, R (X) = λ||LX|| 2 F . The closed-form solution of this inverse problem is given by: which involves the Tikhonov linear filter H T H + λL T L −1 H T . The simplest version is when L = Id, which penalizes solutions with high energy. When the PSF is space invariant, the matrix H is block circulant and the inverse problem can then be written as a simple convolution product. It is also easy to see that the Wiener deconvolution corresponds to a specific case of the Tikhonov filter. See Bertero & Boccacci (1998) for more details. This rather crude deconvolution is illustrated in Fig. 1 in a low signal to noise ratio (SNR) scenario, displaying both oversmoothing of the galaxy image, loss of energy in the recovered galaxy and the presence of coloured noise due to the inverse filter.
Most advanced methods are non linear and generally involves iterative algorithms. There is a vast litterature in image processing on advanced regularization techniques applied to deconvolution: adding some prior information on X in a Bayesian paradigm (Bioucas-Dias 2006;Krishnan & Fergus 2009;Orieux et al. 2010) or assuming X to belong to some classes of images to recover (e.g. using total variation regularization (Oliveira et al. 2009;Cai et al. 2010), sparsity in fixed representations (Starck et al. 2003;Pesquet et al. 2009;Pustelnik et al. 2016) or learnt via dictionary learning (Mairal et al. 2008;Lou et al. 2011;Jia & Evans 2011)), by constraining the solution to belong to some convex subsets (such as ensuring the final galaxy image to be positive).
For instance, a very efficient approach used for galaxy image deconvolution is based on sparse recovery which consists in minimizing: where Φ is a matrix related to a fixed transform (i.e. Fouriers, wavelet, curvelets, etc) or that can be learned from the data or a training data set (Starck et al. 2015b). The 1 norm in the regularisation term is known to reinforce the sparsity of the solution, see Starck Article number, page 2 of 18 et al. (2015b) for a review on sparsity. Sparsity was found extremely efficient for different inverse problems in astrophysics such as Cosmic Microwave Background (CMB) estimation , compact sources estimation in CMB missions , weak lensing map recovery (Lanusse, F. et al. 2016) or radio-interferomety image reconstruction (Garsden et al. 2015). We will compare in this work our deconvolution techniques with such sparse deconvolution approach. Iterative convex optimization techniques have been devised to solve Eq.3 (see for instance Beck & Teboulle (2009) ;Zibulevsky & Elad (2010); Combettes & Pesquet (2011);Chambolle & Pock (2011);Afonso et al. (2011);Condat (2013); Combettes & Vu (2014)), with well-studied convergence properties, but with a high computing cost when using adaptive representation for galaxies. This problem opens the way to a new generation of methods.

Toward Deep Learning
Recently deep learning techniques have been proposed to solve inverse problems by taking benefit of the dataset collected and/or the advances in simulations, including for deconvolving galaxy images. These approaches have proved to be able to learn complex mappings in the supervised setting, and to be computationally efficient once the model has been learned. We review here, without being exhaustive, some recent work on deconvolution using DNNs. We have identified three different strategies for using DNN in a deconvolution problem: -Learning the inverse: the inverse convolution filter can be directly approximated using convolutional neural networks (Xu et al. 2014;Schuler et al. 2016). In our application with space-variant deconvolution and known kernels, such complicated blind deconvolution is clearly not necessary and would require a large amount of data to try learning information already provided by the physics included in the forward model. -Post-processing of a regularized deconvolution: In the early years of using sparsity for deconvolution a two steps approach was proposed, consisting in first applying a simple linear deconvolution such as using the pseudo-inverse or the Tikhonov filter, letting noise entering in the solution, and then in the second step applying a sparse denoising (see the wavelet-vaguelette decomposition (Donoho 1995;Kalifa et al. 2003), more general regularization (Guerrero-colon & Portilla 2006), or the ForWaRD method (Neelamani et al. 2004)). Similarly, the second step have been replaced by denoising/removing artefacts using a multilayer perceptron (Schuler et al. 2013), or more recently using U-Nets . CNNs are well adapted to this tasks, since the form of a CNN mimics unrolled iterative approaches when the forward model is a convolution. In another application, convolutional networks such as deep convolutional framelets have also been applied to remove artefacts from reconstructed CT images (Ye et al. 2018a). One advantage of such decoupling approach is the ability to process quickly a large amount of data when the network has been learnt, if the deconvolution chosen has closed-form expression. -Iterative Deep Learning: the third strategy uses iterative approaches often derived from convex optimization coupled with deep learning networks. Several schemes have been devised to solve generic inverse problems. The first option, called unrolling or unfolding, is to mimic a few iterations of an iterative algorithm with DNNs so as to capture in the learning phase the impact of 1) the prior (Mardani et al. 2017), 2) the hyperparameters ( Finally, an alternative is to use iterative proximal algorithms from convex optimization (for instance in the framework of the alternating direction method of multiplier plug&play (ADMM PnP) (Venkatakrishnan et al. 2013;Sreehari et al. 2016;H. Chan et al. 2016), or regularization by denoising (Romano et al. 2017;T. Reehorst & Schniter 2018)), where the proximity operator related to the prior is replaced by a DNN (Meinhardt et al. 2017;Bigdeli et al. 2017;Gupta et al. 2018) or a series of DNN trained in different denoising settings as in Zhang et al. (2017).
The last two strategies are therefore more adapted to our targeted problem, and in the following we will investigate how they could be applied and how they perform compared to state-of-the art methods in space-variant deconvolution of galaxy images.

Discussion relative to Deep Deconvolution and Sparsity
It is interesting to notice that connections exist between sparse recovery methodology and DNN: -Learning Invariants: the first features learnt in convolutive deep neural networks correspond typically to edges at particular orientation and location in the images (LeCun et al. 2015), which is also what the wavelet transforms extract at different scales.
Similar observations were noted for features learnt with a CNN in the context of cosmological parameter estimations from weaklensing convergence maps (Ribli et al. 2019). As well, understanding mathematically how the architecture of such networks captures progressively powerful invariants can be approached via wavelets and their use in the wavelet scattering transform (Mallat 2016). -Learned proximal operator: Meinhardt et al. (2017) has shown that using a denoising neural network instead of a proximal operator (e.g. soft-thresholding in wavelet space in sparse recovery) during the minimisation iterations improves the deconvolution performance. They also claim that the noise level used to train the neural network behave like the regularisation parameter in sparse recovery. The convergence of the algorithm is not guaranteed anymore, but they observed experimentally that their algorithms stabilize and they expressed their fixed-points. expanding path and contracting path: the U-nets two parts are very similar to synthesis and analysis concepts in sparse representations. This has motivated the use of wavelets to implement in the U-net average pooling and unpooling in the expanding path (Ye et al. 2018b;Han & Ye 2018). Some other connection can be made with soft-Autoencoder in Fan et al. (2018) introducing a pair of ReLU units emulating soft-thresholding, accentuating the comparison with cascade wavelet shrinkage systems.
Therefore, we observe exchanges between the two fields, in particular for U-Net architectures, with however significant differences such as the construction of a very rich dictionary in U-nets that is possible through the use of a large training data set, as well as non-linearities at every layer essential to capture invariants in the learning phase.

Introduction
In the case of a space-variant deconvolution problem, we can write the same deconvolution equation as before, Y = HX+N, but H is not block circular anymore, and manipulating such a huge matrix is not possible in practice. As in Farrens et al. (2017), we consider instead an Object-Oriented Deconvolution, by first detecting n g galaxies with n p pixels each and then deconvolving independently each object. We use the following definitions: the observations of n g galaxies with n p pixels are collected in Y ∈ R n p ×n g (as before, each galaxy being represented by a column vector arranged in lexicographic order), the galaxy images to recover are similarly collected X ∈ R n p ×n g = [x i ] i=1..n g and the convolution operator with the different kernels is noted H. It corresponds to applying in parallel a convolution matrix H i to a galaxy x i (H i being typically a block circulant matrix with circulant block after zero padding which we perform on the images (Andrews & Hunt 1977)). Noise is noted N ∈ R n p ×n g as before and is assumed to be additive white gaussian noise. With these definitions, we now have or more precisely for block circulant {H i } i=1..n g , which illustrates that we consider multiple local space-invariant convolutions in our model (ignoring the very small variations of the PSF at the scale of the galaxy as done in practice (Kuijken et al. 2015;Mandelbaum et al. 2015;Zuntz et al. 2018)). The deconvolution problem of finding X knowing Y and H is therefore considered as a series of independent ill-posed inverse problems. To avoid having multiple solutions (due to a non trivial null space of {H i } i=1..n g ) or an unstable solution (bad conditioning of these matrices), we need to regularize the problem as in standard deconvolution approaches developed for space-invariant convolutions. This amounts to solve the following inverse problem: arg min and in general we will choose separable regularizers so that we can handle in parallel the different deconvolution problems: arg min Farrens et al. (2017) proposes two methods to perform this deconvolution: -Sparse prior: each galaxy is supposed to be sparse in the wavelet domain, leading to minimize with W (k) a weighting matrix. -Low rank prior: In the above method, each galaxy is deconvolved independently from the others. As there are many similarities between galaxy images, Farrens et al. (2017) proposes a joint restoration process, where the matrix X has a low rank. This is enforced by adding a nuclear norm penalization instead of the sparse regularization, as follows: where X * = k σ k (X), σ k (X) denoting the k th largest singular value of X.
It was shown that the second approach outperforms sparsity techniques as soon as the number of galaxies in the field is larger than 1000 (Farrens et al. 2017).
Article number, page 4 of 18

Neural Network architectures
DNN allows us to extend the previous low rank minimisation, by taking profit of existing databases and learning more features from the data in a supervised way, compared to what we could do with the simple SVD used for nuclear norm penalization. The choice of network architecture is crucial for performance. We have identified three different features we believe important for our application: 1) the forward model and the task implies that the network should be translation equivariant, 2) the model should include some multi-scale processing based on the fact that we should be able to capture distant correlations, and 3) the model should minimize the number of trainable parameters for a given performance, so as to be efficient (lower GPU memory consumption) which is also important to ease the learning. Hopefully these objectives are not contradictory: the first consideration leads to the use of convolutional layers, while the second implies a structure such as the U-Net (Ronneberger et al. 2015) already used to solve inverse problems  or the deep convolutional framelets (Ye et al. 2018a). But because such architectures allow to increase rapidly the receptive field in the layers along the network, they can compete with a smaller number of parameters against CNNs having a larger number of layers and therefore more parameters. We therefore have selected a global U-Net structure as in ), but including the following modifications: Article number, page 5 of 18 -2D separable convolutions: we replace 2D convolutions by 2D separable convolutions (Chollet 2016). The separable convolutions allow to limit the number of parameters in the model by assuming that spatial correlations and correlations across feature maps can be independently captured. Their use have already lead to outperform architectures with non-separable convolution with a larger number of parameters (Chollet 2016). -Dense blocks: we changed the convolutional layers at each "scale" by using dense blocks (Huang et al. 2017). Dense blocks also allow to reduce the number of parameters, by propagating through concatenation all prior feature maps to the input of the current layer. This was claimed to enable feature reuse, preservation of information, and to limit vanishing gradients in the learning. -Average-pooling: we change the pooling step: we have observed that max-pooling lead to over-segmentation of our final estimates, which is alleviated by the use of average pooling. -Skip connection: we removed the skip connection between the input and the output layers introduced by ) which proved to be detrimental to the performance of the network, especially at low SNR. Note that the dense blocks may have also better preserved the flow of relevant information and limited the interest of using residual learning.
The two first modification limit significantly the number of parameters per "scale" of the U-Net, and potentially allow for more scales to be used for a given budget of number of trainable parameters. Our network, we name "XDense U-Net", is displayed in Fig. 2. The following describes how to use such networks in two different ways in order to perform the space variant deconvolution.

Tikhonet: Tikhonov deconvolution post-processed by a Deep Neural Network
The Tikhonov solution for the space variance variant PSF deconvolution is: where L is similarly built as H. The closed-form solution of this linear inverse problem is given by: which involves for each galaxy a different Tikhonov filter In this work, we chose L i = Id and the regularization parameter λ i is different for each galaxy, depending on its SNR (see 3.5 for more details). The final estimate is then only: where the neural network predictions based on its parameters θ for some inputs Y are written as N θ (Y). The success of the first approach therefore lies on the supervised learning of the mapping between the Tikhonov deconvolution of Eq. (11) and the targeted galaxy.
We call this two-step approach "Tikhonet" and the rather simple training process is described in Algorithm 1.
Algorithm 1 DNN training in the Tikhonet approach 1: Initialization: Prepare noise-free training set, choose noise parameters (SNR range) and validation set. Choose architecture for network N, learning parameters (optimizer and its parameters, batch size B and number of batches n batch , number of epochs n epoch ) and cost function to minimize (here mean squared error). The second approach we investigated is using the ADMM PnP framework with a DNN.
ADMM is an augmented lagrangian technique developed to solve convex problems under linear equality constraints (see for instance (Boyd et al. 2010)). It operates by decomposing the minimization problem into sub-problems solved sequentially. One iteration consists in first solving a minimization problem typically involving the data fidelity term, then solving a second minimization problem involving the regularization term, and finishing by an update of the dual variable.
It has previously been noted (Venkatakrishnan et al. 2013;Sreehari et al. 2016;H. Chan et al. 2016) that the first two sub-steps can be interpreted as an inversion step followed by a denoising step coupled via the augmented lagrangian term and the dual variable. These authors suggested to use such ADMM structure with non-linear denoisers in the second step in an approach dubbed ADMM PnP, which recent work has proposed to implement via DNNs (Meinhardt et al. 2017).
In the following, we adopt such iterative approach based on the ADMM PnP because 1) it separates the inversion step and the use of the DNN, offering flexibility to add extra convex constraints in the cost function that can be handled with convex optimization 2) it alleviates the cost of learning by focusing essentially on learning a denoiser or a projector -less networks, less parameters to learn jointly compared to unfolding approaches where each iteration corresponds to a different network 3) by iterating between the steps, the output of the network is propagated to the forward model to be compared with the observations, avoiding large discrepancies, contrary to the Tikhonet approach where the output of the network is not used in a likelihood.
The training of the network N θ in this case is similar to Algorithm 1, except that the noise-free training set is composed of noisefree target images instead of noise-free convolved images, and the noise added has constant standard deviation. Then the algorithm for deconvolving a galaxy is presented in Algo. 2 and is derived from H. Chan et al. (2016). The application of the network is here illustrated in red. We call this approach "ADMMnet". The first step consists in solving the following regularized deconvolution problem at iteration k using the accelerated iterative convex algorithm FISTA (Beck & Teboulle 2009): arg min where ι C is the characteristic function of the non-negative orthant, to enforce the non-negativity of the solution. The DNN used in the second step is used as an analogy with a denoiser (or as a projector),as presented above. The last step controls the augmented lagrangian parameter, and ensure that this parameter is increased when the optimization parameters are not sufficiently changing. This continuation scheme is also important, as noted in H. Chan et al. (2016), as increasing progressively the influence of the augmented lagrangian parameter ensures stabilization of the algorithm. Note that of course there is no convergence guarantee of such scheme and that contrary to the convex case the augmented lagrangian parameter ρ is expected to impact the solution.
Finally, because the target galaxy is obtained after re-convolution with a target PSF to avoid aliasing (see section 4), we also re-convolve the ADMMnet solution with this target PSF to obtain our final estimate.

Implementation and choice of parameters for Network architecture
We describe here our practical choices for the implementation of the algorithms. For the Tikhonet, the hyperparameter λ i that controls the balance in between the data fidelity term and the quadratic regularization in Eq. 11 needs to be set for each galaxy. This can be done either manually by selecting an optimal value as a function of an estimate of the SNR, or by using automated procedures such as generalized cross-validation (GCV) (Golub et al. 1979), the L-curve methode (Christian Hansen & O'leary 1993), the Morozov discrepancy principle (Werner Engl et al. 1996), various Stein Unbiased Risk Estimate (SURE) minimization (C. Eldar 2009;Pesquet et al. 2009;Deledalle et al. 2014), or using a hierarchical Bayesian framework (Orieux et al. 2010;Pereyra et al. 2015). We compared these approach, and report the results obtained by the SURE prediction risk minimization which lead to the best results with the GCV approach.
For the ADMM, the parameters ρ 0 , ρ max , η, and γ have been selected manually, as a balance between stabilizing quickly the algorithm (in particular high rho) and favouring the minimization of the data fidelity term in the first steps (low rho). We investigated in particular the choice of ρ 0 which illustrate how the continuation scheme impacts the solution.
The DNNs were coded in Keras 1 with Tensorflow 2 as backend. 4 scales were selected in the XDense U-Net with an increasing number of layers for each scale (to capture distant correlations). Each separable conv was composed of a 3 × 3 spatial filters and a growth factor of 12 was selected for the dense blocks. We used batch normalization, and rectified linear units for the activation. We also tested weighted sigmoid activations (or swish in Elfwing et al. (2018)) which seems to slightly improve the results at the cost of increasing the computational burden. The total number of trainable parameters was 184301. We use 20 epochs, a batch size of 32 and the Adam optimizer was selected (we keep the default parameters) to minimize the mean squared error (MSE) cost function. After each epoch, we save the network parameters only if they improve the MSE on the validation set.

Experiments
In this section, we describe how we generated the simulations used for learning networks and testing our deconvolution schemes, as well as the criteria we will use to compare the different deconvolution techniques.

Dataset generation
We use GalSim 3  to generate realistic images of galaxies for training our networks and testing our deconvolution approaches. We essentially follow the approach used in GREAT3 (Mandelbaum et al. 2014) to generate the realistic space branch from high resolution HST images, but choosing the PSFs in a set of 600 Euclid-like PSFs (the same as in Farrens et al. (2017)). The process is illustrated in Fig. 3.
A HST galaxy is randomly selected from the set of about 58000 galaxies used in the GREAT3 challenge, deconvolved with its PSF, and random shift (taken from a uniform distribution in [−1, 1] pixel), rotation and shear are applied. The same cut in SNR is performed as in GREAT3 (Mandelbaum et al. 2014) , so as to obtain a realistic set of galaxies that would be observed in a SNR range [20,100] when the noise level is as in GREAT3. In this work we use the same definition of SNR as in this challenge: where σ is the standard deviation of the noise. This SNR corresponds to an optimistic SNR for detection when the galaxy profile X i is known. In other (experimental) definitions, the minimal SNR is indeed closer to 10, similarly to what is usually considered in weak lensing studies (Mandelbaum et al. 2014). If the cut in SNR is passed, to obtain the target image in a 96 × 96 grid with pixel size 0.05 , we first convolve the HST deconvolved galaxy image with a Gaussian PSF with FWHM = 0.07 to ensure no aliasing occurs after the subsampling. To simulate the observed galaxy without extra noise, we convolve the HST deconvolved image with a PSF randomly selected among about 600 Euclid-like PSFs (the same set as used in Farrens et al. (2017)). Note that the same galaxy rotated by 90°is also simulated as in GREAT3.
Because we use as inputs real HST galaxies, noise from HST images propagate to our target and observed images, and is coloured by the deconvolution/reconvolution process. We did not want to denoise the original galaxy images to avoid losing substructures in the target images (and making them less "realistic"), and as this noise level is lower than the noise added in our simulations we expect it to change marginally our results -and not the ranking of methods.
This process is repeated so that we end up with about 210000 simulated observed galaxies and their corresponding target. For the learning, 190000 galaxies are employed, and 10000 for the validation set. The extra 10000 are used for testing our approaches.
In the learning phase, additive white Gaussian noise is added to the galaxy batches with standard deviation chosen so as to obtain a galaxy in a prescribed SNR range. For the Tikhonet, we choose randomly for each galaxy in the batch a SNR in the range [20,100], which corresponds to selecting galaxies from the limit of detection to galaxies with observable substructures, as illustrated in Fig 4. For the ADMMnet, we learn a denoising network for a constant noise standard deviation of σ = 0.04 (same level as in GREAT3).
We then test the relative performance of the different approaches in a test set for fixed values: SNR ∈ {20, 40, 60, 80, 100} to better characterize (and discriminate) them, and for a fixed standard deviation of σ = 0.04 corresponding to what was simulated in GREAT3 for the real galaxy space branch to obtain results on a representative observed galaxy set. The corresponding distribution of SNR in the last scenario is represented in Fig. 5. All the techniques are compared on exactly the same test sets.
For the ADMMnet approach when testing at different SNRs, we need to adjust the noise level in the galaxy images to the level of noise in the learning phase. We therefore rescale the galaxy images to reach this targeted noise level, based on noise level estimation in the images. This is performed via a robust standard procedure based on computing the median absolute deviation in the wavelet domain (using orthogonal daubechies wavelets with 3 vanishing moments). In the upper branch we obtain the targeted galaxy. In the lower branch, we simulate the corresponding Euclid-like observed galaxy. Note that in these figures, a log-scale was adopted for the PSFs to illustrate its complicated structure. Fig. 4. Range of SNR used for the training and for testing in the simulations. From left to right: targeted galaxy image, then observed convolved images at increasing SNR. In our definition, SNR = 20 is barely at the galaxy detection limit, while at SNR = 100 galaxy substructures can be visualized. Fig. 5. Distribution of SNR of simulated galaxies for constant noise simulations (σ = 0.04). The peak of the distribution is at about SNR = 30, and the mean SNR is SNR = 54.

Quality criteria
The performance of the deconvolution schemes is measured according to two different criteria, related to pixel error and shape measurement errors. For pixel error we select a robust estimator: Article number, page 9 of 18 where x (t) i is the targeted value, and with MED the median over the relative mean squared error computed for each galaxy x i in the test set, in a central window of 41 × 41 pixels common to all approaches.
For shape measurement errors, we compute the ellipticity using a KSB approach implemented in shapelens 4 (Kaiser et al. 1995;Viola et al. 2011), that additionally computes an adapted circular weight function from the data.
We first apply this KSB method to the targets, taking as well into account the target isotropic gaussian PSF, to obtain reference complex ellipticities i and windows. We then compute the complex ellipticity i of the deconvolved galaxies using the same circular weight functions as their target counterpart. Finally, we compute to obtain a robust estimate of the ellipticity error in the windows set up by the target images, again in a central window of 41 × 41 pixels common to all approaches.. We also report the distribution of pixel and ellipticity errors prior to applying the median when finer assessments need to be made.

New architecture versus literature
We first compared our XDense U-Net implementation with a classical U-Net implementation for the Tikhonet approach, to test the efficiency of our proposed deep learning model. For the U-Net, we choose 3 scales with 2 layers per scale and 20 feature maps per layer, to end up with 206381 trainable parameters (to compare with the 184301 of our XDense U-Net implementation). In this case, we set the hyperparameter using as oracle the SNR for each galaxy (λ i = 1 SNR i ). Visual results are displayed in Fig. 6. When looking statistically at the results in Fig. 7, pixel and ellipticity errors are indeed very similar, but the proposed XDense U-Net led to slightly better results when looking at the ellipticity errors. The improvement in terms of median ellipticity error is about 1% at SNR = 20 to about 6.5% improvement at SNR = 100, with about 10% less trainable parameters. In the following, we therefore use the XDense-Unet architecture for our tests.

Setting the hyperparameters
For the Tikhonet, the key parameters to set are the hyperparameters λ i in Eq. 11. In Fig. 8, these hyperparameters are set to the parameters minimizing the SURE multiplied by factors ranging from 10 to 0.01 at SNR = 20. It appears that for the lowest factor, corresponding to the smallest regularization of deconvolution (i.e. more noise added in the deconvolved image), the Tikhonet is not able to perform as well as for intermediate values, in particular for exactly the SURE minimizer. This is confirmed in Fig. 9, illustrating that the best results according to the pixel and ellipticity error criteria are consistently obtained across all SNR tested for values of the multiplicative factor between 0.1 and 1. Higher multiplicative factors also lead to more extreme errors in particular at low SNR. In the following, we therefore set this parameter to the SURE minimizer.
For the ADMMnet, we set manually the hyperparameters ρ max = 200, = 0.01 to lead to ultimate stabilization of the algorithm, η = 0.5 and γ = 1.4 to explore intermediate ρ values, and we investigate the choice of parameter ρ 0 to illustrate the impact of the   continuation scheme on the solution. This is illustrated in Fig. 10 at high SNR, and Fig. 11 at low SNR. When ρ 0 is small, higher frequencies are recovered in the solution as illustrated in galaxy substructures in Fig. 10, but this could lead to artefacts at low SNR as illustrated in Fig. 11.
Quantitative results are presented in Fig. 12. When looking at pixel error at high SNR and low SNR , the lowest pixel errors in terms of median are obtained at low SNR for ρ 0 = 200 (slightly less than 2% better than for ρ 0 = 1), while at high SNR ρ 0 = 1 is the best (slightly more than 2% better than for ρ 0 = 200). In terms of ellipticity errors, ρ 0 = 1 allows to obtain consistently the best results at low and high SNR. Compared to ρ 0 = 200, it performs more than 4% better in term of median ellipticity error at both SNR = 20 and SNR = 100. Overall, this illustrates that the continuation scheme has a small impact in particular on the ellipticity errors, and that best results are obtained for different ρ 0 if pixel or ellipticity errors are considered. In practice the difference in pixel errors being small, we keep in the following ρ 0 = 1 for further comparison with other deconvolution approaches.

DNN versus sparsity and low-rank
We compare our two deep learning schemes with the sparse and the low rank approaches of Farrens et al. (2017), implemented in sf_deconvolve 5 . For the two methods, we used all parameters selected by default, and select the central 41 × 41 pixels of the observed galaxies to be processed in particular to speed up the computation of the singular value decomposition used in the low rank constraint (and therefore of the whole algorithm) as in Farrens et al. (2017). All comparisons will be made in this central region of the galaxy images.
We now illustrate the results for a variety of galaxies recovered at different SNR for the sparse, low-rank deconvolution approaches and the Tikhonet and ADMMnet.
We first display several results at low SNR (SNR = 20) in Fig. 13 to illustrate the robustness of the various deconvolution approaches. Important artefacts appear in the sparse approach, illustrating the difficulty of recovering the galaxy images in this high noise scenario: deconvolved false detections lead to these point-like artefacts.
For the low rank approach, low frequencies seems to be partially well recovered, but artefacts appears for elongated galaxies in the direction of the minor axis. Finally, both Tikhonet and ADMMnet seem to recover better the low frequency information, but the galaxy substructures are essentially lost. The ADMMnet seems to recover in this situation sharper images but with more propagated noise/artefacts than the Tikhonet, with similar features as for the sparse approach but with less point-like artefacts. Fig. 13. Deconvolved images with the various approaches for SNR = 20. Each row corresponds to a different processed galaxy. From left to right: image to recover, observation with a noise realization, sparse and low rank approaches, and finally Tikhonet and ADMMnet results.
We also display these galaxies at a much higher SNR (SNR = 100) in Fig. 14 to assess the ability of the various deconvolution schemes to recover galaxy substructures in a low noise scenario. The sparse approach leads again to point-like artefacts when the galaxy substructure is complex, whereas the low-rank approach displays less artefacts, but does not seem again to be able to adequately represent elongated galaxies or directional substructures in the galaxy. This is probably due to the fact that the low rankness approach does not adequately cope with translations, leading to over-smooth solutions. On the contrary, both Tikhonet and ADMMnet lead to recover substructures of the galaxies, with the ADMMnet displaying higher frequencies than Tikhonet. Overall the two proposed deconvolution approaches using DNNs lead to the best visual results.
The quantitative deconvolution criteria are presented in Fig. 15. Concerning median pixel error, this figure illustrates that both Tikhonet and ADMMnet perform much better than the sparse and low-rank approach to recover the galaxy intensity values, whatever the SNR, as anticipated from the previous images of galaxy. At high SNR, more than a factor of 2 improvement is obtained with the DNN approaches compared to the classical techniques, illustrating how supervised deep learning methods outperformed the classical approaches by better capturing galaxy features. Then in these noise settings the low-rank approach with 10000 galaxies performed slightly better than using sparsity, except for SNR = 40. The Tikhonet seems to perform slightly better than the ADMMnet with this criterion as well. For shape measurement errors, the best results are obtained with the Tikhonet approach at low SNR (up to SNR = 40), and then the ADMMnet outperforms the others at higher SNR. The sparse approach is about 12% (resp. 25%) worst at SNR = 20 (resp. SNR = 100), and finally the low-rank performs the worst whatever the SNR. In short, these results clearly favour the choice of the DNN approaches. This is confirmed when looking at a realistic distribution of galaxy SNR, as shown in Table 1. In terms of both median pixel and ellipticity errors, the proposed deep learning approaches perform similarly, and outperforms both sparse and low-rank approaches: median pixel error is reduced by almost 50%, and ellipticity error by about 10%.

Computing Time
Finally, we also report in Table 2 the time necessary to learn the networks and process the set of 10000 galaxies on the same GPU/CPUs, as this is a crucial aspect when potentially processing a large number of galaxies such as in modern surveys. Among DNNs, learning the parameters of the denoising network for the ADMMnet is faster than those of the post-processing network in the Tikhonet since the latter requires each batch to be deconvolved. However once the network parameters have been learnt, the Tikhonet based on a closed-form deconvolution is the fastest to process a large number of galaxies (about 0.05s per galaxy). On the other hand, learning and restoring 10000 galaxies is quite fast for the low-rank approach, while iterative algorithms such as ADMMnet or the primal-dual algorithm for sparse recovery are similar in terms of computing time (about 7 to 10s per galaxy). All these computing times could however be reduced if the restoration of different galaxy images is performed in parallel, which has not been implemented.

Conclusions
We have proposed two new space-variant deconvolution strategies for galaxy images based on deep neural networks, while keeping all knowledge of the PSF in the forward model: the Tikhonet, a post-processing approach of a simple Tikhonov deconvolution with a DNN, and the ADMMnet based on regularization by a DNN denoiser inside an iterative ADMM PnP algorithm for deconvolution. We proposed to use for galaxy processing a DNN architecture based on the U-Net particularly adapted to deconvolution problems, with small modifications implemented (dense Blocks of separable convolutions, and no skip connection) resulting in a lower number of parameters to learn with slightly improved performance in shape measurement errors. We evaluated these approaches compared to the deconvolution techniques in Farrens et al. (2017) in simulations of realistic galaxy images derived from HST observations, with realistic space-variant PSFs, processed with the GalSim simulation code. We investigated in particular how to set the hyperparameters in both approach: the Tikhonov hyperparameter for the Tikhonet and the continuation parameters for the ADMMnet. Our main findings are as follows: for both Tikhonet and ADMMnet, the hyperparameters impact the performance of the approaches, but the results are quite stable in a range of values for these hyperparameters. In particular for the Tikhonet, the SURE minimizer is within this range. For the ADMMnet, more hyperparameters needs to be set, and the initialization of the augmented lagrangian parameter impacts the performance: small parameters lead to higher frequencies in the images, while larger parameters lead to over-smooth galaxies recovered. visually both methods outperform the sparse recovery and low-rank techniques, which displays artefacts at both the low and high SNR probed this is also confirmed in all SNR ranges and for a realistic distribution of SNRS; about 50% improvement is achieved in terms of median pixel error and about 10% improvement for median shape measurement errors. among DNN approaches, Tikhonet outperforms ADMMnet in terms of median pixel errors whatever the SNR, and median ellipticity errors for low SNR (SNR < 40). At higher SNR, the ADMMnet leads to slightly lower ellipticity errors. the Tikhonet is the fastest approach once the network parameters have been learnt, with about 0.05s needed to process a galaxy, to be compared with sparse and ADMMnet iterative deconvolution approaches which takes about 7 to 10s per galaxy.
If the ADMMnet approach is still promising, as extra constraints could be added easily to the framework (while the success of the Tikhonet approach also lies on the ability to compute a closed-form solution for the deconvolution step), these results illustrate that the Tikhonet is overall the best approach in this scenario to process both with high accuracy and fastly a large number of galaxies.

Reproducible Research
In the spirit of reproducible research, the space-variant codes will be made freely available on the CosmoStat website. The testing datasets will also be provided to repeat the experiments performed in this paper.