R. Vio^{1,3} - J. Bardsley^{2} - W. Wamsteker^{3}
1 - Chip Computers Consulting s.r.l., Viale Don L. Sturzo 82,
S. Liberale di Marcon, 30020 Venice, Italy
2 -
Department of Mathematical Sciences, The University of Montana,
Missoula, MT 59812-0864, USA
3 -
ESA-VILSPA, Apartado 50727, 28080 Madrid, Spain
Received 12 September 2004 / Accepted 28 February 2005
Abstract
It is well-known that the noise associated with the
collection of an astronomical image by a CCD camera is largely
Poissonian. One would expect, therefore, that computational
approaches that incorporate this a priori information will be more
effective than those that do not. The Richardson-Lucy (RL)
algorithm, for example, can be viewed as a maximum-likelihood (ML)
method for image deblurring when the data noise is assumed to be
Poissonian. Least-squares (LS) approaches, on the other hand,
are based on the assumption that the noise is Gaussian with fixed
variance across pixels, which is rarely accurate. Given this, it
is surprising that in many cases results obtained using LS
techniques are relatively insensitive to whether the noise is
Poissonian or Gaussian. Furthermore, in the presence of Poisson
noise, results obtained using LS techniques are often comparable
with those obtained by the RL algorithm. We seek an explanation of
these phenomena via an examination of the regularization
properties of particular LS algorithms. In addition, a careful
analysis of the RL algorithm yields an explanation as to why it is
more effective than LS approaches for star-like objects, and why
it provides similar reconstructions for extended objects. Finally
a comparative convergence analysis of the two algorithms is
carried out, with a section devoted to the convergence properties
of the RL algorithm. Numerical results are presented throughout
the paper. The subject treated in
this paper is not purely academic. In comparison with many ML
algorithms, the LS algorithms are much easier to use and to
implement, are computationally more efficient, and are more
flexible regarding the incorporation of constraints on the
solution. Consequently, if little to no improvement is gained in
the use of an ML approach over an LS algorithm, the latter will
often be the preferred approach.
Key words: methods: data analysis - methods: statistical - techniques: image processing
The restoration of images is a common problem in Astronomy. Astronomical images are blurred due to several factors: the refractive effects of the atmosphere, the diffractive effects of the finite aperture of the telescope, the statistical fluctuations inherent in the collection of images by a CCD camera, and instrumental defects. An example is represented by the spherical aberration of the primary mirror of the Hubble Space Telescope (White 1991) that limited the quality of the images before the detector system was refurbished.
The widespread interest in this subject has resulted in the development of a large number of algorithms with different degrees of sophistication (for a review, see Starck et al. 2002). For example, recent wavelet-based approaches have been shown to provide excellent results (e.g., see Neelamani et al. 2004). Unfortunately, these algorithms are very expensive to implement, prohibiting their use on large-scale image restoration problems and on problems that require the restoration of a large number of images. Consequently, for many restoration problems, less sophisticated and computationally more efficient algorithms must be used. In this respect, the algorithms based on a linear Least-Squares (LS) methodology represent an interesting class. In this paper we discuss two LS approaches: direct and iterative. Direct methods, which we discuss in Sect. 3.1, are the most computationally efficient, while iterative techniques, which we discuss in Sect. 3.2, allow for the straightforward incorporation of constraints.
In spite of the advantages of the LS-based algorithms, astronomers typically use techniques based on a non-linear approach. Such algorithms are usually more difficult to implement, are less flexible and often have slow convergence rates. In particular, the original Richardson-Lucy (RL) algorithm (Lucy 1974; Shepp & Vardi 1982; Richardson 1972) and later modifications have attracted much attention. RL can be viewed as an Expectation-Maximization (EM) algorithm associated with a Poisson statistical noise model. Linear LS methods, on the other hand, can be viewed as the maximum-likelihood (ML) approach when the noise contaminating the image(s) of interest is additive Gaussian with constant variance across pixels. For CCD camera noise, the statistical assumption inherent in the RL algorithm is much more accurate than that of the LS approach (see Sect. 2). Nonetheless, it is often the case that these two methods provide results of similar quality (see Sect. 4). This is disappointing since it means that in certain instances the RL algorithm is not able to exploit the a priori statistical information. This is particularly relevant when the incorporation of the a priori information results in algorithms that are more expensive and difficult to implement.
The aims of the present paper are as follows: I) to determine the performance of the LS algorithms when the noise is predominantly Poissonian; II) to determine when the LS and RL approaches will give very similar qualitative results. Such questions are not only of academic interest. We believe that due to certain distinct computational advantages, the LS algorithms should be preferentially. However, the LS approach is not always the best choice.
In the next section, we present the statistical model for CCD camera image formation as well as the approximate model that we will use here. After some preliminary considerations in Sect. 2, in Sect. 3 we will explore the convergence properties of the two LS approaches. We will also discuss the performance of these algorithms on different objects. In Sect. 4 we will explore in detail the convergence properties of the RL algorithm. Throughout the paper we present numerical results. We give our conclusion in Sect. 5.
Astronomical image data is typically collected with a CCD camera.
The following statistical model
(see Snyder et al. 1993,1995) applies to image data from a CCD detector array:
For clearness we use matrix-vector
notation. We rewrite Eq. (1) as
We will use the following notation to denote
model (2):
A further useful approximation is possible if the
elements of
are large.
In this case model (5) can be well
approximated by
From Eqs. (6) and (7) we see that
has a flat spectrum, i.e., the expected modulus of its
Discrete Fourier Transform (DFT) is constant. However, here, unlike
a stationary Gaussian white noise process,
the various Fourier frequencies are not independent of each
other (e.g., see Fig. 1). The reason is that the
point-wise multiplication in the spatial domain corresponds to
convolution in the Fourier domain, and vice versa. Thus, from Eq. (7), we have
Figure 1: Comparison of Poissonian vs. stationary Gaussian noise and corresponding power-spectra for the satellite image shown in Fig. 4. The variance of the two noises is the same. | |
Open with DEXTER |
The above comments regarding the noise process
provide some
insight into the performance of the LS deblurring algorithms in the
presence of Poissonian noise. In particular, due to the fact that
the LS approach corresponds to the statistical assumption that the
contaminating noise is stationary Gaussian, while
is
well-approximated by a nonstationary Gaussian random variable, the
possible worsening of the performance of LS algorithms has to be
expected due to their inability to take into account the
dependence between the Fourier frequencies that characterize the
spectrum of .
The question then arises as to what happens if such a
dependence is not taken into account. In order to understand this
point, we remark that the LS estimate of the true image
in Eq. (6) given by
Although the set of frequencies most corrupted by noise are determined both by the noise level and by the spectrum of the PSF, it is the latter factor that has the most influence. To see this we note that, for the case of the star-like source and Gaussian PSF considered above, it is possible to show that the frequency index (i,j)where the spectrum of the signal and that of have the same level is given by , where r=(i^{2}+j^{2})^{1/2}, is the amplitude of the source, and is the level of the noise spectrum. In the two next sections, these arguments will be checked in the context of the direct and the iterative LS methods.
The approximation (7) motivates the use of a weighted
least-squares (WLS) technique. In this approach, the
norm
in Eq. (9) is replaced by an energy norm
,
where
is the diagonal
covariance matrix of the
(approximate) noise random variable ,
and
(12) |
One of the most standard approaches for handling the
ill-conditioning of problem (9) is known as Tikhonov
regularization. In the Tikhonov approach, one obtains a stable
estimate of the solution of the linear system of interest by
imposing a norm penalization, or bias, to the solution itself.
This means that we solve, instead, the penalized least-squares problem
(16) |
Figures 2-5 compare the results obtainable with this method when the noise is stationary Gaussian and Poissonian, respectively. The image , contaminated by Poissonian noise, has been obtained by simulating a nonstationary Poissonian process with local mean given by the values of the pixels in the blurred images, i.e. using model (5). Four peak signal to noise (S/N) ratios have been considered^{}: 20, 30, 40 and . They correspond to situations of very high, intermediate, and very low noise levels. The PSF used in the simulations is a two-dimensional Gaussian function with circular symmetry. The image , contaminated by Gaussian noise, has been obtained by the addition of a discrete stationary white noise process to the blurred images. Both the Gaussian and the Poissonian noises have been simulated through a classic inverse distribution method (e.g., see Johnson 1987) by transforming the same set of random uniform deviates. They have exactly the same variance. Here, the subject of interest is superimposed on a sky whose intensity, in the blurred image, is set to of the maximum value of the image. This means that, contrary to the Gaussian case where the noise level is constant across the image, in the Poissonian case the noise is mostly concentrated in the pixels with highest values. In spite of this fact, these figures show that the results provided by Tikhonov coupled with GCV are quite similar regardless of whether the noise is Gaussian or Poissonian.
These results can be explained if one considers Eq. (17), where it is clear that the role of is to replace the Fourier coefficients with small modulus, i.e., those coefficients that make the deblurring operation unstable. According to the two points mentioned above, the "optimal'' value of should replace all the Fourier coefficients whose modulus is smaller than the expected level of the noise in the Fourier domain. Since in and the level of the noise is the same, such a replacement will be quite important and will have a similar effect for both images. This is shown by the results in Figs. 6-9. In particular, the c) panels show that GCV chooses so that the frequencies corresponding to the flat part of the spectra (i.e., those dominated by the noise) are filtered out. The consequence of this is that, for both Gaussian and the Poissonian noises, almost the same number of coefficients are filtered. Moreover, as is shown in the d) panels, the coefficients and corresponding to the coefficients with the largest modulus, are very similar. From this, one can conclude that the deblurred images can be expected to be very similar regardless of the nature of the noise.
Figure 2: Comparison of the results obtained by Tikhonov coupled with GCV in the case of Poissonian and Gaussian noises (see text). The images have sizes pixels, the PSF is Gaussian with circular symmetry and dispersion set to 12 pixels, . and are the rms of the true residuals calculated on the entire image and only on the pixels corresponding to the satellite, respectively. | |
Open with DEXTER |
Figure 3: As in Fig. 2 but with . | |
Open with DEXTER |
Figure 4: As in Fig. 2 but with . | |
Open with DEXTER |
Figure 5: As in Fig. 2 but with . | |
Open with DEXTER |
Figure 6: Comparison of the results obtained by Tikhonov coupled with GCV in the case of Poissonian and Gaussian noises. This figures correspond to the experiment shown in Fig. 2. Panel a) the coefficients and in decreasing order. The two horizontal lines represent the values of for the two noises; b) corresponding GCV functions; c) coefficients and corresponding to the first 2000 coefficients of shown in panel a). The vertical lines show the indices of closest to . d) vs. the corresponding first 2000 coefficients of with the largest modulus. . | |
Open with DEXTER |
Figure 7: As in Fig. 6 but with . | |
Open with DEXTER |
Figure 8: As in Fig. 6 but with . | |
Open with DEXTER |
Figure 9: As in Fig. 6 but with . | |
Open with DEXTER |
The reason why the two GCV curves are almost identical (see the
b) panels) is that, independently of the nature of the noise,
in Eq. (18) the quantity
can be replaced by
,
where
is
given by Eq. (8) or by a stationary white noise
process. Now, taking the expected value of the resulting
,
it is not difficult to show that
Figure 10: Relationship between the estimates and , corresponding to the Gaussian and the Poissonian noise, respectively, obtained in 100 different realizations of the experiment shown in Fig. 3. The mean value and the standard deviation are 0.128 and for , and 0.128 and for . | |
Open with DEXTER |
It is not difficult to see that similar arguments hold also when
in model (13) the norm penalization is substituted
by a smoothness one, i.e., when
One can use an iterative approach together with Tikhonov regularization, i.e., one can use an iterative approach for solving problem (13). These approaches allow for the incorporation of nonnegativity constraints, while the direct approach does not. See Vogel (2002) for a thorough discussion.
To deal with the ill-conditioning of problem (9) an iterative approach can also be used. Although computationally less efficient than the direct method discussed above, they are much more flexible in that they allow for the straightforward incorporation of constraints. These algorithms provide regularization via a semiconvergence property; that is, the iterates first reconstruct the low frequency components of the signal, i.e. those less contaminated by noise, and then the high frequency ones. In other words, the number of iterations plays the same role as the regularization parameter in the Tikhonov approach.
Semiconvergence has been rigorously proved only for a limited number of algorithms (see Lee & Nagy 2004; Vogel 2002). For others, some theoretical results are available but the primary evidence stems from many years of success in use in applied problems.
The prototypical iterative regularization algorithm for least squares problems is the Landweber method (LW). This is a gradient descent algorithm for solving problem (9). In particular, it creates a sequence of iterates that, provided has full column rank, converges to the solution of Eq. (10). Because of its slow convergence, it is not frequently used in practical applications. However, because of its simplicity, it is often utilized in the theoretical analysis of the potential performances of the LS approach.
If
is the starting image
(often
), then the iterations take the form
There are gradient descent algorithms similar to Landweber for minimizing a least-squares function subject to nonnegativity constraints (e.g., see Vogel 2002).
(23) |
(24) |
(25) |
Figure 11: Plot of vs. r for different values of the iteration count k. | |
Open with DEXTER |
The obvious conclusion is that LW is useful only for the restoration of objects for which the low frequencies are dominant, e.g. extended objects with smooth light distributions.
Figure 12: LW deblurring of the images shown in Fig. 2 corresponding to the smallest value of the rms of the true residuals calculated on the entire image, say . is the rms calculated by using only the pixels containing the contribution of the satellite. | |
Open with DEXTER |
Figure 13: LW deblurring of the images shown in Fig. 3. and as in Fig. 12. | |
Open with DEXTER |
Figure 14: LW deblurring of the images shown in Fig. 4. and as in Fig. 12. | |
Open with DEXTER |
Figure 15: LW deblurring of the images shown in Fig. 5. and as in Fig. 12. | |
Open with DEXTER |
Figure 16: Convergence rate of the LW algorithm applied to the problem of deblurring the images shown in Fig. 2. | |
Open with DEXTER |
Figure 17: Convergence rate of the LW algorithm applied to the problem of deblurring the images shown in Fig. 3. | |
Open with DEXTER |
Figure 18: Convergence rate of the LW algorithm applied to the problem of deblurring the images shown in Fig. 4. | |
Open with DEXTER |
Figure 19: Convergence rate of the LW algorithm applied to the problem of deblurring the images shown in Fig. 5. | |
Open with DEXTER |
In the previous sections we have shown that the LS methods are relatively insensitive to the specific nature of the noise. However, this does not mean that they are optimal. In principle, methods that exploit the a priori knowledge of the statistical characteristics of the noise should be able to provide superior results.
As has been stated above, the least-squares problem (9)
corresponds to the assumption of additive Gaussian, stationary
noise statistics. In this case, the solution is
known as the maximum likelihood estimator (MLE) of the true image.
Assuming, instead, noise model (5), one can show that
the corresponding MLE is the minimizer of the negative Poisson
log-likelihood function
Figure 20: vs. the number of iterations. The object of interest, located in the center of the image, is a two-dimensional Gaussian with circular symmetry and dispersion (in pixels) given in the top of each panel. It is superimposed on a background whose level is of the peak value of the blurred image. The PSF is a two-dimensional Gaussian with a dispersion of 6 pixels. The size of the image are pixels. The noise is Poissonian with peak . Two deblurring algorithms are used: Richardson-Lucy (RL) and LW. | |
Open with DEXTER |
Figure 21: As in Fig. 20. | |
Open with DEXTER |
Figure 22: As in Fig. 20 but with . | |
Open with DEXTER |
Figure 23: As in Fig. 20 but with . | |
Open with DEXTER |
Figure 24: vs. the number of iterations. The object of interest, located in the center of the image, is of a two-dimensional rectangular function, with side length (in pixels) given in the top of each panel. It is superimposed on a background whose level is of the peak value of the blurred image. The PSF is a two-dimensional Gaussian with a dispersion of 6 pixels. The size of the image are pixels. The noise is Poissonian with peak . Two deblurring algorithms are used: Richardson-Lucy (RL) and LW. | |
Open with DEXTER |
Figure 25: As in Fig. 24. | |
Open with DEXTER |
Figure 26: As in Fig. 24 but with . | |
Open with DEXTER |
Figure 27: As in Fig. 24 but with . | |
Open with DEXTER |
Since RL exploits the a priori knowledge regarding the statistics of photon counts, it should be expected to yield more accurate reconstructions than an approach that does not use this information. In reality, as shown by Figs. 20-27, the situation is not so clear. These figures provide the convergence rates and the performances of RL and LW methods for objects with a size that is increasing with respect to the size of the PSF (a two-dimensional Gaussian with circular symmetry). Two different types of objects are considered: a two-dimensional Gaussian and a rectangular function. Since the first target object presents an almost band-limited spectrum, whereas for the second target object the high-frequency Fourier components are important, their restorations represent very different problems. For both experiments, a background with an intensity of of the maximum value in the blurred image has been added. Finally, two different levels of noise have been considered corresponding to a peak S/N of 30 and 40 dB, respectively. The first case provides a background with an expected number of counts approximately equal to 30, i.e., a level for which the Gaussian approximation of the Poissonian distribution is not very good.
From Figs. 20-27 it appears that the performance of RL for objects narrower than the PSF is in general superior to LW for the band-limited target. The same is not true for the other objects. Interestingly, though, for extended objects, i.e. smooth objects with high intensity profiles over large regions, the performance of RL is roughly equivalent to that of LS (to properly compare the convergence rate, it is necessary to take into account that, for each iteration, RL requires the computation of twice the number of two-dimensional DFT than is required by LW). This is especially true for the images characterized by the best S/N. Motivated by these numerical results, we seek answers to the following questions: (i) why does RL perform better than LS on star-like objects; and (ii) why do the RL and LS approaches yield roughly the same results on extended objects?
In practice, when either the RL or LS approaches are used in solving image reconstruction problems, the exact computation of the maximum likelihood estimate (MLE) is not sought. For example, as was stated above, the LW iteration implements regularization via the iteration count. In fact, the objective when using LW is to stop the iteration late enough so that an accurate reconstruction is obtained, but before the reconstruction is corrupted by the noise in the high frequency components of the image. Notice, for example, that in Figs. 20-27 the relative error begins to increase at a certain point in both the RL and LW iterations.
As was stated above, one can show that the LW iterations provide
regularized solutions of
via the singular
value decomposition (SVD) of the matrix .
Unfortunately, such
an analysis of RL is impossible due to the nonlinearity in the RL
algorithm. In particular, note the Hadamard multiplication and
division in algorithm (32). Instead, we first note
that if
is an invertible matrix, RL iterations converge to the
MLE corresponding to the statistical model (5)
(see Wu 1983). The MLE is also the minimizer of the function
(28) subject to
.
When the RL
algorithm is used on image deblurring problems it exhibits a
similar convergence behavior to that of the LW iteration.
Specifically, for ill-conditioned problems, the RL iterates
provide more accurate reconstructions in early
iterations (semiconvergence property), while in later iterations
blow-up occurs (Lucy 1974; Carasso 1999). To explain why this occurs,
we note that at the positive components x_{j} of the MLE vector
,
we must have, from Eqs. (29) and (30),
that
To obtain a deeper understanding of the RL and LS
approaches and to answer the two questions posed above,
we introduce the
quantities
(36) |
However, as has been shown by the simulations presented above, RL
does not outperform LW when applied to reconstructing objects
with smooth light distribution and
whose spatial extension is broader than the PSF. In order to
understand this phenomenon, consider the negative
Jacobian matrices of the quantities (34) and
(35):
Perhaps an even more important observation is that the sensitivity of the LW iteration to perturbations in is independent of both and . Consequently, the algorithm has no means of distinguishing between low and high intensity regions within the object, and hence perturbations of the same magnitude are allowed for components of corresponding to regions of both low and high light intensity. This explains why in areas of low light intensity (where the margin of error is very small) LW, and the least-squares approach in general, does poorly.
The sensitivity matrix (38) for the RL iteration is more difficult to analyze. However, some simplification is possible when one considers the problem of the restoration of a flat background or of regions of an image in which the intensity distribution varies smoothly (e.g., the interior of the rectangular function considered in the simulations). In this case it is possible to define a region where the image can be considered constant or almost constant. Because of the semiconvergence property of RL, in such regions the components of the vector converge rapidly to zero (this has been verified via numerical simulations). Thus the first term in Eq. (38) converges to zero. The same is not true for the second term. Thus, it is reasonable to expect that it will provide an accurate approximation of the sensitivity of RL within .
Provided that the spread of the PSF is small relative to the size
of ,
early in RL iterations the vector
is
approximately constant and close to ,
i.e. those pixels
values are reconstructed rapidly, within .
Hence, the
vector
is also
approximately constant within .
In addition, if we define
to be the diagonal matrix with components
As shown above, LW presents an acceptable convergence rate only in case of restoration of extended objects. Unfortunately, understanding the convergence properties of the RL algorithm (21) is difficult since it is not possible to carry out an analysis similar to that done in Sect. 3.2. For this reason, we consider, again, a noise-free signal and Gaussian PSF with circular symmetry and variance . In addition, we suppose that the object of interest is a circular Gaussian source with variance . The amplitude of the source is not considered since RL conserves the total number of counts. Due to the connection between the RL and LW iterations discussed in the previous section, an understanding of RL convergence may provide further understanding of the convergence of the LW iteration. For simplicity, in what follows we work in the continuous, and results will be later discretized.
If a Gaussian function with variance
is denoted by
,
then
If we define to be the variance of the Gaussians on the right hand side of Eqs. (44)-(46), then for Eq. (44) we have ; in Eq. (45) we have ; and in Eq. (46) we have . Consequently, only the operation (45) results in a Gaussian function with a variance that is smaller than both and .
If the true object
is a Gaussian with variance
,
then using Eq. (46) and the fact that
,
it is
Now, let us suppose that
.
Using Eqs. (44)-(46) one can obtain
(52) |
In light of these findings, it is possible to consider some convergence properties of the RL algorithm. We begin by showing that the sequence converges to . First, note that Eqs. (49) and (50) imply that is a decreasing sequence that is bounded below by . Hence, converges to some . From inequalities (49) and (50), we have that . Furthermore, the arguments used in the proof of Eq. (50) imply that if then . Thus it must be that .
With regard to the convergence rate of the RL algorithm,
Eq. (49) shows that, almost independently of the
characteristics of the object, in the first iteration, when
,
we have
(53) |
(54) |
Figure 28: vs. the number of iterations for RL in the case of a Gaussian source with various values of and a Gaussian PSF with dispersion . | |
Open with DEXTER |
Figure 29: vs. the number of iterations for the cases shown in Fig. 28. | |
Open with DEXTER |
A comparison between the RL solution for star-like sources at the
kth iterate
This effect is clearly visible in the
experiment shown in Fig. 30.
Figure 30: Deblurring results obtained with RL and LW for a star-like object superimposed on an extended source given by a Gaussian with circular symmetry and dispersion set to 30 pixels. An additional background is present with a level set to the maximum of the noise-free blurred image. The PSF is a Gaussian with circular symmetry and dispersion set to 6 pixels. The size of the image are pixels. The noise is Poissonian with peak . The image presented corresponds to the result with the smallest rms of the true residuals. | |
Open with DEXTER |
In this paper we provide explanations for why, in spite of the incorporation of a priori information regarding the noise statistics of image formation, the RL deblurring algorithm often does not provide results that are superior to those obtained by techniques based on an LS approach. In particular, we have identified a possible explanation in the regularization approaches of the specific algorithms. The adoption of a penalty term in the Tikhonov approach, or the need to stop the iterations before blow-up occurs in the iterative approaches, e.g. both LW and RL, do not permit the full exploitation of the information contained in the highest Fourier frequencies, i.e., those where the specific nature of the noise has the largest influence. This has two consequences: I) the performance of the LS algorithms is almost insensitive to whether the noise is Gaussian or Poissonian; II) the RL algorithm does not fully benefit from the fact that it incorporates the specific statistical model of the noise. In other words, the regularization of the solution implies a levelling out of the possible performances. In this respect, much more than a detailed knowledge of the nature of the noise is needed. Specifically, some rough a priori information regarding the solution, e.g. is it an extended or star-like object, is needed before one can know whether or not RL will provide superior results. Our numerical experiments support these conclusions. In particular, the fact that reconstructions obtained via the RL algorithm are often comparable to those of LW, i.e., an unsophisticated and very slow algorithm, indicates that resorting to advanced and often complex techniques is not always justified.
We stress that such conclusions are not only of academic interest. With respect to the ML algorithms, in general the LS algorithms are much easier to implement, are more flexible concerning the incorporation of constraints, are more amenable to a theoretical analysis of their characteristics and are computationally less costly. Consequently, unless the use of a different approach is justified, they should be considered the standard approach.