Free Access
Issue
A&A
Volume 551, March 2013
Article Number A138
Number of page(s) 14
Section Astronomical instrumentation
DOI https://doi.org/10.1051/0004-6361/201220318
Published online 08 March 2013

© ESO, 2013

1. Introduction

Ground-based instruments have now demonstrated the capability of detecting planetary mass companions (Chauvin et al. 2004; Lagrange et al. 2010; Marois et al. 2008) around bright host stars. By combining adaptive optics (AO) system and coronagraphs, some first direct detections from the ground have been possible in favorable cases, at large separations and in young systems when low-mass companions are still warm ( ≥ 1000 K) and therefore not too faint. There is a very strong astrophysical case to improve the high-contrast detection capability (105 for a young giant planet to 1010 for an earth-like planet in the near infrared) very close to stars (<0.1′′ to 1′′).

Several instruments will be capable of performing multispectral imaging and will allow characterizing the planets by measuring their spectra. This is the case of GPI (Gemini; Graham et al. 2007), Palm 3000 (Palomar; Hinkley et al. 2011), SCExAO (Subaru; Martinache & Guyon 2009), SPHERE (VLT; Beuzit et al. 2008), and several others that will follow, such as EPICS (E-ELT; Kasper et al. 2008). By combining extreme adaptive optics (Ex-AO) and more accurate coronagraphs than before, the level of star light cancellation is highly improved, leading to a better signal-to-noise ratio. Even so, the residual host star light is affected by the instrument aberrations and forms a pattern of intensity variations or “speckle noise” on the final image. Part of the speckles cannot be calibrated because they evolve on various time scales (neither fast enough to smooth down to a halo nor stable enough to remove) and for this reason, these “quasi-static speckles” are one of the main limitations for high-contrast imaging.

Several authors have discussed the challenge posed by the elimination of speckle noise in high-contrast multispectral images. It can be done by post-processing, after the best possible observations. Because images are highly spectrally correlated, one can use the wavelength dependence of the speckles to subtract them. In the particular case of coronagraphic multispectral imaging, only some empirical methods have been developed to subtract the speckle field from the image in the focal plane.

thumbnail Fig. 1

Evolution of the speckle field with the wavelength. Simulated images at 950, 1306, and 1647  nm for a 103 stellar flux over planet flux contrast. The dynamic range is adapted to the visualization. The speckle field moves with the wavelength but not with the planet position.

Open with DEXTER

We propose an alternative approach based on a parametrized imaging model for the post-processing of multispectral coronagraphic imaging corrected by an extreme AO system in the near-infrared domain. The aberrations and bright companions at small separations are estimated jointly in a Bayesian framework. In particular, it is possible to take advantage of prior information such as a knowledge on the aberration levels. This type of approach will be all the more efficient as the instruments improve with lower or more stable aberrations and more efficient coronagraphs.

In Sect. 2, we explain how previous methods used the information redundancy to suppress the speckles in high-contrast imaging. Then, we describe the advantages of a joint Bayesian estimation of the aberrations in the pupil plane and of the planet map, based on a parametrized model of coronagraphic imaging. Section 3 presents the long-exposure coronagraphic imaging model that is used to simulate the images and to restore them. We also study the case of an approximate model. Section 4 describes the proposed Bayesian joint estimation method as well as theoretical and numerical problems of an alternating restoration algorithm. In particular, we address the strong minimization difficulties associated to the aberration estimation and we propose some solutions. In Sect. 5, our method is validated by restoring images simulated with a perfect coronagraph.

2. Post-processing speckle subtraction and multispectral imaging

Several empirical post-processing methods have already been proposed to overcome the problem of detection limitation caused by the quasi-static speckles. Some of these methods used the wavelength dependence of the speckle pattern (Fig. 1) to estimate it and subtract it from the image, while preserving both the flux and spectrum of the planet.

Racine et al. (1999) suggested to subtract two images at different wavelengths to eliminate both the point-spread function (PSF) and the speckle field in non-coronagraphic images. The main limitation of this simultaneous differential imaging (SDI) method comes from the residuals caused by the evolution of the general PSF profile and of the speckle pattern with wavelength. These residuals can be reduced by increasing the number of images used for the speckle field subtraction. Marois et al. (2000) showed with their double difference method that adding another image to the SDI theoretically improves the signal-to-noise ratio in the final image of the restored companion. The case of multispectral images has been tackled by Sparks & Ford (2002), who described the so-called spectral deconvolution method in the framework of space-based observations for an instrument combining a coronagraph and an integral-field spectrometer (IFS). The method, subsequently improved and tested on ground-based non-coronagraphic data by Thatte et al. (2007), is entirely based on a speckle intensity fit by low-order polynomials as a function of wavelength in the focal plane. More recently, Crepp et al. (2011) combined this method with the LOCI algorithm, which is based on a linear combination of images (Lafrenière et al. 2007). They tested this approach to restored on-sky images from the Project 1640 IFS on the Palomar telescope. These methods are applicable to any optical system and in particular to those with coronagraphs. However, it is challenging to prevent the planet signals from being eliminated with the speckles because the planet presence is not explicitly modeled.

In addition, some information on the measurement system can be very useful to distinguish a planet from the speckle field. Burke & Devaney (2010) combined classical empirical techniques of differential imaging with a multi-wavelength phase retrieval method to estimate the aberration pattern in the pupil plane with a simple imaging model without a coronagraph. This multi-wavelength phase retrieval is nicknamed wavelength diversity (Gonsalves 1982). Information diversity is obtained here by different wavelengths whereas it is obtained by introducing a known phase, e.g. defocus, in phase diversity. But in contrast to the phase diversity, the wavelength diversity does not remove the phase sign ambiguity. In Burke & Devaney (2010), the inversion algorithm is based on a maximum-likelihood estimator, which measures the discrepancy between the data and an imaging model. The minimization of this estimator is all the more difficult as the number of unknowns to estimate is high. This problem is overcome by the sparse parametrization of the unknown phases φλ through the optical-path-errors (or aberrations) δ, assuming that the former are achromatic: . This allows one to exploit jointly the images at all wavelengths to estimate the aberrations efficiently: the map of the unknown optical-path-errors δ is common to all wavelengths. The number of unknowns is thus limited and the problem constrained. In the present case, Burke’s wavelength diversity method does not apply readily, because it assumes non-coronagraphic imaging, whereas we consider the highly non-linear case of a coronagraphic imaging model.

That is why we propose to take advantage of a combined use of wavelength diversity applied in a case of a coronagraphic imaging model, and a Bayesian inversion to jointly estimate the aberrations in the pupil plane and the planet map. The joint estimation aims at taking up the challenge of preserving the planets signal. An advantage of the Bayesian inversion is that it can potentially include an important regularization diversity to constrain the problem, using for example prior information on the noise, the planet map (position, spectrum, etc.) or the aberrations. In the Bayesian framework, the criterion to be minimized is the sum of two terms: the data fidelity term, which measures the distance between the data and the imaging model, and one or more penalty terms. An important difficulty is to define a realistic coronagraphic imaging model, that depends on parameters (e.g. aberrations) that can be either calibrated beforehand or estimated from the data.

3. Parametric model for multispectral coronagraphic imaging

To carry out the Bayesian inversion, we need a parametric direct model of coronagraphic imaging. This direct model will also be useful to simulate our test images.

thumbnail Fig. 2

Optical scheme of a coronagraphic imager. The upstream and downstream static aberrations, and the adopted notations are denoted δu and δd denote focal plane complex amplitudes, whereas Ψi(ρ) denotes pupil plane amplitudes.

Open with DEXTER

We used a non-linear analytical expression of the coronagraphic image as proposed by Sauvage et al. (2010), with an explicit role of the optical aberrations before and after the coronagraph, and turbulence residuals. This model assumes that the coronagraph is “perfect” in the sense that the coherent energy is perfectly canceled out. The presence of upstream aberration however, will result in remaining intensity from the star in the image. The aberrations, or optical-path-errors, δ, are assumed to be achromatic as an approximation. The most recent spectro-imagers take increasing care to avoid any source of chromatism, such as out-of-pupil aberrations, down to a level compatible with contrasts higher than 106. The variable represents the angular position in the focal plane in radians and the variable is the angular position in the pupil plane in radians-1. Finally, corresponds to a spatial position in the pupil plane in meters.

We recall and discuss this model below. In particular, we estimate its simplified expression in the asymptotic case of very low phase, with its second-order Taylor expansion. This simplified expression helps to understand the explicit way in which each type of aberration impacts the image. It also helps to identify some important ambiguities with different sets of phases that can produce similar images, which will guide the subsequently selected approach for phase retrieval. We also estimate the departure from this low-phase approximation when the phase grows and discuss the validity of this approximation in a SPHERE-like case.

3.1. Imaging model

We assume that for an AO-corrected coronagraphic image at the wavelength λ, the direct model is the following sum of three terms, separating the residual coronagraphic stellar halo, the circumstellar source (for which the impact of coronagraph is neglected), and noise nλ: (1)where is the data, i.e., the image to which we have access, is the star flux at wavelength lambda and , the non-coronagraphic PSF, which can be estimated separately. Solving the inverse problem is finding the unknowns, namely the object  and the speckle field , which we also call the “coronagraphic PSF”.

3.2. Long-exposure coronagraphic PSF model

A model description of directly depends on the turbulence residuals and optical wave-front errors. After previous works to model non-coronagraphic PSFs (Perrin et al. 2003) and coronagraphic PSFs (Cavarroc et al. 2006; Soummer et al. 2007), Sauvage et al. (2010) proposed an analytical expression for the coronagraphic image with a distinction between upstream and downstream aberrations (cf. Eq. (16) in Appendix B). The considered optical system is composed of a telescope, a perfect coronagraph, and a detector plane (cf. Fig. 2). Some residual turbulent aberrations δr(ρ,t) are introduced in the telescope pupil plane. δr(ρ,t) is assumed to be temporally zero-mean, stationary and ergodic. Because we only consider exposure times that are long with respect to turbulence timescales, these turbulent aberrations contribute only through their statistical spatial properties: power spectral density Sδr(α) or structure function Dφr. The static aberrations are separated into two contributions: the aberrations upstream of the coronagraph δu(ρ), in the telescope pupil plane and the aberrations downstream of the coronagraph δd(ρ) in the Lyot Stop pupil plane . The perfect coronagraph is defined as an optical device that subtracts a centered Airy pattern of maximal energy from the electromagnetic field. Finally, the coronagraphic PSF depends on three parameters that define our system: the aberration maps δu, δd and, the residual phase structure function Dφr.

3.2.1. Approximate long-exposure coronagraphic model in the low-phase regime

Because the analytical expression for is a highly non-linear function of the aberrations (Sauvage et al. 2010), we derived and studied the relevance of an approximate expression for this model (Ygouf et al. 2010). Approximate coronagraphic imaging models have been derived in several works. Cavarroc et al. (2006) have developed a short-exposure expression and showed by simulations that the main limitation comes from the static aberrations and particularly the aberrations upstream of the coronagraph. Here, we consider a long-exposure imaging model and confirm analytically the dominance of the upstream aberrations. Soummer et al. (2007) have developed a two-term expression with one static and one turbulent term. Nevertheless, these terms are not explicitly linked to the aberrations, which is what we are interested in.

thumbnail Fig. 3

Speckle non-centrosymmetry. Evolution of the level of residuals with respect to the rms value of the upstream quasi-static aberrations in an image simulated with the model of Sauvage et al. These residuals correspond to the antisymmetric part of the image.

Open with DEXTER

Assuming that all phases are small and that the spatial means of φu(ρ) and φd(ρ) are equal to zero on the aperture, we derive a second-order Taylor expansion of expression 24 of Sauvage et al. (2010): (2)where and are the Fourier transforms of the downstream pupil and upstream aberrations and P[δr(λρ,t)] denotes the piston of the aberration map δr(λρ,t). is a corrective term that compensates for the fact that δr(λρ,t) is stationary and thus non-piston-free on the aperture at every instant. is the Airy pattern formed by the pupil .

This approximate expression brings physical insight into the long-exposure coronagraphic PSF model of Sauvage et al.:

  • The speckle pattern scales radially in λ within the approximate model and evolves as 1/λ2 in intensity in the data cube. It is consistent with the analysis of Sparks & Ford (2002), who performed fits of low-order polynomials as a function of the wavelength after rescaling radially.

  • The approximate expression can be separated into one static and one turbulent term. This is consistent with the analysis of Soummer et al. (2007) with the advantage that these terms depend on the parameters of interest. The turbulent term is simply the turbulent aberration power spectral density, as seen at the resolution of the instrument, i.e., convolved by the output pupil Airy pattern. The static term is explicitly a function of the upstream aberrations.

  • The downstream aberrations do not appear in the static term. This confirms that the role of the aberrations upstream and downstream of the coronagraph is very different and that upstream aberrations are dominant in the final image.

  • Four equivalent upstream aberration sets, δu(ρ), δu(−ρ),  − δu(ρ) and  − δu( − ρ), which we call quasi-equivalent aberration maps in the following, lead to the same image (cf. Appendix A). This item is discussed in more detail in Sect. 4.4.2.

  • By using this approximate expression for in the imaging model (1), we can see that there is a degeneracy between the value of the star flux and the rms value of the aberration map, if there is no turbulence. Indeed, without turbulent aberrations, the approximate model multiplied by the star flux can be written as (3)This is discussed in Sect. 4.4.1 in greater depth.

3.2.2. Discussion

Because of the complexity of the long-exposure coronagraphic PSF model of Sauvage et al., we first considered using the approximate model in our inversion algorithm to decrease the number of unknowns to estimate and simplify the criterion to minimize. But a study of this approximate model showed that the resulting image is too different from the one simulated with the Sauvage et al. expression: computing the root mean square of the difference between the two images leads to a substantial error of typically 30% in SPHERE-like conditions (Ygouf et al. 2010).

thumbnail Fig. 4

Speckle non-centrosymmetry. In the same dynamic range: (left) simulated image, (center) 180 degree rotated version of the image and (right) residuals after combination of the image with the 180 degree rotated version of itself. The gain on the rms value after the subtraction is about 6.

Open with DEXTER

thumbnail Fig. 5

Speckle dilation. In the same dynamic range: (left) image at 950 nm rescaled at 1650 nm and multiplied by the γ coefficient, image at 1650 nm (center) before and (right) after speckle subtraction. The gain on the rms value after the subtraction is 2.5.

Open with DEXTER

Speckle non-centrosymmetry.

A substantial part of this error arises because in the approximate model, the quasi-static speckles are purely centrosymmetric. Thus, if we eliminate the centrosymmetric part of the image by combining it with a 180 degree rotated version of itself as follows: (4)there are no residuals with this model, i.e. iantisym = 0. But this is not the case with a more physically realistic image such as the one simulated with the model of Sauvage et al. The level of residuals after such a subtraction is determined by the quantity of upstream aberrations, as we can see in Fig. 3. For example, with 30 nm rms of upstream aberrations, the level of residuals is about six times lower than the rms value of the simulated image (cf. Fig. 4). Thus, for a significant quantity of upstream aberrations, using the model of Sauvage et al. rather than the approximate model for an inversion, should lead to fewer residuals on the final image.

Speckle dilation.

Another difference between the two models tips the balance toward the model of Sauvage et al.. Indeed, in the approximate model the speckle dilation is powered by the 1/λ2 factor. If we subtract an image at 950 nm from its 1650 nm-dilation, there are no residuals with this model. The same operation with the model of Sauvage et al. (cf. Fig. 5) shows some residuals, at a level 2.5 lower than the rms value of the simulated image, which attests that it is not a pure speckle dilation. To do this, we took the images at the minimum and maximum wavelengths and rescaled the image at 950 nm with respect to the image at 1650 nm. Finally, we performed the following spectral differences between the two images: idiff1650 = i1650   nm − γi950   nm, where γ is a coefficient that minimizes the squared difference |imax − γimin|2, and is given by Cornia et al. (2010): (5)A fine model of the speckle field must be able to account for deviations form centrosymmetry and for deviations from a radial scaling of the speckles proportional to the wavelength. That is why we adopted the model of Sauvage et al. rather than the approximate one in the inversion.

3.2.3. Assumptions on the long-exposure coronagraphic PSF model

The information we obtained from the approximate model study helped us define some key assumptions for the success of the speckle field estimation with the Sauvage et al. long-exposure coronagraphic PSF model.

Because they have quite a different impact on the final image, it is important to distinguish the aberrations upstream and downstream of the coronagraph. The effect of the downstream aberrations is lower than that of the upstream aberrations, and furthermore, in predicted systems such as SPHERE, they are expected to be much more stable and easier to calibrate than upstream aberrations. Additionally, because we consider long-exposure images, the residual turbulent aberrations will be averaged to form a smooth halo that is easily distinguishable from a planet. Furthermore, the statistical quantity Dφr, which characterizes this halo, will be measured through the AO system wavefront sensor (Véran et al. 1997). Therefore, we here assumed that both the static downstream aberrations and the residual turbulent aberrations are calibrated and known. This decreases the number of unknowns because the only aberration map to estimate in order to access the coronagraphic PSF is the quasi-static upstream aberrations. We therefore denote the long-exposure coronagraphic PSF by instead of to underline the fact that δd and Dφr are assumed to be known.

An advantage of our approach is that these assumptions can evolve. The formalism will allow us to refine our method if we finally decide to estimate either the downstream aberrations or the residual turbulent aberrations. Thus, we can slowly increase the complexity of the problem in anticipation of using real data from SPHERE or from another instrument.

4. Joint estimation of wavefront and object algorithm and minimization strategy

This section introduces the criterion to be minimized (Sect. 4.1) as well as the regularization elements that were used to constrain the problem for the present validations (Sect. 4.2). The minimization algorithm is then described, stressing the two stages that constitute its core (Sect. 4.3). One of these stages presents some convergence difficulties. A minimization strategy is described in Sect. 4.4. The chosen optimizer in described in Sect. 4.5.

4.1. Definition of the criterion to be minimized

Following the Bayesian inverse problem approach, solving the inverse problem consists of finding the unknowns, firstly the object characteristics , secondly the parameters of the speckle field and , which are the most likely given the data and our prior information about the unknowns. This can be reduced to minimizing the following criterion: (6)This criterion is the sum of two terms: the data fidelity term, which measures the distance between the data and the imaging model, and a non-exhaustive list of regularization terms on our unknowns Ro, Rf ∗ , Rδ. We consider that the noise is the sum of a photonic contribution and a detector contribution, that it is white and approximately Gaussian, which is a valid approximation for the flux as considered in this application. The noise variance is assumed to be known here and if it were not, it could be estimated as  (Mugnier et al. 2004), where  is the photon noise variance and is the detector noise variance previously calibrated. It is assumed that the noise is not correlated from pixel to pixel or between images.

The star flux at each wavelength can be analytically estimated from the criterion provided the regularization on flux is quadratic or absent. In the latter case, the maximum likelihood solution being given by , we obtain: (7)Thanks to this analytical expression, the criterion to be minimized is that of Eq. (6) with replaced by , which will be denoted by and depends explicitly on oλ and δu only.

Estimating both the object and the aberrations from a single image is a highly underdetermined problem. Any diffraction-sized feature in the halo can be interpreted either as a circumstellar point-source or as a part of the stellar speckle halo. This ambiguity can be decreased by using multispectral images but it is not sufficient. It is thus necessary to regularize the problem by adding more constraints. This is the role of the regularization terms. In this paper, we study the case of constraints on the object that is sufficient for the simulated images we used, but we should keep in mind that it is also possible to use constraints on the aberration map.

4.2. Regularization terms and constraints

We describe below the regularization terms that we used for the validation tests of this paper. Other regularizations could be chosen depending on the kind of images to be processed.

4.2.1. Regularization on the object Ro

A regularization on the object is fundamental to help compensating for the degeneracy that exists in the inversion between the aberrations and the object. By penalizing the energy in the object map, it favors the energy in the aberration map and prevents the speckles from being mistaken for a planet.

The regularization term Ro includes the prior spatial and spectral information we have on the object. We chose here an L1-L2 white spatial regularization, which assumes independence between the pixels (Meimon et al. 2009) because we are mainly looking for point sources. We used an L1-L2 regularization rather than a true L1 regularization to keep a differentiable criterion, which simplifies the minimization problem. The spectral prior is based on smoothness of the object spectrum. We currently assume that the object is white (constant spectrum) but as the final aim is to extract some spectra, for future validations we will use a L2 correlated spectral regularization (Thiébaut & Mugnier 2006), which will involve the differences between the spectrum at neighboring wavelength at each pixel and will enforce smoothness on the object spectrum.

The regularization on the object is necessary to obtain a sparse object. Without the regularization many residuals remain on the estimated object.

4.2.2. Positivity and support constraints on the object

The object intensity map is a set of positive values, which is an important prior information. One should therefore enforce a positivity constraint on the object. This constraint can be implemented in various ways, such as criterion minimization under the positivity constraint, reparameterization of the object, or explicit modification of the a priori probability distribution (e.g., addition of an entropic term). Mugnier et al. (2004) have found that the best way to ensure positivity, with respect both to speed and to not introducing local minima, is to directly minimize the criterion under this constraint. We proceeded similarly.

Because the star light is concentrated around the optical axis, the flux is essentially estimated on this very bright region (cf. Eq. (7)). But if there are too many residuals on the object, the flux estimation can be biased. Thus, imposing, as is physically meaningful, that the object is null very close to the star, in a region of typically 3λ/D radius, helps us estimate the star flux accurately.

4.2.3. Regularization on the star flux Rf

To make the minimization more robust, it can be useful to constrain the flux estimation to physical values. Indeed, if there are no turbulent aberrations, there is an indetermination between the star flux value and the phase rms value in the approximate model of Eq. (13). Consequently, for aberrations with very low rms values, the coronagraphic PSF is close to zero and thus the analytical flux estimate given by Eq. (6) can diverge. The presence of known turbulent aberrations naturally constrains the flux value, but to make the method more robust, we chose to prevent the flux from diverging. In practice, we regularized the flux estimation by the following quadratic metric: . This metric can be interpreted as a Gaussian prior low on the flux, but its role is not as essential for the criterion minimization as that of the L1-L2 regularization and the positivity constraint. This leads to the following expression for the analytic star flux: (8)In practice, we chose a very high standard deviation σf,λ = 100    ×  ∑ αiλ, to avoid biasing the flux. With this standard deviation, we can choose any mean flux, for example, f0 = 0. This is sufficient to avoid the division by zero in the flux computation and thus the flux divergence.

4.3. Iterative algorithm

The structure of the joint criterion of Eq. (6) prompted us to adopt an estimation of wavefront and object that alternates between estimation of the aberrations, assuming that the object is known (multispectral phase retrieval) and estimation of the object assuming that the aberrations are known (multispectral deconvolution).

Multispectral deconvolution.

For given aberrations, we define the following intermediate data where the (assumed known) stellar halo is subtracted, keeping then only the circumstellar object as seen in classical imaging: . By inserting these intermediate data into the the criterion of Eq. (6), we obtain: (9)which shows that the problem at hand is a non-myopic multispectral deconvolution of images . The chosen regularization leads to a convex criterion (Mugnier et al. 2004) and thus to a unique solution for a given set of aberrations and a given object regularization.

Phase retrieval.

If we replace the intermediate data into the the criterion of Eq. (6), we obtain: (10)which shows that the problem at hand is essentially a phase retrieval problem. In this phase retrieval stage, the combination of a high number of parameters to estimate (typically 103, see Sect. 5) and of a highly non-convex criterion complicates the problem. To avoid local minima, several numerical solutions resulting from a fine understanding of the imaging process are necessary and are described below.

4.4. Phase retrieval: dealing with local minima

4.4.1. Choice of an appropriate starting point: very small random phase

To keep the computation time reasonable, we used the local descent algorithm described in Sect. 4.5 to minimize the criterion. Because the latter is highly non-convex, the chosen starting point so that we are fully in the conditions where the Taylor expansion developed in 3.2.1 is valid and where the criterion is less non-convex. It allows the algorithm to avoid many wrong directions, and thus many local minima. As the algorithm converges, the upstream aberration rms value increases toward its true value and a gradual non-linearity of the model is gradually introduced.

We tested the phase retrieval capability of our algorithm with respect to the chosen starting point, assuming that there is no object to estimate. We give the rms value of the estimated upstream aberration map and the rms value of the difference between the simulated and the estimated maps estimated as follows: (11)The inversion is performed with one spectral channel and without turbulence. Figure 6 compares some estimated upstream aberration maps (a, b, c) to the simulated one (“true”):

  • (a)

    Using a random aberration map with the same rms value as thetrue aberrations (30 nm at950 nm) as a starting point does not help infinding the global minimum. Indeed, the algorithm convergesvery quickly toward a local minimum and the estimatedaberration map (rmsa = 307 nm at 950 nm) is completely different from the simulated one (rmsdiff,a = 103%).

  • (b)

    Using a zero-aberration map as a starting point does not work either. This is probably because the approximate model is an even function. For this particular starting point, the gradient is null, which leads to some convergence difficulties. The rms of the difference between the two maps is about 1.4 × 104%. The estimated aberration pattern (rmsb = 4069 nm at 950 nm) seems to show that the algorithm does not explore the high frequencies.

  • (c)

    The solution we propose is to use as a starting point for the minimization a non-null random aberration map with a low rms value compared to those of the “true” simulated aberration map. In practice, we chose an rms value about 108 times lower than the “true” value. This leads to a correct estimation of the aberration map (rmsc = 30.2 nm at 950 nm) with an rms of the difference between the two maps of about 0.6%.

If we plot the same results for images simulated with turbulence, the conclusions are not the same. The convergence to an aberration map that resembles the true one and has similar rms value is easier:

  • (a’)

    By using a random aberration map with the same rms value asthe true aberrations (30 nm at950 nm) as a starting point, the rms value of theestimated aberration map (30.7 nm) is close tothe true value. After a careful inspection, the estimated aberrationmap turned out to be similar to the opposite of the simulated one.This is discussed in more detail in 4.4.2.

  • (b’)

    Using a zero-aberration map as a starting point leads to a good pattern and a good rms value of aberration map (30.1 nm).

  • (c’)

    Using a non-null random aberration map with a low rms value as a starting point also leads to a good pattern and a good rms value of the aberration map (30.1 nm).

For the (b’) and (c’) cases, the difference between the two maps is bigger than before (17%) but the estimated aberration map is a sufficiently good starting point for the alternating minimization.

The choice of an appropriate starting point seems not to be as essential with turbulent aberrations as it was without turbulent aberrations. Indeed, the presence of turbulent aberrations raises the ambiguity that exists between the value of the star flux value and the rms value of the upstream aberration map. Nevertheless, even if we assume that there are turbulent aberrations in the following, we chose to use a random aberration map with a low rms value as a starting point of the phase retrieval because it allows us to avoid some local minima by linearizing the highly non-linear model used in the inversion.

thumbnail Fig. 6

Choice of an appropriate starting point. Estimated upstream aberration maps with one spectral channel for three different starting points. [i] Without turbulent aberrations in the simulated images. [ii] With turbulent aberrations in the simulated images. From left to right, with a dynamic range adapted to the visualization: “true” simulated aberration map, (a) and (a’) estimated aberrations with a random aberration map (rms value of the simulated aberrations) as starting point, (b) and (b’) estimated aberrations with a zero aberration map as starting point, and (c) and (c’) estimated aberrations with a random aberration map (rms value 108 times lower than the “true” one) as starting point. The estimation is performed with a regularization on the star flux.

Open with DEXTER

thumbnail Fig. 7

Estimated upstream aberrations for the four quasi-equivalent aberration maps as starting points. From left to right, with the same dynamic range: , , , . The image simulation is performed with one spectral channel in the presence of turbulent aberrations.

Open with DEXTER

4.4.2. Avoiding some local minima by testing quasi-equivalent starting points

In the approximate model, four different aberration maps can give the same image (cf. Eq. (15) in Appendix A). This means that, from a given starting point, the minimization algorithm can take four different but equivalent directions from the approximate model point of view. But from the point of view of the model used in the inversion, this is not the case because it depends on downstream aberrations, which break the symmetry. That is why we call them “quasi-equivalent” aberrations maps (cf. 3.2.1).

The idea is then to explore the several regions offered by the four different quasi-equivalent aberrations maps to determine which of these solutions gives the smallest criterion. To do this, we performed an initialization step where the very small random phase is taken as a starting point. A first phase retrieval stage was performed with this starting point, leading to a first estimated aberration map denoted by . Then, the three other quasi-equivalent aberration maps , and were taken as starting points for three other phase retrieval stages. This led to three more estimated aberration maps denoted by , and .

Figure 7 shows the four estimated aberration maps at the end of the initialization step. These estimated aberration maps are compared to the simulated one (Fig. 6d). The final aberration map chosen as a starting point for the alternating algorithm is the one that gives the minimum value for criterion J of Eq. (6): (12)In practice, quite often and in particular in this simulation, it turns out that the chosen set of aberrations is also the one whose rms value (rmsb’ = 30.1 nm at 950 nm) is closest to the ‘true” phase (Fig. 6i. “true” and Fig. 6ii. “true”).

thumbnail Fig. 8

Block diagram of the algorithm used for the joint estimation of the object map and the upstream aberrations.

Open with DEXTER

4.4.3. Avoiding some local minima in the multispectral inversions by taking the previously estimated aberration map as starting point

We used a local descent algorithm to minimize the criterion. For this, the gradients are computed for each explored direction and the computation is all the longer as there are spectral channels. That is why it is useful to perform several inversions by gradually increasing the number of spectral channels. This also helps us avoid some local minima. We begin by an inversion with one spectral channel. Then, we add some more spectral channels for new inversions and each time, we take the previous estimated aberration map as a starting point.

Because an inversion with only one spectral channel sometimes leads to a local minimum, it is useful to also test the four quasi-equivalent starting points with two spectral channels (cf. Sect. 4.4.2).

4.5. VMLM optimizer

To minimize the criterion, we chose the variable metric with limited memory and bounds (VMLM-B) method (Thiébaut 2002). Updated from the BFGS variable metric method (Press et al. 2007), it is usable for a problem of large dimensionality. Moreover, it offers the possibility to constrain these parameters. This makes this method a good tool for many inversion problems in high angular resolution (Meimon et al. 2009; Gratadour et al. 2005). It is available from http://www-obs.univ-lyon1.fr/labo/perso/eric.thiebaut/optimpack.html.

4.6. Summary of the developed algorithm

Figure 8 summarizes the different steps of the developed algorithm. The choice of a very small random phase as a starting point is essential because it avoids falling into some local minima (Sect. 4.4.1). An initialization phase is performed. It consists in running the algorithm for the four quasi-equivalent solutions (Sect. 4.4.2). The solution that leads to the lowest criterion value is selected. Then, the minimization core is performed, alternating between the aberration estimation, assuming that the object is known (multispectral phase retrieval), and the object estimation, assuming that the aberrations are known (non-myopic multispectral deconvolution). Regularization terms and constraints prevent the algorithm from falling into other local minima (Sect. 4.2.3). Iterations are performed (Sect. 4.3) until the stopping rule of the algorithm is verified. The chosen optimizer is the VMLM-B (Thiébaut 2002, Sect. 4.5).

5. Validation of the inversion method by simulations

In this section, we validate the exoplanet detection capabilities of our inversion method. After giving the numerical simulation conditions, we investigate the estimation quality of the aberrations and the object as a function of the number of images at different wavelengths used. We also study the algorithm robustness with respect to the simulated images and with respect to the starting point we use. Finally, we study the effect of the bandwidth on the quality of the object estimation.

5.1. Simulation hypothesis

From a data cube of six images simulated with the image formation model of Eq. (1) and the Sauvage et al. (2010) analytical expression for coronagraphic imaging (cf. Eq. (16) in Appendix B), we jointly estimated the speckle field and the object map. We chose pixel indicator functions as the basis for the phase rather than, e.g., a truncated basis of Zernike polynomials, to model and reconstruct phases with a high spatial frequency content. The hypotheses are typical of a SPHERE-like instrument: upstream δu and downstream δd aberrations simulated with standard deviation of 30 nm (cf. their power spectral densities in Fig. 10), star-planet angular separations of 0.2 and 0.4 arcsec, contrasts, i.e. ratio of star flux over planet flux of 105, 106 and 107, a [950 nm; 1647 nm] spectral bandwidth and an integrated flux of 4 × 1011 on the data cube in presence of photon noise and a transmission (throughput and quantum efficiency) of 10%, corresponding to the observation of a 6-magnitude star for 25 min with the VLT. We used 128 × 128 pixels to simulate our images, Shannon-sampled at 950 nm. This results in a number of unknowns to estimate for the aberration map of about 3 × 103. If we add the unknowns to estimate for the object map, which is 16 × 103, the total number of unknowns is about 2 × 104. Figure 9 shows the simulated objet map (9(a)) and the associated image in the focal plane (9(b)). For an easier visualization, we represent the images in the focal plane and not the object map in the following. Figure 9 shows the simulated aberration map (9(c)) and the associated image of the speckle field in the focal plane (9(d)).

thumbnail Fig. 9

Simulated images at λ = 950nm. a) Simulated object map and b) associated image in the focal plane. The following planets are simulated: one with a star-over-planet contrast of 105 at a separation of 0.2 arcsec (planet 1), two with star-over-planet contrast of 106 at separations of 0.2 (planet 2) and 0.4 (planet 3), respectively, and one with a star-over-planet contrast of 107 at a separation of 0.4 arcsec (planet 4). The image in the focal plane is obtained by convolving the object map oλ by the non-coronagraphic psf . c) Simulated aberrations and d) associated image of the speckle field in the image focal plane. The image is given by the coronagraphic PSF .

Open with DEXTER

thumbnail Fig. 10

Power spectral densities of the simulated aberrations. The upstream and downstream aberrations are randomly generated according to a f-2 spectrum and with an rms value of 30 nm. The downstream aberrations are not corrected by the adaptive optics. The upstream aberrations are corrected by the adaptive optics up to 20 cycles/pupil. The residuals are due to the non-common path aberrations. Below 4 cycles/pupil, these non-common path aberrations are corrected but there are some residuals from rotative optics.

Open with DEXTER

5.2. Algorithm robustness and performance studies

We jointly estimated the upstream quasi-static aberration map and the object map with multispectral data. To study the robustness of the method we have developed, we ran several simulations in a Monte Carlo-like manner. Both different simulated images (5.2.1) and different starting point (5.2.2) were used for the inversion. The results of the inversions with two and six spectral channels were compared to study the effect of the redundancy of information offered by the multispectral imaging.

5.2.1. Different simulated phases

We applied our method to ten images simulated with different random upstream aberration maps to assess the algorithm robustness. Figure 11 shows the estimated images of the object from these images for a two-spectral channel inversion (a) and a six-spectral channel inversion (b). These results bring several conclusions, both on the robustness of the method and the multispectral redundancy.

thumbnail Fig. 11

Robustness study on the simulated phases. With the same dynamic range, at 950 nm: estimated planet images from different images, simulated with ten different randomly generated upstream aberration maps.

Open with DEXTER

Robustness of the method.

In one out of ten cases, some residuals from the speckle field remain on the object. The level of this residual prevents one from detecting any planet. In the other nine cases, the level of the residuals is negligible, which allows one to detect at least one planet. These observations, independent of the number of spectral channels used for the inversion, shows that the method is relatively robust, given the minimization difficulties we met.

Multispectral redundancy.

We can see some significant differences between the two-spectral channel inversion and the six-spectral channel inversion:

  • The planet with a contrast of 105 is detected in eight out of ten cases with two spectral channels and in nine out of ten cases with six spectral channels.

  • Only one planet with a contrast of 106 is detected with two spectral channels in three out of ten cases. The same planet is detected in nine out of ten cases with six spectral channels.

  • The other planet with a contrast of 106 is detected in three out of ten cases with six spectral channels.

  • Furthermore, the estimation of the flux of the planets is more accurate with six spectral channels. The inversion shown in Fig. 12 is representative of a large body of simulated tests and is performed with two and six spectral channels. The simulated and estimated image of the object maps for these two different inversions are represented as the errors on the estimated planet flux when the planet is effectively detected.

These results show that multispectral redundancy helps us detect the planets. However, we note that even if the convergence problems were solved with our minimization strategy in most cases, there are still challenges to be overcome to arrive at a perfectly robust algorithm.

thumbnail Fig. 12

Multispectral redundancy. With the same dynamic range, at 950 nm, from left to right: simulated planet image and estimated ones with a two- and six-spectral channel inversion. The errors on the planet flux estimations are computed when the planet is detected.

Open with DEXTER

5.2.2. Different starting points

For each of the ten previous simulated phases, we took ten different random aberration maps with the same level of rms value as starting points to demonstrate than any random aberration map, provided it is small, allows us to find a good solution. Figure 13 shows the estimated images of the object for the ten different random aberration maps, used as starting points, for one phase.

The starting point does not have a strong impact on the planet detection. In all cases, the three brightest planets are detected. In only one out of ten cases, a false planet is detected. Despite all the attention given to the initialization of the algorithm, the choice of starting point can have, to a limited extent, an effect on detection.

thumbnail Fig. 13

Robustness study on the aberrations maps used as starting points. With the same dynamic range, at 950 nm: estimated planet images from one image with different random aberration maps used as starting points.

Open with DEXTER

5.3. Bandwidth effect

We study here the bandwidth effect on the planet image estimation quality. We selected the following spectral bandwidths: [950 nm; 1050 nm], [950 nm; 1150 nm], [950 nm; 1350 nm], and [950 nm; 1650 nm]. The two last bandwidths correspond to operational modes of the SPHERE-IFS instrument, whereas the other bandwidths are produced for the purpose of our study’s completeness. Figure 14 compares the estimated planet image maps for these different inversions. If the planet with a contrast of 105 is detected each time, only the two broader bandwidths (700 and 400 nm) allow one to detect the two planets with a contrast of 106. With a 200 and a 100 nm bandwidth, the planet with a contrast of 106 which is at a separation of 0.4 arcsec is detected, whereas the one that is at a separation of 0.2 arcsec is not.

The detection performance increases with the bandwidth because the incorporated spectral information helps the algorithm to distinguish between speckles and planets. Indeed, from one wavelength to another, the amplitude of a speckle movement is proportional to the wavelength difference and to the radial position of the speckle. For a given spectral bandwidth, it is therefore easier to detect a planet that is far away from the star than a planet than is close to the star.

thumbnail Fig. 14

Bandwidth effect. With the same dynamic range, at 950 nm, from left to right: estimated planet images with a 700, 400, 200, and 100 nm-bandwidth inversion.

Open with DEXTER

6. Conclusion

We have proposed an original method that jointly estimates the object (multispectral deconvolution) and the aberrations (multispectral phase retrieval) for the new generation of planet finders. For the first time, a fine parametric model of coronagraphic imaging, describing the instrument response, is used for the inversion of simulated multispectral images in a solid statistical framework. Even though the model remains a simplification of reality, in particular when assuming achromatic wavefront errors, it goes much further than only assuming that the spatial speckle pattern essentially scales with wavelength. We have shown that the second-order approximation of the imaging model has the same behavior as the often-used model for the problem at hand: the speckle pattern is centrosymmetric, it scales with the wavelength, and it is not dependent on the downstream aberrations. The departure from this case quickly becomes very significant as the phase grows. It is clear that the ability of a finite phase to produce non-symmetric speckles induces a strong ambiguity between the estimates of the aberrations and the object, even if the latter is a point-like companion.

To set up our method, we developed an iterative algorithm. With only one spectral channel, this joint estimation is an underdetermined problem, as we emphasized before. This underdetermination results mathematically in a degeneracy of the global minimum. A multispectral inversion raises this underdetermination but it is still possible to fall into local minima. Because of the high non-linearity of the coronagraphic imaging analytical model and the number of unknowns to estimate (about 103 in our case), the phase retrieval, even if it is multispectral, remains a difficult problem. We set-up a minimization approach that remains quite fast (without systematically exploring the whole parameter space in a search for global minimum) but that is still relatively robust: extensive tests showed the success in converging to a good solution in 90% of the cases in a systematic manner, and in the other cases, the failure appears obviously in the results with no risk of confusion and can motivate tests with alternative criterion minimization approaches. We obtained these convergence capabilities of the algorithm by bringing original solutions to the minimization difficulties of the phase retrieval, inspired by studying the imaging model. One element of the solution is to use a small random aberration map as starting point. Another element is to explore the several directions offered by the quasi-equivalent aberration maps.

A wide variety of prior information, either about the system (aberrations, flux, noise) or about the object of interest, can be used to constrain the problem. The choice of a Bayesian approach allows this flexibility. In particular, a regularization that minimizes the energy on the object map helped us in separating the aberrations from the object and in decreasing the speckle noise in the reconstructed object map.

The restoration of images simulated with a perfect coronagraph is very encouraging for the extraction of planetary signals at levels that begin to be astrophysically interesting. We demonstrated the efficiency of the method even with only two spectral channels, by achieving a contrast of 105 at 0.2 arcsec. Multispectral redundancy improves the detection, which allowed us to achieve a contrast of 106 at 0.2 arcsec with six spectral channels.

We therefore believe that this approach will be quite powerful when we are faced with experimental data. This deserves to be studied, as well as how the performance will evolve in the cases of images simulated with a non-perfect coronagraph, real images from the SPHERE instrument in the lab, or real images from an instrument on-sky. Eventually, this method could be used to improve the performance of the existing multispectral imaging instruments, providing better astrophysical exploitations. Now that we demonstrated that we can manage the difficulties linked to the criterion minimization, we can now focus on its applicability, and on adding more prior information, starting with the full set of information that can be obtained from the instrument calibrations.

The lessons learned by applying the method could also facilitate the approach for the design of future instruments such as EPICS for the European Extremely Large Telescope (Kasper et al. 2008), and the definition of their calibration procedures.

Appendix A: Indetermination on the estimated aberrations from an image simulated with our approximate model

We show here that four sets of upstream aberrations give the same image for our approximate model. We re-write the expression below as a function of φu = (2π/λ) × δu and without the variables for better readability: (13)We consider here the static term and re-write it in the form of a correlation. For this, we consider two functions and g = δu of the two variables ρx and ρy and denote .

By using the definition of the correlation Γfg(ρ) and the convolution of the two functions f(ρ) and g(ρ), and the properties we obtain This yields (14)The properties of the autocorrelation and lead to and This yields (15)which means that the upstream aberration sets δu(ρ), δu( − ρ),  − δu(ρ) and  − δu( − ρ) are equivalent with respect to the approximate model, because they give the same image. This is true even in the presence of the turbulent term .

Appendix B: Indetermination on the estimated aberrations from an image simulated with the Sauvage et al. model

In classical imaging, i.e. “non-coronagraphic imaging”, the sign of the even part of the phase cannot be deduced from only one image in the focal plane (Blanc 2002). In other words, if we denote φe and φo, the even and the odd parts of the phase, the two phases φ = φe + φo and φ′ =  −φe + φo give the same image. In this appendix, we show that this is also the case for the Sauvage et al. expression (Sauvage et al. 2010) in coronagraphic imaging, if we assume that the sign of the even part changes for all phase errors.

The expression of Sauvage et al. is (16)with , , and φtot(ρ) = φr(ρ) + φu(ρ) + φd(ρ). TF[.] denotes the Fourier transform. represents the mean Strehl ratio during observation, such as with φ(ρ,t) = φr(ρ,t) + φu(ρ).

The first term of Sauvage et al.’s expression is the classical case of non-coronagraphic PSF, which is well-known (Blanc 2002). The term stays identical whatever the sign of the even part of the phase.

The second term of Sauvage et al.’s expression is the product of two factors: and . The latter stays identical whatever the sign of the even part of the phase. We take the following phase φ′ =  −φe + φo and calculate the corresponding , assuming that ρ′′ =  −ρ: is then independent of the sign of the even part of the phase. Thus, the product is also independent of the sign of the even part of the phase.

We study now the third term . Assuming that φd = (φd)e + (φd)o and :

In the same way as the previous demonstration, we can show that . Under the effect of the transformation φ → φ′, the different terms become

In other words, When we take the conjugate of a complex number, only the sign of the imaginary part changes. Because we take the real part of this expression, changing the sign of the even part of the phase does not change the term.

To conclude, like in classical imaging, changing the sign of the even part of the phase does not change the image in the focal plane. This means that two sets of aberrations give the same image. But if we assume like in this communication that the downstream aberrations are fixed and known, this removes the degeneracy.

Acknowledgments

This research is supported by the “Groupement d’Intérêt Scientifique” PHASE.

References

All Figures

thumbnail Fig. 1

Evolution of the speckle field with the wavelength. Simulated images at 950, 1306, and 1647  nm for a 103 stellar flux over planet flux contrast. The dynamic range is adapted to the visualization. The speckle field moves with the wavelength but not with the planet position.

Open with DEXTER
In the text
thumbnail Fig. 2

Optical scheme of a coronagraphic imager. The upstream and downstream static aberrations, and the adopted notations are denoted δu and δd denote focal plane complex amplitudes, whereas Ψi(ρ) denotes pupil plane amplitudes.

Open with DEXTER
In the text
thumbnail Fig. 3

Speckle non-centrosymmetry. Evolution of the level of residuals with respect to the rms value of the upstream quasi-static aberrations in an image simulated with the model of Sauvage et al. These residuals correspond to the antisymmetric part of the image.

Open with DEXTER
In the text
thumbnail Fig. 4

Speckle non-centrosymmetry. In the same dynamic range: (left) simulated image, (center) 180 degree rotated version of the image and (right) residuals after combination of the image with the 180 degree rotated version of itself. The gain on the rms value after the subtraction is about 6.

Open with DEXTER
In the text
thumbnail Fig. 5

Speckle dilation. In the same dynamic range: (left) image at 950 nm rescaled at 1650 nm and multiplied by the γ coefficient, image at 1650 nm (center) before and (right) after speckle subtraction. The gain on the rms value after the subtraction is 2.5.

Open with DEXTER
In the text
thumbnail Fig. 6

Choice of an appropriate starting point. Estimated upstream aberration maps with one spectral channel for three different starting points. [i] Without turbulent aberrations in the simulated images. [ii] With turbulent aberrations in the simulated images. From left to right, with a dynamic range adapted to the visualization: “true” simulated aberration map, (a) and (a’) estimated aberrations with a random aberration map (rms value of the simulated aberrations) as starting point, (b) and (b’) estimated aberrations with a zero aberration map as starting point, and (c) and (c’) estimated aberrations with a random aberration map (rms value 108 times lower than the “true” one) as starting point. The estimation is performed with a regularization on the star flux.

Open with DEXTER
In the text
thumbnail Fig. 7

Estimated upstream aberrations for the four quasi-equivalent aberration maps as starting points. From left to right, with the same dynamic range: , , , . The image simulation is performed with one spectral channel in the presence of turbulent aberrations.

Open with DEXTER
In the text
thumbnail Fig. 8

Block diagram of the algorithm used for the joint estimation of the object map and the upstream aberrations.

Open with DEXTER
In the text
thumbnail Fig. 9

Simulated images at λ = 950nm. a) Simulated object map and b) associated image in the focal plane. The following planets are simulated: one with a star-over-planet contrast of 105 at a separation of 0.2 arcsec (planet 1), two with star-over-planet contrast of 106 at separations of 0.2 (planet 2) and 0.4 (planet 3), respectively, and one with a star-over-planet contrast of 107 at a separation of 0.4 arcsec (planet 4). The image in the focal plane is obtained by convolving the object map oλ by the non-coronagraphic psf . c) Simulated aberrations and d) associated image of the speckle field in the image focal plane. The image is given by the coronagraphic PSF .

Open with DEXTER
In the text
thumbnail Fig. 10

Power spectral densities of the simulated aberrations. The upstream and downstream aberrations are randomly generated according to a f-2 spectrum and with an rms value of 30 nm. The downstream aberrations are not corrected by the adaptive optics. The upstream aberrations are corrected by the adaptive optics up to 20 cycles/pupil. The residuals are due to the non-common path aberrations. Below 4 cycles/pupil, these non-common path aberrations are corrected but there are some residuals from rotative optics.

Open with DEXTER
In the text
thumbnail Fig. 11

Robustness study on the simulated phases. With the same dynamic range, at 950 nm: estimated planet images from different images, simulated with ten different randomly generated upstream aberration maps.

Open with DEXTER
In the text
thumbnail Fig. 12

Multispectral redundancy. With the same dynamic range, at 950 nm, from left to right: simulated planet image and estimated ones with a two- and six-spectral channel inversion. The errors on the planet flux estimations are computed when the planet is detected.

Open with DEXTER
In the text
thumbnail Fig. 13

Robustness study on the aberrations maps used as starting points. With the same dynamic range, at 950 nm: estimated planet images from one image with different random aberration maps used as starting points.

Open with DEXTER
In the text
thumbnail Fig. 14

Bandwidth effect. With the same dynamic range, at 950 nm, from left to right: estimated planet images with a 700, 400, 200, and 100 nm-bandwidth inversion.

Open with DEXTER
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.