Open Access
Issue
A&A
Volume 668, December 2022
Article Number A36
Number of page(s) 7
Section Astronomical instrumentation
DOI https://doi.org/10.1051/0004-6361/202143001
Published online 01 December 2022

© M. Quesnel et al. 2022

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.

1 Introduction

Because of the small angular separation and high contrast between planetary companions and their parent star, exoplanet imaging is particularly challenging. Although these constraints can be addressed with specific instruments such as coronagraphs, residual wavefront aberrations still represent an inherent obstacle for detecting the majority of exoplanets. To a large extent, these residuals originate from non-common path aberrations (NCPAs) between the scientific and wavefront sensing arms. Focal-plane wavefront sensing (FPWFS) is an approach that has the advantage of taking NCPAs into account by probing their signature in the focal-plane images (Jovanovic et al. 2018) while offering high sensitivity.

Estimating phase aberrations from the sole scientific images is not trivial since the relationship between focal-plane intensities and the pupil-plane phase is nonlinear and degenerate (Guyon 2018). Numerical methods have been developed for FPWFS, such as iterative algorithms (Fienup 1982), with the most standard one being the Gerchberg-Saxton algorithm (Gerchberg 1972). More recent techniques have been proposed for various applications (see Jovanovic et al. 2018, for a review), including the use of deep learning techniques for FPWFS (Paine & Fienup 2018; Andersen et al. 2019, 2020; Orban de Xivry et al. 2021). All of these approaches have to deal with one important hindrance: for a centrosymmetric pupil, two different phase distributions in the input pupil plane can produce the same point spread function (PSF). This ambiguity, also called the twin-image problem (e.g., Guizar-Sicairos & Fienup 2012), is typically solved with phase diversity using, for instance, an additional defocused PSF (Gonsalves 1982), or an asymmetric pupil mask (Martinache 2013). This, however, reduces the science duty cycle because some observing time, and/or part of the science beam, has to be dedicated to wavefront measurements exclusively.

Based on the properties of the vector vortex coronagraph (VVC, Mawet et al. 2005), a Nijboer-Zernike phase retrieval approach tailored to the post-VVC PSF was formulated in Riaud et al. (2012a,b). They proposed to split the two circular polarization states to exploit the phase diversity introduced by the two opposite topological charges associated with the VVC. A similar approach was more recently used by Bos et al. (2019) in the case of the grating-vector apodizing phase plate; although, it also required an asymmetric pupil to lift the sign ambiguity fully.

Here, we revisit the problem of phase retrieval behind a vortex coronagraph using deep learning techniques. Unlike an analytical approach, which could show limitations regarding its formulation, deep learning models can be trained regardless of the instruments and observing conditions. First, in Sect. 2, we argue that a scalar vortex coronagraph (SVC) has the potential to yield comparable residual phase errors to the dual-polarization VVC implementation, using a single post-coronagraphic PSF instead of two. In Sect. 3, we present our deep learning approach, based on convolutional neural networks (CNNs), which have the advantage of being flexible and easy to implement, and they have already been shown to be capable of reaching fundamental noise limits in our previous works (Quesnel et al. 2020; Orban de Xivry et al. 2021). Finally, in Sect. 4, we provide quantitative results on simulated data. We compare the performance of our vortex phase diversity method to a classical approach, and assess the robustness of the models, notably in the presence of representative atmospheric turbulence residuals.

2 Vortex phase diversity

2.1 Vortex coronagraphs

The vortex coronagraph (VC), introduced by Mawet et al. (2005), is a transparent focal plane mask that diffracts on-axis light outside of the pupil area. A Lyot stop placed in a downstream pupil plane allows this diffracted light to be blocked, enabling high contrast observations. Because of wavefront aberrations, some incoming light from the star is, however, not blocked. Indeed, the VC only removes the Airy disk, and speckles still appear in the focal plane.

There are two different types of vortex coronagraphs: vectorial (VVC) and scalar (SVC). The VVC applies a geometrical phase ramp to the incoming wavefront with a transmission t = exp(±j lp θ), where lp is the topological charge and θ is the azimuthal coordinate. Conjugated phase ramps are applied to each circular polarization state, producing a different signature in the focal plane for each (Riaud et al. 2012a). Here we focus on a topological charge lp = 2, which is the most commonly used design so far (Mawet et al. 2009; Absil et al. 2016), but the following developments would also hold for any even topological charge. Unlike the VVC, the SVC uses longitudinal phase delays (Ruane et al. 2019; Desai et al. 2021), and thereby applies the same phase ramp (e.g., with +lp) to both polarization states. The focal-plane signature behind an SVC corresponds to the one obtained with a single polarization state using the VVC.

2.2 Sign ambiguity and phase diversity

In FPWFS, the Fourier relationship between the PSF and the pupil-plane phase causes a sign ambiguity for Zernike modes of an even radial order (e.g., defocus, astigmatism): | F(Eeven(x)) |2=| F(Eeven*(x)) |2,${\left| {F\left( {{E_{{\rm{even}}}}\left( x \right)} \right)} \right|^2} = {\left| {F\left( {E_{{\rm{even}}}^*\left( { - x} \right)} \right)} \right|^2},$(1)

where Eeven(x) = exp(−even(x)) is the pupil-plane electric field with phase aberrations ϕeven (containing even modes only), Eeven*$E_{{\rm{even}}}^*$ is its conjugate, and F(·) is the Fourier transform operator. This sign ambiguity is a strong limitation for FPWFS using a single in-focus image. A fair number of FPWFS methods have been developed to solve the twin-image problem. The most standard one is to use an additional known defocus together with the in-focus image. An illustration of this ambiguity can be found in Fig. 1, where we generated two phase maps with opposite signs for their even Zernike modes. After propagation through a VVC, the in-focus PSFs are the same in both cases (Figs. 1d and f), showcasing the twin-image problem. The out-of-focus PSFs, however, are different (Figs. 1e and g) because the added defocus has the same sign in both cases, which allows the ambiguity to be lifted.

Now, if the two orthogonal circular polarization states are split downstream of the VVC to separate the conjugated phase ramps (−lp and +lp), or if the case of the SVC is considered, the in-focus PSFs are not identical anymore (Figs. 1h and j, or Figs. 1i and k). The resulting PSFs are actually switched between the two circular polarization states. This indicates that the sign ambiguity can potentially be lifted when using either the two PSFs obtained from the separate circular polarization states, or the single PSF behind the SVC independently of the polarization state. This illustrates the fact the VC provides an azimuthal phase diversity, which can be used instead of the radial phase diversity provided by an additional defocus (Riaud et al. 2012a). In the case of the SVC, the sign ambiguity would then be lifted similarly to using only an out-of-focus PSF in classical phase diversity (e.g., Lamb et al. 2021).

thumbnail Fig. 1

Comparison of simulated PSFs for two conjugated phase maps ϕ (left) and ϕ′ (right): for ϕ′, we set opposite Zernike coefficients to those of ϕ only for the even modes, with a total of 18 modes starting from defocus. (a) Entrance annular pupil. (b, c) The conjugated phase maps. (d, e) In-focus and out-of-focus PSFs obtained from propagating (b) with both polarization states together. (f, g) The same as (d, e) but using (c) for propagation instead. (h, i) In-focus PSFs obtained from (b) with −lp and +lp used separately. (j, k) The same as (h, i) but using (c) for propagation instead.

3 Deep learning approach

3.1 Data generation

In our simulations, we considered an annular entrance pupil with a diameter of 8 m and a central obstruction of 30%. An observed bandwidth of 0.2 µm was defined around 2.2 µm (K band), by simulating a total of five wavelengths. A pixel scale of 0.25 λ/D/pix was set with a detector containing 64 × 64 pixels, giving a field-of-view of 16λ/D. The most relevant simulation parameters are listed in Table 1.

We generated the phase aberrations using annular Zernike polynomials, which make up an orthonormal basis on the input pupil: ϕ(x,y)=i=1NmodesciZi(x,y),$\phi \left( {x,y} \right) = \sum\limits_{i = 1}^{{N_{{\rm{modes}}}}} {{c_i}{Z_i}\left( {x,y} \right),} $(2)

where ϕ is the complete phase map, Zi are the Zernike polynomials, ci are the corresponding coefficients, and Nmodes is the number of modes considered.

The generated datasets are composed of 18 or 88 Zernike modes, up to the fifth and 12th radial orders, respectively, excluding the piston, tip, and tilt modes. The set of Zernike coefficients for each sample was first randomly generated within the range [−1, 1] before each coefficient was divided by its corresponding radial order to approximate a 1/f2 power spectral density profile, typically encountered with good quality optics (Dohlen et al. 2011). Low and high aberration levels, represented by wavefront error (WFE) distributions centered at a 70 and 350 nm root mean square (RMS), respectively, are considered by normalizing the Zernike coefficients accordingly. An example of such a distribution can be seen in Orban de Xivry et al. (2021). For classical phase diversity, the additional defocus was set to λ/5, that is, 440 nm RMS. In our case, this amount of diversity is close to the optimal value in terms of phase retrieval performance. The defocus was added in the entrance pupil plane, as if done by the deformable mirror of an adaptive optics system, which means that the resulting defocused PSFs contain more flux than the in-focus PSFs as the coronagraphic performance of the VC is degraded.

To increase the representativeness of our simulations and to test the robustness of our approach, we added atmospheric turbulence residuals to the phase maps. A state-of-the-art extreme adaptive optics (AO) was simulated using the COMPASS library (Ferreira et al. 2018), assuming a loop frequency of 3.5 kHz, 2-frame delay, a 50 × 50 deformable mirror (i.e., 2040 modes/valid actuators), and a pyramid sensor with 5 λ/D of modulation (without noise). This has yielded a Strehl ratio of about 98% at 2.2 µm, corresponding to a WFE of about 50 nm RMS. We sampled the AO residuals at 10 Hz and we used a sequence of ten consecutive phase screens by summing up the corresponding PSFs. We therefore simulated a 1-s exposure in the presence of a given amount of static NCPAs. The results with data containing these AO residuals are shown in Sect. 4.3.

To simulate a PSF obtained behind a VVC, we performed two propagations, one with +lp and the other with −lp, to consider each circular polarization state. The downstream Lyot stop blocked 2% of the outer pupil area (but the central obstruction was not oversized). The resulting PSFs were then either summed up to reproduce the nonpolarized case, or they were kept separate to consider the dual-polarization case. To simulate the SVC, only one such PSF was taken. The optical propagation was handled by the HEEPS package1 (Carlomagno et al. 2020), which makes use of PROPER (Krist 2007). Examples of generated phase maps and PSFs can be found in Fig. 1. We then added photon noise to our PSFs, so that the signal-to-noise ratio (S/N) was defined as S/N=Nph$S/N = \sqrt {{N_{{\rm{ph}}}}} $, where Nph is the number of photons. A square-root stretching operation was applied to the PSFs to help the CNN identify the speckle patterns. Finally, we normalized the PSFs with a min-max scaling to obtain flux in the range [0,1], which ensured the CNN was fed with same-scale quantities.

Table 1

Data generation parameters.

3.2 Model architecture

We built deep neural network models whose goal is to map the Zernike coefficients of phase aberrations ϕ from a given PSF I, that is, to approximate a nonlinear function f such that ϕf(I). CNNs have been proven to be very well suited for image analysis, with numerous applications for both classification and regression tasks. CNN-based architectures have been developing very quickly in recent years, with performance still improving greatly. We have therefore used a state-of-the-art deep CNN called EfficientNet (Tan & Le 2019). This type of architecture stands out from other ones by using a new scaling technique: all dimensions of the CNN (depth, width, and resolution) are scaled by the same compound coefficient Φ, and the parameters are inferred from the original model or baseline EfficientNet-B0 (Φ = 0). There are thus different models available, and we chose to use EfficientNet-B4, for which we have obtained the best trade-off between model performance and runtime. EfficientNet-B4 has a total of 1.9 × 107 parameters and 4.2 × 109 FLOPS. It has about the same number of parameters as the ResNet-50 architecture, which was used in Quesnel et al. (2020) and Orban de Xivry et al. (2021).

3.3 Model training

For a given training, a dataset composed of 105 PSFs (or PSF pairs for the cases with two input channels) was randomly split into training (90%) and validation (10%) sets. Each sample also contains the true NCPA phase maps as labels, while the AO phase screens are never given. Batches composed of 64 data samples were then consecutively fed to the neural network. We define the loss function as the root-mean-square error (RMSE) of the phase residuals. Weight updates based on the loss were handled by the Adam optimizer (Kingma & Ba 2017). To improve the performance, we set a penalty on the loss (“weight decay”) of 10−7 for the low aberration regime and 10−6 for the higher aberration regime. We also set an initial learning rate of 10−3 which was decreased by a factor of two as soon as the validation loss reached a plateau over 15 epochs. This results in sudden loss drops, allowing the performance to be greatly improved. Pretrained models on ImageNet were used to initialize the weights. The training of the model was stopped if no improvement of the validation loss was observed over 25 epochs. This results in training procedures lasting between 50 and 250 epochs.

4 Results

We compare the capacity of different configurations to lift the sign ambiguity as well as their performance. The designation of these configurations, together with some of their parameters, can be found in Table 2: we consider the cases of the VVC with or without classical phase diversity (“VVC [in, out]-focus” and “VVC in-focus,” respectively), which are compared to the new approaches presented in this paper (“VVC dual-polar” and “SVC”). The noncoronagraphic case (“no vortex [in, out]-focus”) is evaluated as well. We also investigate the possibility to work with differential PSFs obtained by subtracting the separate circular polarization states (“VVC dual-polar; diff PSFs”). In the last part of this section, we add atmospheric turbulence residuals and we assess the robustness of the models regarding variations in the S/N levels, input wavefront errors, and Zernike polynomial orders. All models are evaluated using 1000 test samples.

thumbnail Fig. 2

RMSE per Zernike mode, following the Noll convention, starting from the defocus mode. Four cases were compared (see Table 2 for notations), using a single in-focus post-VVC PSF without splitting the polarization states (cyan), two post-VVC PSFs with additional defocus (dark blue), the two post-VVC PSFs associated with each polarization state (red), and a single PSF after the SVC (orange). The RMSE of the input phase maps is represented in black and the even modes are indicated by the green areas. Left: input WFE of 70 nm distributed over 18 modes. Right: input WFE of 350 nm distributed over 88 modes. In both examples, the S/N in the entrance pupil plane is equal to 100.

Table 2

Configurations considered for phase retrieval.

4.1 Phase sign determination

To determine whether the models predict the correct sign, we looked at the performance per Zernike mode. The metric used is the RMSE per mode: σz=1NtestiNtest(c^ici)2,${\sigma _z} = \sqrt {{1 \over {\sqrt {{N_{{\rm{test}}}}} }}\sum\limits_i^{{N_{{\rm{test}}}}} {{{\left( {{{\hat c}_i} - {c_i}} \right)}^2},} } $(3)

where Ntest is the number of test samples, while c^$\hat c$ and c are the estimated and true Zernike coefficients, respectively.

In Fig. 2, we compare the performance per mode between four cases for two different aberration contents. A network using only in-focus PSFs in the nonpolarized case with the VVC yields no correction for even Zernike modes, because the model tends to predict zero for the coefficients facing the ambiguity (due to the l2-norm training loss). For odd modes, the model is able to provide some correction, even though its quality is limited by the loss function, which does not discriminate between even and odd modes. Adding defocused PSFs as input solves the problem as expected (Quesnel et al. 2020). In the dual-polarization case, a network using either one or both circular polarization states separately as input (SVC and VVC, respectively) also yields good performance for even modes as well as for odd modes. This indicates that the sign ambiguity is properly lifted with these two approaches.

It is noteworthy that the performance marginally depends on the Zernike mode: the error tends to increase for larger angular azimuthal orders at a given radial order. Our interpretation is that since the phase information is of higher spatial frequency and located closer to the edge of the pupil in these cases, it is more difficult for the CNN model to identify those features.

4.2 Performance compared to classical phase diversity

We now compare our method to the classical phase diversity approach in terms of overall phase retrieval performance. The RMS WFE on the phase residuals is used as a metric and it is defined for each test sample as: σϕ=1NpixiNpix(ϕ^iϕi)2,${\sigma _\phi } = \sqrt {{1 \over {{N_{{\rm{pix}}}}}}\sum\limits_i^{{N_{{\rm{pix}}}}} {{{\left( {{{\hat \phi }_i} - {\phi _i}} \right)}^2},} } $(4)

where Npix is the number of pixels, while ϕ^$\hat \phi$ and ϕ are the estimated and true pupil phases, respectively.

In our simulations, we consider the fact that the vortex coronagraphs block out most of the starlight, and that for a given stellar magnitude, the resulting flux in the detector plane is reduced. The flux is also equally split between each PSF for all the cases with two channels, while for the configurations with a single one, the PSF receives the total remaining flux behind the vortex mask. The performance of the trained models at different S/N levels defined in the entrance pupil plane is shown in Fig. 3. In our case, S/Ns between 101 and 3 × 103 correspond to stars of apparent magnitudes in the range from 18.6 to 6.22. For a median input WFE of 70 nm with 18 modes (Fig. 3, left), the simulated performance is almost identical for the classical, SVC and VVC dual-polarization approaches, even though the additional defocus increases the overall S/N at the focal plane for the classical method. For a median input WFE of 350 nm with 88 modes (Fig. 3, right), the phase residuals are distinctly higher for all the configurations, and a plateau is reached for S/Ns above 1000. We can especially notice that the sole PSF behind the SVC somewhat limits the performance in this case. Our main hypothesis for this discrepancy is that, in a high aberration regime, the effects of the nonlinear nature of the problem are greater. The extra information given by having two input channels is therefore favorable and makes the models easier to train. In general, it is more difficult to train datasets containing strong aberrations, and this can typically be improved by using more data (e.g., 5 × 105 samples, see Orban de Xivry et al. 2021), more complex architectures (e.g., EfficientNet-B6), and/or stronger weight decay.

We also consider the possible presence of planetary companions in the detected images. This additional, off-axis source of light is largely unaffected by the vortex phase ramp and therefore adds the same signature in both circular polarization states. This additional light source may bias the phase retrieval process, and lead to unwanted planetary signal subtraction. A possible workaround is to subtract one polarization image from the other, in an attempt to remove the signature of any off-axis light source. We thus assessed the phase retrieval capabilities using the difference between both polarization states after the VVC. The results are shown in Fig. 3 and are compared with the other configurations. We only obtain a marginal increase in the error at high S/Ns, especially in the low aberration regime, which can be explained by the loss of information produced by subtracting one PSF from the other.

The performance of the various configurations are finally compared to the theoretical limit in Fig. 3. This limit is discussed in Orban de Xivry et al. (2021) for noncoronagraphic imaging. For both the noncoronagraphic and vortex imaging cases, the residual errors reach the fundamental limit in the low aberration regime (Fig. 3, left). In a higher aberration regime, the performance does not reach the fundamental limit, and the gap increases toward higher S/Ns (Fig. 3, right). This can be improved with more robust training as explained above. One can note that the residual errors are constrained by the WFE distribution in the data toward lower S/Ns, while the theoretical limit is independent of the input WFE distribution and continues to increase for lower S/Ns, thus yielding residual WFE below the limit.

thumbnail Fig. 3

Performance in terms of RMS WFE on the phase residuals at different S/N levels. Each point corresponds to a model trained and evaluated on the indicated S/N (six S/Ns are considered, and slight horizontal shifts were applied to be able to discern each point). The same colors in Fig. 2 are used, with the addition of the performance with classical imaging (green), using differential post-VVC PSFs (violet), as well as the theoretical limit (black dashed line). The median values are represented and the error bars correspond to the 2–98th percentiles. The S/Ns indicated are the ones at the entrance pupil plane, and the flux suppression induced by the vortex mask is taken into account. Left: input WFE of 70 nm distributed over 18 modes. Right: input WFE of 350 nm distributed over 88 modes.

4.3 Model robustness

To test how the method handles more realistic ground-based observations, we added atmospheric turbulence residuals in addition to the NCPAs, as described in Sect. 3.1. This represents an additional source of noise since the AO residuals are not included in the labels for training. Examples of input PSFs at the different flux levels can be found in Fig. 5. The performance now starts to reach a plateau of a few nm RMS in the low aberration regime at high S/Ns (Fig. 4, left), due to the presence of these atmospheric turbulence residuals. In the high NCPA regime (Fig. 4, right), the AO residuals however become negligible and the performance is almost identical to the case without turbulence (Fig. 3, right).

We finally study the robustness of the models regarding a variation in the data during evaluation. First, we may encounter different flux levels than those considered during training. In Fig. 6, we illustrate how models in the VVC dual-polar configuration trained on data containing 70 nm RMS behave in such conditions. Whether the training S/N is low or high, models only show good robustness to other flux levels within a limited range, outside of which the performance is strongly degraded. If a more robust model is required, it is also possible to train with various flux levels. We investigated this by using a training dataset covering the entire test S/N range, without increasing its size. The median performance is much more consistent at every S/N; although, the variation in the residual error between samples is greater, and a small degradation can naturally be seen compared to using identical training and testing S/N (as shown in Fig. 4).

We also study the change in performance when evaluating the model outside the input WFE training range. Figure 7 shows the robustness of models trained on the two aberration regimes studied in this paper. Data containing more aberrations rapidly deteriorate the reconstruction. The models perform better when evaluated at lower aberration levels, but they have limited performance when trained in the high aberration regime. To overcome these limitations, we trained two models over the entire test WFE range for each of the Zernike mode contents considered in the paper. Such models show excellent robustness, with minimal degradation compared to models with identical training and testing WFE distributions. This suggests that these models could be robustly used in closed-loop operations, even with the aberration level decreasing with time. Regarding the varying spatial power spectral density of the wavefront, the residuals are generally constant along the Zernike modes, as seen in Fig. 2. When giving the reconstructed PSFs as input to the same trained model, we have observed that most residual RMS WFE stay below 10 nm for a model trained on 70 nm RMS as input and an S/N of 1000. A thorough analysis of a closed-loop application will be the subject of future work when testing the algorithm in the lab or on-sky.

It can also be expected to have observations containing higher-order NCPAs (in addition to the changing atmospheric residuals) than considered during training. For a model trained on 18 modes at 70 nm RMS (S/N = 1000), we added 70 higher-order Zernike modes in the test data. In Fig. 8, we observe a moderate degradation for the 18 modes when increasing the wavefront error contained in these additional modes, because the central PSF signature is mostly preserved.

thumbnail Fig. 4

Phase prediction errors at different S/N levels, presented the same way as in Fig. 3, but this time also including atmospheric turbulence residuals in the PSFs during both training and testing.

thumbnail Fig. 5

Examples of PSFs at different S/N levels (defined in the entrance pupil plane) for +lp. The resulting S/N in the detector place is reduced due to the extinction factor introduced by the coronagraph and by the beam splitting between the two polarization channels. The level of NCPA is equal to 70 nm RMS distributed over 18 modes (top) and 350 nm RMS over 88 modes (bottom). AO residuals are also present: each PSF is the result of combining ten PSFs, with each containing a different AO residual phase screen.

thumbnail Fig. 6

Performance with altered S/N levels during evaluation for three models trained on data with a median RMS WFE of 70 nm over 18 modes, with an S/N of 30 (purple), 1000 (blue), and with S/Ns uniformly distributed over the entire S/N range (green). Each point is obtained from an testing batch composed of 1000 samples (the median value together with the 2–98th percentiles are shown).

thumbnail Fig. 7

Performance with different input WFE levels defined during evaluation for models trained on data with a median RMS WFE of 70 nm over 18 modes (blue), and 350 nm RMS over 88 modes (red). Models were also trained on data following a uniform distribution covering the whole input WFE range, using both spatial frequency regimes (cyan and orange). The S/N is 1000 and each training dataset contains 105 samples.

5 Conclusions

In this paper, we have investigated a new way to perform focal-plane wavefront sensing using vortex coronagraphs. Based on a deep learning approach and considering simulated data, we have leveraged the modulation introduced by the vortex coronagraph (either scalar, or vectorial after splitting the circular polarization states) to lift the sign ambiguity and perform FPWFS for various S/Ns, input WFEs, and spatial frequency contents. The dual-polarization method with the VVC offers a very similar performance to the classical phase diversity method using additional defocused PSFs, even though the level of light is largely reduced after filtering by the VVC. For instance, considering a star of magnitude 6.2 observed at a wavelength of 2200 nm, we obtain a residual of 0.73 nm RMS from an input WFE of 70 nm RMS. In the case of the SVC, which provides a single focal-plane image, a loss in performance is only observed for high aberration levels. For bright stars, and with higher order and higher levels of aberrations, the CNN training is generally challenging, and the performance reaches a plateau of approximately 20 nm RMS. In such circumstances, more training data, larger and deeper CNN architectures, and regularization techniques could further improve the phase retrieval accuracy. Atmospheric turbulence residuals that are expected in ground-based data only produce minor degradation in performance in a low NCPA regime, and they should not be a concern in practice. We have also shown that models trained on data containing particularly wide WFE and S/N distributions provide very good robustness.

Potential applications of the proposed method could rely on including a polarizing beam splitter downstream of the VVC to collect both circular polarization states separately, either on a single or on two distinct sensors. Since our simulation-based FPWFS experiments work well even with a single image obtained behind an SVC, it appears that this flavor of vortex coronagraph offers an interesting alternative, notably because it would work without any additional optical components.

Deep learning models offer a flexible framework and fast inference speeds, which are appreciable features for on-sky applications. The requirement on speed is, however, not very stringent as we expect the lifetime of NCPAs that produce quasi-static speckles to be on the order of minutes. But on-sky applications will naturally come with their own challenges and discrepancies unpredicted by simulations. To account for the difference between simulations and real data, transfer learning techniques can be used to efficiently fine-tune the models before observations. Finally, it is difficult to obtain reliable and very precise NCPA labels for model training. Employing unsupervised learning techniques, for example autoencoder-based architectures, is another interesting approach that we are considering for future developments.

thumbnail Fig. 8

Robustness on higher-order aberrations. Top: performance per Zernike mode on test data following the distribution of the training data (red), adding, to the test data, 70 modes containing 35 nm RMS (purple) and 70 nm RMS (blue) of NCPAs. Bottom: example of post-VVC PSFs for each case (+lp).

Acknowledgements

This research made use of PyTorch (Paszke et al. 2019) and the following implementation of EfficientNet: https://github.com/lukemelas/EfficientNet-PyTorch. The HEEPS (Carlomagno et al. 2020) and PROPER (Krist 2007) open-source optical propagation Python packages were used for data generation. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 819155), and from the Wallonia-Brussels Federation (grant for Concerted Research Actions).

References

  1. Absil, O., Mawet, D., Karlsson, M., et al. 2016, in SPIE Conf. Ser., 9908, 99080Q [NASA ADS] [Google Scholar]
  2. Andersen, T., Owner-Petersen, M., & Enmark, A. 2019, Opt. Lett., 44, 4618 [NASA ADS] [CrossRef] [Google Scholar]
  3. Andersen, T., Owner-Petersen, M., & Enmark, A. 2020, J. Astron. Telescopes Instrum. Syst., 6, 034002 [NASA ADS] [Google Scholar]
  4. Bos, S. P., Doelman, D. S., Lozi, J., et al. 2019, A&A, 632, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  5. Carlomagno, B., Delacroix, C., Absil, O., et al. 2020, J. Astron. Telescopes Instrum. Syst., 6, 035005 [Google Scholar]
  6. Desai, N., Llop-Sayson, J., Jovanovic, N., et al. 2021, in Techniques and Instrumentation for Detection of Exoplanets X, eds. S. B. Shaklan, & G. J. Ruane, 11823, International Society for Optics and Photonics (SPIE), 238 [Google Scholar]
  7. Dohlen, K., Wildi, F. P., Puget, P., Mouillet, D., & Beuzit, J.-L. 2011, in Second International Conference on Adaptive Optics for Extremely Large Telescopes, online at http://ao4elt2.lesia.obspm.fr, 75 [Google Scholar]
  8. Ferreira, F., Gratadour, D., Sevin, A., & Doucet, N. 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS), 180 [CrossRef] [Google Scholar]
  9. Fienup, J. 1982, Appl. Opt., 21, 2758 [NASA ADS] [CrossRef] [Google Scholar]
  10. Gerchberg, R. W. 1972, Optik, 35, 237 [Google Scholar]
  11. Gonsalves, R. A. 1982, Opt. Eng., 21, 829 [NASA ADS] [CrossRef] [Google Scholar]
  12. Guizar-Sicairos, M., & Fienup, J. R. 2012, J. Opt. Soc. Am. A, 29, 2367 [NASA ADS] [CrossRef] [Google Scholar]
  13. Guyon, O. 2018, ARA&A, 56, 315 [Google Scholar]
  14. Jovanovic, N., Absil, O., Baudoz, P., et al. 2018, SPIE Conf. Ser., 10703, 107031U [NASA ADS] [Google Scholar]
  15. Kingma, D. P., & Ba, J. 2017, Adam: A Method for Stochastic Optimization Krist, J. E. 2007, in Optical Modeling and Performance Predictions III, ed. M. A. Kahan, 6675, International Society for Optics and Photonics (SPIE), 250 [Google Scholar]
  16. Lamb, M. P., Correia, C., Sivanandam, S., Swanson, R., & Zavyalova, P. 2021, MNRAS, 505, 3347 [NASA ADS] [CrossRef] [Google Scholar]
  17. Martinache, F. 2013, PASP, 125, 422 [NASA ADS] [CrossRef] [Google Scholar]
  18. Mawet, D., Riaud, P., Absil, O., & Surdej, J. 2005, ApJ, 633, 1191 [Google Scholar]
  19. Mawet, D., Serabyn, E., Liewer, K., et al. 2009, ApJ, 709 [Google Scholar]
  20. Orban de Xivry, G., Quesnel, M., Vanberg, P. O., Absil, O., & Louppe, G. 2021, MNRAS, 505, 5702 [NASA ADS] [CrossRef] [Google Scholar]
  21. Paine, S. W., & Fienup, J. R. 2018, Opt. Lett., 43, 1235 [NASA ADS] [CrossRef] [Google Scholar]
  22. Paszke, A., Gross, S., Massa, F., et al. 2019, in Advances in Neural Information Processing Systems, eds. H. Wallach, H. Larochelle, A. Beygelzimer, et al., 32 (Curran Associates, Inc.), 8026 [Google Scholar]
  23. Quesnel, M., Orban de Xivry, G., Louppe, G., & Absil, O. 2020, SPIE Conf. Ser., 11448, 114481G [NASA ADS] [Google Scholar]
  24. Riaud, P., Mawet, D., & Magette, A. 2012a, A&A, 545, A151 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  25. Riaud, P., Mawet, D., & Magette, A. 2012b, A&A, 545, A150 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  26. Ruane, G., Mawet, D., Riggs, A. E., & Serabyn, E. 2019, in Techniques and Instrumentation for Detection of Exoplanets IX, ed. S. B. Shaklan, 11117, International Society for Optics and Photonics (SPIE), 454 [Google Scholar]
  27. Tan, M., & Le, Q. 2019, in Proceedings of Machine Learning Research, 97, Proceedings of the 36th International Conference on Machine Learning, eds. K. Chaudhuri, & R. Salakhutdinov (PMLR), 6105 [Google Scholar]

2

With an integration time of 1s, a transmission and quantum efficiency equal to 50%, a telescope diameter of 8 m, and a filter bandwidth of 50 nm.

All Tables

Table 1

Data generation parameters.

Table 2

Configurations considered for phase retrieval.

All Figures

thumbnail Fig. 1

Comparison of simulated PSFs for two conjugated phase maps ϕ (left) and ϕ′ (right): for ϕ′, we set opposite Zernike coefficients to those of ϕ only for the even modes, with a total of 18 modes starting from defocus. (a) Entrance annular pupil. (b, c) The conjugated phase maps. (d, e) In-focus and out-of-focus PSFs obtained from propagating (b) with both polarization states together. (f, g) The same as (d, e) but using (c) for propagation instead. (h, i) In-focus PSFs obtained from (b) with −lp and +lp used separately. (j, k) The same as (h, i) but using (c) for propagation instead.

In the text
thumbnail Fig. 2

RMSE per Zernike mode, following the Noll convention, starting from the defocus mode. Four cases were compared (see Table 2 for notations), using a single in-focus post-VVC PSF without splitting the polarization states (cyan), two post-VVC PSFs with additional defocus (dark blue), the two post-VVC PSFs associated with each polarization state (red), and a single PSF after the SVC (orange). The RMSE of the input phase maps is represented in black and the even modes are indicated by the green areas. Left: input WFE of 70 nm distributed over 18 modes. Right: input WFE of 350 nm distributed over 88 modes. In both examples, the S/N in the entrance pupil plane is equal to 100.

In the text
thumbnail Fig. 3

Performance in terms of RMS WFE on the phase residuals at different S/N levels. Each point corresponds to a model trained and evaluated on the indicated S/N (six S/Ns are considered, and slight horizontal shifts were applied to be able to discern each point). The same colors in Fig. 2 are used, with the addition of the performance with classical imaging (green), using differential post-VVC PSFs (violet), as well as the theoretical limit (black dashed line). The median values are represented and the error bars correspond to the 2–98th percentiles. The S/Ns indicated are the ones at the entrance pupil plane, and the flux suppression induced by the vortex mask is taken into account. Left: input WFE of 70 nm distributed over 18 modes. Right: input WFE of 350 nm distributed over 88 modes.

In the text
thumbnail Fig. 4

Phase prediction errors at different S/N levels, presented the same way as in Fig. 3, but this time also including atmospheric turbulence residuals in the PSFs during both training and testing.

In the text
thumbnail Fig. 5

Examples of PSFs at different S/N levels (defined in the entrance pupil plane) for +lp. The resulting S/N in the detector place is reduced due to the extinction factor introduced by the coronagraph and by the beam splitting between the two polarization channels. The level of NCPA is equal to 70 nm RMS distributed over 18 modes (top) and 350 nm RMS over 88 modes (bottom). AO residuals are also present: each PSF is the result of combining ten PSFs, with each containing a different AO residual phase screen.

In the text
thumbnail Fig. 6

Performance with altered S/N levels during evaluation for three models trained on data with a median RMS WFE of 70 nm over 18 modes, with an S/N of 30 (purple), 1000 (blue), and with S/Ns uniformly distributed over the entire S/N range (green). Each point is obtained from an testing batch composed of 1000 samples (the median value together with the 2–98th percentiles are shown).

In the text
thumbnail Fig. 7

Performance with different input WFE levels defined during evaluation for models trained on data with a median RMS WFE of 70 nm over 18 modes (blue), and 350 nm RMS over 88 modes (red). Models were also trained on data following a uniform distribution covering the whole input WFE range, using both spatial frequency regimes (cyan and orange). The S/N is 1000 and each training dataset contains 105 samples.

In the text
thumbnail Fig. 8

Robustness on higher-order aberrations. Top: performance per Zernike mode on test data following the distribution of the training data (red), adding, to the test data, 70 modes containing 35 nm RMS (purple) and 70 nm RMS (blue) of NCPAs. Bottom: example of post-VVC PSFs for each case (+lp).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.