Open Access
Issue
A&A
Volume 683, March 2024
Article Number A113
Number of page(s) 8
Section Astronomical instrumentation
DOI https://doi.org/10.1051/0004-6361/202346939
Published online 12 March 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

Adaptive optics (AO) systems, especially the extreme adaptive optics (ExAO) systems that form part of high-contrast imaging systems, require ever more sensitive and accurate wavefront sensors. The first generation of AO systems used Shack-Hartmann wavefront sensors (SHWFS). The systems with SHWFSs have been very successful; they were the very first to image substellar companions (Chauvin et al. 2004). However, it was also shown that the SHWFS has relatively poor sensitivity to low-order modes (Guyon 2005). This is intrinsic to the SHWFS. This sensor uses a micro-lens array (MLA) in the pupil to create an array of spots. A flat wavefront will create a perfect regular pattern. Local slopes in the wavefront map will displace the spots and in turn this displacement signal can be used to reconstruct the incoming wavefront. The fact that the SHWFS measures gradients means that it is more sensitive to high-order modes, because those modes have much stronger derivatives. This is not ideal for direct imaging instruments, which require the best performance at small angular separations, where low-order modes dominate the performance.

The pyramid wavefront sensor (PWFSs) partially solved this issue (Ragazzoni 1996). The PWFS uses a pyramidal prism in the focal plane to create four pupils. The PWFS has a very small dynamic range, which can be increased by applying a modulation pattern to the point spread function (PSF) at the cost of sensitivity (Ragazzoni 1996; Vérinaud 2004). The PWFS is now in use in many systems (Pinna et al. 2016; Lozi et al. 2019; Males et al. 2022a); it has been chosen as the wavefront sensor for all new high-contrast imaging systems (Kasper et al. 2021; Males et al. 2022b; Haffert et al. 2022b; Fitzgerald et al. 2022) and will be part of upgrades of current systems (Boccaletti et al. 2022; Perera et al. 2022). The PWFS is a sensor that mixes wavefront slope and phase measurements (Vérinaud 2004). This means that the PWFS still has lower sensitivity to low-order modes, when modulated, than a perfect sensor. Several modifications have been suggested to improve the sensitivity of the PWFS. Either by changing the shape of the pyramid prism itself (3-PWFS, Flattened PWFS) or the modulation pattern (Fauvarque et al. 2017; Schatz et al. 2021; Vérinaud et al. 2024).

The Zernike wavefront sensor (ZWFS) is more sensitive than either the SHWFS or the PWFS (Guyon 2005; Chambouleyron et al. 2021). The ZWFS (N’Diaye et al. 2013) is a common path interferometer, which means that there are no differential aberrations between the reference beam and the sampled beam. The ZWFS phase mask shifts the phase of the core of the PSF. This maximizes the throughput and enables us to use all available photons. The phase-shifted core creates the reference beam for the measurement. It is near optimal in terms of robustness against photon noise and read noise (Chambouleyron et al. 2021, 2022). The sensitivity and accuracy of the ZWFS are demonstrated by the picometer precision in wavefront errors (Steeves et al. 2020). However, its response is very nonlinear and not even symmetric around a flat wavefront (N’Diaye et al. 2013). A nonlinear reconstruction algorithm is required to account for this. This Letter describes a new iterative nonlinear reconstruction algorithm that extends the working range of the ZWFS.

2 Reconstruction algorithms

The basic measurement of the ZWFS is made by the interference of the incoming wavefront and a reference beam. The incoming wavefront is focused onto a focal plane mask and then collimated again to form a pupil. The reference beam is created by the electric field that passes through the phase dimple on the focal plane mask. The mathematical description is

Eo=F{ F{ Ei }M }.${E_o} = F\left\{ {F\left\{ {{E_i}} \right\}M} \right\}.$(1)

Here the input electric field, Ei, is filtered by a focal plane mask M, which creates the output electric field Eo. The effect of the focal plane mask can be rewritten in a clearer form:

Eo=F{ F{ Ei }[ (M1)+1 ] }=Ei+F{ F{ Ei }(M1) }=Ei+Er.${E_o} = F\left\{ {F\left\{ {{E_i}} \right\}\left[ {\left( {M - 1} \right) + 1} \right]} \right\} = {E_i} + F\left\{ {F\left\{ {{E_i}} \right\}\left( {M - 1} \right)} \right\} = {E_i} + {E_r}.$(2)

This equation shows that the response of any focal plane mask can actually be described by the addition of the input electric field and a (self-created) reference beam. The exact shape of the reference beam depends on the focal plane mask and on the incoming wavefront. The ZWFS uses a top-hat filter of a certain diameter and depth. The optimal size was first thought to be the 50% encircled energy (EE) diameter at about 1.06λ/D, which would result in 50% of the light being part of the test beam and the other 50% of the light being part of the reference beam. Such a diameter creates a reference beam that is larger than the pupil; this is because of diffraction, which then scatters light outside of the geometric pupil (N’Diaye et al. 2013; Chambouleyron et al. 2021). Chambouleyron et al. found that a dot size of 2.0λ/D for a clear aperture is better in terms of sensitivity.

Detectors cannot measure the electric field directly, only the intensity is accessible. The intensity pattern of the ZWFS is the classic two-wave interference pattern,

I=| Ei |2+| Er |2+2| Ei || Er |cos(ϕiϕr).$I = {\left| {{E_i}} \right|^2} + {\left| {{E_r}} \right|^2} + 2\left| {{E_i}} \right|\left| {{E_r}} \right|\cos \left( {{\phi _i} - {\phi _r}} \right).$(3)

Each electric field is parameterized by its amplitude |Ej|2 = Ij and phase ϕj. The phase difference between the input and reference beam can be found by inverting Eq. (3):

ϕi=ϕr+arccos[ IIiIr2IiIr ].${\phi _i} = {\phi _r} + \arccos \left[ {{{I - {I_i} - {I_r}} \over {2\sqrt {{I_i}{I_r}} }}} \right].$(4)

This solution is also the maximum-likelihood estimator (MLE) for the incoming phase under Poisson noise (bright sources) and Gaussian noise (read noise dominated); see Appendix A. However, this equation requires knowledge of the input intensity and the intensity of the reference beam. The incoming pupil intensity |Ei|2 can be measured by taking out the focal plane mask, or simply by offsetting the PSF by several λ/D. The electric field of the reference beam has to be modeled. The common assumption in the literature (N’Diaye et al. 2013; Doelman et al. 2019) is that the reference beam is created by an aberration-free wavefront scaled by the expected Strehl ratio:

Er=SF1{ F{ P }(M1) }.${E_r} = \sqrt S {F^{ - 1}}\left\{ {F\left\{ P \right\}\left( {M - 1} \right)} \right\}.$(5)

The aberration-free response is created by filtering the pupil function P. However, errors are still introduced due to the aberration-free assumption. The estimate can be improved by iterating the solution (Doelman et al. 2019). Each iteration creates the currently best estimate of the reference beam by using the reconstructed phase from the previous iteration:

Er,n=SF1{ F{ Pexp(iϕn1) }(M1) }.${E_{r,n}} = \sqrt S {F^{ - 1}}\left\{ {F\left\{ {P\exp \left( {i{\phi _{n - 1}}} \right)} \right\}\left( {M - 1} \right)} \right\}.$(6)

The Strehl, S, in this method is a correction factor for high-frequency modes that cannot be measured because of the finite number of pixels. The ZWFS uses light from the Airy core for the reference beam. Light from high-spatial frequencies that are not controlled is scrambled because of the atmosphere. This effectively reduces the amount of power in the Airy core, which requires compensation. Simulations in the following section show that the iterative approach always converges to the correct phase if the aberration is within the dynamic range of the sensor.

2.1 Linear reconstruction

The linear reconstructor for small wavefront aberrations can be made by Taylor expanding Eq. (3) around a zero input phase:

I=Ii+Ir2IiIr sin (ϕr)ϕi.$I = {I_i} + {I_r} - 2\sqrt {{I_i}{I_r}} \sin \left( {{\phi _r}} \right){\phi _i}.$(7)

From this, the linear reconstructor follows as

ϕ1sin (ϕr)I − Ii − Ir2 IiIr.$\phi \approx {{ - 1} \over {\sin \left( {{\phi _r}} \right)}}{{I - {I_i} - {I_r}} \over {2\sqrt {{I_i}{I_r}} }}.$(8)

The linear approximation can also be used in an iterative approach. The Taylor expansion is then taken around the new reference phase for each iteration. This is already implicitly taken care of by updating the reference electric field.

2.2 Multistep nonlinear reconstruction

The nonlinear reconstructor can account for many of the effects of the interferometric measurement. There is one big limitation. A sinusoidal signal will have the same value twice over the [0,2π] range, which reduces the reconstructed phase range from both the arceos and arcsin to [0, π]. This causes a phase degeneracy on an interval that is smaller than the optical phase wrapping interval. Phase degeneracy can be circumvented using multiple measurements with different reference phases. This approach is called phase-shifting interferometry (PSI) and is the standard approach for metrology systems based on interferometers (Groot 2011).

There are many different ways to create a different phase response. The phase shift can be applied either by changing the depth of the ZWFS dot (Wallace et al. 2011; Doelman et al. 2019) or by adding a phase diversity in the incoming wavefront. Modification of the incoming wavefront is a very attractive solution; there is no need for additional hardware because an AO system running in closed-loop automatically creates a time series of different wavefronts. The method that is proposed here is valid for both approaches of phase modulation.

Each measurement in the time series will have its own reference phase. This leads to a system of equations:

[ I1InIN ]=[ I1,i+I1,r+2 I1,iI1,r cos(ϕ1,iϕ1,r)In,i+In,r+2 In,iIn,r cos(ϕn,iϕn,r)IN,i+IN,r+2 IN,iIN,r cos(ϕN,iϕN,r) ]$\left[ {\matrix{ {{I_1}} \cr \vdots \cr {{I_n}} \cr \vdots \cr {{I_N}} \cr } } \right] = \left[ {\matrix{ {{I_{1,i}} + {I_{1,r}} + 2\sqrt {{I_{1,i}}{I_{1,r}}} \cos \left( {{\phi _{1,i}} - {\phi _{1,r}}} \right)} \cr \vdots \cr {{I_{n,i}} + {I_{n,r}} + 2\sqrt {{I_{n,i}}{I_{n,r}}} \cos \left( {{\phi _{n,i}} - {\phi _{n,r}}} \right)} \cr \vdots \cr {{I_{N,i}} + {I_{N,r}} + 2\sqrt {{I_{N,i}}{I_{N,r}}} \cos \left( {{\phi _{N,i}} - {\phi _{N,r}}} \right)} \cr } } \right]$(9)

A least-squares solution to this system can be found by expanding each cosine term in the above equation using the trigonometry identity cos(x + y) = cos(x) cos(y) − sin(x) sin(y). Each of the cosine terms can be written as cos(ϕi,oϕi,r) = cos(ϕoδϕiϕi,r) with δϕi the phase change at step i with respect to the incoming phase. This can be expanded using the trigonometry identity to

cos(ϕo) cos(δϕi+ϕi,r)sin(ϕo) sin(δϕi+ϕi,r).$\cos \left( {{\phi _o}} \right)\cos \left( {\delta {\phi _i} + {\phi _{i,r}}} \right) - \sin \left( {{\phi _o}} \right)\sin \left( {\delta {\phi _i} + {\phi _{i,r}}} \right).$(10)

The last part of the equation is an inner product between the vector [cos(δϕi + ϕi,r), − sin(δϕi + ϕi,r] and [cos(ϕo), sin(ϕo)]. The systems of equations in Eq. (9) can be rewritten in terms of a single matrix vector multiplication:

[ Z1ZnZN ]=[ cos(δϕ1+ϕ1,r)sin(δϕ1+ϕ1,r)cos(δϕn+ϕn,r)sin(δϕn+ϕn,r)cos(δϕN+ϕN,r)sin(δϕN+ϕN,r) ]  [ cosϕosinϕo ].$\left[ {\matrix{ {{Z_1}} \cr \vdots \cr {{Z_n}} \cr \vdots \cr {{Z_N}} \cr } } \right] = \left[ {\matrix{ {\cos \left( {\delta {\phi _1} + {\phi _{1,r}}} \right)} & {} & { - \sin \left( {\delta {\phi _1} + {\phi _{1,r}}} \right)} \cr {} & \vdots & {} \cr {\cos \left( {\delta {\phi _n} + {\phi _{n,r}}} \right)} & {} & { - \sin \left( {\delta {\phi _n} + {\phi _{n,r}}} \right)} \cr {} & \vdots & {} \cr {\cos \left( {\delta {\phi _N} + {\phi _{N,r}}} \right)} & {} & { - \sin \left( {\delta {\phi _N} + {\phi _{N,r}}} \right)} \cr } } \right]\left[ {\matrix{ {\cos \,{\phi _o}} \cr {\sin {\phi _o}} \cr } } \right].$(11)

Here, the variable Zn is the post-processed intensity,

Zn=In − In,i − In,r2 In,iIn,r.${Z_n} = {{{I_n} - {I_{n,i}} - {I_{n,r}}} \over {2\sqrt {{I_{n,i}}{I_{n,r}}} }}.$(12)

The post-processed intensities are all on the left hand side while the right hand side contains all the modified cosines. The system of equations can be solved by inverting the cosine matrix on the right hand side. The result is that the cosine and sine term of the incoming phase can be fitted.

This set of equations has to be solved for every pixel in the detector. Recent work has already shown that many small sets of equations can be solved at multiple kHz speeds using GPUs (Haffert et al. 2021). It is therefore not unrealistic to solve such systems of equations upon every iteration during closed-loop control. The final step combines the sine and cosine components to get the full phase,

ϕ=arctan[ sinϕ/cosϕ ].$\phi = \arctan \left[ {{{\sin \phi } \mathord{\left/ {\vphantom {{\sin \phi } {\cos \phi }}} \right. \kern-\nulldelimiterspace} {\cos \phi }}} \right].$(13)

An additional computational burden comes from the optical model propagation to determine the reference beam. The propagation through the ZWFS can be significantly sped up by using a semi-analytical propagation method (Soummer et al. 2007). This method can reduce the ZWFS propagation to two Tall-and-Skinny matrix-vector multiplications, which can be calculated extremely quickly on a GPU.

2.3 Multiwavelength phase unwrapping

Phase unwrapping is required for phase errors that are larger than 1 λ peak to valley. The only way to get around this limitation is to use either multiwavelength measurements (Cheng & Wyant 1985) or apply some knowledge about the spatial correlations in order to perform spatial phase unwrapping (Ghiglia & Pritt 1998). There is a downside to spatial phase unwrapping in that there must be spatial correlations between adjacent pixels, which might not always be the case (e.g., petal and segments modes for the Extremely Large Telescope (ELT) and the Giant Magellan Telescope (GMT; Haffert et al. 2022a; Bertrou-Cantou et al. 2022). Multiwavelength measurements can break the phase wrapping degeneracy because a particular optical path difference (OPD) error (δ) will result in a different amount of phase error for each wavelength (ϕ = 2π/λ × δ). The unwrapped phase can be estimated using the measurements at two wavelengths:

δ^=12π/λ12π/λ2arg(ei(ϕ1ϕ2)).$\hat \delta = {1 \over {{{2\pi } \mathord{\left/ {\vphantom {{2\pi } {{\lambda _1}}}} \right. \kern-\nulldelimiterspace} {{\lambda _1}}} - {{2\pi } \mathord{\left/ {\vphantom {{2\pi } {{\lambda _2}}}} \right. \kern-\nulldelimiterspace} {{\lambda _2}}}}}\arg \left( {{e^{i\left( {{\phi _1} - {\phi _2}} \right)}}} \right).$(14)

More sophisticated phase unwrapping techniques also exist that are robust against measurement noise (Guo et al. 2022).

3 Simulations

The different reconstructors can be compared with one another using numerical simulations. The same setup was used for all simulations presented here, and all of them make use of HCIPy, an open-source object-oriented framework written in Python for performing end-to-end simulations of high-contrast imaging instruments (Por et al. 2018). The standard simulation uses a clear aperture with a telescope diameter of 1 m. The exact diameter is not relevant for these sets of simulations because the aberrations are normalized with respect to the diameter of the pupil. The sampling was set to 58 pixels across the pupil. The size of the ZWFS dot was set at 1.5λ/D (this size is near optimal sensitivity; see Appendix 3.3) and the central wavelength of the simulations is 1 µm.

3.1 Required iterations

The proposed algorithm uses an iterative approach to update the reference electric field after each estimation of the phase. Upon each iteration, the phase estimate should improve from the updated reference electric field. This behavior is investigated for all three reconstructors (linear, nonlinear, and phase-shifting multistep nonlinear) and for three different aberration strengths: 0.01 rad rms, 0.10 rad rms, and 0.50 rad rms. The aberrations are created from a power-law power spectral density (PSD) with a power-law exponent of −2.5. A total of 31 different trials were performed for each combination of reconstructor and aberration strength to average over statistical effects. The phase-shifting multistep approach requires multiple measurements with a known phase diversity. This set of simulations is used to determine the open-loop reconstruction performance. This means that no wavefront control is done. Therefore, a phase diversity needs to be added to the system. The main purpose of the phase diversity is to probe the entire wavefront. The most efficient way to do this is by adding white noise to each actuator, because white noise has a uniform spectral density. Simulations have been performed using other phase probes, such as low-order Zernike modes. Overall the white-noise phase probes showed the best performance, which is why they were used for comparison with the other reconstructors. Five measurements are taken in sequence with random white noise applied with 0.2 rad rms. The results of this analysis are shown in Fig. 1.

Both the linear and nonlinear algorithm converge to input phase to within machine precision (10−15 relative error for 64 bit floats) for small aberrations (0.01 and 0.1 rad rms). The nonlinear algorithm does this in fewer iterations than the linear algorithm, even for the smallest aberration (0.01 rad rms). This shows that it is always better to use the nonlinear algorithm than the linear approximation. There is also no benefit to choosing the linear over the nonlinear reconstructor in terms of computational complexity. However, both fail when the aberration is stronger, which is shown by the results of the 0.5 rad rms aberrations. The multistep nonlinear algorithm is able to converge for the strongest aberration. The results show that a maximum of 20 steps are required to reach the machine precision limit. This is clearly much more precise than any reconstruction algorithm needs to be, because any amount of measurement noise quickly limits the accuracy.

3.2 Dynamic range increase

The iterative algorithm improves the reconstruction quality at all levels of aberration. The increase in performance is explored by stepping through a range of aberrations from 0.001 rad rms to about 1.4 rad rms. Each step has 50 trials, where each trial uses a different realization of the incoming phase. The phase aberration is created using a PSD with a −2.5 exponent in its power law. The incoming aberration is scaled to the required standard deviation after the creation of the phase pattern. The reconstruction algorithms are nonlinear. This means that the actual reconstruction quality not only depends on the total rms but also on how this power is distributed across different modes. Nonlinear coupling might lead to situations where certain combination of modes are more difficult to estimate. Therefore, the residual error at each input aberration is a distribution and not a single number. This distribution is shown in Fig. 2.

The linear and nonlinear reconstructors show similar behavior. After one iteration, the performance of the nonlinear reconstructor is slightly better. However, after 20 iterations, both reach a similar performance. The distributions after 20 iterations are flatter for all reconstructors, indicating a higher reconstruction quality. The linear and nonlinear reconstructors work perfectly until about 0.25 rad rms. After this point, both have decreased performance that roughly scales linearly with the incoming rms. The iterated linear algorithm shows the poorest scaling; its reconstruction error increases exponentially at around 0.75 rad rms. The nonlinear algorithm behaves better and remains stable across all input rms.. There does not seem to be any difference in its scaling behavior for large input aberrations. The multistep nonlinear reconstructor shows excellent performance at both 1 and 20 iterations; it is superior to the other two reconstructors in terms of performance. Adding iterations to the multistep algorithm improves its performance even more. The reconstructed error remains flat from around 0 until 0.75 rad rms, beyond which it starts to decrease in performance. The error distribution also starts to become discrete at that point. A closer look at those cases shows that the error is caused by phase wrapping effects. Therefore, there is no possible way to distinguish between those cases and this is therefore the limit of the monochromatic ZWFS.

Adding multiple wavelengths to the measurements helps to break the degeneracy. With two wavelengths, it is possible to almost double the dynamic range to an rms of 1.4 rad (Fig. 3). At that point, the performance of the reconstructor starts to break down because of two issues: The first is that the phase starts to wrap multiple times. This can be solved by using more sophisticated spectral phase unwrappers. The second issue is that the reconstructed monochromatic phase is severely underestimated. This might be an effect of the iterative nature of the reconstructor; it starts with an assumption of a diffraction-limited beam, which creates a reference electric field with a high amplitude. The amplitude becomes much lower if the aberrations are stronger because most of the light will leak around the phase dot. This can only be solved by using a different reconstruction approach, such as machine learning (Allan et al. 2020).

thumbnail Fig. 1

Root mean square of the residuals for different reconstructed as a function of the number of iterations. The different colored lines show the results for the linear (blue), nonlinear (purple), and the multistep (green) reconstructors. The top (0.01 rad rms), middle (0.1 rad rms), and bottom panels (0.5 rad rms) show the result for different aberration strengths. The shaded area represents the 16–84% percentile, which corresponds to the 1σ bounds. The linear and nonlinear reconstructors are able to reach machine precision for small aberrations (10−15 relative error for 64 bit floats). The nonlinear algorithm converges more quickly than the linear approach. The multistep nonlinear reconstructor converges for all considered aberration strengths.

3.3 Photon noise sensitivity

When using an iterative algorithm, we must consider the possible influence of noise and how it propagates through the iterations. The photon noise sensitivity of the nonlinear algorithms is estimated using

σ=1sN,$\sigma = {1 \over {s\sqrt N }},$(15)

where σ is the reconstructed phase rms, s is the sensor sensitivity, and N the number of photons that are used for the measurement. In order to average over statistical noise, 100 different noisy realizations are generated for each number of photons N. The phase reconstruction error is then determined as a function of N. From this, the sensitivity parameter s can be estimated by fitting Eq. (15) to the data. The results are shown in Fig. 4 for a variety of ZWFS parameters and telescope apertures. The nonlinear iterative algorithm almost reaches the fundamental sensitivity limit for ZWFS with dot diameters of about 1.5 λ/D. The noise behavior as a function of dot size and central obscuration is very similar to the sensitivity with linear reconstructors (Chambouleyron et al. 2021). The simulations demonstrate that the iterative nonlinear algorithm does not affect the photon noise sensitivity of the ZWFS. This behavior is also expected because the solution from Eq. (3) is the MLE solution. The updates on the reference electric field are minimally impacted by noise because the reference electric field is a low-pass-filtered version of the incoming wavefront (Steeves et al. 2020). The low-pass filter reduces the impact of noise.

thumbnail Fig. 2

Distribution of the reconstruction errors for different reconstructors (left, linear; middle, nonlinear; right, multistep nonlinear). The top and bottom rows of panels show the error distribution after 1 and after 20 iterations, respectively.

thumbnail Fig. 3

Distribution of the reconstruction errors for the multi-wavelength measurements. The distribution shows a linear trend with increasing input rms, which can be attributed to the limited number of monochromatic iterations. The reconstructor starts to break down around an rms of 1.4 rad, which is 0.23 waves. At this point the phase is starting to wrap multiples times.

3.4 Closed-loop simulations

For most AO systems, it is not possible to take multiple measurements with different phase probes. For AO systems working in closed-loop, another possibility for the PSI approach is to make use of the closed-loop telemetry. The commands that are sent at every iteration to correct for wavefront aberrations are in fact also phase probes. Therefore, it is possible to make use of the closed-loop commands for the multistep algorithm. This makes it a very efficient algorithm, as no iterations are necessary to do the probing. This is similar to the “Fast and Furious” focal plane wavefront sensing technique, which uses the control history to solve for degeneracies (Keller et al. 2012; Wilby et al. 2018; Bos et al. 2020). The closed-loop control works by keeping track of the last M measurements and control commands that are sent to the DM. The overall phase disturbance is assumed to be static during this period. This is a good assumption for AO systems running at several kHz or for ZWFSs that are used to track quasi-static speckles (e.g., ZELDA on SPHERE). The main challenge is the bookkeeping of the actual phase differences that are applied on the DM. The phase difference is the accumulation of the control signals. If the control signal at timestep i is vi, then the introduced phase difference at timestep j is δϕj = vj–1vj–2. The time lag of the AO system is assumed to be 1 frame. The full temporal response of the DM has to be taken into account if the dynamics are more complicated than integer delays. The simulations in this work assume a lag of 1 frame.

Figure 5 shows the wavefront residual rms during closed-loop operations. For small wavefront aberrations, all three algorithms converge to zero wavefront error. However, only the phase-shifting multistep algorithm converges for the wavefront aberration starting at 0.8 rad rms. The single-step solutions converge to the wrong phase correction because of the phase degeneracy in the reconstruction. The multistep approach is free from this constraint.

thumbnail Fig. 4

Photon noise sensitivity of the ZWFS for different mask diameters and central obscurations (different colored lines).

thumbnail Fig. 5

Closed-loop performance as a function of the number of iterations. The solid curve is the median performance across 15 different realizations and the shaded area depicts the 1σ boundary. The different colors correspond to the different reconstruction algorithms.

4 Discussion and conclusion

This work shows how the phase reconstruction of the Zernike wavefront sensor can be extended using iterative (non)linear reconstructors. The iterative reconstructors increase the accuracy of the ZWFS to the machine precision limit. The ZWFS reconstructor is also extended using PSI techniques. These techniques help the ZWFS to circumvent numerical phase wrapping limits and to increase its dynamic range by a factor of three. The new algorithm, the multistep nonlinear reconstructor, extends the working range of the ZWFS by a factor of four. The main simulations presented here were performed with monochromatic light to demonstrate the iterative and multistep reconstructors. The approach is still applicable for measurements with broadband light, which introduce only a very small error (<0.01 rad rms); see Appendix B. Extending the measurements by using multiple wavebands almost doubles the dynamic range again, allowing the ZWFS to work down to 15% Strehl. Such measurements are enabled by energy-resolving detectors, such as MKIDS (O’Connor et al. 2019).

The ZWFS has increased in popularity over recent years as an ideal wavefront sensor for high-contrast imaging and extreme AO systems. Until now, most instruments only used it as a final stage sensor to sense noncommon path aberrations, because of its small dynamic range (N’Diaye et al. 2016; Vigan et al. 2019; van Kooten et al. 2022; Hours et al. 2022). However, it is also possible to use the ZWFS as a second stage wavefront sensor. It will most likely still need to operate at IR wavelengths to have a sufficiently large dynamic range. Such an approach is envisioned for the Giant Magellan Adapative Optics eXtreme (GMagAO-X) instrument (Haffert et al. 2022b). A multistage AO system is planned for this instrument, where a first-stage AO system is used to control the 4 K low-order deformable mirror. This cleans up the wavefront aberrations to such a degree that the ZWFS can be used for the high-order loop that controls the 21.000 actuator deformable mirror (Males et al. 2022b; Close et al. 2022). Lab experiments are underway to verify the proposed algorithms for real-life feasibility.

Acknowledgements

Support for this work was provided by NASA through the NASA Hubble Fellowship grant #HST-HF2-51436.001-A awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Incorporated, under NASA contract NAS5-26555.

Appendix A Maximum-likelihood derivation

The maximum-likelihood solution starts by considering the likelihood. We are considering two cases for the Zernike wavefront sensor; a Gaussian-noise model and a Poisson-noise model. A major advantage to using the Zernike WFS is that it is a direct wavefront sensor. There is no information required from neighbor pixels to reconstruct the phase of a single pixel. Therefore, we can consider all pixels as independent problems, making it easy to find the MLE solution.

Appendix A.1 Gaussian-noise maximum-likelihood estimator

The likelihood, ℒi, of a Gaussian-distributed variable xi ~ N(µi, σi) is

i=12πσi2exp((xiui)2σi2).${{\cal L}_i} = {1 \over {\sqrt {2\pi \sigma _i^2} }}\exp \,\left( { - {{{{\left( {{x_i} - {u_i}} \right)}^2}} \over {\sigma _i^2}}} \right).$(A.1)

The data point xi is the measured intensity value and σi the noise for that data point. The expectation value follows from Eq. 3 as μi=Io+Ir+2Io,Ir cos  (ϕoϕr)${\mu _i} = {I_o} + {I_r} + 2\,\sqrt {{I_o},{I_r}} \,{\rm{cos}}\,\,\left( {{\phi _o} - {\phi _r}} \right)$. The MLE of the phase is found by maximizing the likelihood with respect to the phase. It is possible to directly optimize the likelihood, but it is easier to optimize the logarithm of the likelihood. Maximizing Eq. A.1 is the same as finding the maximum of the logarithm of the equation,

ϕ^=argmax ϕi=argmaxϕlogi.$\hat \phi = \mathop {\arg \,\max }\limits_\phi {{\cal L}_i} = \mathop {\arg \,\max }\limits_\phi \,\log {{\cal L}_i}.$(A.2)

The maximum can be found by equating the derivative of the log-likelihood to zero,

ϕolog=122πϕoσi2ϕo(xiμi)2σi2=0.${\partial \over {\partial {\phi _{\rm{o}}}}}\log {\cal L} = - {1 \over 2}2\pi {\partial \over {\partial {\phi _{\rm{o}}}}}\sigma _i^2 - {\partial \over {\partial {\phi _{\rm{o}}}}}{{{{\left( {{x_i} - {\mu _i}} \right)}^2}} \over {\sigma _i^2}} = 0.$(A.3)

Here, only the second term depends on the incoming phase.

ϕolog=2(xiμi)σi2ϕoμi=0.${\partial \over {\partial {\phi _{\rm{o}}}}}\log {\cal L} = 2{{\left( {{x_i} - {\mu _i}} \right)} \over {\sigma _i^2}}{\partial \over {\partial {\phi _{\rm{o}}}}}{\mu _i} = 0.$(A.4)

The equation shows we have two possible solutions:

xiμi=0ϕoμi=0.${x_i} - {\mu _i} = 0 \vee {\partial \over {\partial {\phi _{\rm{o}}}}}{\mu _i} = 0.$(A.5)

The first equation is just the direct solution of Eq. 3. The other solution results in

sin (ϕoϕr)=0.$\sin \left( {{\phi _o} - {\phi _r}} \right) = 0.$(A.6)

This solution no longer contains the measurement xi and does not give us any information. Second, the first solution always has a smaller log-likelihood because xiµi is perfectly canceled, which is not always the case for the second equation.

Appendix A.2 Poisson-noise maximum-likelihood estimator

The Poisson-noise MLE can be derived in a straightforward manner. The Poisson distribution is parameterized by the rate parameter v. The corresponding distribution is

i=exp(v)vxixi!.${{\cal L}_i} = \exp \left( { - v} \right){{{v^{{x_i}}}} \over {{x_i}!}}.$(A.7)

The log-likelihood is

logi=vlogxi!+xilogv.$\log {{\cal L}_i} = - v - \log {x_i}! + {x_i}\log v.$(A.8)

For the ZWFS, the photon rate is set by Eq. 3. This means that v=I0+Ir+2IoIr cos (ϕoϕr)$v = {I_0} + {I_r} + 2\,\sqrt {{I_o}{I_r}} \,{\rm{cos}}\,\left( {{\phi _o} - {\phi _r}} \right)$. The derivative of the log-likelihood with respect to the incoming phase, ϕo, is

ϕologi=ϕov+xiϕologv,${\partial \over {\partial {\phi _{\rm{o}}}}}\log {{\cal L}_i} = - {\partial \over {\partial {\phi _{\rm{o}}}}}v + {x_i}{\partial \over {\partial {\phi _{\rm{o}}}}}\log v,$(A.9) ϕologi=2 IoIr sin δϕxi2 IoIr sin δϕIo+Ir+2IoIr cos δϕ.${\partial \over {\partial {\phi _{\rm{o}}}}}\log {{\cal L}_i} = 2\sqrt {{I_o}{I_r}} \sin \delta \phi - {x_i}{{2\sqrt {{I_o}{I_r}} \sin \delta \phi } \over {{I_o} + {I_r} + 2\sqrt {{I_o}{I_r}} \cos \delta \phi }}.$(A.10)

The MLE solution can be found by equating the derivative to zero and solving for ϕo. After some algebra this becomes

Io+Ir+2 IoIr cos δϕ=xi.${I_o} + {I_r} + 2\sqrt {{I_o}{I_r}} \cos \delta \phi = {x_i}.$(A.11)

The MLE solution is just the direct solution of the Eq. 3.

The derivations in this appendix show that the solution in Equation 4 is the MLE solution at all photon rates.

Appendix A.3 MLE solution for a linear combination of modes

The equivariance principle of MLE solutions can be used to find the MLE solution in mode spaces other than the direct pixel basis that was used in the derivations in the previous section. This principle states that if θ^${\hat \theta }$ is an MLE of θ, then τ^=g(θ^)$\hat \tau = g\left( {\hat \theta } \right)$ is the MLE solution of g(θ) for any function g (Casella & Berger 2021). Using this principle, we can find the MLE solution of α = o with α the mode coefficients and P the modal projection matrix. The MLE solution of ϕo is found with Eq. 4, which means that the MLE solution of the mode coefficients is α^=Pϕ^o$\hat \alpha = P{{\hat \phi }_o}\,$. The best way to estimate the modal coefficients is to start by solving in pixel space and then project that phase onto the mode basis.

Appendix B Broadband effects

Appendix B.1 Broadband reconstruction errors

The proposed reconstructors are all based on the monochromatic interference pattern. Real AO systems always have a broad spectral bandwidth in order to capture as many photons as possible. The response will be an average over all wavelengths. The spectral average operator $\left\langle \cdot \right\rangle $ is defined as

f =λ0Δλ/2λ0+Δλ/2f(λ)dλλ0Δλ/2λ0+Δλ/2dλ.$\left\langle f \right\rangle = {{\int_{{\lambda _0} - {\rm{\Delta }}{\lambda \mathord{\left/ {\vphantom {\lambda 2}} \right. \kern-\nulldelimiterspace} 2}}^{{\lambda _0} + {\rm{\Delta }}{\lambda \mathord{\left/ {\vphantom {\lambda 2}} \right. \kern-\nulldelimiterspace} 2}} {f\left( \lambda \right){\rm{d}}\lambda } } \over {\int_{{\lambda _0} - {\rm{\Delta }}{\lambda \mathord{\left/ {\vphantom {\lambda 2}} \right. \kern-\nulldelimiterspace} 2}}^{{\lambda _0} + {\rm{\Delta }}{\lambda \mathord{\left/ {\vphantom {\lambda 2}} \right. \kern-\nulldelimiterspace} 2}} {{\rm{d}}\lambda } }}.$(B.1)

The wavelength averaged ZWFS response is

I = Ii + Ir +2IiIr cos (ϕiϕr) .$\left\langle I \right\rangle = \left\langle {{I_i}} \right\rangle + \left\langle {{I_r}} \right\rangle + 2\left\langle {\sqrt {{I_i}{I_r}} \cos \left( {{\phi _i} - {\phi _r}} \right)} \right\rangle .$(B.2)

The first term is just the incoming pupil, which is nearly achromatic. The only chromatic effects that causes differential amplitude variations in the pupil are due to scintillation. However, such effects are usually very small. The pupil image can be assumed to be achromatic, Ii =Ii$\left\langle {{I_i}} \right\rangle = {I_i}$. The second term is the spectral average of the reference electric field. The changes in the reference beam across wavelengths of of low order. The reference field is created by the circular ZWFS mask. This creates an Airy pattern in the pupil. This pupil-plane Airy pattern scales with wavelength due to diffraction, which is the first chromatic effect. The second comes from the incoming beam. The amount of power that passes through the ZWFS dimple also scales with wavelength due to diffraction; lower wavelengths have a smaller PSF. In summary, shorter wavelengths have a more concentrated reference beam with higher power, while longer wavelengths have a more spread out reference beam with less power. These two effects are partially compensated by spectral averaging. Figure B.1 shows the phase error from the change in the reference field averaged over a 50% bandwidth. The phase error consists of 0.009 rad rms focus error and 0.0036 rad rms of spherical error. This is a very small error and can be safely neglected in the reconstruction.

thumbnail Fig. B.1

Phase error in radians due to the chromatic change in the reference field.

The inversion of the measurement equation is complicated by the last term. It is a weighted integral of a cosine, which cannot simply be replaced by a single cosine. The error from the monochromatic assumption can be estimated by making a Taylor expansion of the integral for small bandwidths around the center wavelength λ0,

λ0Δλ/2λ0+Δλ/2IiIr cos (ϕiϕr)+B(λλ0)+C(λλ0)2dλ.$\int_{{\lambda _0} - {\rm{\Delta }}{\lambda \mathord{\left/ {\vphantom {\lambda 2}} \right. \kern-\nulldelimiterspace} 2}}^{{\lambda _0} + {\rm{\Delta }}{\lambda \mathord{\left/ {\vphantom {\lambda 2}} \right. \kern-\nulldelimiterspace} 2}} {\sqrt {{I_i}{I_r}} \cos \left( {{\phi _i} - {\phi _r}} \right) + B\left( {\lambda - {\lambda _0}} \right) + C{{\left( {\lambda - {\lambda _0}} \right)}^2}{\rm{d}}\lambda } .$(B.3)

Here, B and C are the first- and second-order expansion coefficients. The first-order term will cancel if the integral is taken over a symmetric domain around the center wavelength. This will leave us with

2 IiIr cos (ϕiϕr)Δλ+C6Δλ3.$2\sqrt {{I_i}{I_r}} \cos \left( {{\phi _i} - {\phi _r}} \right){\rm{\Delta }}\lambda + {C \over 6}{\rm{\Delta }}{\lambda ^3}.$(B.4)

The spectral average is normalized by the total bandwidth Δλ. Therefore, the integral term has to be divided by Δλ. Substituting this back into the original broadband equation leaves us with

I =Ii+Ir+2 IiIr cos (ϕiϕr)+C6Δλ2.$\left\langle I \right\rangle = {I_i} + {I_r} + 2\sqrt {{I_i}{I_r}} \cos \left( {{\phi _i} - {\phi _r}} \right) + {C \over 6}{\rm{\Delta }}{\lambda ^2}.$(B.5)

This derivation shows that using the monochromatic solution has a second-order error in spectral bandwidth. The second-order behavior is made clear by Figure B.2, which shows the residual wavefront error rms as a function of spectral bandwidth. The typical bandwidths for wavefront sensors are at the 0.1 to 0.5 level. At such bandwidths, the reconstruction error is roughly 0.01 rad rms. This is substantially lower than any measurement noise. The reconstruction error created by a finite bandwidth will therefore be of no consequence and the monochromatic reconstructor can be used without considering bandwidth effects.

thumbnail Fig. B.2

Reconstruction error as a function of bandwidth for the different reconstructors. The top, middle, and bottom panels show the reconstruction error for different input rms values. The top and middle panels show that the reconstruction error grows as the bandwidth squared. The bottom panel shows the results for an rms value that is outside the dynamic range of the single-step algorithms.

References

  1. Allan, G., Kang, I., Douglas, E. S., et al. 2020, SPIE, 11443, 743 [Google Scholar]
  2. Bertrou-Cantou, A., Gendron, E., Rousset, G., et al. 2022, A&A, 658, A49 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Boccaletti, A., Chauvin, G., Wildi, F., et al. 2022, SPIE, 12184, 622 [Google Scholar]
  4. Bos, S. P., Vievard, S., Wilby, M. J., et al. 2020, A&A, 639, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  5. Casella, G., & Berger, R. L. 2021, Statistical Inference (Beijing: Cengage Learning) [Google Scholar]
  6. Chambouleyron, V., Fauvarque, O., Sauvage, J.-F., et al. 2021, A&A, 650, L8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  7. Chambouleyron, V., Fauvarque, O., Plantet, C., et al. 2022, SPIE, 12185, 932 [Google Scholar]
  8. Chauvin, G., Lagrange, A.-M., Dumas, C., et al. 2004, A&A, 425, L29 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Cheng, Y.-Y., & Wyant, J. C. 1985, Appl. Optics, 24, 804 [NASA ADS] [CrossRef] [Google Scholar]
  10. Close, L. M., Males, J. R., Durney, O., et al. 2022, SPIE, 12185, 668 [Google Scholar]
  11. Doelman, D. S., Auer, F. F., Escuti, M. J., & Snik, F. 2019, Optics Lett., 44, 17 [NASA ADS] [CrossRef] [Google Scholar]
  12. Fauvarque, O., Neichel, B., Fusco, T., Sauvage, J.-F., & Girault, O. 2017, J. Astron. Telesc. Instrum. Syst., 3, 019001 [NASA ADS] [CrossRef] [Google Scholar]
  13. Fitzgerald, M. P., Sallum, S., Millar-Blanchaer, M. A., et al. 2022, SPIE, 12184, 736 [Google Scholar]
  14. Ghiglia, D. C., & Pritt, M. D. 1998, Phase Unwrapping: Theory, Algorithms, and Software (Wiley Interscience Publication) [Google Scholar]
  15. Groot, P. D. 2011, in Optical Measurement of Surface Topography (Berlin: Springer), 167 [CrossRef] [Google Scholar]
  16. Guo, R., Lu, S., Wu, Y., Zhang, M., & Wang, F. 2022, Opt. Commun., 510, 127965 [NASA ADS] [CrossRef] [Google Scholar]
  17. Guyon, O. 2005, ApJ, 629, 592 [NASA ADS] [CrossRef] [Google Scholar]
  18. Haffert, S. Y., Males, J. R., Close, L. M., et al. 2021, SPIE, 11823, 413 [Google Scholar]
  19. Haffert, S. Y., Close, L. M., Hedglen, A. D., et al. 2022a, J. Astron. Telesc. Instrum. Syst., 8, 021513 [NASA ADS] [CrossRef] [Google Scholar]
  20. Haffert, S. Y., Males, J. R., Close, L. M., et al. 2022b, SPIE, 12185, 1071 [Google Scholar]
  21. Hours, A., Carlotti, A., Mouillet, D., et al. 2022, SPIE, 12185, 736 [Google Scholar]
  22. Kasper, M., Urra, N. C., Pathak, P., et al. 2021, The Messenger, 182, 38 [NASA ADS] [Google Scholar]
  23. Keller, C. U., Korkiakoski, V., Doelman, N., et al. 2012, SPIE, 8447, 749 [Google Scholar]
  24. Lozi, J., Jovanovic, N., Guyon, O., et al. 2019, PASP, 131, 044503 [NASA ADS] [CrossRef] [Google Scholar]
  25. Males, J. R., Close, L. M., Haffert, S., et al. 2022a, SPIE, 12185, 61 [Google Scholar]
  26. Males, J. R., Close, L. M., Haffert, S. Y., et al. 2022b, SPIE, 12185, 1381 [Google Scholar]
  27. N’Diaye, M., Dohlen, K., Fusco, T., & Paul, B. 2013, A&A, 555, A94 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  28. N’Diaye, M., Vigan, A., Dohlen, K., et al. 2016, A&A, 592, A79 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  29. O’Connor, E., Shearer, A., & O’Brien, K. 2019, New Astron. Rev., 87, 101526 [CrossRef] [Google Scholar]
  30. Perera, S., Maire, J., Do Ó. C. R., et al. 2022, SPIE, 12185, 1298 [Google Scholar]
  31. Pinna, E., Esposito, S., Hinz, P., et al. 2016, SPIE, 9909, 1183 [Google Scholar]
  32. Por, E. H., Haffert, S. Y., Radhakrishnan, V. M., et al. 2018, Proc. SPIE, 10703, 1070342 [Google Scholar]
  33. Ragazzoni, R. 1996, J. Modern Opt., 43, 289 [NASA ADS] [CrossRef] [Google Scholar]
  34. Schatz, L., Males, J. R., Correia, C. M., et al. 2021, J. Astron. Telesc. Instrum. Syst., 7, 049001 [NASA ADS] [CrossRef] [Google Scholar]
  35. Soummer, R., Pueyo, L., Sivaramakrishnan, A., & Vanderbei, R. J. 2007, Opt. Express, 15, 15935 [CrossRef] [Google Scholar]
  36. Steeves, J., Wallace, J. K., Kettenbeil, C., & Jewell, J. 2020, Optica, 7, 1267 [NASA ADS] [CrossRef] [Google Scholar]
  37. van Kooten, M. A., Ragland, S., Jensen-Clem, R., et al. 2022, ApJ, 932, 109 [NASA ADS] [CrossRef] [Google Scholar]
  38. Vérinaud, C. 2004, Opt. Commun., 233, 27 [Google Scholar]
  39. Vérinaud, C., Heritier, C., Kasper, M., & Tallon, M. 2024, 682, A27 [Google Scholar]
  40. Vigan, A., N’diaye, M., Dohlen, K., et al. 2019, A&A, 629, A11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  41. Wallace, J. K., Rao, S., Jensen-Clem, R. M., & Serabyn, G. 2011, SPIE, 8126, 110 [Google Scholar]
  42. Wilby, M. J., Keller, C. U., Sauvage, J.-F., et al. 2018, A&A, 615, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

All Figures

thumbnail Fig. 1

Root mean square of the residuals for different reconstructed as a function of the number of iterations. The different colored lines show the results for the linear (blue), nonlinear (purple), and the multistep (green) reconstructors. The top (0.01 rad rms), middle (0.1 rad rms), and bottom panels (0.5 rad rms) show the result for different aberration strengths. The shaded area represents the 16–84% percentile, which corresponds to the 1σ bounds. The linear and nonlinear reconstructors are able to reach machine precision for small aberrations (10−15 relative error for 64 bit floats). The nonlinear algorithm converges more quickly than the linear approach. The multistep nonlinear reconstructor converges for all considered aberration strengths.

In the text
thumbnail Fig. 2

Distribution of the reconstruction errors for different reconstructors (left, linear; middle, nonlinear; right, multistep nonlinear). The top and bottom rows of panels show the error distribution after 1 and after 20 iterations, respectively.

In the text
thumbnail Fig. 3

Distribution of the reconstruction errors for the multi-wavelength measurements. The distribution shows a linear trend with increasing input rms, which can be attributed to the limited number of monochromatic iterations. The reconstructor starts to break down around an rms of 1.4 rad, which is 0.23 waves. At this point the phase is starting to wrap multiples times.

In the text
thumbnail Fig. 4

Photon noise sensitivity of the ZWFS for different mask diameters and central obscurations (different colored lines).

In the text
thumbnail Fig. 5

Closed-loop performance as a function of the number of iterations. The solid curve is the median performance across 15 different realizations and the shaded area depicts the 1σ boundary. The different colors correspond to the different reconstruction algorithms.

In the text
thumbnail Fig. B.1

Phase error in radians due to the chromatic change in the reference field.

In the text
thumbnail Fig. B.2

Reconstruction error as a function of bandwidth for the different reconstructors. The top, middle, and bottom panels show the reconstruction error for different input rms values. The top and middle panels show that the reconstruction error grows as the bandwidth squared. The bottom panel shows the results for an rms value that is outside the dynamic range of the single-step algorithms.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.