Neural networks and PCA coefficients to identify and correct aberrations in adaptive optics

Context. Static and quasi-static aberrations represent a great limit for high-contrast imaging in large telescopes. Among them the most important ones are all the aberrations not corrected by the adaptive optics (AO) system, which are called non-common path aberrations (NCPA). Several techniques have been proposed to mitigate it. The typical approach is to set an offset on the AO system with exactly the opposite sign of the NCPA in order to correct for the aberrations introduced by all the optical components downstream the wave-front sensor (WFS) up to the science camera. An estimate of the NCPA can be obtained with a trial-and-error approach or by more sophisticated techniques of focal-plane wave-front sensing. Aims. In all cases, a fast procedure is desirable to limit the telescope downtime and to repeat, if needed, the correction procedure to cope with the temporal variation of the NCPA. Very recently, new approaches based on neural networks (NNs) have also been proposed as an alternative. Methods. In this work, through simulated images, we test the application of a supervised NN for the mitigation of NCPAs in high-contrast imaging at visible wavelengths and, in particular, we investigate the possibility of applying this method to fast imagers such as SHARK-VIS, the forthcoming visible-band high-contrast imager for the Large Binocular Telescope (LBT). Results. Preliminary results show a measurement accuracy of the NCPA of 2 nm root mean square (RMS) for each sensed Zernike mode in turbulence-free conditions, and 5 nm RMS per mode when the residual turbulence has a wave-front error (WFE) of approximately 42 . 5 nm RMS, a typical value during LBT AO system calibration. This measurement is sufficient to guarantee that, after correction, NCPA residuals in the system are negligible compared to the typical WFE > 100 nm RMS of the best AO systems at large telescopes. Conclusions. Our simulations show this method is robust even in the presence of turbulence-induced aberrations that are not labelled in the training phase of the NN. The method could thus be used in a real-world setting by offloading a corrective static offset to the AO system of a telescope to mitigate the NCPA.


Introduction
Astronomical high-contrast imaging demands accurate wavefront control to minimise residual atmospheric aberrations that can hamper the detection of very faint objects, for example for extrasolar planet imaging (e.g.Mesa et al. 2019;Lagrange et al. 2009;Marois et al. 2008).A significant limitation to the achievement of the theoretical contrast limit is represented by static or quasi-static aberrations that can result from, for example, optical imperfections, misalignments, or thermal effects.This is the case for both space-(e.g.Hubble Space Telescope Gonsalves & Nisenson 1991) and ground-based observations.In the latter case, particularly with regard to adaptive optics (AO) systems, imaging instruments operated at ground-based telescopes are affected by wave-front errors (WFEs) that result from different contributions: residual atmospheric turbulence not corrected by the AO, telescope vibrations, gravity-or temperature-induced deformations of the optics, and the performance of the optical elements themselves.Image aberrations originating downstream of the WFS are not sensed by the wave-front sensor (WFS) itself, and therefore are not corrected by the AO system, thus contributing to the increase of the overall WFE, as noted in Esposito et al. (2020).
In this regard, wave-front sensing performed by the science camera would be ideal to effectively mitigate these residual aberrations (Jedrzejewski et al. 1993), which are generally referred to as non-common path aberrations (NCPA).
Achieving a robust focal plane or close-to-focal plane wavefront sensing, based on the analysis of the point spread function (PSF), would also offer a series of advantages in different fields of astronomical instrumentation, including post-facto image deconvolution and analysis.In general, it would also be useful whenever a mitigation of the residual WFE is desired but a dedicated WFS or a secondary focal plane are not available, leading to analysis of extrafocal plane image momentum, as in Terreri et al. (2019b).The Strehl ratio (SR) is calculated using Szapiel (1983); Ross (2009).The NCPA effect is clearly stronger at the shorter H α wavelength.
Even if smaller than the residual amplitude of the seeing aberrations, NCPAs can still represent a severe limitation (see Fig. 1), in particular in applications such as high-contrast imaging of circumstellar discs and exoplanets where extreme wave-front control is required (Potier et al. 2021).
Preliminary on-sky tests (Esposito et al. 2012) conducted on the First Light Adaptive Optics (FLAO) instrument at the Large Binocular Telescope (LBT, Arizona USA; Hill 2010) have shown an overall residual WFE ranging from an 80 nm root mean square (RMS) for a star of R Mag = 7.5 to a 218 nm root mean square RMS for a star of R Mag = 14.5, as also reported in Pinna et al. (2015Pinna et al. ( , 2016)); Agapito & Pinna (2019) and provided by the Strehl Calculator1 .According to Li Causi et al. (2017); Causi et al. (2018); Hope et al. (2019); Stangalini et al. (2018); Mattioli et al. (2019), these results limit the contrast achievable at visible wavelengths to about 5 × 10 −5 at 100 mas separation from the central star under typical observing conditions (seeing ranging from 0.8 ′′ to 1.2 ′′ ).
NCPAs contribute to the overall WFE budget and show typical sub-Hz frequencies and amplitudes in the range of a few tens of nanometers RMS, as in Pedichini et al. (2017); Esposito et al. (2020).NCPAs usually affect the low orders of the Zernike components of the wave-front (in this article Zernike components are indexed according to the Noll convention Noll 1976), and the effects are stronger as the wavelength becomes shorter as confirmed by experimental and theoretical results reported in Vassallo et al. (2018); Esposito et al. (2020); Szapiel (1983); Ross (2009).In the case of point-like sources, in the last decade new wave-front sensing technologies (Guyon et al. 2006;Korkiakoski et al. 2014) have been developed based on the analysis of images taken directly from the science camera: these are referred to as focal-plane wave-front sensors (FPWFS).A few examples are phase diversity (PD; Gonsalves 1982), curvature WFSs (Roddier 1988), the Roddier Test (Roddier et al. 1988), the Iterative Fourier Transform method (Roddier & Roddier 1991), the Photonic Lantern WFS (Norris et al. 2020), and approaches based on machine learning (Orban de Xivry et al. 2021) (Menduiña Fernández 2021).These methods are useful to measure NCPAs and mitigate their effect.In particular, on AOassisted telescopes this is done through the addition of a specific static offset to the AO real-time correction, which corresponds to the opposite value of the measured NCPA.However, one of the principal drawbacks of this approach is the degeneracy at the focal plane of the PSF with respect to the sign of some aberration components, that is the fact that for some aberrations, opposite sign amplitudes produce the same PSF at the focal plane; hence, there is a difficulty in disentangling the contribution of these aberration terms.This effect can be seen in Fig. 2, where pupil plane wave fronts and their corresponding PSFs are compared: for even Zernike orders n = 2, 4, 6, ..., a positive (+50 nm) or negative (−50 nm) Optical Path Difference (OPD) produces exactly the same PSF, while for odd Zernike orders n = 3, 5, 7, ... the PSFs are different.In the extrafocal plane, however, this degeneracy is removed, so the use of extra-focal plane images allows the retrieval of information even for even radial-order Zernike components.In order to solve this ambiguity, most methods exploit images from intra-and extra-focal planes, such as the Roddier test (Roddier & Roddier 1993), which uses both, or the Phase Diversity (PD) technique described in Gonsalves (1982), which uses one image taken infocus and another taken with a known amount of defocus, as done by SPHERE at the VLT by Sauvage et al. (2011).It is, however, also possible to solve this problem with a single PSF image A70, page 2 of 13 taken at a defocused plane by using its morphology and intensity distribution; an example of this approach is the Wave-front Reconstruction Software (WRS), which is currently used at the Large Binocular Camera (LBC) at the LBT to preset the shape of the active primary mirror and was used to identify the first low Zernike order static aberration values in Terreri et al. (2019b).
Building on this knowledge base, we introduce the idea of using an Artificial Intelligence (AI) approach to FPWFS, using Principal Component Analysis (PCA) encoded out-offocus images as input for a Neural Network (NN) to retrieve aberrations in optical systems (Hotelling 1933, Terreri et al. 2019a).Neural Networks are extremely efficient in mapping non-linear relationships between input and output in regression problems and they have almost real-time results once the model is trained.The present work used a Multi-Layer Perceptron (MLP) NN to analyse defocused images and measure their wave-front aberrations.All images reported in this work are simulations generated following an optical prescription corresponding to the optical design of SHARK-VIS (Fig. 3), the new forthcoming high-contrast and high-resolution visible-band imager for the LBT, as described in Pedichini et al. (2016) and Mattioli et al. (2018).
We generated two datasets of simulated images in different conditions and, for each one of them, an NN model was trained to measure the first 21 Zernike polynomials.In the first dataset, only aberrations encoded in the first 21 Zernike polynomials were injected, while the second set also contained high-order aberrations typical of the LBT AO residual turbulence that occurs during closed dome calibrations (Esposito et al. 2012).The range of this is from 40 to 50 nm RMS.The latter situation closely represents the conditions of an instrument such as SHARK-VIS at the LBT, when calibration procedures intended to minimise NCPAs are executed by using the telescope diffraction-limited source of ARGOS, as in Schwab et al. (2010).

Methods
Our wave-front analysis procedure takes extra-focal-plane images of optical systems, generated with physical optics propagation, and applies a two-stage process to retrieve the Zernike expansion (Noll 1976) of our original wave-front.First, a PCA (Hotelling 1933) of the image is performed, and then the resulting PCA coefficients are analysed by a properly trained NN using the python code in the scikit-learn (Ross et al. 2008) library package.
Neural Networks (Sect.2.3) are algorithms designed to recognise patterns in data by establishing correlations between input and desired output data.As described in Van Der Malsburg (1986), a simple toy model can be graphically described by a net of lines connecting a set of nodes called 'neurons', representing input numbers, with other nodes representing output, possibly including auxiliary nodes in-between.These neurons are usually structured in three different blocks called 'layers': the input layer, the hidden layer(s), and the output layer.The net works by propagating the node values from the input layer to the output layer, processing the values at each node and propagating the result to the next.At each node, the separate input values are each multiplied by a weight and summed.A bias value is then added to the sum.The resulting value is then fed as an argument to an activation function; the output value of the function is the value for that neuron and is fed to the next layer.Activation functions can be of different types (Sigmoid, ReLU, atan,...) and they are the element that makes the whole process non-linear.In our case, the activation function is a fixed ReLU for all neurons and the biases are set to 0. The values in the output layer are thus dependant on the weights.Before the NN can be used, the weights must first be optimised: this is the 'learning' process of the net and is called "training' the NN.In this phase, data consisting of known corresponding inputs and outputs are fed to the NN, respectively to the input and output layers, and used to iteratively update connection weights.After the training process is completed, it is possible to apply the NN to new and unknown datasets in a 'testing' phase.In this phase (which can be also called the 'working' or 'prediction' phase), data are fed to the input layer and the NN outputs the result(s).
In our case, the NN is trained to output a measure of the wave-front aberrations from input-simulated images whose aberration is known.Once the training is completed, a different set of images with different known aberrations (the testing dataset) is fed to the NN; the output is then compared with the known input and the accuracy of the prediction is measured.For our purpose, we generated different sets of defocused PSF images for the training and testing phases (Sect.2.1) simulating different Zernike aberrations by means of an optical propagation software.We then applied a PCA (Sect.2.2) to the simulated defocused PSF images, which encodes each image into a vector of a few hundred coefficients.These coefficients are used as values for the input layer nodes, and the values of the output layer correspond to the Zernike coefficients for all the simulated PSFs.In practice, the NN maps PCA coefficients to Zernike coefficients, allowing us to directly measure the wave-front aberrations from a defocused image of the PSF.In Sect.2.4 we describe how atmospheric turbulence residuals were simulated and then introduced together with Zernike coefficients.

Simulations
All the simulations used in this work were made using the proper_lib python optical propagation library developed by Krist (2007).Using this library, it is possible to inject known aberrations expressed in terms of Zernike polynomials as where W is the final wave front that generates our images, W 0 is the original wave front (with no NCPA), and N i c i Z i is the NCPA aberration, with c i being the amplitude RMS of the ith Zernike polynomial, using the Noll index (Noll 1976) on the index i to identify the Zernike order.
For our simulations, we generated several series of random c i with a uniform distribution in the ±50 nm RMS range, with one series for each image of each set.These were organised into 2D-arrays ('Zernike dataset'), with row indices corresponding to the image number and column indices corresponding to the i-th Zernike polynomial.Hereafter, all the measures regarding the WFE are to be considered as RMS.
Running the optical propagation we generated an image for each coefficient set and created a cube of images ('Image Dataset').Optical simulation parameters were chosen in order to reproduce the working conditions of SHARK-VIS.Table 1 summarises the parameters used to generate the image datasets for each of the three stages of NN learning and application (i.e.training, validation, and testing; see Sect.2.3).
Each set of Zernike coefficients is thus mapped to each image of the 'image dataset'.Out-of-focus images were initially generated on a 1024 × 1024 grid to guarantee super-Nyquist sampling of high-order components and avoid aliasing; these were then re-sampled onto a smaller 147 × 147 grid to reproduce the real sampling of the 6.5 µm pixels of the SHARK-VIS science camera detector (Andor Zyla as described in Andor 2018).Each image was then registered to make the array centre coincide with the position of its photo-centre and cropped to a dimension of 137 × 137 pixels to remove tip and tilt (Z 2 and Z 3 ).
The expected NCPA typical of an imaging instrument such as SHARK-VIS, resulting from Pedichini et al. (2017), is of the order of a few tens of nanometers, with the dominant aberrations described by Zernike polynomials from Z 4 to Z 11 .To make our simulation robust and realistic, in Eq. ( 1) we set N = {1...21} and the c i range of each component to ±50 nm RMS in the Noll representation.The piston term Z 1 is neglected because it is a constant phase offset on the optical pupil of the system and does not affect the in-focus or out-of-focus PSF.Tip (Z 2 ) and tilt (Z 3 ) do not modify the PSF shapes on the image plane, only its position; their effects are eliminated by the centring during the dataset pre-processing and are not relevant for the NCPA analysis.Consequently, it is the Zernike from 4 to 21 that are processed by the NN.We set the value of the defocusing of the images to 10 mm after the nominal focal plane in order for it to be compatible with the range of the focus stage of the SHARK-VIS science camera, which is between −10 mm and +40 mm.This value was found to work through preliminary tests: as stated in Sect.5, we plan to further explore the effect of different defocus positions in future studies.The number of image samples for the NN training phase was set to 10 000.
The simulation tool ran on a laptop with an INTEL core i5 3320M processor.The average time for the generation (with or without AO residual turbulence) of one aberrated extra-focal image of 1024 × 1024 pixels is ∼2.5 s when using N = 21; a further ∼0.5 s are necessary for the post-processing (centring, cropping, and disc storage).

PCA
After its generation, each data set is processed with a PCA that computes a basis of eigen images from an image dataset (Fig. 4).This technique is widely used to reduce data dimensions as in JPEG encoding for image compression, PSF reconstruction for angular differential imaging (ADI) as in Amara & Quanz (2012), and pattern recognition.Every image of the dataset can then be expressed in terms of this basis as a sum of eigen images with a specific set of coefficients.These coefficients are used as input for the NN in place of the actual images, thus requiring only a small number of neurons in the NN input layer.We see that 500 components are enough to keep most of the information present in the image dataset, as can be verified in Fig. 5, where the maximum absolute values of each PCA coefficient over the whole dataset and their cumulative function are plotted, versus the PCA component order.It is evident that after 400 modes there is little added information; the high number of components needed to properly encode our problem reveals the faintness of information present in the NCPA signal we are trying to measure.The datasets are thus encoded on one side by the Zernike coefficients of the injected aberrations and on the other side by the PCA coefficients (Fig. 6).The NN training process creates a robust correlation between these two descriptors.A70, page 4 of 13

Artificial neural network and multi-layer perceptron
Artificial Intelligence is a branch of informatics focused on the creation of machines that are able, using specific algorithms, to make decisions autonomously when stimulated by external inputs.Inside this AI family, there are 'deep-learning' methods based on artificial NNs.In the biological world, cognitive functions are provided by the presence of 'neurons' which are arranged in an interconnected network.The multi-layer perceptron (MLP) is one of the most used NN layouts and is detailed in Van Der Malsburg (1986).This layout (Fig. 7) has a series of artificial neurons organised in layers, where each neuron communicates with all the others placed on the previous and next layers; the first and last layers manage the input and output functions and are called input and output layers, while all the others are referred to as the hidden layers.
In order to tune the connection weights and biases of the neurons, the NN must be trained.In this phase, the input and output are both fed with known linked data: the process is iterative, and at each step, known as an 'epoch', the weights and biases are modified, until the RMS error between the NN output and the expected output is below the required precision.There are different optimisation algorithms to train the NN weights.We used ADAM, which is described in Kingma & Ba (2017).Such training processes usually require two phases: the NN is first trained on a large dataset, called a training set; then, after convergence, the NN is applied to a smaller dataset, called a validation set to check if the prediction error is still low on different data than that used for training.If this condition is not met the network parameters are changed until a validated NN model is achieved.
In our case, each set is composed of PCA data and Zernike data.During training, we fed the NN with the PCA training set as input and the Zernike training set as output.At specific regular epoch intervals, a PCA validation set and its corresponding Zernike-validation-set elements are injected into the NN.The validation-set elements, which represent 20% of the entire training dataset, are useful to avoid the NN ending up on local minima.After typically 100 epochs, the accuracy stabilizes just below 4 nm, which is an acceptable level for our kind of application (see Fig. 8).Once a NN training phase has been concluded, it is said that a specific NN model has been generated.In our NN, the input layer has N input = 500 neurons, corresponding to the number of PCA components, the single hidden layer has N hidden ≫ N input and the output layer has 18 neurons corresponding to the number of Zernike components.Our hidden layer is composed of N hidden = 10 000 neurons with a ReLU activation function; this number was chosen by trial and error, following the principle expressed in Hornik et al. (1989) whereby a NN with one hidden layer whose number of neurons tends to infinity can approximate any functional relation.
The capture range of the NN is defined by the training process, for instance if we train the NN with Zernike coefficients randomly distributed in the ±50 nm range then the NN will be able to recognise aberrations in this range.After the training stage, the testing set is used to test our concept and explore the NN performance on unknown data, which are to those used for training.The testing set is composed of N test = 100 images.These are decomposed on the same PCA eigen image basis as the training dataset to obtain the PCA testing set.The corresponding Zernike testing set contains the aberrations encoded into each image and the values against which the NN output will be compared to verify its accuracy in working conditions.
We also tested how fast the NN reached a plateau for different numbers of PCA components [200,300,500] and for what mean absolute error (MAE) over all the Zernike components.In Fig. 8, we notice that with 500 components the plateau is reached faster (~100 epochs ) and, more importantly, with a lower MAE (~2.4 nm ).In the same figure, with 300 components the plateau is reached at 300 epochs and ~3.2 nm MAE and with 200 at 150 epochs and ~3.9 nm MAE.A70, page 5 of 13

AO residual and NCPA
As stated previously, we analysed two scenarios, one where only NCPA aberrations were present and one with both NCPA and AO residuals.AO residuals are high-order aberrations left after the correction of the AO system with a timescale τ that corresponds to the atmospheric turbulence timescale and is of the order of a few milliseconds Stangalini et al. (2017).These residuals combine with other present aberrations (e.g.NCPA), further reducing the SR at the science focal plane.The final wave front can then be expressed as follows: (2) In order to create a NN model capable of identifying quasi-static aberrations within images affected by turbulence, i.e.AO residuals, we had to create a realistic model of the turbulence itself.Creating a realistic model of the turbulence requires comparing the model with a real-case measurement at the telescope that is to be simulated.This in turn implies comparing long-exposure PSFs of the simulated and real cases.
Long-and short-exposure PSFs in AO systems are different.Short-exposure PSFs (t exp < τ) are dominated by a bright diffraction core and high-frequency spatial structures called 'speckles'.Long-exposure PSFs (t exp ≫ τ) have a central core with the presence of diffraction rings that are similar to those of the Airy diffraction pattern, while the halo is smooth and resembles that of a seeing-dominated PSF (Fig. 9).The transition point between the two regions is called the control radius and is identified by a bump in the PSF luminosity profile (see Fig. 10).The control radius is a function of the number of corrected modes in the AO system according to the following equation: where R c is the control radius, N modes the total number of actuators in the AO system, λ is the observing wavelength and D is the diameter of the telescope pupil.For the LBT forerunner experiment, N modes = 550, λ = 630 nm, and D = 8.2 m, resulting in R c = 210 mas (Pedichini et al. 2017;Tyson 2011).
To obtain the long-exposure PSF profiles, we added random phases to our optical simulation of the wave front at the  pupil plane.The phase screen representing the random phases was sampled every 18.6 mm across the LBT aperture of 8.2 m diameter; its PSD was derived from a low-pass filtered Kolmogorov spectrum similar to the response of the LBT AO system with a WFE value of approximately 42.5 ± 2 nm RMS.This Optical Path Difference (OPD) phase screen, which simulates typical AO residual turbulence during closed dome calibrations, has been verified experimentally at LBT in the same conditions (Quirós-Pacheco et al. 2010).Data were saved in a large fits file of 4096 × 4096 pixel (see Fig. 11).
We then took random 440 × 440 pixel portions of this phase screen and injected them in simulations generating a series of images equivalent to an exposure time of 1 ms.We summed 1000 of these short-exposure images in order to create the PSF of an equivalent long-exposure of 1 s and normalised it to its peak to compare its luminosity profile with the real one obtained from the SHARK-VIS forerunner experiment PSF at 1 s exposure (Pedichini et al. 2017); see Figs. 10 and 9.The simulated PSF has a control radius that matches the real one well, and the core and wing profiles also match.The phase screen of AO residuals is also useful to simulate short integration times.As A70, page 7 of 13 A&A 666, A70 (2022) the refreshed timescale of speckles is of the order of milliseconds, as expected according to Stangalini et al. (2017), using a high-speed acquisition frame rate ( >1 KHz) it is possible to freeze the atmospheric turbulence and acquire images similar to our AO residual simulation (see Fig. 9).This condition is still valid in every image plane, including the extra-focal one; furthermore, the defocusing performs a smoothing and broadening effect on high spatial frequency structures, which is helpful for the PCA encoding.We can exploit the fast imaging approach and the corresponding atmospheric frozen flow condition to infer the NCPA as an average of the NN prediction results.The testing set with AO residuals was obtained from a set of constant Zernike coefficients, representing a fixed NCPA and random AO residuals added as different OPDs for each of the simulated images of the dataset.This is a good representation of a short burst of 1 ms exposure images acquired during a calibration run when the NCPA is static and the AO residual varies on milliseconds scales.The AO residual can be considered as noise for the NN and thus is not expected to influence the training process.

NCPA mitigation concept with AO residuals
To test the NN efficiency, we first generated different sets of images: each image of one set had the same fixed NCPA.For each set, we let the NN measure the NCPA (Zernike coefficients) of every image.These values were then averaged over the whole set to estimate the total NCPA for that set.We then measured the MAE between the injected and measured values and implemented an iterative process where successive NCPA estimates are removed and the corrected images are then fed to the NN for a new estimate.In a real-world operation, this is equivalent to measuring the NCPA, sending the corresponding correction to the AO system for removal and repeating the measure until the NCPA value is below a certain threshold.
This process was simulated with the proper_lib and the 42.5 nm phase screen.The NCPA measured by the NN in one set is multiplied by a gain factor and then subtracted from the initial NCPA (which had been injected in that set): the new NCPA is then injected in the simulation tool to create a new set of images, as per the following equation: where i is the iteration step, NCPA(i) is the NCPA at step i, NN(i) is the NN output at step i, and g is the gain factor.Notes.AO residual RMS WFE is 42.5 nm.3).
We let the simulation run for 15 iterations to verify if oscillations were triggered for certain gain values.Results are presented in Sect.3.4.

Results
The values from NN models with and without AO residual turbulence are summarised in Table 2.The recovered RMS WFEs are the final residual RMS after applying the NCPA calculated by the NN as a correction to the input wave front.The results are the average over the whole set.

Ideal condition: No turbulence
The median MAE achieved from our predictions in this case is 3.67 nm, and in Fig. 12 we compare the magnitudes of injected and retrieved values versus Zernike order for a specific case, with values reported in Table 3.We further reduced the WFE and flattened the NCPA components by applying recursive predictions and corrections, as described in Sect.2.5.
In Fig. 13, we plot the RMS WFE values before and after the prediction and correction phase.As shown, the NN reduces the RMS WFE by a factor of almost 6, taking the average on the test set from WFE initial ∼ 124 nm to WFE final ∼ 21 nm.In Fig. 14, we show the effects on the PSF after the correction of NCPA by the NN prediction in one of the simulated cases.The NN prediction, however, can then be re-applied after the first step of correction and so on with the aim of improving the final results, as in Fig. 15, where we find that using a process gain of 0.8 makes it possible to flatten the wave front to a final MAE of ≈10 nm RMS A70, page 8 of 13 Table 3. Injected Zernike coefficients and the corresponding values found by the NN in two images of the testsets for the two different cases of no turbulence and AO residual 42.5 nm turbulence.
Zernike n°Values from Fig. 12 [nm] Values from Fig. 16   in a few steps despite the initial NCPA components.This result is also mathematically supported by the quadratic sum of the MAE of the 18 Zernike components (each one with a MAE of 2.36 nm RMS) giving a final value for the WFE of ≈10 nm RMS.

Addition of AO residual turbulence: NN behaviour
In this section, we cover the behaviour of the NN when typical AO residual aberrations are present and when using short exposure images as NN input (see Sect. 2.4).In this test set, every image has both different AO residuals and different NCPA aberrations.We increased the number of training samples from    3).
10 000 to 15 000 in order to improve the NN prediction capability, reaching a median value of MAE = 5.17 nm. Figure 16 shows a specific case to compare the magnitudes of injected and retrieved values versus Zernike order.Exact numerical values are reported in Table 3.The WFE after prediction by the NN and correction is less than half of the initial value and close to the AO correction without NCPA (Fig. 17).
A70, page 9 of 13 A&A 666, A70 (2022) This test shows that the NN is effective in identifying aberrations with sufficient accuracy even if high-order aberrations from AO residuals are present.This includes aberrations that were not labelled during the training phase, i.e. the NN has not been 'told' what they are.

Estimation of static NCPA with AO residuals
In a sequence of short exposure images, we have different WFEs, and, consequently PSFs with different sharpnesses, depending on the combination between the NCPA and the instantaneous AO residual; thus, it is not possible to estimate quasi-static aberrations and minimise the average WFE with a NN prediction on a single image.This is why several images arranged in a time series are needed for a correct evaluation of the NCPA.To verify this, we fixed a static NCPA value for a dataset of 100 images and let the NN infer the NCPA value.The predictions of the injected NCPA are shown for each Zernike component in Fig. 18.The average of the NN predictions for each Zernike is assumed as the NCPA estimate for that component.For most of the NCPA Zernike components, the typical estimate error is less than 10 nm.

NCPA mitigation process
As described in Sect.2.5, we simulated NCPA correction in a real-world system by iteratively estimating it from a set of images, correcting the same set and feeding the corrected images to the NN for a new estimate.In Fig. 19 we present the results of 15 steps and of the values of the single Zernike coefficients after each successive step in the iteration.
We then ran three 15-step simulations, each with different gain settings.The results are presented in Fig. 20 with the resulting WFE at every step of simulated correction; a gain, g = 1 is still stable and allows the NCPA to be removed after only three simulation steps.This means that a few hundred images are sufficient to correct the NCPA of the system.In a fast imaging instrument, such as SHARK-VIS, this corresponds to about a few seconds of exposure, assuming the time for NN predictions is negligible with respect to the exposure time.The RMS WFE goes from the initial value of WFE ≈ 185 nm to WFE ≈ 43 nm, which is the typical value of the residual turbulence left by AO in closed dome conditions.
We also verified the NN mitigation process on 20 sets of data with different NCPA values to see if the method is robust in different conditions.We used a gain value of g = 1, 42.5 nm of residual AO turbulence and only three iteration steps.
Figure 21 shows the results, that NCPA are almost completely removed (see Sect. 3.2). Figure 22 shows the effect of the NN NCPA mitigation process on the sharpness of the PSF in one of the simulated cases.It is worth noting that, in our specific case, there is no temporal correlation between the images of the 100 image testing set because the AO residual is sampled randomly.This allows us to use a smaller number of samples in the testing set to give a solution with sufficient accuracy.
In a real-world setting, the states of atmospheric turbulence between consecutive frames during acquisitions will not be independent, and this decoupling effect will not be valid.Thus, to achieve sufficient accuracy at each step of the mitigation process, images should be acquired with short exposures (t exp ranging from 1 to 2 ms) and sampled more slowly than the typical turbulence de-correlation time, which is between 10 ms and 100 ms, as expected by Stangalini et al. (2017).

Discussion
Comparing the mitigation processes without turbulence (Fig. 15) to that with AO residuals (Fig. 20), it is possible to notice that, counter-intuitively, the measurements of the NCPA is more accurate when working with turbulence than without it, passing from a value of approximately 10 nm to almost 7 nm RMS (Fig. 19).We also have to take into account that the NN trained on turbulence has almost double the MAE for each Zernike, so the expected value for the mitigation process (given by the squared sum of the MAE for each Zernike) is 5.17 * √ 18 ≈ 22 nm.The reason is in the averaging of the NN predictions, which provides the system with a way to overcome the NN limits, making it perform better in the presence of turbulence on instruments capable of fast imaging.
Preliminary results (which will be the subject of future articles) show that the method is robust even in the presence of photon and detector noise, and also in more generic cases.Such cases would be, for example, spiders at the pupil plane, wide-band filter images instead of monochromatic light, coronagraphic images, different f#, and defocused offsets.
In Fig. 23, we present just a few of these aspects: the presence of spiders, the effect of photon and read-out noise (RON), and the use of a wide-band filter.We performed these tests both with and without turbulence (WFE = 42.5 nm RMS), basing them only on a single prediction by the NN in order to quickly estimate the potential of the method even in these scenarios.Furthermore, the NN used for these preliminary tests was trained on monochromatic images at λ = 656 nm without any spider or image noise and not on similar images, as should be the case.
We added in the optical propagation a mask corresponding to the LBT spiders in order to have a more realistic simulation and applied the mitigation process.We followed the same procedure considering a more realistic source dominated by Poisson noise (PN).We simulated a defocused guide star with a flux of 5 × 10 4 photons s −1 spread over a few thousand pixels, which was affected by photon noise and acquired with a detector read-out noise RON[RMS] = 1e − /pixel.Wide-band filter images were A70, page 10 of 13  generated with a spectral bandwidth of FWHM = 60 nm centred at λ = 656 nm coadding 12 monochromatic PSFs within the spectral bandwidth equally spaced out by 5 nm.As can be seen from the plots in Fig. 23, results are almost identical to the ones obtained by using monochromatic light in ideal conditions both in the presence and absence of turbulence.In all these described scenarios, we fixed the same amount of initial NCPA values and then performed the mitigation process as explained in Sect.2.5.
We note that once verified on real hardware (telescope or optical bench), the imaging-detector-with-NN approach may become a cheap substitute for WFSs in all cases in which these A70, page 11 of 13 A&A 666, A70 (2022)  are not retro-fittable in already developed and installed instrumentation, provided a focusing system is present.Despite its imperfections and limited accuracy with respect to a dedicated WFS or an optical interferometer, simulations show the NN is sufficiently precise to mitigate NCPA in AO systems where a final WFE less than 80−100 nm RMS is not expected.In the future, the team will investigate the opportunity to increase the number of used components to map the aberrations and the use  of different NNs tuned to different amplitudes of aberrations in a cascaded system where the first NN is trained on large aberration ranges (e.g.250 nm RMS) and the successive ones are instead tuned and optimised to smaller ranges.This would act as a multiple-step process in which the larger aberrations are measured and removed first, and smaller scale aberrations are then measured and removed by increasingly more finely tuned systems.

Conclusions
In this work, thanks to optical propagation simulations based on a simplified layout where we used monochromatic light, ideal optical elements without obstructions (such as spiders or occulters) and no photon noise, we demonstrated that a simple multi-layer perceptron NN is able to establish a robust relation between defocused PSF images and the corresponding pupil phase aberrations.This relation also holds in the presence of limited AO residual turbulence, such as the one present during closed-dome calibrations.This is possible thanks to a pre-processing step in which the dimensionality of the data is reduced by projecting images onto an orthonormal base of PCA eigen images and the pupil wave front phase onto Zernike polynomials.In the training phase, we feed the NN with the PCA components of a set of images and the Zernike coefficients from which they were generated.In the predicting phase, after being fed with the images to be analysed through their PCA components, the NN outputs the Zernike coefficients that describe the NCPA aberrations.
Typical WFE measurements, retrieved during our simulations, showed errors of a few nanometers per Zernike mode with a MAE ranging from 2 to 3 nm RMS that encouraged us to apply A70, page 12 of 13 A. Terreri et al.: Correction of static and quasi-static aberrations through Neural Networks and PCA this method to a more realistic case.Here, the AO residual turbulence typical of LBT closed-dome calibrations was added to the static pupil aberration.Here, by using a dedicated NN and starting from an initial RMS WFE of 180 nm, the final RMS WFE was drastically reduced to less than a few nm above the AO residual limit value of 42.5 nm RMS with only three steps of measurement and correction of the NCPA component.
Further studies in this field are planned to improve the MAE so that the method works even in the case of AO full turbulence residual, photon, and detector noise.This will be done by searching for an optimal defocus position, for the optimal ratio between the number of Zernike polynomials and the number of samples in the training stage, for different kind of basis to be used instead of PCA and Zernike, and for different NN layouts.The method presented in this work has two important features.The first is the possibility to train the NN to recognise static aberrations hidden by the larger and dynamic aberrations of atmospheric turbulence.The second is that the method is computationally fast, both in the training and application stages, because we do not use the whole image but its compact representation through PCA coefficients.In a fast imaging approach, this can be used to measure NCPA as the average of the NN prediction, taking care that the average is taken over a period at least one order of magnitude longer than the typical turbulence timescale.We expect that this method can be used directly on sky, once the NN is trained on a realistic model of the telescope optics or by injecting aberrations with its AO system, because it does not need additional hardware to be added to the instrumentation or telescope.We are currently working to confirm the results of these simulations on a dedicated optical bench in the ADONI laboratory at INAF (Rome) and at the ESO (Garching) GHOST bench Kasper et al. (2021).

Fig. 2 .
Fig. 2. Detailed view of the degeneracy problem for even radial orders (n) at focal plane.Top panel: aberrations induced by even radial-order Zernike with n = 2 (Z 4 , Z 5 , Z 6 ).Bottom panel: aberrations induced by odd radial-order Zernike with n = 3 (Z 7 , Z 8 , Z 9 ).Top rows: wavefront OPD at pupil.Centre rows: focal-plane images of aberrated PSFs.Bottom rows: extra-focal-plane images,obtained with 10 mm of defocus on an f /25 beam.The removal of the degeneracy is evident in each extra focal image (the WFE amplitude was always 50 nm RMS).

Fig. 4 .
Fig. 4. Example of the first and last 8 eigen images output by the PCA of one extra-focal dataset.The first PCA eigen images encode wide spatial scales in the image, while the last ones contain information on high spatial frequencies.

Fig. 5 .
Fig.5.Contribution of PCA components to image information.Blue: maximum absolute value for each PCA coefficient across all images used to create the basis.Orange: cumulative value of all previous PCA coefficients.The dashed grey line is the 0.8 limit reference reached using more than 400 PCA components.

Fig. 6 .
Fig. 6.Left matrix represents the PCA-training-set input of the NN and right matrix is the Zernike-training-set output of the NN.Each row of the two matrices represents one of the 10 000 images of the training dataset.

Fig. 8 .
Fig. 8. Mean absolute error (MAE) over the Zernike component vs. training epochs of the NN.A larger number of PCA components allows a faster and more accurate training of the NN.The asymptotic values for each PCA curve are reported above the plot final parts.

Fig. 9 .
Fig. 9. Comparison between simulated and on-sky images of forerunner PSFs at LBT. Top row: PSF-simulated images.Bottom row: forerunner at LBT PSF images at λ = 630 nm, f # = 44.Left column: Short 1 ms exposures.Right column: long 1 s exposures.In the real on-sky image, it is possible to see the slow wind effect left uncorrected by AO and causing an asymmetric scintillation halo structure, as explained in Madurowicz et al. (2019).

Fig. 11 .
Fig. 11.OPD layer phase screen (4096 × 4096 pixel) used to emulate LBT AO residual turbulence with WFE ≈ 42.5 nm RMS.The annulus shows a 440 pixel diameter disc corresponding to the real pupil of the telescope.

Fig. 12 .
Fig. 12.No turbulence case.Comparison of predicted (red) and injected (green) Zernike values of a single image of the testing set (exact values in Table3).

Fig. 13 .
Fig. 13.NCPA correction in the case of no turbulence.Dashed lines represent the mean, over the entire testing set, of initial and corrected values.Each sample represents a 'single iteration' of the NN prediction and simulated NCPA correction.

Fig. 14 .
Fig. 14.PSF of one case from Fig. 13 in log scaling.From left to right: NCPA aberrated; corrected for NCPA with only one NN prediction; diffraction-limited PSF (for comparison).

Fig. 15 .
Fig. 15.Effects of iterative application of the NCPA prediction and correction by NN on ten different cases of initial NCPA using a gain of 0.8.The MAE is always reduced to only 10 nm RMS in a few steps.

Fig. 16 .
Fig. 16.Turbulence case.Comparison of predicted (red) and injected (green) Zernike values of a single short exposure image of the testing set (exact values in Table3).

Fig. 17 .
Fig. 17.NCPA correction in case of ≈42.5 nm turbulence based on our NN model predictions.Dashed lines represent the mean, over the entire testing-set, of initial and corrected values.The red dashed line is the AO system limit without NCPA.In this test set every image has both different AO residuals and different NCPA aberrations.

Fig. 18 .
Fig. 18.Comparison of injected (red dotted line) and averaged prediction of NCPA component (red line) for each Z i in the simulation when NCPA is fixed over the whole dataset of 100 images while the turbulence is evolving.Blue dots are the single NN predictions for NCPA.

Fig. 19 .
Fig. 19.Mitigation iterative process.Barplots of the NCPA Zernike coefficients after each iteration.For each Zernike, the bars, from yellow to blue, show the value of the components almost decreasing at each iteration step.

Fig. 20 .
Fig. 20.Temporal evolution of NCPA mitigation process for one AO residual condition.NCPA aberrations' RMS WFE (solid lines) and total RMS WFE (dashed lines) versus iterations for different gain values.The NCPA is well mitigated after a few iterations, independently of the loop gain value.

Fig. 21 .
Fig. 21.WFE expressed in RMS [nm] versus a different set of NCPA.The applied NCPA correction is estimated by the average of NN predictions over 100 images.The red gradient bar represents the ±3σ error bar for the AO residual.

Fig. 22 .
Fig. 22. Long-exposure (100 ms) PSF of one case of Fig. 21 in log scaling.From left to right: NCPA aberrated with a WFE ≈ 185 nm RMS; NN corrected for NCPA; AO limit PSF (for comparison).

Fig. 23 .
Fig. 23.Comparative results of iterative mitigation process based only on a single NN prediction in case of presence of spider, a noisedominated faint source of 5 × 10 4 photons, and a wide-band filter.Solid lines represent the case without turbulence while dashed lines represent the case with an additional WFE = 42.5 nm RMS of turbulence.

Table 1 .
Settings used for the optical simulations and for the development of the NN.

Table 2 .
Summary of the results obtained for the WFE [nm] RMS by the NN prediction averaged for every dataset.