Issue |
A&A
Volume 664, August 2022
|
|
---|---|---|
Article Number | A71 | |
Number of page(s) | 15 | |
Section | Astronomical instrumentation | |
DOI | https://doi.org/10.1051/0004-6361/202243311 | |
Published online | 09 August 2022 |
Toward on-sky adaptive optics control using reinforcement learning
Model-based policy optimization for adaptive optics
1
Lappeenranta–Lahti University of Technology,
Lappeenranta, Finland
e-mail: jalo.nousiainen@lut.fi
2
University of Helsinki, Department of Computer Science,
Helsinki, Finland
3
European Southern Observatory,
Garching bei München, Germany
4
University of Arizona, Steward Observatory,
Tucson,
Arizona, USA
5
Wyant College of Optical Science, University of Arizona,
1630 E University Blvd,
Tucson,
AZ 85719
USA
6
Astrobiology Center, National Institutes of Natural Sciences,
2-21-1 Osawa, Mitaka,
Tokyo, JAPAN
7
National Astronomical Observatory of Japan, Subaru Telescope, National Institutes of Natural Sciences,
Hilo,
HI 96720
USA
8
Kirtland Air Force Base, Air Force Research Laboratory,
Albuquerque, NM, USA
Received:
11
February
2022
Accepted:
25
May
2022
Context. The direct imaging of potentially habitable exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based, extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current control laws of XAO systems leave strong residuals.
Aims. Current AO control strategies such as static matrix-based wavefront reconstruction and integrator control suffer from a temporal delay error and are sensitive to mis-registration, that is, to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction, and, therefore, reduce the residual flux in the coronagraphic point spread function (PSF).
Methods. We extend previous work in reinforcement learning for AO. The improved method, called the Policy Optimization for Adaptive Optics (PO4AO), learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensor (PWFS) for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using Magellan Adaptive Optics eXtreme system (MagAO-X) at the Steward laboratory.
Results. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors of 3–5 within the control region of deformable mirror and PWFS, both in simulation and in the laboratory. The presented method is also quick to train, that is, on timescales of typically 5–10 s, and the inference time is sufficiently small (<ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes.
Key words: instrumentation: high angular resolution / instrumentation: adaptive optics / atmospheric effects / methods: data analysis / techniques: high angular resolution / methods: numerical
© J. Nousiainen et al. 2022
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.