Slitless spectrophotometry with forward modelling: Principles and application to measuring atmospheric transmission

,


Introduction
Cosmology measures and interprets the evolution of the whole Universe.To probe its dynamics and understand the nature of dark energy, observers need to compute distances at different epochs from the light they receive in telescopes.The evolution of cosmological distances with time indicates how dark energy, dark matter, and matter interact and how they can be modelled.
Optical surveys use magnitude and colour comparisons to build a relative distance scale.For instance, type Ia supernovae (SNe Ia) revealed the presence of a dark energy component because they appeared fainter in the early Universe than was thought (Riess et al. 1998;Perlmutter et al. 1999;Betoule, M. et al. 2014;Scolnic et al. 2018).More precisely, because SNIa colours shift with the expansion of the Universe, high-redshift supernovae were fainter in red bands than what can be inferred from low-redshift supernovae observed in blue bands.This case underlines that colours need to be accurately calibrated in an optical survey to display the dynamics of the Universe (see e.g.Betoule et al. 2013).Every chromatic effect that alters the astral light distorts our dynamic perception of the expansion of the https://github.com/LSSTDESC/SpectractorUniverse, such as the galactic dust, the instrumental response, or the local atmospheric conditions.
In this paper, we present a forward-modelling method to analyse and extract data gathered with a dispersing element (grating or hologram) in the filter wheel of a telescope.We label our approach forward modelling because we implement a numerical simulation of the data-taking procedure that includes as much a priori knowledge as available, and then estimate model parameters from likelihood maximisation.This method is fundamentally different from the traditional flux-weighted sum orthogonal to the dispersion axis (Horne 1986;Robertson 1986) or the algebraic method that uses multiple images (Ryan et al. 2018) because it relies on physical modelling to directly describe the footprint of the spectrum on the imaging sensor.Deconvolution techniques and point spread function (PSF) modelling have been explored for optical fibre spectrographs (Bolton & Schlegel 2010;Li et al. 2019), but in our forward model, we proceed to build a physical model for the extraction of the spectra, and in particular, the atmospheric transmission from spectra.Our approach was inspired by the forward modelling developed in Outini & Copin (2020), and we applied it for punctual sources with the ultimate goal of measuring atmospheric transmission.
The scientific context of our work is the study of the atmospheric transmission variability via the repeated observation of stable (standard) stars.Burke et al. (2010) opened the path to controlling the optical survey photometry with dedicated measurements of atmospheric components by observing standard stars.Then, Burke et al. (2017) reached a 5 per mill (i.e. per thousand) relative photometric calibration between filters that covered the full optical and the near-infrared (near-IR) range and accounted for linear temporal variations of the atmosphere transmission over the nights.Because of the scale of the new SNe Ia surveys, the number of observed objects now reaches a point where even such an exquisite calibration becomes the dominant source of systematics.One of the challenges of modern SNe Ia cosmology thus is to be able to accurately measure and estimate the chromatic variations of the atmospheric transmission at the per-thousand level that allow us to probe for systematic variations, either nightly, seasonal, or directional.
While spectrophotometry at the required precision has been hinted at in The Nearby Supernova Factory et al. (2013); Rubin et al. (2022), it relies on the dedicated use of a specifically designed integral field spectrograph.Our current approach instead focuses on exploring the spectrophotometric possibilities offered by a much simpler design: We consider a slitless spectrograph, where a disperser (either a grating or a hologram) is inserted in the converging beam of a telescope in a regular filter wheel.This implementation is used in the Rubin Auxiliary Telescope (Aux-Tel) (Ingraham et al. 2020), as well as on the Star Direct Illumination Calibration Experiment (StarDICE) telescope.StarDICE is an experiment aiming at transferring to stars the unit of optical power (watt) defined at the National Institute of Standards and Technology (NIST) with a reference cryogenic radiometer, the Primary Optical Watt Radiometer (POWR) (Houston & Rice 2006;Hazenberg 2019;Betoule, Marc et al. 2023).
This paper describes the preparatory work for these projects and presents the analysis procedure we developed and tested on a few nights of data gathered at the Cerro Tololo Inter-American Observatory (CTIO).It describes some implementation choices and demonstrates that the forward modelling approach allows us to incrementally build a detailed understanding of the data that in the end can permit the direct extraction of the parameters used to describe the atmospheric transmission variability.
The first section of this paper describes the theory of slitless spectrophotometry, the basic implementation in Spectractor, and the data and simulation sets we used to assess the quality of the algorithm.Section 3 details the different ingredients of the Spectractor software, the assumptions and the implementation choices.In particular, we detail the regularised deconvolution technique at the heart of the process to obtain a prior for the forward model and to qualify the code on simulations.The application of Spectractor to extract spectra from on-sky CTIO data is described in section 4, while section 5 focuses on the measurement of the atmospheric transmission.Discussions and summaries conclude the paper in section 6.

Forward modelling of a slitless spectrogram
There are many different possible configurations for gathering spectroscopic data without the use of a slit.The slitless spectrograph configuration that we considered (a grating or hologram in a converging beam) can be implemented in different ways that could require special care in the forward-modelling analysis.For example, the field of view can be small and may contain only one star, or it can be crowded, with many source-dispersed images (so-called spectrograms hereafter) super-imposed on each other.Different detectors might span the field of view, with responses that need to be mapped and gaps that need to be accounted for.We did not try to abstractly solve all different situations.We therefore defer all the technical issues that we did not encounter to further work and concentrate on those we did and solved.
Within these restrictions, we consider in the following only the case of point sources and do not discuss, except for a passing mention in the next section, the case of extended sources such as galaxies or resolved planetary nebulae, nor the deblending of these extended sources with point sources.A convergent beam with incident angle θ 0 is focused on a CCD at position r 0 = (x 0 , y 0 ), but also passed through a grating with N eff grooves per millimetre at a distance D CCD above the CCD.The beam is deflected at an angle θ(λ) along the mean dispersion axis u, which forms an angle α with respect to the CCD x axis, but is focused somewhere above the sensor.Atmospheric refraction adds a supplementary dispersion along and transversally to the mean dispersion axis.

Description and geometry of a slitless spectrograph
The slitless spectrograph we consider can be seen as a grating with N grooves per millimetre, placed in a telescope beam at a distance D CCD from a charge-coupled device (CCD) sensor.In the following, the positions in the sensor plane are parametrised with the coordinates r = (x, y), and the z-axis points orthogonally toward the CCD.The disperser can be more or less complex and may be used in a convergent or in a parallel beam, and its resolution can vary, but in the end, it will spread the source light in different diffraction orders superimposed on the CCD, with a sky background that is also diffracted.Depending on the choice of N, on the size N CCD (in pixels) of the sensor, and on D CCD , different diffraction orders will finally be recorded by the sensor.The special case of the zeroth order is worth mentioning.While its presence on the image is not mandatory, knowing its centroid position r 0 can significantly facilitate the setting of the zero of the wavelength calibration.
The positive and negative diffraction orders are placed on each side of the zeroth order on a line forming an angle α with the x-axis.We parametrised the position along this dispersion axis with the coordinate u and transversally with the coordinate v.The zeroth order stands at coordinates (u 0 , 0) in the (u, v) coordinate system.If the instrument wavelength coverage spans more than one octave in wavelength, the different diffraction orders are superimposed on each other.
The source spectrogram is the total 2D CCD image formed by the cummulation of the dispersed light.All the notations are illustrated in Figure 1.
For a periodic grating placed inside a convergent beam instead of a parallel beam, this optical system response is astigmatic, that is, the image of a point source such as a star is not point-like on the sensor.Usually, the image is elliptical, and the redder the wavelength, the wider the ellipse (see e.g.Moniez et al. 2021).However, the centroid of these ellipses is still given by the classical grating formula (see e.g.Murty 1962;Hall 1966;Schroeder & Inc 2000), sin θ p (λ) − sin θ 0 = pN eff λ, (1) The angles are those of the projection in the plane perpendicular to the grating lines (see Figure 1); θ 0 is the angle of the projected telescope beam axis with respect to the normal to the grating surface, p is the diffraction order, θ p (λ) is the corresponding projected diffracted angle, and N eff is the effective spatial frequency of the grating lines at the position of the central ray 1 of the light beam (hereafter called chief ray).
The observer can choose the focus of the telescope.One common choice is to set the focus using the zeroth order, but if this is done, the spectrogram can be affected by defocusing effects at increasingly redder wavelengths (for periodic gratings).To minimise the defocusing effect and increase the spectrograph resolution, the focus for a particular wavelength can be optimised, but it is then more difficult to set the zero of the wavelength calibration with a defocused zeroth-order image.In the following, we assume that the focus has been made on the zeroth order unless otherwise specified.

Theoretical model of a spectrogram
A theoretically perfect image of a light source can be modelled by its spatio-spectral flux density C(r, λ).For a point source, we can separate the spectral and spatial distributions as with S * (λ) spectral energy density (SED) of the astrophysical object, and δ the Dirac distribution.The observed spectral and spatial distribution is then where T atm (λ|P a ) is the atmospheric transmission, depending on a set of atmospheric parameters P a , and T inst,p (λ) is the instrumental transmission (including the CCD quantum efficiency) for the diffraction order p.
1 For some dispersers this number of lines per millimetre can depend on the beam position on the grating.The goal of Spectractor is to extract the colour information from this image to obtain the stellar spectrum.
In the case of a monochromatic point source, we have C 0 (r, λ) = A δ(λ−λ 0 )×δ (r − r 0 ), with A the received source flux at λ 0 .The PSF describes the optical response of the telescope and of the atmospheric turbulence on a sensitive surface such as a CCD (see Figure 2 (a)).It depends a priori on the wavelength λ and can be modelled by a function φ 0 (r, λ) whose spatial integral is normalised to one.Therefore, by definition of the PSF, the image of a monochromatic point source centred on r 0 on the CCD can be described as a convolution product, With a slitless spectrograph, the mechanism is the same, but the incoming light is dispersed in several diffraction orders p, and light from all diffraction orders of the sky background is superimposed.The position of the point-source image depends on the wavelength and on the order p.Moreover, the shape of the PSF itself can depend on the order p and on the wavelength λ.
Here and everywhere else, the dispersed-imaging PSF integrates both the seeing and the instrumental PSF.We introduce the dispersion relation ∆ p (λ) as the 2D vectorial quantity that describes the position of the PSF centroids x c,p (λ) − x 0 , y c,p (λ) − y 0 on the CCD with respect to the zeroth-order position (x 0 , y 0 ), for a diffraction order p and a wavelength λ.This quantity can be computed by applying the classical grating formula 1.With a pointlike monochromatic source, the image recorded on the CCD is modelled as with ∆ 0 (λ 0 ) = r 0 and A p the flux density at wavelength λ 0 for the order p.On the image, we expect a series of spots, one per order p, of different intensities, with different sizes, but containing the same spectral information S * (λ 0 ) about the source.The observed sources are naturally polychromatic.For pointlike sources, the theoretical description above holds at each wavelength, and the image can be described as with S p (λ) = T inst,p (λ) T atm (λ|P a ) S * (λ).( 9) Therefore, the spectrogram of a polychromatic source can be viewed as a stack of monochromatic images with different centroids, or as a dispersed image with a very chromatic PSF (see Figure 2 (b)).This description of the image and of the spectrogram formation is the base of our forward model for slitless spectrograph data.
To obtain the S p (λ) spectra, a process to un-stack the monochromatic images spread across the image by the disperser is needed (Figure 2 (c))).This in turn requires that ingredients such as the PSF φ p (r, λ) and the dispersion relation ∆ p (λ) are sufficiently well known, either a priori, through a specific data analysis, or directly fitted on data.The hardest point usually is the determination of the PSF kernel as a function of wavelength.As illustrated in Figures 1 and 2, in the general case, the PSF is blurred and defocused.A simple Moffat profile can be used to model the PSF as long as the defocus is small; we used this approximation throughout this paper and leave the general defocus case for future work.Nevertheless, we studied and discuss two cases: The use of a standard grating, and the use of an holographic disperser that limits the defocus.
In this paper, we describe our implementation of this process in the form of the python3 Spectractor code Neveu et al. (2021) (see Section 3).We also show how we tested it on simulations and data and how it could be used in order to ingest lacking modelling information from specific data analysis.

Spectractor
We call Spectractor2,3 the computer suite we wrote to analyse the future AuxTel images and the images obtained on the Cerro Tololo Inter-American Observatory (CTIO) 0.9 m telescope.It was trained on CTIO data, but with the purpose of being easily configurable for slitless spectrophotometry with other telescopes.The main steps, inputs, and outputs of the extraction part are illustrated in Figure 3 and are described in detail in Section 3. To facilitate reading of the following, we summarise the main Spectractor steps below.
Zeroth-order centroid The main inputs of Spectractor are a pre-processed image (overscan subtracted, debiased, and spatial flat fielded) obtained with a slitless spectrograph, and a configuration file setting the main geometrical and spectrographic properties of the instrument (D CCD , N, telescope diameter, pixel size, the PSF model, etc.)At the time of writing this paper, the PSF models we implemented were either a Gaussian profile, a Moffat profile, or a Moffat minus a Gaussian profile.Additional and more detailed PSF models are planned to describe the AuxTel data more accurately as they are analysed.The centroid of the zeroth order is contained in the image.We thus implemented a searching procedure to inform us on the origin of the location of the spectrum and set the zero of the wavelength scale.
Rotation Because of the geometry of the spectrograph and the dispersion properties of the grating, the spectrogram was cropped from the image and was de-rotated so that the dispersion axis was along the horizontal axis of the cropped image.The rotation angle is fitted later in the full model without explicitly rotating the image.
Geometric wavelength calibration On this image, a first fit of a 1D sliced PSF model transverse to the dispersion axis is performed.The PSF shape parameters are represented by a polynomial function4 as a function of the distance to the zeroth order.

PSF deconvolution
The procedure is continued by a deconvolution that uses a 2D PSF model and the 1D result as a prior to regularise it.
Wavelength calibration A first wavelength calibration is performed using the detection of the principal absorption or emission lines in the extracted spectrum (astrophysical or telluric lines).

Flux calibration
The spectrum flux in ADU is converted into erg/s/cm 2 /nm units using the telescope collecting area, CCD gain, and exposure time.
Full forward model Finally, given all these prior ingredients, a full forward model is initiated on the pre-processed5 but not rotated exposure using a 2D PSF model, and a model for the atmospheric differential refraction (ADR)6 .The PSF shape parameters as well as the wavelength calibration are refitted in the process.The main output is a calibrated spectrum in wavelength and amplitude, but Spectractor also returns a host of useful fitted parameters such as the PSF shape chromaticity or the D CCD distance to perform extraction quality analyses, and in the end, to improve the forward modelling or its initialisation.
All steps but the last are implemented in order to provide the required ingredients to the full forward models, and they are thus completely contingent to the availability of understanding the instrument.This is again be addressed below when we discuss how the optical transmission of the instrument can be obtained.

Data examples
While we use the natural ability of the forward-model implementation to easily provide simulations to validate the code, we also present the use of Spectractor on real data.

CTIO data
In order to test the approach of slitless spectroscopy, in particular, using a holographic disperser, we benefited of a run of 17 nights in May-June 2017 at the Cerro Tololo Inter-American Observatory (CTIO) 0.9 m Cassegrain telescope ( f /13.7, scale at focal plane 60 µm/ ).This telescope is equipped with a cooled Tek2K CCD device of 2048 × 2046 pixels, read by four ampli- fiers 7 .Two filter wheels are installed.The first wheel in the light path was used to insert broad-band filters, and the second wheel holds different dispersers.While many gratings were tried, we focused on two of them here: a Thorlabs blazed grating (300 lines/mm) ref.GT50-03, and an amplitude holographic optical element (around 350 lines/mm) especially designed for this telescope.This hologram is fully described and analysed based on the CTIO data in Moniez et al. (2021).Its main advantage is that the defocusing described in Figure 1 is very limited, which allowed us to model its chromatic PSF with simpler mathematical models.
By using these dispersers in the upstream filter wheel, we readily transformed the CTIO 0.9 telescope into a spectrophotometric instrument with a resolution of about 150 -200 (Moniez et al. 2021).
Figure 4 shows an example of the data we obtained: The dispersion axes are nearly horizontal along the x-axis of the CCD, and for an optimal focusing of the amplitude hologram, the target star was placed around pixel coordinates (750, 700).The spectrum covers two amplifiers.Field stars are present, and the sky background is also dispersed (brighter in the middle).Dome flats were taken with two different filters, blue (λ < 550 nm) and red (λ > 715 nm).We saw no difference in the extraction of the spectra using one or the other.In order to be more precise about the extraction of absorption bands in water in the red part of the spectrum, we chose to use only the red flat to flatten out the pixel-to-pixel fluctuations in our exposures.Combined bias was taken at the beginning of each night for the bias subtraction.We made sure that the meta data contained informations about the properties of the CCD and the on-site meteorological station.
We mainly analysed the performances of the holographic element we brought during this run.Fortunately, we had one very good night on 2017 May 30 with very stable conditions in terms of temperature and seeing, which we used to estimate the atmospheric transmission.During that night, we essentially mon-7 https://noirlab.edu/science/programs/ctio/instruments/Tek2K itored the CALSPEC8 (Bohlin et al. 2014;Bohlin et al. 2020) star HD111980 with an amplitude hologram.The main characteristics of these data are summarised in Table 1.

Simulations
To test the Spectractor pipeline, we used the full forward model for CTIO spectrograms (see section 3.8) to simulate observations of CALSPEC stars (in particular, HD111980).
A&A proofs: manuscript no.main The simulation used in Section 3 shares the same known characteristics as the real-data image presented in Table 1, but with variations in the unknown parameters such as the PSF model, the amount of second-order diffraction contamination, and the atmospheric parameters.
For the simulations, a 2D Moffat circular PSF kernel φ(x, y, λ) was chosen.To model the widening of the PSF due to defocusing or chromatic seeing effects, the shape parameters γ and α evolved with wavelength as a n PSF order polynomial function, The integral of this PSF kernel is exactly A, and its centre is at r c = (x c , y c ).The PSF shape parameters (γ(x), α(x)) are themselves sets of polynomial coefficients γ i and α i , respectively.The L i (x) functions are the order i Legendre polynomials.We call x min and x max the left and right pixel positions of the spectrogram edges on the x-axis.The parameter z ∈ [−1, 1] was rescaled proportionally on the desired pixel range [x min , x max ], set to approximately encompass the wavelength range [350 nm, 1100 nm] with the formula Parametrisation with Legendre polynomials has the advantage to give an equal weight to all polynomial coefficients during χ 2 minimisation, regardless of the degree of the polynomial functions.The chosen n PSF , γ i , and α i values are quoted for each simulation.The simulation suite is fully available in the Spectractor code.

Forward-model extraction of a spectrum
In this section, we follow and describe in detail the steps of the Spectractor pipeline in order to obtain the first-order spectrum S 1 (λ) of a point source, calibrated in flux and wavelength.These steps cover the orange boxes of figure 3. The process starts from a preprocessed image containing the 2D image that is formed by a star observed through a slitless spectrograph, which we call a spectrogram.

Uncertainty evaluation
When it is not given, we must start to build the uncertainty map of the exposure when the exposure is pre-processed.The uncertainties on the pixel values are estimated using the CCD gain G CCD (x, y) (in electrons/ADU) and its read-out noise σ ro (in electrons).Other sources of noise, such as those coming from the flat-fielding or the dark current, are subdominant.The correlations between pixels are also negligible.The exposure unit is considered to be in ADU at that point.The uncertainty σ(x, y) on the pixel value I(x, y) is then because we assume that the number of photoelectrons in each pixel follows a Poisson distribution.The uncertainty map can be inverted to obtain a weight map, on which we can superimpose a mask to remove bad pixels.Assuming no correlations between pixels, we then assemble the weight matrix W as the diagonal matrix of the inverted pixel variances.
The computed noise variance uses the pixel value itself, which incorporates the noise fluctuation.Using weights incorporating the fluctuations can introduce a bias on the recovered quantities because in the case of positive fluctuation, the weight is increased, while in the case of negative fluctuation, the weight is decreased.The bias is smaller with spectrograms with a high signal-to-noise ratio (S/N).For this reason, we only considered the case with a high S/N in simulations and data, and we avoid the bias due to this uncertainty evaluation (even at the spectrum edges, where flux is low).The case with a low S/N is postponed for future developments.

Zeroth-order centroid
Because the zeroth order is included in the observed images, we used its centroid to set the zero of the wavelength scale.Therefore, an error in the determination of the zeroth-order centroid (x 0 , y 0 ) causes a systematic shift of the wavelength calibration.
A subset of different situations encountered in the CTIO data is presented in Figure 5.To obtain a high S/N in the spectrogram, the exposure time was set at such a value that the zeroth order is saturated, causing bleeding spikes.If the exposure has astrometric coordinates in the World Coordinate System (WCS), then we can obtain the precise position of the star on the CCD.However, in most images we obtain from CTIO, the WCS associated with the images is incorrect because the mount calibration is not correct.
While not directly part of the extraction procedure, we present the algorithm we implemented to find the zeroth-order centroid in these difficult cases as a useful example of how to improve the starting point from a preliminary analysis of the data.
First, the image is cropped around the supposed position of the zeroth-order image, as close as needed so that the targeted star is the brightest object.Then the cropped image is projected onto the x-and y-axis: The maximum of the two projections sets a new approximation of the zeroth-order position.From there, saturated pixels are detected, and a null weight is associated with them.A 2D second-order polynomial background with a 3σ outlier removal is fitted and subtracted from the cropped image.Finally, a 2D circular Moffat profile is fitted on the weighted pixels: Only the crown of non-saturated pixels counts and locks the fit.This last step is then repeated a second time on a new cropped image for which the width and height are divided by two, centred on the last fitted centroid.This step is illustrated in Figure 6.We tested this process on many images, most of which were pathological, and we visually confirmed that the accuracy of this algorithm is finer than the pixel size on CTIO images.Another issue we solved is that when a disperser is added to the telescope beam path, the WCS associated with the image can be shifted or distorted.In the case of a crowded field, for faint objects, or for a very pathological zeroth order, we developed a method for estimating the WCS using the field stars, the library astrometry.net 9 (Lang et al. 2010), and the Gaia DR2 catalogue.The process is described in the appendix A, but it was not used by default for the CALSPEC stars we observed.However, when the centroids obtained with both methods are compared, their difference has a scatter of 0 .15 (around one-half of a pixel) on CTIO data, which confirms that both methods are accurate at the 0 .15 level at least (Figure 7).
This accuracy is converted into a prior on the zeroth-order shift δu 0 used during the wavelength-calibration process to account for a mistake in the evaluation of the zeroth-order centroid.

Rotation
Unless special care has been taken to that end when mounting the disperser in the filter wheel, the spectrogram image can be tilted (possibly intentionally) with respect to the x-axis, with an angle, which can be poorly known depending on the mounting of the disperser into the telescope beam.
This dispersion direction can either be known a priori or fitted in the full forward-model step.We needed a supplementary step to estimate a good starting point for this angle because our case is the latter case.
In addition, we found it extremely useful that the spectrogram was roughly aligned with the x-axis of the exposure in our 9 http://astrometry.net/doc/readme.htmlprocedure, with the wavelength increasing with x.To this end, the exposure must be flipped and rotated accordingly before the process of extracting more information could be continue.That was useful both for diagnosing the data quality and for refining the starting point of the full forward model.
The spectrogram of sources that are sufficiently continuous in wavelength, such as the thermal emission component of stars, displays filament shapes on the 2D image that can be detected using a Hessian analysis inspired by the one developed in Ade et al. (2016).The advantage of this technique is that it comes with an analytical expression of the angle of the detected shape with respect to the horizontal or vertical axis of the CCD grid.
The Hessian matrix H(x, y) of the image is computed for each pixel value I(x, y) as The two eigenvalues of the Hessian matrix H are calculated as The eigenvalue λ − is associated with the eigenvector directed along the spectrum dispersion axis, and λ + corresponds to the eigenvector with the largest change in intensity value, that is, transverse to the dispersion axis.The orientation angle of these eigenvectors with respect to the x-axis can be computed analytically.For instance, we have for λ with the trigonometric formula tan 2α = 2 tan α/(1 − tan 2 α).After selecting the 5% pixels with the highest λ − value above a reasonable threshold, the median α of the remaining α(x, y) values gives the orientation of the spectrum with respect to the x-axis.
A linear fit can also be performed across the selected pixels, and the slope gives an angle very close to the one estimated with the median of the angle values.This process is illustrated Figure 8.Because of the atmospheric differential refraction (ADR), the spectrogram can be sheared transversally to the dispersion axis.The ADR shear is about 2 across the visible spectrum, which is 5 pixels at CTIO, while the spectrogram is ≈ 700 pixels long, and it is thus neglected at this step of this analysis.On the other hand, it is fully accounted for in the full forward model.

Background estimation
The background of the image must be carefully subtracted to avoid bias in the estimation of the spectrum flux and its PSF.However, due to optical vignetting, it can be non-flat, and it is dispersed.It can also contain additional field stars and their corresponding spectrograms.
To estimate the spectrogram background, we first selected two lateral bands above and below the spectrogram region that had the same length and width N (bgd) y (see Figure 9).First, we masked the sources detected above a 3σ threshold.Then we divided the two lateral regions in a few square boxes of size  small enough to account for the spatial variations in the background.
To obtain a first estimate of the background, we used a Python wrapper of the SExtractor10 algorithm for background extraction, which is roughly based on the bilinear interpolation of the median value inside the boxes, after a sigma-clipping rejection of the outlier pixels (more details in Bertin & Arnouts 1996).This process is illustrated in Figure 9.
In a second step, we analysed the distribution of the background residuals normalised with their uncertainties in the two regions, with the sources being masked.When the histogram of the residuals normalised by their errors (the pull distribution) departs from a distribution of zero mean and standard deviation equal to 1, we refined the background estimate by dividing the box size by 2, and the process was continued iteratively until the mean was below 1 and the standard deviation was below 2, or until the box size decreases below a threshold of 5 pixels.
At the end of this process, the estimated background was interpolated between the two lateral bands to obtain the background below the spectrogram, and it was finally subtracted.We call B(r) the background map.The background RMS was also evaluated by SExtractor and was added quadratically as a background uncertainty to the error budget of the spectrogram.
At this stage of the pipeline, with α and r 0 given and using a first geometrical wavelength calibration to approximately define the left and right margins of the spectrogram, we cropped the exposure to extract a background-subtracted spectrogram.This is extremely useful for diagnosis purposes and can also be directly studied with an atmospheric forward model (see Section 5).

First spectrum extraction
The next step of the pipeline we implemented is devoted to extracting the first-order spectrum S 1 (λ) and estimating the wavelength-dependent PSF.The extraction of the spectral information S 1 (λ) from the spectrogram image is a delicate process.The intuitive and traditional way to extract slitless spectra at this stage of the pipeline is to sum over the cross-dispersion direction, possibly with weights, to form what we call a crossdispersion spectrum.Horne (1986) and Robertson (1986) presented an optimal method to achieve this.However, this method leads to distorted spectra in case of a wavelength-dependent PSF (neighbouring wavelengths contaminate each other on the sensor), which is not an issue for spectroscopy, but is problematic for spectrophotometry.In the following, we first describe a method for deconvolving the spectrum for the PSF (Section 3.5).The products of that process are then very useful to inform the full forward model, finalising the unbiased extraction of the spectrum (Section 3.8).

First-order model of the spectrogram
To start, we only consider the spectrogram of the first diffraction order, potentially with the superposition of a second diffraction order, as provided by the previous step of the pipeline.The cropped spectrogram has the shape (N x , N y ) pixels.
Inspired by equation ( 7), we model the spectrogram as a discrete stack of N x 2D PSF realisations of amplitude A i , separated by one pixel along the x-axis, with r = (x, y) the vector of the pixel coordinates, A the amplitude parameter vector along the dispersion axis of the spectrogram, and φ(r|r c,i , P i ) the 2D PSF kernel whose integral is normalised to one.This kernel depends non-linearly on the shape parameter vector P i and on the centroid position vector r c,i = (x c,i , y c,i ), where only the y c,i coordinate is considered unknown.The x c,i coordinate is set directly to the pixel index i.This choice of implementation can be discussed and changed, but it was found to be practical because the PSF is then well sampled by the pixel grid.In theory, we could choose another sampling for x c,i , however, to increase the speed of the spectrum extraction or to enhance the spectral resolution.
In some way, the array of vectors r c is a sampled precursor of the dispersion relation ∆ 1 (λ), and the vector A is the flux density S 1 (λ) integrated within the pixels.When we index all the N x N y spectrogram pixels as a long vector the equation 18 takes a matricial form, with The (N x N y , N x ) matrix M is called the design matrix.This model of the spectrogram is designed to deconvolve the spectrum S 1 (λ) from the PSF.In principle, we can choose to sample it with PSF kernels separated by arbitrary distances.However, if the PSF is correctly sampled by the pixel grid, it is difficult to extract the spectrum with a spatial resolution below the typical PSF width, and therefore, below the pixel size.
At this stage, we wish to extract the spectrum from a spectrogram that is potentially contaminated by higher diffraction orders, as yielded by the pipeline steps discussed above, or that is distorted by atmospheric differential refraction.The final extraction using a full forward model that takes these physical effects entirely into account is the last step of the Spectractor pipeline.This is presented in section 3.8.

Preparation for deconvolution
To initialise the deconvolution with parameters close to the final best model, a first PSF fit is performed transversally to the dispersion axis with 1D PSF kernels on the rotated spectrogram image, This procedure is done in two steps.The first step fits the 1D parameters independently for each pixel column by applying a 5σ clipping to reject field stars and other CCD defects.In the second step, a polynomial evolution of the 1D PSF parameter vector along the dispersion axis is proposed.For instance, a polynomial evolution of the y c,i (x c,i ) positions can model the transverse ADR, and a polynomial evolution with the width of the PSF can model a defocusing effect.The polynomial coefficients are fitted using a Gauss-Newton minimisation of a χ 2 (equation 23) for the non-linear parameters (see appendix B), alternating with a linear resolution for the amplitude parameters as follows.
Assuming that we gather all the spectrogram pixel values in a long vector D (as Z), we can model it as with the random noise vector.The χ 2 function to minimise is with W the weight matrix of dimension (N x N y , N x N y ), which is the inverse of the data covariance matrix (see section 3.1).In most cases, this matrix is diagonal because the pixels are all considered to be independent.The minimum of equation 23 is reached for the set of amplitude parameters Â given by The covariance matrix associated with the Â coefficients is At the end of this process, we obtain a first guess of the r c and P parameters and a first guess of the amplitudes Â(1D) with their uncertainties σ A (1D) , which form what we call a transverse cross-spectrum.
The result of this extraction is illustrated on simulated data in Figure 10 left.For this simulation, we chose to ignore the second-order diffraction, and we used a wide PSF kernel with strong chromatic variations to enhance the visibility of the residuals for the 1D extraction: (γ 0 , γ 1 , γ 2 ) = (10, 2, 5) and (α 0 , α 1 , α 2 ) = (2, 0, 0).At first glance, the model in the top panel appears to be accurate, but the residuals show that this 1D model failed to capture the true evolution of the PSF shape with wavelength.As expected, the residuals are less dramatic with a thinner PSF, but remain measurable.This shows that a 2D PSF extraction method has to be used.
At this stage, we obtained a spectrum close to a crossdispersion spectrum as in Horne (1986) and Robertson (1986), but informed with a fitted PSF model with a smooth polynomial wavelength evolution.

PSF deconvolution
Having provided ourselves with this first 1D estimate of the spectrum, we can resort to a more accurate 2D PSF modelling.When using 2D PSF kernels, the latter linear regression method enters the category of the deconvolution problems or inverse problems.Because the spectral amplitude information is mixed and diluted at a scale below the typical size of the PSF, the computation of the Â vector sampled at the pixel scale inverting equation 19 using 2D PSF kernels may lead to results that are far from the reality, while giving an apparently good fit to data (low χ 2 ).When this is done, a common symptom is the alternation of positive and negative values in Â, or at least with large variations, demonstrating that the problem is ill-posed.As we know by the physics that well-sampled spectra are rather continuous and differentiable functions, we enforce a regularisation method to smooth the resulting Â vector.
A first fit to the data, without any prior on A, using a 1D transverse PSF model fitted independently to each column of data along the dispersion axis thus yields a vector Â(1D) that contains most of the spectral flux, especially in the smooth parts of the spectrum, but lacks precision in the rapidly varying parts.The most visible effect of the PSF is to smooth the absorption lines, and more generally, to deform the spectral information where the spectral energy density evolves rapidly (e.g. in the blue part of the spectrum), while conserving the total flux.It nonetheless provides useful information that we use as a prior A 0 on A when performing a fit using a 2D PSF kernel with a Tikhonov regularisation.
The Tikhonov regularisation method proposes to add a regularisation quantity to the χ 2 and minimise a new cost function, where Q is the weight matrix.The last term favours Â to be close to prior vector A 0 , with a positive regularisation hyper-parameter r.
Minimising this E( A|r c , P) function is still a linear regression for the A parameters, whose optimal value now is The covariance matrix associated with Â directly is We tested different Q matrices on CTIO simulations, and the matrix that gives the most satisfying results when Â is compared to the true amplitudes uses the Laplacian operator L, with U T U the matrix proposed in equation C.5, The total variation regularisation is known to be able to retrieve very sharp features (e.g.steps or edges) when deconvolving an image.The Laplacian regularisation cannot do the same, but discontinuities are not expected in the physical spectra we observe.Moreover, the norm-2 regularisation offers analytical solution, while a norm-1 regularisation needs to conduct an iterative minimisation process (more details in appendix C).
For the 1D transverse estimate, the deconvolution fit is performed using a Gauss-Newton minimisation of E( A|P) for the non-linear P parameters (see appendix B) and a linear resolution for the A parameters (see section 3.5.2).The Gauss-Newton minimisation is repeated three times with a clipping rejection of bad pixels to remove field stars or CCD defects that can pull the final parameters in undesired directions.At the end of this process, we obtain the measured rc and P parameters, and a measurement of the amplitudes Â(r) with their covariance matrix C(r).The vector Â(r) forms the searched spectrum, but it might be contaminated by the second diffraction order at this stage.To accelerate the computation, regions farther away than twice the PSF FWHM from the centroids are masked and set to zero with null weights (the grey regions in Figure 10).For this process, a default value of the r regularisation parameter was chosen, without fully exploring how it could be optimised.
The PSF deconvolution problem was solved in another way in Ryan et al. (2018) for slitless spectroscopy.Instead of using a PSF model, the mathematical model was inverted using multiple exposures of the same spectrum, ideally taken at different orientations, dispersion directions, or dithered positions.In this way, the number of data leads to an invertible problem, but a damping hyper-parameter equivalent to r still needs to be introduced in equation 25.Ryan et al. (2018) tuned this hyper-parameter manually and determined the breaking point of a specific L-curve (see Hansen 1992).This method is mathematically similar to the one we used (described in Section 3.5.4)because it leads to an equilibrium between information from the prior and data.

Optimisation of the regularisation
The correct amount of information for which the data can be searched might be thought hard to determine.It may seem hopeless to find information at a spatial scale below the typical width of the convolution kernel with a unique spectrogram.Therefore, we searched for the way to determine the optimal hyperparameter r.
The regularisation parameter r is optimised via the study of the resolution operator with I the identity matrix.A fruitful interpretation of the R operator is given in Hansen (2010) with Tr I = Tr R + Tr (rCQ) ⇔ # parameters = # parameters resolved by data + # parameters resolved by prior , where the trace of the resolution operator gives the effective number of degrees of freedom that can be extracted from the data for a given amount of prior information.We set The optimal r parameter is found by minimising the following G(r) function, which behaves like a reduced χ 2 , This method, known as general cross-validation (GCV), is extensively presented for example in Golub et al. (1979); Wahba (1990); Hansen (2010).It is demonstrated that the minimum of G(r) corresponds to the minimum of the distance |M Â − M A truth | 2 , where A truth is the true amplitude vector hidden in the data.
We tested different ways to implement the optimisation of the regularisation process in Spectractor .The most efficient result (in terms of rapidity and bias of the final result) is obtained by setting a default reasonable regularisation parameter for the fitting procedure of the amplitude Â and PSF rc , P parameters, and finally to find the minimum of the G(r) function.We observed in the simulations that regardless of the r hyper-parameter that is chosen at the beginning, the process reconstructs an unbiased spectrum at the end of the E( A|r c , P) minimisation.The level of regularisation of the solution can thus be set a posteriori by finding the optimal r after minimising the G(r) function.
The result of a 2D deconvolution process is presented in the right panel of Figure 10 for a simulation with a wide PSF kernel, but without second diffraction order to stress the benefit of the deconvolution.The residual map between the best-fitting model and the simulated spectrogram shows that the 1D transverse fit cannot extract the spectrum from the spectrogram image correctly (Figure 10 left), while the 2D deconvolution ends with a quasi-unstructured residual map that respects the expected Gaussian distribution (Figure 10 right).Because our model is informed by a PSF model, we are able to extract spectra with a single exposure, which involves rather small matrices compared with Ryan et al. (2018).This allows us to tune the r hyperparameter automatically in a few seconds with a standard laptop.
If the spectrogram is not fully contained in the sensor area, the spectrum exhibits a discontinuity that causes the norm-2 regularisation to fail (the second derivative from the Laplacian operator is undefined).For a given instrumental PSF, it is therefore better to use a more dispersive grating to feed the deconvolution with more data and increase the wavelength resolution, but the spectrogram must fit inside the sensor area for regularisation techniques to be use.

The spectrophotometric uncertainty principle
The regularity of the deconvolved solution depends on the hyperparameter r.The optimal r parameter is chosen as the minimum of the G(r) function represented in the top panel of Figure 11 for a simple simulation with n PSF = 0, γ 0 = 5, α 0 = 3 (and the same characteristics as in Table 1), The second panel displays the χ 2 ( Â(r)|r c , P) function, which shows that the optimum A(r) solution does not minimise the agreement with the data (minimum χ 2 ) , but makes a compromise with a regularisation scheme (modelled by the χ 2 pen ( A| A 0 ) penalty term).The lower panel illustrates that the effective number of amplitude parameters fitted by data with the optimum regularisation hyper-parameter is approximately 180.For this simulation, about 680 amplitude parameters were fitted in a spectrogram built with a constant PSF FWHM of about 5.5 pixels.Intuitively, we can conjecture that an optimum relation must exist between the typical width of the PSF kernel and the amount of information that can be searched for in data, If too many parameters are searched for, the problem becomes ill-posed.If too few parameters are searched for, it should be compensated for with a wide PSF kernel.We thus postulate that we should have a spectrophotometric uncertainty principle of the type where the optimum is reached at equality, and h has the same units as σ PSF the width of the PSF kernel.This formula gives the minimum number of the degrees of freedom required to describe data given a PSF width.We tested the deconvolution and regularisation process on a large number of simulated spectrograms with a constant width (n PSF = 0), but without second diffraction order.For the sake of simplicity, we tested this without any loss of generality.We tried a Gaussian PSF kernel and a Moffat kernel with two different exponents (α 0 = 2 and 3) and various γ 0 values.The results are summarised in Figure 12.For the three models, σ PSF was chosen to be half of the PSF FWHM.The figure shows that for any PSF kernel, the measure of the number of degrees of freedom N dof /N x scales as the inverse of the PSF width.They show a definite trend for the product σ PSF × N dof /N x that appears to be asymptotically constant and equal to the number h close to 0.8 pixel when the PSF size is significantly greater than a few pixels.This h value varies with the S/N of the spectrogram, but for a situation, it locks the relation between σ PSF and N dof .This flatness of the relation shows that our procedure agrees when the extraction of information at the scale of the PSF kernel is considered optimal.
It is also noteworthy that this equation could be exploited to accelerate the computation of the PSF cubes: Instead of computing a PSF kernel for each pixel column i, we could compute it for each N dof /N x pixel.Because the computation times were not an issue in this paper, we left this investigation for a future project in which computation speed counts.

Wavelength calibration
Despite the astigmatism of the system, to first approximation, the slitless spectrograph obeys the usual grating formula (Eq.1; see e.g.Murty 1962;Hall 1966;Schroeder & Inc 2000).Using the notations of Figure 1, the grating formula can be inverted to find the relation between the u coordinate along the dispersion axis and λ.
First, we assume that the true zeroth-order position is at u 0 along the dispersion axis, but that a misfit of its centroid (see Section 3.2) can shift the position by a quantity δu (fit)  0 .The ADR also slightly spreads the zeroth-order image along the local constant azimuth line in a deterministic way depending on the airmass and the spectrum of the source.It also depends on the parallactic and camera angles, the atmosphere temperature, pressure, and humidity.It is incorporated in the wavelength calibration process as a wavelength-dependent shift δu (ADR) (λ) of the PSF centroid position along the dispersion axis with respect to a reference wavelength λ ref : δu (ADR) (λ ref ) = 0. Top: Global χ 2 (δu (fit) 0 , D CCD ) function and its minimum (black circle) for the planetary nebula PNG321.0+3.9 observed at CTIO.The sharp steps at the top of the plot arise when certain lines are detected.Bottom: Calibrated spectrum of PNG321.0+3.9.The vertical lines indicate emission or absorption lines that are detected, positioned at their tabulated values.The dashed blue lines show the fitted background (whose degree depends on its length), and the plain blue lines show the Gaussian profiles fitted on data.
We model this effect using the NIST metrology toolbox 11 recommendation of using a modified version of the Edlén equation (Edlén 1966) by Birch and Downs (Birch & Downs 1993;Birch & Downs 1994) (see appendix D).
The distance d(λ) between an abscissa of the spectrogram and the zeroth order then reads so that The bijection between the position on the CCD and the wavelength is thus parametrised with two unknown parameters, D CCD and δu (fit) 0 , that need to be fitted.As a starting point, we compute a first wavelength array λ 0 from the array of distances d to the zeroth order along the dispersion axis, assuming δu 0 = 0 and given a prior value of D CCD .To obtain a wavelength array given D CCD and δu 0 , equation 42 11 https://emtoolbox.nist.gov/Wavelength/Documentation.asp is inverted as To remove the ambiguity with ADR, which also depends on wavelength, we iterate this computation five times starting from λ 0 and updating λ.We verified that it is enough to converge toward a stable wavelength solution.In these steps, we associated a wavelength array λ with the amplitude array A. From this calibration, λ ref is computed as the mean wavelength weighted by the spectrum A itself, in order to associate the maximum amplitude of the zeroth order with its brighter wavelength.With the fit of δu (fit) 0 and this setting of λ ref , we ensure that we are not sensitive to the slight dispersion of the zeroth order itself.
The parameters D CCD and δu (fit) 0 are fit on data using the most prominent absorption (or emission) lines on the observed stellar spectrum (typically, the hydrogen lines Hα and Hβ, and the dioxygen lines at 762.1 nm and 686.7 nm).They are locally fitted with a polynomial background plus a Gaussian profile of unknown height, centroid, and width.A partial χ 2 quantity is computed for each spectroscopic line and added into a global χ 2 .
A penalty defined as the squared distance between the fitted Gaussian centroids and the tabulated values for the detected lines, weighted by the squared S/N, is then added to the χ 2 .
Finally, the full χ 2 and its penalty are normalised to the number of detected lines.This normalisation of the global χ 2 is necessary to avoid solutions that favour a lower number of detected lines, while the penalty gives more weights to the well-detected lines and anchors them on their tabulated values.The two parameters δu (fit)  0 and D CCD are varied to minimise the penalised global χ 2 and find the best solution for the wavelength calibration.
The result of this process is illustrated in Figure 13.At the top, the global χ 2 is represented for the wavelength calibration of the planetary nebula PNG321.0+3.9 observed at CTIO.The sharp steps at high or low D CCD values reflect situations in which some emission lines are detected or are not detected, which emphasizes the need to normalise the global χ 2 by the number of detected lines.The smoothness around the minimum is due to the penalty term on δu (fit)  0 .In the calibrated spectrum, we can observe many emission lines that have been detected by the algorithm, and a good alignment between the tabulated values (represented by the vertical lines), the extrema of the Gaussian profiles, and those of the data curve.A summary of the detected lines with high S/N is presented in Table 2, reporting their fitted wavelength and their equivalent width (EQW).Notes.The third column gives the centroid of the fitted Gaussian profile, and the fourth column lists the distance to the tabulated value.The EQW is reported in the last column.
After the full forward-model fit

Flux calibration
With the fitted wavelength solution, the spectrum amplitude Â can be converted from ADU units into flux densities in erg/s/cm 2 /nm, assuming that the telescope collecting area S T , the exposure time τ, and the CCD gain G CCD (in e − /ADU) are known, where δλ is the local variation in λ within one pixel.
The end product of this pipeline is thus a backgroundsubtracted spectrum, calibrated in wavelength and flux, that is the product of three quantities: the object SED, the instrumental transmission, and the atmospheric transmission, which might be contaminated by a second diffraction order because this latter has not yet been taken into account.

Full forward model of the spectrogram
At the end of the previous steps, we also have first estimates of the two instrumental model functions φ 1 (r, λ) and ∆ p (λ), and of geometric parameters such as the zeroth-order position r 0 and the dispersion angle α.With all these ingredients, we can implement a full forward model of the data that also takes the atmospheric differential refraction (ADR) and the superposition with the second diffraction order (last stage of Figure 3) into account.
In practice, we enrich the forward model described in the steps above with the knowledge of the ADR physics (see Sections 3.6 and D) and with the knowledge of the second-order to first-order transmission ratio r 2/1 (λ) of the spectrograph disperser.The ADR model replaces a polynomial approach to predict ∆ p (λ).In other words, it is used to predict the trace of the spectrogram on the sensor without free parameters, as long as airmass, outside pressure, outside temperature, and humidity are given.With this, two free parameters remain for the spectrogram trace on the sensor to be fully constrained: the dispersion axis angle α and δy (fit) , which compensates for a misfit of the zerothorder centroid along the y-axis.The ratio r 2/1 (λ) can be measured on an optical test bench (see Section 5.1) or on on-sky data.In the full forward model, we use a new design matrix M that is defined as where A 2 is a safety normalisation parameter, R 2/1 is the transmission ratio vector computed for a given wavelength calibration, r p=2 c are the centroid positions of the second diffraction order PSF kernels, and P p=2 are their shape parameters.The vector r p=2 c is computed using the grating formula 1 and the ADR model.P p=2 can be fitted independently of P p=1 , but we chose to assume that the PSF shape depends more on the spectrograph defocusing towards the infrared than on the atmospheric chromatic seeing.The PSF shape parameters for the second diffraction order are thus considered the same as for the first diffraction order at the same distance of the zeroth order.We chose to set the P p=2 vector accordingly12 .Therefore, the full forward model now includes both first-and second-order spectrogram models as We implemented a two-step iterative method that alternates the wavelength calibration described in Section 3.6 and the full forward model described just above (by combining a linear fit for the A spectrum amplitudes and a Gauss-Newton descent for the non-linear parameters P) with the same r hyper-parameter that was fitted (in Section 3.5.4).The A parameter vector determined with PSF 2D deconvolution previously (Section 3.5.3) is seeded in the forward-model fit as a new prior A 0 .A 20σ clipping is performed to reject the field stars and their concomitant spectrograms as well as other sensor defects.This procedure ensures that all the forward-model parameters are fitted again together on data, using the more complete model including A, P, D CCD , δu (fit) 0 , and A 2 .Their values replaced all those that were fitted previously.The residual map obtained in this way is flatter than before, with an even smaller final χ 2 , and consequently, with a better accuracy of all the fitted parameters.
The two main products of this step are a first-order diffraction spectrum S 1 (λ) separated from the second-order spectrum, and the second-order spectrum S 2 (λ) with in erg/s/cm 2 /nm (following Eq. ( 44)).At this point, we consider the second diffraction order not as a nuisance, but as a useful signal.With a strong bending due to ADR (e.g. with the dispersion axis orthogonal to the zenith direction), it can be detached from the first diffraction order on purpose to maximise the statistical power of the exposure.
In summary, a full forward model takes advantage of the higher diffraction orders as a redundant piece of data to fit all parameters, especially in the bluer part, where absorption lines are twice wider in pixels than in the first-order spectrum.This is a key advantage of the forward approach compared to the direct approach.

Validation on simulations
To test the full forward model, we simulated a spectrogram with a second diffraction order and a sharp Moffat PSF kernel to increase the spectral resolution (see the PSF parameter values in Table 3), whose shape evolves as a second-order polynomial function.Figure 14 compares the simulated data with the fitted spectrogram model and focuses on some atmospheric absorption lines: The residuals follow the expected Gaussian distribution again, even in the fast-varying regions of the spectrum.Notes.Parameters with uncertainties are fitted during the Gauss-Newton descent of the full forward model, while others are just estimated on data.The simulation was done with a spectrum made from 669 amplitude parameters, and the regularisation process recovered 303 of them.The δy (fit) centroid correction is included in the y 0 quoted value.
In Figure 15, we show that the process recovered the true spectrum injected in the simulation within the estimated uncertainties (diagonal elements from the C matrix from equation 28).The agreement is excellent at all wavelengths, even around the fast-varying absorption lines.The right panel of the figure also shows the FWHM of the true PSF.The reconstructed PSF displays the same wavelength-dependent PSF FWHM as the simulated one.It is also remarkable that while the cross-spectrum issued from the transverse 1D fit described in Section 3.5.2failed to recover the true spectrum and the true PSF profile because of the presence of the second diffraction order (orange curves), it still proved to be an important seed for the regularisation pro- (2) (0) (1) (2) Correlation matrix cess because only its regularity is used because of the Laplacian operator L.
The recovered parameters are compared with the simulation values in Table 3.They are fitted together with their uncertainties in the full forward-model minimisation, which provides their full covariance matrix (Figure 16), while the other parameters such as the star centroid are just estimated on data.
The regularisation quantities are given in Figure 17.For a PSF FWHM between 2 to 4 pixels, we again obtain N dof ≈ 300 out of ≈ 700 parameters.This confirms the rule of thumb given by the spectrophotometric uncertainty principle.For spectrograms such as those presented in the previous figures, the endto-end pipeline takes 2 minutes on a standard laptop.
The Spectractor implementation was tested on many simulations and recovered the simulation parameters within the estimated uncertainties (68% confidence interval) for sets of parameters that were not too extreme (smooth wavelength dependence on the PSF, or the PSF kernel sampled over a few pixels).We also evaluated the extraction bias b between the true spectrum given in the simulation S truth 1 (λ) and the extracted spectrum The extraction bias was evaluated with many simulations in different cases in terms of S/N, resolution, and geometry.For the first case, the variation in S/N was simulated by multiplying the simulated spectrum by an arbitrary grey factor A 1 , keeping the image background at the same level.The S/N of the simulation presented in Figure 15 and Table 3 corresponds to A 1 = 1.We found no significant bias in the A 1 > 1 regime (Figure 18), but a small bias appears for spectrograms with a lower S/N at the percent level.This is because a Gaussian model was used in the evaluation of the uncertainty map (see 3.1) while a Poisson distribution is more accurate at low S/N.The variation in spectrogram resolution was simulated by changing the PSF width γ 0 , and a small negative bias appeared for low-resolution spectra (large γ 0 ).These spectra exhibit wider and shallower absorption lines than the true spectrum, which leads to these negative b values, but only at the sub-percent level.However, except for the absorption lines, the overall spectrum shape from blue to red is recovered perfectly.Finally, geometry variations were also simulated using different dispersion axis angles α, but we found no bias.In conclusion, the most important condition for extracting unbiased spectra from slitless spectrophotometry is a sufficient S/N, closely followed by a sufficiently fine spectral resolution.The adequacy of these parameters is easy to test a priori with a forward-model simulation.
For completeness, we represent the reconstructed number of degrees of freedom for all these simulations in Figure 18.As expected, for spectrograms with a very low S/N, this number is close to zero.In that case, the extracted spectra are close to the A 0 = Â(1D) prior spectra.When the signal increases, then N dof strongly increases until it saturates because of pixelisation.Conversely, this number decreases when the PSF width increases because the spectrogram has a lower spectral resolution.

Spectrum extraction on data
The success of the spectrum extraction on data mostly depends on the model of the wavelength-dependent PSF of the telescope.If the PSF model correctly represents the reality, the residuals after the full forward-model fit of the spectrogram converges towards the expected Gaussian distribution.Otherwise, the extracted spectrum is distorted when the PSF is too far from reality.This is illustrated in Figure 19.In the left panels, a blazed Thorlabs grating with 300 lines/mm was chosen to observe CALSPEC star HD111980 at CTIO on 2017 May 30.Directly placed in the filter wheel at D CCD ≈ 55 mm from the CCD, this grating presents a rather strong defocusing that is poorly mod-elled by our default circular Moffat PSF, even with a fourth-order polynomial evolution with wavelength.
The building of an appropriate model for this highly defocused PSF is left for future work and is presented here as an illustration that the prior understanding of the telescope can be greatly worthwhile in a forward-fitting approach.These extractions used a sigma-clipping procedure with a 20σ threshold in order to reject only the field stars and not the poorly modelled spectrogram pixels.
However, the same PSF kernel as was used to treat the same star observed 5 minutes later, but with an amplitude hologram optimised to correctly focus the spectrogram at all wavelength (Moniez et al. 2021), leads to residuals between ±5σ (right fig-ures), mostly dominated by a field star that contaminates the spectrogram at about 530 nm.
The parameters of interest for these two extractions are summarised in Table 4.We realised a posteriori that the S/N was not sufficient to fit A 2 , and decided to keep it fixed at 1.The way in which the ratio r 2/1 (λ) was obtained for these exposures is explained in Section 5.The calibrated spectra given by Spectractor at the end of the extraction process are given in Figure 20.Because of the strong defocusing, the spectrum from the Thorlabs grating presents broadened absorption lines, while the amplitude hologram has sharper absorption lines.The second displays a better spectral resolution that is limited by the atmospheric seeing, which argues in favour of either controlling the spectrograph PSF at the hardware level, or of being able to accurately model a defocused PSF kernel.At this point, it seems easier to adjust the spectrograph to obtain a simple PSF model than to guess the complexity of the chromatic PSF with on-sky data.Based on these spectra or spectrograms, we show in Section 5 how to measure the atmospheric transmission or the instrumental transmission via forward modelling.

A path toward measuring atmospheric transmission
One of the main objectives for building a spectrophotometric instrument and its analysis pipeline is to be able to accurately measure on-site atmospheric transmission so as to improve the photometric calibration of other telescope on the same site.For instance, the aim of the Auxiliary Telescope at Cerro Pachón is to measure the atmospheric transmission to correct the photometry of the LSST survey.In order to discuss the capabilities of Spectractor in measuring atmospheric quantities, we first recall that its main output is a first diffraction order spectrum, To be able to obtain the atmospheric transmission T atm (λ|P a ), we need to know the stellar SED and the instrumental transmission, and to have an accurate full forward model for the spectrograph.
Because the most accurate PSF model is achieved with the amplitude hologram at CTIO because of its focusing properties, we expect better results from its analysis than from the data acquired with the Thorlabs grating.
In order to inform our forward model, we need to know the disperser on-sky first-order transmission and their r 2/1 (λ) ratio.While we show below how to use on-sky data to this end, we also secured the Thorlabs grating to bring it back on an optical bench at the Laboratoire de Physique Nucléaire et de Hautes Énergies (LPNHE) and measure its transmission.This was not possible for the prototype hologram used at CTIO, and we had to recover its transmission with on-sky data.
While in theory, the forward modelling of the atmospheric transmission could be based on a perfect a prior knowledge of the instrument and simply needs to fit each star exposure, in practice, all this is slightly more complex.We therefore decided to show that our pipeline can be used with intermediary steps in order to gain increasingly better understanding of the data, up to the point where the full forward modelling of the atmospheric parameters becomes possible.Because of the limited telescope time and observations, this first paper relies on a limited amount of data and aims at presenting the algorithms and procedures.Clearly for accurate results, many more data are required.
The different steps that we undertook and detail below are as follows: -5.1 Laboratory measurement of the blazed grating transmission as a function of λ; -5.2.1 inference of the CTIO 0.9 m telescope transmission using data taken with the blazed grating during a stable night; -5.2.2 inference of the amplitude hologram transmission during the same stable night with the CTIO 0.9 m telescope transmission; -5.2.3, 5.3 measurement of T atm (λ|P a ) with data using the amplitude hologram with T inst,1 (λ), S * (λ), and a Moffat PSF kernel.

Datasheet Optical bench
On-sky at CTIO r2/1( ) extrapolation Fig. 21.First diffraction-order transmission of the blazed Thorlabs grating with 300 lines/mm (blue and black) and the ratio of transmissions of the order 2 over the order 1 (red, blue, and orange).
Our blazed Thorlabs 300 lines/mm grating was studied two years after the CTIO campaign on the LPNHE optical test bench.The optical bench is made of parabolic off-axis mirrors to simulate a monochromatic f /D = 18 telescope beam.A monochromator is used to select an accurately known narrow wavelength interval, and the light, after passing through the grating mounted on an xyz support, is collected on a CCD device.
The transmissions for orders 0, 1, and 2 were measured using aperture photometry at many wavelengths.Because the laboratory bench had been designed to measure filter transmissions, it could unfortunately not be adjusted to allow a S/N that was high enough in the regions below 450 nm and above 1000 nm.For these wavelength ranges, we therefore used the spreadsheet of the grating manufacturer.
To measure the r 2/1 (λ) ratio, we used the optical bench measurement above 450 nm and the CTIO on-sky measurement from Moniez et al. (2021) below.
The first-order efficiency curve and the r 2/1 (λ) curve for the blazed Thorlabs 300 lines/mm grating are represented in Figure 21 and were used in Spectractor when spectra taken with this grating were measured (e.g. in Figure 19 left).

Analysis of a photometric night
To measure the CTIO 0.9 m telescope transmission, we made use of a set of spectra acquired at different airmasses during a night with stable photometric condition.The multiplicity of the airmass conditions and the hypothesis that the atmospheric transmission spectrum only varies with the quantity of atmosphere between the source and the observer allowed us to factorise the atmospheric transmission as an airmass-dependent term and the instrumental transmission term.
In order to distinguish between the average spectrum of the atmospheric transmission and the instrumental transmission spectrum, we performed the fit over the available N s spectra by simulating the atmospheric transmission with Libradtran13 (Mayer & Kylling 2005;Emde et al. 2016) jointly with N i arbitrary coefficients to sample the T inst,1 (λ) curve.
At CTIO on 2017 May 30, the night presented very stable conditions according to the in situ meteorological measurements of temperature, pressure, humidity, and also a stable seeing around 0.8".We analysed the spectra of CALPSEC star HD111980 acquired with the Thorlabs and holographic dispersers under the hypothesis that this night was photometric.The observations cover an airmass range from 1 to 2 (see Figure 22).
For each of the dispersers, N s spectra were acquired and extracted.They were averaged in 3 nm bins, for which the instrumental transmission is assumed to be very smooth at this scale.The main atmospheric and hydrogen absorption lines were masked in this process.

Analysis of a photometric night in which the instrumental transmission was extracted
We present here how we estimated the telescope transmission from a photometric night data set and the knowledge of the laboratory measurement of the Thorlabs grating.Basically, we fitted a telescope transmission model and one atmospheric model on the collection of spectra extracted from the data collected with this disperser.
From 300 nm to 1100 nm, we simultaneously fitted the N s = 20 good blazed grating spectra observed with the model from equation ( 9) for p = 1, where S * (λ) is the binned SED of the CALSPEC star, and T inst,1 (λ) is a vector of N i = 250 free linear amplitude parameters.
The Libradtran atmospheric transmission simulation T atm (λ) uses the in situ pressure, temperature, and airmass given in each exposure metadata.In addition, three common parameters P a were fitted: the precipitable water vapour (PWV; in mm); the ozone quantity (in dobson db); the vertical aerosol optical depth (VAOD).
In order to account for a possible small grey variation in the atmospheric transmission, each spectrum was weighted by a grey factor A (n)  1 , with their average constrained to one, The χ 2 we need to minimise thus reads where D n is the data vector for spectrum number n, and C n is its covariance matrix estimated by the Spectractor extraction pipeline.The A (n) 1 and P a parameters were fitted jointly via a Gauss-Newton descent and come with their covariance matrix, while the T inst,1 (λ) linear parameters were computed analytically via the usual algebra at each descent step.Because the spectrum of the star is assumed to be known, no regularisation is needed.Because the instrumental transmission is assumed to be smooth, the descent was repeated with a 5σ clipping to remove outliers.
This procedure has been tested on simulations, and we verified that it recovered the injected parameters for instrumental transmission, grey factors, and atmospheric quantities within the uncertainty ranges.
The results obtained on the CTIO data are presented in Figure 23.Approximately 15% of the 5000 spectral data points are masked, either because they are close to a spectral line, or because they are 5σ outliers.The residuals are structured below the 2σ level in the red part of the spectra, either because of an incorrect PSF model for the redder wavelengths due to defocusing, or because of PWV variations in the atmosphere, as hinted by the spectra, vertically ordered in time.
The best T inst,1 (λ) solution that we extracted is presented in Figure 24.The black points represent the raw fitted T inst,1 (λ) vector, and the red curve is smoothed with a Savitzky-Golay filter of order 1 and window size 17.The error bars result from the combination of raw uncertainties from the fit plus the difference between the smoothed curve and the scattered raw black points.This leads to larger uncertainties for the instrumental throughput where the spectra were masked, around the main absorption lines.
The transmission curve presents the expected decreases due to the loss of efficiency of the CCD.Given the measurement of the blazed Thorlabs grating at the laboratory (Figure 21), we extracted the CTIO 0.9 m instrumental throughput from the T inst,1 (λ) smoothed curve.
This fills the lack in a priori knowledge of the telescope throughput to inform our forward model, and it is used in the following analysis.We obtained the telescope throughput by dividing the fitted instrumental transmission by the first-order efficiency of the blazed Thorlabs grating.The atmospheric transmission results are detailed in Section 5.2.3.A more accurate estimate of the instrumental transmission would need more data, both to inform a better PSF model, and to constrain the atmospheric transmission variations better.Because the available data were limited, our goal was to illustrate that a forward-model approach can be adjusted to gain more information about the different components of the model.
We find it also noteworthy that the procedure is symmetric with respect to atmospheric transmission and telescope transmis-400 500 600 700 800 900 1000 1100 [nm] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 CTIO transmission with blazed grating T inst, 1 T inst, 1 smoothed Fig. 24.Measured T inst,1 (λ) curve for the CTIO 0.9 m telescope equipped with a Thorlabs 300 lines/mm blazed grating (black points) from a photometric night.The red curve is a smoothing using a Savitzky-Golay filter of order 1 and a window size 17.sion: The need for the Libradtran model as a priori to constrain the atmospheric transmission shape could be replaced by the a priori measurement of the telescope transmission.

Analysis of a photometric night in which amplitude hologram transmissions were obtained
The next step needed to further inform our forward model in order to use the best-quality data to constrain the atmospheric transmission was to estimate the holographic disperser transmission.If this transmission is known, the data gathered with this disperser can be used, and we can take advantage of the fact that its PSF can be fairly well modelled with a Moffat.In order to obtain the hologram transmission, we used the same procedure as described for the Thorlabs disperser above on data that were collected during the same photometric night, but with the holographic disperser.However, the ratio r 2/1 (λ) is still a prior information needed for the full forward model.For the holographic disperser, we built r 2/1 (λ) using the interpolated on-sky data presented in Moniez et al. (2021) Figure 21 alone.
The N s = 27 spectra are presented in Figure 22, and the results are presented in Figure 25.Approximately 0.5% of the 6723 data points are masked.The residuals are below 3σ.A deep absorption feature is visible around the water-absorption band at 950 nm.
As in the previous section, from the T inst,1 (λ) best fit (see Figure 26), we deduced the transmission of the first diffraction order for the holographic disperser, using the CTIO 0.9 m telescope transmission curve determined previously.We are well aware of the systematic errors present in these results, and we stress that they are presented here to illustrate that the forward model approach we implemented can be used when a priori information about crucial components of the model is lacking.

Analysis of a photometric night in which atmospheric parameters were extracted
In addition to the instrumental transmissions of both dispersers, the procedures above also yield the parameters describing the mean atmospheric transmission of the night.Under the assump-  26.Measured T inst,1 (λ) curve for the CTIO 0.9 m telescope equipped with an amplitude holographic grating of about 350 lines/mm (black points) from a photometric night.The red curve shows a smoothing using a Savitzky-Golay filter of order 1 and a window size 11.tion that the night was photometric, these results are presented in Table 5.
The rather low value of the reduced χ 2 for the amplitude hologram illustrates the focusing properties of this disperser, which allow us to describe its PSF quite accurately with a simple 2D Moffat.Quantities obtained from the blazed Thorlabs grating data show lower statistical uncertainties than amplitude hologram data because their S/N is much higher (because its transmission is much higher).However, they certainly show higher unevaluated PSF systematics than the hologram measurements.The difference between the two estimates of the atmospheric transmission in Table 5 leads to variations in synthesised broadband magnitudes for the LSST filters of about 8 mmag in the u, g, and r filters, 3 mmag in i, 1.5 mmag in z, and 4 mmag in y filter for various standard CALSPEC SEDs and supernovae at redshift 0. The millimagnitude accuracy on atmospheric transmission can thus be reached provided that the accuracy of the atmospheric parameters reaches below the difference shown in Table 5: We found that PWV must be fitted with an accuracy better than ≈ 0.05 mm to obtain a milli-magnitude accuracy in y band.For VAOD, uncertainties of about 0.001 are required for the u, g, and r bands.For ozone, a 10 db precision is enough to obtain milli-magnitude precision in r band.Furthermore, the ozone and VAOD parameters we fitted are similar to the estimates of the global meteorological network MERRA-214 (Gelaro et al. 2017) for the CTIO site during that night.The MERRA-2 PWV value ranges from 4 to 5 mm during the night of 2017 May 30.As MERRA-2 averages atmospheric quantities in 60 km wide cells, it can be expected that quantities with large local variations such as water vapour could differ from on-site measurements.This is even more true for CTIO, which is located at the top of a Chilean mountain.On the other hand, a high-atmosphere quantity would be expected to depart less between on-site and satellite measurements.We report the MERRA-2 values and compare them to what we extract with our forward model to illustrate that with a detailed knowledge of the telescope, the challenging problem of on-site atmospheric transmission measurement can be solved.While the quoted error bars only propagate the statistical uncertainties and are probably dominated by systematics, the tentative concordance between the parameters measured by MERRA-2 and our forward-model results supports the algorithm we developed.
For completeness, we also present the evolution of the grey parameters A (n)  1 through the night in Figure 27 for the holographic disperser data, together with the final correlation matrix of the fitted parameters in Figure 28.The variation in the 27 A (n) 1 factors is lower than 1%.This supports the first-order approximation of the night as being photometric and again shows that the procedure is able to improve our understanding of the data.It also offers venues to improve the model.
Finally, we note that the correlation matrix shows that the VAOD aerosol parameter is particularly strongly correlated to the spectrum amplitudes.This is expected because this quantity is mostly determined by the spectrum slope for λ ≈ 400 nm.Any systematic on the amplitude of the spectrum therefore directly affects the estimate of aerosols.

Atmospheric forward-model approach
After illustrating that the forward-model approach can be used to measure the telescope and disperser transmissions, which yields a set of stellar and atmospheric transmission spectra, we can proceed one step further.Assuming that our measurement of these crucial components of the forward model had been done with enough data to be accurate, we might skip the part in which the spectrum is extracted and directly fit the atmospheric parameters on the raw spectrogram.
At the cost of having access to the transmissions described above, when we model the spectrum S 1 (λ) as the product of a known instrumental transmission T inst,1 (λ), a Libradtran atmospheric model T atm (λ|P a ), and the SED S * (λ) of a known CAL-SPEC star, we can describe any observed spectrogram as with A 1 a grey factor.As before, the parameters and their covariance matrix were estimated via a Gauss-Newton descent by minimising a χ 2 calculated over a single spectrogram.The fitted parameters were A 1 , A 2 , δy (fit) , P a , D CCD , α, and all the polynomial coefficients that model the wavelength dependence of the PSF kernel.Each spectrogram was fitted independently.We call this a spectrogram fit.As a comparison, and as a way to assess the quality of the stellar spectrum forward model, we also directly fit S 1 (λ) and the atmospheric transmission on the stellar spectra extracted at this step.We call this a spectrum fit.

Qualification on simulations
The direct extraction of atmospheric parameters from a spectrogram was tested on simulations.The parameters we chose to simulate were the extracted parameters found from fitting all data spectrograms of the photometric night of 2017 May 30, involving the amplitude hologram.With these parameters, we simulated spectrograms of a CALPSEC star spanning an airmass ≈ 1 to ≈ 2 with the same seeing and atmospheric conditions as that of our data.We arbitrarily fixed the unknown P a parameters to 300 db for ozone, 0.03 for VAOD, and 5 mm for PWV.
The result of the spectrogram fit is very similar to what was presented in Figure 10.For comparison, we present the result of the spectrum fit of the extracted spectrum from one of the simulated spectrograms in Figure 29.The extracted spectrum (red points), the best-fitting spectrum model (blue), and the true spectrum (green) all agree within the quoted uncertainties.
In addition, the recovered atmospheric values are compatible with the true injected values within the uncertainties, with a strong correlation between the grey parameter A 1 and the aerosols, as seen before on real data.The nightly behaviour is presented in Figure 30.All values agree with the true values for both methods and correctly account for the variable simulated conditions.
We again note the strong correlation between the VAOD and the A 1 parameter.As mentioned before, this is an expected behaviour because aerosols specifically affect the spectrum slope in the blue and the global spectrum amplitude.
In addition to validating the spectrogram fit, theses results also show that the forward-model process and all the pipeline steps presented above do not bias the measurement of the atmospheric parameters.

Data analysis
The individual fits of the spectrograms and spectra extracted from CTIO data are presented in Figures 31,32,and 33.The atmospheric forward modelling fits the data at the 5σ uncertainty level, but the PSF model imprints structured residuals similarly to what happens in the full forward-model case (Figure 19).This effect is visible throughout the spectrogram and inside the dioxygen absorption line.
The spectrum presented in Figure 32 is globally well fitted by the S 1 (λ) model.The fit residuals around the main dioxygen line for data and for the simulation are compatible with the instrumental throughput uncertainties.All spectrograms and spectra of the night show the same residual patterns.The two methods yield very similar values for the atmospheric parameters.The values of the spectrogram fit are smooth in time, with a visible correlation between VAOD and A 1 parameters, while the spectrum fit values are shifted and more scattered, probably due to the higher sensitivity of this simpler procedure to outliers such as the field-star contamination of the spectra around 530 nm.
We recall that in the spectrogram fit, the raw spectrogram data are directly fitted with a model that contains the instrumental transmission for diffraction orders 1 and 2, a Libradtran atmospheric model, and models for the dispersion relation and PSF kernel.
At this point, the smoothness of the atmospheric parameter curves and the reasonable values that we obtained (low ozone and a few millimetres of precipitable water vapour) are as close to reality as can be expected with the quality and size of the data set at hand.
We again acknowledge that these atmospheric results are affected by systematic uncertainties and choices (e.g. the circularity of the PSF model, the PSF size of the second diffraction order, or the blazed grating transmission model) that affect the absolute value of the quoted parameters.Unfortunately, we do not have enough data to estimate systematics and proceed.We leave an analysis of the atmospheric transmission for a future paper, for example, based on the high-quality data set promised by AuxTel, which is dedicated to measuring atmospheric transmission at the Rubin Observatory site.With its mirror with a diameter of 1.2 m, Article number, page 23 of 29 VAOD its high-quality CCD sensor, and its fast readout electronics, the spectra of CALSPEC standards or stars with similar magnitudes can be acquired at a rate of approximately one or two per minute.Its overall observation strategy is not yet defined, but it has to infer the atmospheric transmission for each LSST pointing, presumably observing a grid of stars (CALSPEC stars or others) to infer directional atmospheric transmissions and interpolate them for the LSST pointing.

Summary and conclusions
Slitless spectrophotometry with forward modelling opens a path towards the acquisition of spectra with imaging telescopes that can be simply transformed into spectrographs by inserting a disperser on the light path.
We demonstrated on simulations that building a forward model of a spectrogram allows for accurate spectrophotometry, with a spectral resolution that only depends on the width of the PSF along the dispersion axis.The key of the process is a regularisation algorithm, which is fed with as much as prior information as possible (regularity of the searched spectrum, PSF parametrisation, ADR, and grating efficiency).The two key functions of the model are the dispersion function ∆ p (λ) and the PSF model φ(r, λ), together with the knowledge of the r p/1 (λ) ratio of diffraction-order transmissions.We exemplified that this procedure functions on real data, with tentatively very promising results.We are aware of the limits of the data set in our possession, and we exemplified that the forward-model procedure can be used to improve our knowledge of the data, and by doing so, to inform the forward model.
We can also summarise some of the important lessons learned while implementing the Spectractor pipeline as follows.
-Forward modelling provides a modular approach in which each brick is a physical or empirical model that can be changed or improved, depending on the data particularities and S/N.The residuals indicate how the model can be improved (data rules).-When it is implemented, forward modelling can easily simulate data sets to test new algorithms.-The second diffraction order is not a contamination, but a signal that helps recover the blue part of the first-order spectrum.It should be taken advantage of whenever possible.This in particular means that we need to rethink the common wisdom of spectroscopy by increasing the efficiency of the grating in the second order, and use a field rotator (if available) in order to more easily separate the different diffraction orders on the sensor through ADR.-The accurate knowledge of the PSF is thus crucial and requires dedicated data and an analysis that need to be carefully budgeted for.The PSF width sets the spectral resolution and the number of degrees of freedom that can be extracted from the data, and thus, decreasing the width is crucial.
Because the main scientific driver for the development of Spectractor is the measurement of on-site atmospheric transmission, we pushed our analysis to that point.We showed that our procedure allows us to measure the on-sky telescope transmission to the direct extraction of atmospheric parameters from spectrograms.
While the atmospheric parameters are dominated by systematic uncertainties, in particular, by our partial knowledge of the instrumental transmission and of the PSF, the comparison with satellite data shows a promising tentative agreement.Work that goes beyond this paper, which was devoted to presenting the spectrophotometry method, is a more intensive study of on-site atmosphere transmission.This requires in particular access to many more data and a specific detailed analysis to obtain accurate instrumental transmissions and PSF model.
We would finally like to acknowledge that many elements of the forward model can and will be improved as new data become available.In particular, the background might be estimated directly in the forward model, including other diffraction orders, by modelling the contamination of the field-star spectrogram and integrating the chromatic flat-fielding in the model to account for pixel efficiencies, and so.If the flat screen in the observatory can be illuminated with a system of several LEDs or lasers, a cube of wavelength-dependent relative transmission might be obtained that could be directly included in the forward model.Instead of dividing the exposures by one of them, each layer of the PSF cube φ p (r, λ) could then be multiplied by the flat corresponding to the correct wavelength before the spectrogram model is integrated over λ.Using the forward model, we can thus solve the exact problem of deflation in spectrophotometry.
These ideas are worth implementing in the algorithm if required by data.On the other hand, many hardware solutions can also be implemented to increase the a priori knowledge of the instrument and to greatly improve the forward-model analysis.For instance we showed that holographic dispersers such as those presented in Moniez et al. (2021) improve the focusing of the spectrogram on the sensor on the whole visible and near-infrared range, which facilitates the PSF modelling.Its narrow width also allows a better spectral resolution.Another improvement would be using a collimated beam projector (CBP) (Coughlin et al. 2016;Souverin et al. 2022) to measure the telescope transmission at the per mill level, and monitor its evolution with time.
In conclusion, we have presented the theoretical tools, together with a detailed implementation example to add spectrophotometric ability to an imager by the insertion of a disperser on the light path.This comes at some computational cost, which is readily available today, but it also requires either a priori knowledge of the instrument or dedicated data and an analysis to bring the model to the required level of accuracy.
The penalisation term χ 2 pen ( A|A 0 ) = ( A − A 0 ) T Q( A − A 0 ) is also a quadratic term, and for Q = L T U T UL with the Laplacian operator L = −D T D, and U is such that we obtain χ 2 pen ( A| A 0 ) = (UL( A − A 0 )) T (UL( A − A 0 )), and it is the quadratic norm (or the squared Euclidean norm) of the vector UL( A − A 0 ), denoted UL( A − A 0 ) 2 2 .When we interpret A as the discretisation of a continuous spectrogram a by setting A (i) = a i N x , and when σ is a function such that σ , a continuous analogue of this term would be a term of the form We also note that lim This shows that regularisation using the weighted quadratic second-order derivative automatically ensures an upper bound of the weighted total variation norm.This means that while it is computationally much faster, regularising by the weighted quadratic norm of the second-order derivative ensures an upper bound on the regularisation via the weighted total variation norm.Because the usual advantages of the weighted total variation norm (no assumption of a second-order derivative or research of a sparse minimiser) are not important here, the choice of the weighted quadratic norm of the second-order derivative as a loss function is completely pertinent.

Appendix D: Atmospheric differential refraction
The ADR mostly depends on the pressure, the temperature, and the airmass and only loosely on the atmospheric humidity.
In our wavelength-calibration process for CALSPEC stars, the absorption line that weights most in the fit is the main dioxygen line at 762.1 nm.When the ADR is not correctly modelled and taken into account in the wavelength calibration, shifts of the absorption line minima towards the blue part of the spectrum can be observed throughout the night while the airmass of the star changes.This is illustrated in the left panels of Figures D.1 and D.2.In the right panels, the ADR effect is included in the wavelengthcalibration process through a wavelength-dependent shift of the zeroth-order centroid δu (ADR) 0 (λ).This procedure absorbs most of the line shifts when the dispersion axis is not orthogonal to the zenith direction.

Fig. 1 .
Fig.1.Geometry of a simple slitless spectrograph in the plane orthogonal to the CCD (top) and parallel to the CCD (bottom).A convergent beam with incident angle θ 0 is focused on a CCD at position r 0 = (x 0 , y 0 ), but also passed through a grating with N eff grooves per millimetre at a distance D CCD above the CCD.The beam is deflected at an angle θ(λ) along the mean dispersion axis u, which forms an angle α with respect to the CCD x axis, but is focused somewhere above the sensor.Atmospheric refraction adds a supplementary dispersion along and transversally to the mean dispersion axis.
J. Neveu et al.: Slitless spectrophotometry with forward modelling

Fig. 2 .
Fig. 2. General illustration of the spectrograph model and of the purpose of the spectrum extraction.(a) Illustration of the image formation for a monochromatic star observed through a slitless spectrograph.(b) Same for a polychromatic star.(c) Real acquired image: The detector generally is a black-and-white sensor, with noise, a structured and chromatic background, and field star zeroth-and first-order spectrograms.The goal of Spectractor is to extract the colour information from this image to obtain the stellar spectrum.
Fig. 3. Overview of the Spectractor pipeline.The green ellipses represent external inputs needed for the spectrum extraction, and blue ellipses stand for the Spectractor products.The bottom stage represents the full forward-modelling method for extracting the spectrum from the raw spectrogram.

Fig. 4 .
Fig. 4. Processed exposure of CALSPEC star HD111980 observed at CTIO with an amplitude-hologram grating of 350 lines/mm on 2017 May 30.The greyscale gives the exposure intensity in ADU/s.The yellow circle indicates the zeroth-order position of HD111980.

Fig. 5 .
Fig.5.Zeroth-order images from different celestial objects, observed in different seeing conditions with several dispersers.The red cross shows the fitted centroid using the Spectractor method from section 3.2.All images have saturated pixels.

Fig. 8 .Fig. 9 .
Fig. 8. Disperser rotation angle estimation.Left: Angle α on the spectrogram image for the 5% pixels with the highest λ − values found in the Hessian matrix, and the dispersion axis as the median of the angles (black line, shifted upward for clarity).The masked pixel with low λ − values are indicated (light grey).Right: Histogram of the selected angles from the left panel and its median (vertical black line).

Fig. 11 .
Fig. 11.Optimisation of the r hyper-parameter for a simple simulation with n PSF = 0, γ 0 = 5, α 0 = 3 (and the same characteristics as in Table1).Top: G(r) function (blue) and the minimum position r = 0.0467 (vertical black line).Middle: χ 2 evolution with r.Bottom: Trace of the resolution matrix Tr R. The intersection with the black line gives the effective number of parameters fitted by data, here, 180.

Fig. 12 .
Fig. 12. Product of the PSF FWHM with N dof /N x as a function of the PSF FWHM for three different PSF models: a Gaussian kernel (orange stars), a Moffat kernel with α 0 = 2 (blue points), and a Moffat kernel with α 0 = 3 (green points).
Fig.13.Wavelength-calibration process on a planetary nebula spectrum.Top: Global χ 2 (δu(fit)  0 , D CCD ) function and its minimum (black circle) for the planetary nebula PNG321.0+3.9 observed at CTIO.The sharp steps at the top of the plot arise when certain lines are detected.Bottom: Calibrated spectrum of PNG321.0+3.9.The vertical lines indicate emission or absorption lines that are detected, positioned at their tabulated values.The dashed blue lines show the fitted background (whose degree depends on its length), and the plain blue lines show the Gaussian profiles fitted on data.

Fig. 14 .
Fig. 14.Same as Figure 10, but with the addition of a second-order diffraction spectrogram and after the full forward-model fitting procedure.

Fig. 15 .
Fig.15.Results from the full forward-model fit of a simulated spectrogram of the CALSPEC star HD111980 with a Moffat PSF kernel.A second diffraction order was simulated.Top left: Spectrum output from the full forward-model process (orange) compared with the true spectrum injected in the simulation (blue) and the intermediate 1D transverse fit used as the prior vector A 0 (green).Bottom left: Residuals between the true spectrum and the 1D and 2D fits normalised by their estimated uncertainties.Top right: FWHM of the true PSF (blue, right below the green curve) compared with the fitted PSF during the full forward-model procedure (green) or the 1D transverse fit (orange).Bottom right: Difference between the true PSF FWHM injected in the spectrogram simulation and the 1D and 2D fits.All blue curves are covered by the orange curves.

Fig. 16 .Fig. 17 .
Fig. 16.Correlation matrix of the full forward-model fitting of a spectrogram simulation at the end of the Gauss-Newton descent.
Fig. 18.Extraction bias b (blue points) as a function of the S/N materialised by the amplitude A 1 factor (top), of the PSF width γ 0 (middle), and of the rotation angle α (bottom).For completeness, the effective number of degrees of freedom N dof is represented with red diamonds for each simulation.

Fig. 19 .
Fig. 19.Same as in Figure 14, but for CTIO data: CALSPEC star HD111980 observed on 2017 May 30 with a blazed Thorlabs grating 300 lines/mm (left), for which the PSF is out of focus and deviates from a Moffat model, and an amplitude hologram of 350 lines/mm (right), better focused on the CCD.

Fig. 22 .
Fig. 22. Spectra from the CALSPEC star HD111980 acquired during the night of 2017 May 30 at CTIO, which was assumed to be photometric, with the blazed Thorlabs grating with 300 lines/mm (top) and an amplitude hologram with 350 lines/mm (bottom).The curves are coloured according to the acquisition airmass z.In the amplitude hologram, a field star creates a spike around 530 nm.

Fig. 23 .
Fig. 23.Multispectra fit of a CTIO photometric night using the blazed grating with 300 lines/mm.Top: D n data spectra binned in 3 nm intervals, indexed vertically by their index n and with a coloured amplitude.Masked regions are shown in grey.Middle: Best-fitting spectrum models A (n) 1 S 1 (λ) indexed vertically by their index n.Bottom: Residual map.

Fig. 25 .
Fig. 25.Multispectra fit of a CTIO photometric night using the amplitude hologram with 350 lines/mm.Top: D n data spectra binned in 3 nm intervals, indexed vertically by their index n and with a coloured amplitude.Masked regions are shown in grey.Middle: Best-fitting spectrum models A (n) 1 S 1 (λ) indexed vertically by their index n.Bottom: Residual map.

Fig. 27 .
Fig. 27.Measured A (n) 1 grey absorption factors for the CTIO night of 2017 May 30, in which CALSPEC star HD111980 was observed, for each spectrum (ordered in time).

Fig. 29 .Fig. 30 .
Fig. 29.Results from the S 1 (λ) model fit on the spectrum of CALSPEC star HD111980, extracted from a simulated spectrogram.Left: Spectrum data (red) compared with the best-fitting model (blue) and the model uncertainties (light blue band) due to the CTIO telescope transmission uncertainties, the true injected spectrum in the simulation (green), and the residuals (bottom).Middle: Zoom around the dioxygen line at 762 nm and zoom on the H 2 O absorption band around 950 nm.Right: Correlation matrix of the fitted parameters.Simulations

Fig. 31 .Fig. 32 .
Fig. 31.Results from the atmospheric forward model of CALSPEC star HD111980 with a Moffat PSF kernel (the shape parameters evolve as a fourth-order polynomial function), observed with an amplitude hologram with 350 lines/mm.Left: Spectrogram data (top), best-fitting spectrogram model (middle), and residuals in units of σ (bottom) Middle: Zoom around the dioxygen line at 762 nm and zoom on the H 2 O absorption band around 950 nm.All colour maps are normalised by the maximum of the simulated spectrogram.Right: Correlation matrix of the fitted parameters.Data

Fig. 33 .
Fig. 33.Evolution of the atmospheric parameters as a function of the exposure time during the night of 2017 May 30, in which CALSPEC star HD111980 was observed with an amplitude hologram.The blue points show the spectrum fits, and the orange points show the spectrogram fits.
because N x D ≈ − d dx .The total variation distance is defined as the (weighted) norm-1 of the gradient operator as a functional term, (a − a 0 ) 1,σ = A 0 )| = (a − a 0 ) 1,σ .(C.7)However, by a simple argument, we can show that the 2norm of the second derivative controls the 1-norm of the first derivative, so that minimising the former means that we also minimise the latter.In order to prove it, let f (x) = a(x) − a 0 (x

Fig. D. 1 .
Fig. D.1.Absorption line Hβ at 486.3 nm of CALSPEC star HD111980 observed during the night of 2017 May 30 while it goes from an airmass of 1 to an airmass of approximately 2. Left: Without modelling the ADR effect in the wavelength-calibration process.Right: With the ADR effect in the wavelength-calibration process.The profiles are better aligned.For completeness, we show in Figure D.3 the angle conventions that are used in Spectractor to correctly compute the zenith direction in the image.

Fig. D. 2 .
Fig. D.2.Same as Figure D.1, but for the Fe absorption line at 430.8 nm.N

Table 1 .
Principal characteristics of the exposure used as an example in section 3.

Table 2 .
Emission lines detected in the spectrum of planetary nebula PNG321.0+3.9 with a S/N above 10.

Table 3 .
Parameters of a spectrogram simulation and Spectractor estimations, recovered with a full forward-model approach.

Table 4 .
Parameters of the CTIO exposures and Spectractor estimates.

Table 5 .
Atmospheric parameters P a fitted for the same photometric night of 2017 May 30, observing CALSPEC star HD111980 with two different dispersers.Notes.Last line gives the MERRA-2 value measured in a 60 km wide cell around CTIO.