Gaia Data Release 3: External calibration of BP/RP low-resolution spectroscopic data

Context. Gaia Data Release 3 contains astrometry and photometry results for about 1.8 billion sources based on observations collected by the European Space Agency (ESA) Gaia satellite during the first 34 months of its operational phase (the same period covered Gaia early Data Release 3; Gaia EDR3). Low-resolution spectra for 220 million sources are one of the important new data products included in this release. Aims. In this paper, we focus on the external calibration of low-resolution spectroscopic content, describing the input data, algorithms, data processing, and the validation of the results. Particular attention is given to the quality of the data and to a number of features that users may need to take into account to make the best use of the catalogue. Methods. We calibrated an instrument model to relate mean Gaia spectra to the corresponding spectral energy distributions using an extended set of calibrators: this includes modelling of the instrument dispersion relation, transmission, and line spread functions. Optimisation of the model is achieved through total least-squares regression, accounting for errors in Gaia and external spectra. Results. The resulting instrument model can be used for forward modelling of Gaia spectra or for inverse modelling of externally calibrated spectra in absolute flux units. Conclusions. The absolute calibration derived in this paper provides an essential ingredient for users of BP/RP spectra. It allows users to connect BP/RP spectra to absolute fluxes and physical wavelengths.

IM presented here to allow for the simulation of Gaia-like mean spectra from a given spectral energy distribution (SED); for example a synthetic stellar spectrum or an absolute-flux-calibrated measured spectrum.
In this paper, we illustrate how the BP/RP IM is derived and how internally calibrated mean BP and RP spectra are converted into ECS, discussing the performances and the limitations of the final products. After giving an overview of the external calibration approach (Sect. 2) and a description of the external calibrators (Sect. 3), we describe the implementation of the IM (Sect. 4) and the method implemented to reconstruct the ECS (Sect. 5). Section 6 is dedicated to the description of the processing scheme and the main results are shown in Sect. 7, while Sect. 8 is dedicated to the validation of the calibrations. Finally, in Sect. 9 we discuss a few known problems.

Overview of the problem
The instrument model (described in detail in Sect. 4) allows us to relate a mean BP or RP spectrum to the corresponding SED via an integral equation of the following kind: n e (u) = I(u, λ) · n p (λ) dλ, where the observed spectrum n e (u) is the internally calibrated mean spectrum in units of e − s −1 (u denotes a coordinate in data space, often referred to as a pseudo-wavelength) and the kernel I is a combination of few components: (i) the LSF, that is, the instantaneous one-dimensional intensity distribution in the spectrum of a monochromatic point source; (ii) the dispersion model, that is, the relation that links absolute wavelengths to pseudo-wavelength coordinates; and (iii) the response model, which represents the ratio between the number of detected photons for wavelength interval and the number of photons per wavelength interval entering the telescope aperture.
As the detectors are photon-counting devices, all the relations are expressed in terms of n p (λ), the spectral photon flux distribution (SPD). The SPD (hereafter expressed in units of photon m −2 s −1 nm −1 ) is related to the source SED via the equation where hc is the product of the Planck constant and the vacuum speed of light, and the SED f (λ) is expressed in units of W m −2 nm −1 (the factor 10 −8 compensates for per-nanometre flux units; all other measurements and constants are expressed in S.I. units).
The external calibration concept is based on two key assumptions: the first is that all differential effects that impact raw observations (spatial variations of the instrument across the focal plane, different observing configurations, time evolution of instrument characteristics, etc.) have been removed by the internal calibration chain ) that precedes the external calibration; the second is that the common reference system defined by the internal calibration is similar to the physical instrument in some unspecified point of the focal plane.
These assumptions underlie the design of the external calibration strategy: a) The external calibration model is unique for all sources and in particular does not depend on source luminosity or colour. The first assumption above, although plausible, cannot be a priori guaranteed and any infringement of it will manifest as systematic differences within different classes of sources. b) Also, the IM can be parametrised in terms of corrections to the nominal instrument, which is defined by our pre-launch knowledge and laboratory measurements made on the satellite components.
This second point helps to cope with degeneracies that are present in possible IM solutions: in principle the optimisation of the IM parameters could be derived from an arbitrarily large number and variety of standard stars (sources with known SED from independent ground-or space-based observations) by matching the predictions of the model with the corresponding observed BP/RP mean spectra. As discussed by Weiler et al. (2020), the traditional approach of deriving a simple response of the instrument as a function of wavelength -by computing the ratio between the observed spectrum and the SED for a limited set of (possibly featureless) calibrators-will not work for a Gaia-like instrument because of the rather large width of the LSF compared to the wavelength scale of the response variations. As a consequence, the derived response changes with the spectral type of the calibrator: the LSF must be taken into account, creating the need for a much larger set of calibrators.

Calibrators
A reliable calibrator must satisfy several stringent requirements: it must be an isolated and point-like source with stable flux and high signal-to-noise ratio (S/N), and it should not be subject to strong interstellar extinction in order to avoid polarisation that could cause variations in the measured flux with the observing geometry and so on (see Pancino et al. 2012;Pancino et al. 2021, and references therein).
The data set of spectrophotometric standard stars (SPSSs) 2 , expressly built over the years for the calibration of Gaia photometric and spectroscopic data, is composed of 111 stellar sources, calibrated to the CALSPEC 3 scale (Bohlin et al. 2014(Bohlin et al. , 2020 with flux accuracy of about 1% (Pancino et al. 2012;Altavilla et al. 2015;Pancino et al. 2021;Altavilla et al. 2021). The SPSSs have been monitored for short-term constancy at the 0.005 mag level over a few hours (Marinoni et al. 2016). The current SPSS release is based on about 25% of the spectra collected for the project; a more complete release, which will contain about 200 SPSSs, will be used to calibrate future Gaia releases. The SPSS data set was recently complemented ) by a second set of 60 stars with looser requirements on the absolute accuracy (up to 5%) and flux stability (up to variations of about 0.05 mag) but including stellar types not contained in the SPSS set (bright O, B and late M stars), the passband validation library (PVL). These were originally intended to be used for validation purposes only, but a subset of these were eventually included in some phases of the actual calibration of the IM. The consequence of the severe criteria applied to the selection of primary calibrators is that the resulting stellar spectra are not independent from a mathematical point of view: their principal components span only a subspace of all possible spectral shapes and consequently not all the necessary instru- ment components would be constrained by these, allowing for degeneracies in the solutions. Implementing the IM as a perturbation of the nominal model confers the advantage that unconstrained components have a reliable a priori estimation. However, to enforce more observational constraints to the IM, the set of primary calibrators has been extended by adding a set of secondary calibrators, including a wide variety of sources featuring strong emission lines over the entire wavelength range (mostly quasi-stellar objects (QSOs) and Wolf Rayet stars). Although these objects often show variability and require a special treatment in the processing (see Sect. 6), they are essential to provide strong constraints to the wavelength and LSF calibrations. For the Gaia DR3, a total of 211 peculiar sources have been selected from the literature, including 188 QSOs from the Sloan Digital Sky Survey (SDSS; see Lyke et al. 2020, and references therein), 17 young stellar objects (YSOs) from the X-Shooter spectral library (Verro et al. 2021), and six emission line sources (ELSs) from STELIB (Le Borgne et al. 2003). Most of these objects have SEDs only partially covering the Gaia wavelength range: this limitation had consequences on the processing strategy, as described in Sect. 6. Finally, we also used the catalogue of the Next Generation Spectral Library (NGSL, Heap & Lindler 2016) for validation purposes, which consists of 348 bright sources with magnitude ranging from G = 1.97 and G = 12.0. Figure 1 shows the whole pool of calibrators and validation sources in a colour-magnitude diagram either in terms of absolute (left) or apparent (right) G magnitudes as function of G BP − G RP colour.

Instrument model
The Gaia satellite observes the sky spinning around its axis (Gaia Collaboration et al. 2016a): the light collected by its telescopes is projected onto the focal plane where an array of chargecoupled device (CCD) detectors make measurements while operating in time-delay integration (TDI) mode. In the case of Gaia spectrophotometers, two slit-less prisms disperse the light onto two separate rows of seven CCDs each, both covering the focal plane in the across scan (AC) direction. The dispersion direction, referred as along scan (AL) being aligned with the transit direction of the projected light, is perpendicular to the AC direction. Each spectrophotometer covers part of the spectral wavelength interval, and the two ranges partially overlap: the blue photome-ter (BP) covers the nominal range [330,680] nm while the red photometer (RP) covers the range [640,1050] nm. For each observed source, to limit the telemetry of the satellite, only a small window around the position of the source is actually read and transmitted to Earth during each transit. Windows are 60 × 12 pixels wide in each of the AL and AC directions. The maximum exposure time of an observation (∼ 4.4 s) is fixed by the velocity by which the image transits on a CCD, but, to avoid saturation, it can be reduced according to the magnitude of the source by limiting the activation of the reading window to only a portion of the CCD with gates (De . Sources brighter than G 11.5 are transmitted as 2D windows (called window class 0, WC0) while for fainter sources, data are binned in the AC direction producing 1D spectra of 60 samples (WC1). Each source is observed several times during the lifetime of the mission under many different observing configurations and conditions (see Carrasco et al. 2021, for an exhaustive description). The internal calibration ) has the complex task of calibrating all these configurations to reduce spectra to a common reference system, the mean instrument.
If we assume that the dispersion of the prism is perfectly aligned with the AL direction, then we can model the dispersed image of a point-like source in the data space as where u is the continuous coordinate in data space in the AL direction, w is the continuous coordinate in data space in the AC direction, P τ is the telescope pupil area, n p (λ) is the SPD of the source, u d (λ) is the dispersion function, P λ (u, w) is the effective monochromatic point spread function (PSF) at wavelength λ, and R(λ) is the overall instrument response function. This relation assumes indirectly that non-linear effects, such as those produced by charge transfer inefficiency (CTI) effects 4 , are not important and can be neglected. As all internally calibrated spectra are binned to 1D windows, we can integrate the previous equation in the AC direction, obtaining the following model, which is suitable for describing a mean spectrum: where n e is given in units of e − s −1 and L λ (u) is the effective monochromatic LSF at wavelength λ obtained by integrating P λ in the AC direction. This is the explicit form of Eq. 1. A detailed description of each factor of the model is given in the following subsections.

Dispersion model
Airbus Defence and Space (DS), the company in charge of developing and building the Gaia satellite, provided nominal dispersion functions based on chief-ray analysis for the BP and RP prisms in units of millimetres as a function of wavelength, by fitting a sixth-degree polynomial to the unperturbed Gaia optical design. For each field of view (FoV), dispersion functions are 4 CTI, by delaying the release of the charge by a physical pixel as CCD charges scroll in the AL direction, would result in a deformation of the source spectrum that depends not only on the source SED but also on the scene of the observation, i.e. the temporal sequence of sources observed by that particular pixel immediately before the considered transit. provided for the centre of each CCD (the dispersion varies in the AC direction) in the form of the coefficients A i of the expansion where -AL(ω) denotes the AL image position in mm, -ω = 1/λ in nm −1 denotes the wavenumber, and ω re f = 1/440 nm −1 for BP and 1/800 nm −1 for RP, corresponding roughly to the central wavelength of each instrument.
The AL position in pixel units u is obtained by dividing Eq. 5 by the pixel size of a CCD in the AL direction P AL . Nominal dispersion curves for all FoV/CCD row combinations are shown in Fig. 2. The dispersion curve varies across the focal plane due to the tilt of the prisms with respect to the focal plane assembly: the comparison of the dispersion functions for different AC positions shows that the functions are related through a linear scaling to a high degree and are virtually independent from the FoV. We can therefore arbitrarily assume the coefficients for any of the CCD rows or FoV and model the generic dispersion function as: where u d (λ) denotes the AL image position in pixel units, coefficients A i are arbitrarily assumed as those of CCD row 4, and FoV 1 and d k are the IM parameters to be optimised in the calibration process. For Gaia DR3, we assume a number of parameters N u = 3. Lower order parameters can be interpreted as follows: parameter d 0 represents the zero point of the dispersion relation and by construction is the reference AL position u re f corresponding to the reference wavenumber ω re f ; and parameter d 1 is the scale of the dispersion relation. The nominal values are d 0 = 30, d 1 = 1.0, and d 2 = 0.0 for both BP and RP instruments.

The line spread function model
The LSF model is the only component of the IM that cannot be implemented as a simple perturbation of a nominal model, as in the cases of the dispersion and response models. Numerical monochromatic LSFs were provided by Airbus DS for testing purposes for each combination of FoV and CCD row; Fig. 3 shows two sets of these LSFs computed at wavelength λ = 440 nm for the BP instrument (left) and λ = 800 nm for the RP instrument (right). These models were built upon the optical PSF model of the telescope, which included optical aberrations based on laboratory measurements of the wavefront error (WFE) maps made on the telescope mirrors. The great variations in the shapes of the LSF shown in the figure are essentially due to variations in the WFE map from one FoV/CCD pair to the other. The problem is that these WFE maps are not applicable to the flying instrument because several factors (changes in physical conditions, mechanical stress of the launch, defocusing of the instrument, etc.) lead to them changing in an unpredictable way. The strategy adopted for the current model implementation, which is explained in more detail in Appendix C, is to create a large sample of theoretical PSFs based on the optical design of Gaia, including randomly generated realistic WFE maps: these optical PSFs are then converted to effective PSFs that include a number of effects (charge diffusion, smearing introduced by TDI, pixel integration) to account for the discretised nature of the data. Effective PSFs are then marginalised in the AC direction to obtain a set of numerical LSFs sampled on a twodimensional grid in spatial and wavelength coordinates. These numerical LSFs are then used to build a set of two-dimensional basis functions to allow the LSF to be modelled with a minimum number of free parameters for a given accuracy. The reduction to the 2D basis functions is achieved by means of generalised principal component analysis (GPCA, Ye et al. 2004), a fast and efficient algorithm for 2D image compression used to concentrate relevant information of a given data set in a small number of dimensions. Unlike the usual principal component analysis, GPCA is able to preserve the spatial locality of pixels in an image by projecting the images to a vector space that is the tensor product of two lower dimensional vector spaces. These two vector spaces are designed by matrices U and W whose columns represent the basis functions with which the dependencies are modelled along the spatial coordinate u and the wavelength coordinate λ, respectively. Figure 4 shows the first four bases for vector space U (left) and W (right). The U and W bases are interpolated to continuous variables (u, λ) by 1D interpolation. To ensure that the interpolation for the U bases satisfies the 'shift invariant sum' condition, which preserves the underlying function normalisation independently from the subpixel position of the sampling grid, these bases were then fitted with an S-spline model (Lindegren 2009). The interpolation for W bases is achieved by a cubic spline. The model for the LSF is finally given by where L(u, λ) is the numerical mean LSF of the theoretical set and d m,n are the IM parameters that are fitted during the external calibration processing. As explained in Appendix C, the LSF wavelength modelling requires roughly half the dimensions needed for AL dependency modelling, hence 1 2 2 . Reiterating the caveat expressed at the beginning of this section regarding the absence of a proper nominal LSF model, when all parameters d m,n are set to zero the LSF coincides with the mean numerical model L(u, λ).
Equation 7 represents an undispersed LSF centred on the origin of the U bases. The dispersed LSF in the data space can be obtained by shifting the origin by an amount given by the dispersion relation as seen in Eq. 4: L (u − u d (λ), λ). However, it is worth noting that the LSF origin does not necessarily coincide with its centroid: the centroid is an intrinsic property of the LSF, and in the case of a symmetric LSF is naturally given by the point of symmetry ξ, that is L(ξ − u) = L(ξ + u) ∀u, but in general the LSF is not symmetric and its shape can change with wavelength, introducing some degeneracy between the chromaticity and the dispersion. As sketched in Fig. 5, the dispersion model of Eq. 6 provides the location of the LSF origin: as far as the instrument model is concerned, the degeneracy between dispersion and chromaticity has little significance because the dispersion model and the LSF origin are consistently defined; however, physical interpretation of the data requires a dispersion relation that gives the centroid of the monochromatic LSF as a function of wavelength. This dispersion relation is provided as a lookup table where the centroid u 0 is computed at each wavelength λ by solving the non-linear equation: Article number, page 5 of 33 A&A proofs: manuscript no. 43880final where the weighting function w is the Tukey's bi-weight: and the scale parameter s = 2.7 is a value suitable for the Gaia case (Lindegren 2006). This dispersion function is provided for both BP and RP instruments as a single CSV file tabulated for wavelengths ranging from 320 nm to 1100 nm in steps of 0.5 nm 5 .

Response model
The response R(λ) defined in Sect. 2, as the ratio between the number of detected photons and the number of photons entering the telescope aperture per wavelength interval, is modelled as the product of the individual responses of each physical element (e.g. primary mirror, secondary mirrors) hit along the optical path. It changes across the focal plane, and depends on the observing configuration of each source and transit (gates, window class) and on time (contamination and decontamination issues). Assuming that all these dependencies have been accounted for by the internal calibration, we are left with a function of the wavelength alone. The nominal pre-launch response curve for the mean BP and RP instruments can be described (Jordi et al. 2006) as the product of the following elements: where -T 0 (λ) is the telescope (mirrors) reflectivity; -ρ att (λ) is the attenuation due to rugosity (small-scale variations in smoothness of the surface) and molecular contamination of the mirrors; -Q(λ) is the typical CCD quantum efficiency curve; -T p (λ) is the prism (fused silica) transmittance curve including the filter coating. These quantities were initially measured by Airbus DS during on-ground laboratory test campaigns and are plotted in Fig. 6. As can be seen, the steepest features of these curves are the BP and RP cut-offs produced by the prism transmittance curves, and the steep BP drop around 400 nm which is mainly due to mirror reflectivity. Laboratory measurements showed that the precise location, that is, in wavelength, of the cut-off varies across the focal plane due to the uneven thickness of the prism coating (which is a few nm in both instruments). Moreover, combining measurements taken all over the focal plane results in a further smearing of the nominal curve. Therefore, a suitable modelling of the actual response cut-offs has been achieved by assuming the nominal curves, degrading the wavelength resolution by convolution with a rectangular window of width varying with the spectral dispersion per pixel, and re-shaping the cut-off mathematically with a two-parameter Gauss error function for RP and a two-parameter complementary error function for BP to control the wavelength position λ C and the slope σ C of these features (the tabulated transmittance curve is in practice truncated just before the cut-off and multiplied by the error function to mimic the cut-off shape). Nominal values for (λ C , σ C ) are (667.9, 4.71) and (631.0, 4.0) for BP and RP, respectively. The corresponding curves are represented in Fig. 6 as blue (BP) and red (RP) thick lines. It is well known that the actual on-board overall response was heavily affected by rapid and discontinuous variations due to water vapour contamination of the satellite instrument components (Gaia Collaboration et al. 2016b) and to the various decontamination campaigns. The internal calibration initialises the internal reference system using only high-quality data collected in periods of low and slowly varying contamination , which ensures that the mean instrument response is not too different from the nominal R * N (λ; λ C , σ C ). Nevertheless, to be able to model any deviation from the nominal curve, the response is multiplied by a parametric function: Parameters r i are referred to as response shape parameters to distinguish them from the two response cut-off parameters λ C and σ C ; their nominal values are all zeros. The distortion model R d is implemented as the exponential of a linear combination of a set of basis functions R in the AL sampling space u: which is transformed to wavelength space λ through the dispersion relation: The exponential form guarantees the non-negativity of the overall model, while modelling in the sample space ensures the natural instrument spectral resolution, avoiding over-fitting where the spectral resolution is lower. The basis functions R i used for the Gaia DR3 models are spline functions of second order with an initial uniform knot spacing in u that becomes non-uniform in later processing stages (see Sect. 6 for details). The full set of IM response parameters to be optimised is therefore (r i , λ C , σ C ): when they are set to their nominal values, the response model closely resembles the nominal model.

Basis inversion and SED reconstruction
Once the instrument model is defined, Eq. 4 can be used to estimate the spectral photon flux distribution n p (λ) corresponding to an observed spectrum n e (u). Obtaining the SPD/SED allows the user to inspect Gaia spectra in a format that is more intuitive and of common usage. It may be convenient to define an effective spectral photon distribution as: so that The effective spectrum is the observed spectrum deconvolved by the LSF function and transformed to the wavelength space through the dispersion relation (and scaled by some factor). Therefore, its shape will preserve the basic features of the observed spectrum. However, Eq. 15 is a Fredholm integral equation of the first kind, which is difficult to solve for the unknown n * p (λ) because such integral equations are often ill-posed problems: large variations in the solution n * p (λ) can occur for a slightly perturbed observable n e (u) (as is the case here, as n e is affected by noise). However, as BP/RP mean spectra are modelled as a linear combination of basis functions (Carrasco et al. 2021): an interesting solution can be found by modelling the effective spectral photon distribution as a linear combination of the same spectral coefficients b n with a particular set of bases: where the φ n bases satisfy the following condition: In practice, the externally calibrated spectral photon distribution (and related SED) corresponding to each pair of observed BP and RP spectra can be reconstructed by finding a set of proper functions whose images through the dispersed LSF model are the bases of the internal representation: the great advantage of this approach with respect to solving Eq. 15 directly is that it requires inverting the integral equation for a set of analytic functions that are by definition noise-free. We refer to these functions φ n as the inverse bases hereafter.
To choose the most suitable representation for the inverse bases, it is useful to review the representation used for internally calibrated mean spectra described in Carrasco et al. (2021) andDe Angeli, F. et al. (2022) and summarised here for convenience. The basis functions implemented for Gaia DR3 are orthonormal Gauss-Hermite functions ϕ n (θ) where a linear transformation is set between the pseudo-wavelengths axis u and the argument of the Hermite functions θ as: The number of bases for Gaia DR3 has been set to N = 55 for both BP and RP spectra. An optimisation post-process is applied to mean spectra basis functions to concentrate most of the information in the lower order spectral coefficients: this optimisation takes the form of a rotation of the bases specified by a square matrix V C . This rotation has no consequence for the basis inversion algorithm described here because, whenever Eq. 18 is satisfied, the inverse bases for the optimised bases are simply obtained by applying the same rotation matrix to the inverse basis set φ n . The model chosen to represent each φ n function is a linear combination of the same bases that model their image through the instrument model, that is, a linear combination of Hermite functions: with the same mapping between axes u and θ as adopted for the bases used to represent internally calibrated spectra: The reason for choosing such a model is that, as in the case of the effective photon distribution, left and right bases should share the same basic features, as the function on the left hand side of the equation is a smeared version of that on the right hand side once mapped to the same axis u. We could solve Eq. 21 for coefficients h k,n in a least-squares sense by sampling ϕ n on a sufficiently dense and extended grid on the θ axis, but in this case the optimal number of bases K of the model would be undefined and it would not be clear whether or not a limit to this number were set by some hidden condition. A more appealing possibility is to project Eq. 21 into the coefficient space b n of mean spectra, where the n th function ϕ n is represented by definition by a vector of coefficients that are all null except the n th one equal to unity. In matrix notation, let b denote the array of coefficients, s a mean spectrum sampled on a given grid u of U points, and D ∈ R U×N the design matrix whose element D nu is the value of the n th Hermite function evaluated at the u th pixel grid point. Consequently, where D † ∈ R N×U is the pseudo-inverse of the design matrix (see Appendix D for details). The integrals of Eq. 21 can be computed numerically by trapezoidal integration over a fine regular wavelength grid with Λ points and step δλ extending over the wavelength interval where the response is not null: let L ∈ R U×Λ represent the instrument dispersed LSF model kernel sampled over that discrete grid, with the (i, j) th element being We can write an equation like Eq. 21 for each of the left N Hermite basis functions: if the left term is interpreted as a column of the design matrix D, then all the N relations can be condensed into one single equation: where D ϕ ∈ R Λ×K is the design matrix for the right ϕ k bases sampled on the wavelength integration grid, while matrix H ∈ R K×N contains in its columns the set of coefficients h k,n that define the shape of each inverse basis. By multiplying Eq. 25 by D † from the left we finally obtain: where the left hand term is the (N ×N) identity matrix. By setting it is evident that the problem can be solved if B is a square matrix with K = N, that is, the number of inverse bases is equal to the number of bases for mean spectra representation. In this case, the matrix H that defines the basis functions for the externally calibrated spectra is simply given by: The model for inverse bases consists of a matrix of coefficients H for each of the BP and RP instruments. Let λ be the wavelength grid over which we sample the externally calibrated spectrum corresponding to a pair of BP and RP observed spectra and Λ be the dimension of vector λ. We can build a design matrix D ϕ ∈ R Λ×N by sampling the N Hermite functions on the grid: If V C is the orthogonal rotation matrix that defines the optimisation of the basis functions for the internally calibrated mean spectra, we obtain that the sampled effective spectral photon distribution is defined as If we build two more design matrices D P ϕ and D E ϕ whose elements are respectively defined as and where R(λ) is the instrument response and hc the product of the Planck constant and the vacuum speed of light, then we get which is the SPD in units of photons s −1 m −2 nm −1 , and which is the SED in units of Wm −2 nm −1 .
As the design matrices of Eqs. 31-32 depend on the inverse of the instrument response function, the sampling wavelength grid must be limited to the range [330, 650] nm for BP and [635, 1050] nm for RP in order to avoid large errors in the reconstructed spectra.
The BP and RP instruments produce two partially overlapping SPDs and SEDs: these are combined into a single distribution by computing a weighted mean in the overlapping region [λ lo , λ hi ], where the weight varies linearly with wavelength: and for λ lo < λ < λ hi . BP and RP spectra coefficients are accompanied by the covariance matrix K bb : this is used to calculate the covariance matrix for the sampled spectrum, which in the case of the SED is computed as The square roots of the diagonal elements of K ff are the errors associated with the sampled SED.

Processing
The instrument model has been designed to reproduce the nominal instrument model when all parameters are set to their nominal values: these parameters are d i for the dispersion (Eq. 6), d m,n for the LSF (Eq. 7), and r i , λ C , σ C for the response model (Eqs. 11-12). The concept of nominal refers to the dispersion function and to the overall response curve, for which pre-launch laboratory measurements are available. The LSF model instead is initialised as the mean of a large number of theoretical LSF models: for this reason it corresponds to a flat WFE map and is symmetric. Model parameter optimisation can be fulfilled with a sufficient number of calibrators by minimising in a least squares sense a χ 2 -based cost function where the array of residuals r and the weight matrix W can be evaluated in two different spaces: 1. Sample space: the observed BP/RP mean spectrum is sampled according to Eq. 22: where I u,λ represents the instrument matrix sampled on the wavelength array of the source SPD and on the same AL P. Montegriffo et al.: Gaia Data Release 3: External calibration of BP/RP low-resolution spectroscopic data grid as the mean spectrum; (I u,λ n p ) represents the discretised version of Eq. 1, and the weight matrix is computed as where K bb is the covariance matrix of the spectra coefficients, the product DK bb D T is the pixel covariance matrix of the sampled BP/RP spectrum, and K pp represents the covariance matrix of the source SPD. 2. Coefficient space: the model prediction is projected in the coefficient space according to Eq. 23 and the weight matrix is the inverse of the sum between coefficients and projected SPD covariances: In all scenarios, we include the usage of the covariance matrices for both the observed BP/RP spectra and the calibrator SPDs to allow for a total least-squares regression analysis of the data (Huffel & Vandewalle 1991). Although the SPD covariance matrix would be extremely important to properly account for systematic effects of wavelength calibration errors, especially in regions where the SPD changes steeply with the wavelength for certain spectral types, only errors on fluxes are usually available in the literature. Therefore, only BP/RP covariances are full matrices while SPD covariances are simply diagonal matrices populated with the corresponding variances on the sampled flux. Moreover, total regression is generally highly demanding in terms of computational cost because it requires an evaluation and inversion of the covariance matrix at each step of the solver, and has therefore only effectively been taken into account in the final stages of the calibration, as is clarified below.
When optimisation is carried out in sample space, the AL grid used to compare sampled spectra is usually oversampled by some factor with respect to the Gaia pixel, and therefore the coefficients covariance matrix transformed to sample space (D · K bb · D T ) (hereafter samples covariance) does not have full rank and cannot generally be inverted. A first obvious solution is to take into account only the diagonal elements of the samples covariance, even if the off-diagonal elements are not negligible because mean spectra are continuous functions and hence random noise will manifest as random wiggles in the form of longrange correlations between pixels. A formal solution for a correct weighting scheme in sample space is to compute the weight matrix as: where the pseudo-inverse of the samples covariance matrix D † has been defined in Eq. D.4. This approach was tested but some occasional numerical instabilities discouraged us from using it for the present calibrations. Cost evaluation in coefficient space would solve the problem of the invertibility of the matrix, allowing for a full exploitation of spectra covariances. However, this approach could not be followed for the current release because of the incomplete wavelength coverage of many emission line calibrators used in the optimisation process (Eq. 23 implicitly requires the sampled spectrum to extend over the entire AL range). For calibrators with incomplete wavelength coverage, χ 2 computation was limited to the available section of the model spectrum, excluding a safety margin of a couple of pixels where truncation occurs because the redistribution of light produced by LSF will cause some systematic difference between the partial spectrum and the one that would have been obtained by a complete SPD. For this reason, we decided to carry out the cost evaluation in sample space. Provided that the cost function is not linear with respect to the model parameters, the optimisation process can be carried out using an implementation of the differential evolution algorithm (DEA) as described by Storn & Price (1997): although this class of algorithms has been shown to achieve global optimisation with a natural ability to escape local minima traps in the χ 2 space, we find it convenient to proceed with the bootstrapping of the model by limiting the number of free parameters at the first stages of the processing, and gradually enhancing the model complexity (i.e. increasing the number of parameters) only when convergence is progressively achieved. This led to a complex processing scheme that is sketched in Fig. 7.
Calibrators are divided into two groups, absolute flux or primary calibrators (labelled SPSS) and secondary calibrators, sources with emission line features (labelled EL calibrators). As explained in Sect. 3, the latter group contains potentially variable sources. To overcome this problem, an update process equivalent to a grey flux calibration is implemented for these calibrators: the input SPD of each EL source is scaled by a parameter that is evaluated at each calibration cycle to minimise the squared residuals between the current model prediction and the corresponding observed mean spectrum. In the first stage of the processing, the response model is initialised with a low number of shape parameters (see Eq. 11, 8 parameters for BP, 5 for RP), while cut-off parameters are set to their nominal values, the dispersion model degree is set to 1, and the LSF model is symmetric. This stage is designed to provide a reliable initialisation of the dispersion relation and the response shape: the cost evaluation is limited to the central region of the spectra, avoiding the wings and the cut-off regions; the optimisation of the dispersion parameters, based on the EL calibrators, is alternated with the optimisation of the response shape, which is achieved by using a selection of featureless SPSSs. Each optimisation cycle typically consists of approximately 3000-4000 DEA iterations involving about 50 walkers (different realisations of model parameters, initially distributed randomly around the starting set of parameters). The iterations stop when the individual costs from all the walkers converge to a common value. The set of parameters with the lowest cost is used to initialise the subsequent optimisation cycle. Once convergence is reached, the second stage begins, which entails modelling the shape of the central part of the LSF: the LSF model is initialised with ( 1 , 2 ) = (4, 1) bases, and LSF and dispersion parameters are optimised together to model any possible asymmetry of the LSF core. A second upgrade to the LSF model is made in stage 3 where the number of bases is increased to ( 1 , 2 ) = (4, 2) to allow modelling of any chromaticity effect: an LSF and dispersion parameter fit is alternated with response model adjustment and EL source update. In stage 4, the LSF and dispersion models are set to their final configurations, the number of response shape parameters are doubled while the cut-off parameters are left free to change; all model parameters are optimised together using an extended set of calibrators (EL + SPSS). The number of bases is set to ( 1 , 2 ) = (7, 3) for BP and is left unchanged for RP; in both instruments the dispersion model degree is set to 2. In stage 5, the basis inversion process is performed to allow the reconstruction of the effective SPD for the SPSS: these are divided by the corresponding source SPDs to obtain the data shown in Fig. 8. This plot was obtained using a subset of 41 SPSSs selected to be as featureless as possible and, given the definition of effective spectra in Eq. 14, it traces the overall instrument response curves. The top plot shows data plotted as a function of wavelength. The bottom plot shows the residuals between data and model represented as function of the AL coordinate (or pseudo-wavelength); given that the dispersion direction of the BP instrument is inverted with respect to that of the RP instrument, BP data have been mirrored horizontally Fig. 8. BP and RP response curves traced by the ratio between effective SPD computed from BP and RP spectra and source SPDs from groundbased observations. Top: Data are plotted against wavelength; black triangles mark the position of Balmer lines. Bottom: Residuals between data and the response models are plotted against pseudo-wavelengths. BP pseudo-wavelengths have been swapped left-to-right, while RP data are shifted by 60 samples. Blue-and red-filled triangles mark the position of BP and RP cut-of,f respectively, while blue and red open triangles show where the response drops to zero. in order to sort the displayed data according to wavelength. RP data are offset by 60 samples in pseudo-wavelength to avoid superposition with BP. It is possible to recognise the signs left in the data by the first Balmer absorption lines (H α through H ) as peaks highlighted in the plots. Interestingly, a wavy regular pattern is visible at all wavelengths with a nearly constant frequency in the pseudo-wavelength space. The origin of this pattern is not fully understood; it could be related to wiggles in the mean spectra . Using this data, we upgraded the instrument response distortion model by increasing the number of spline knots to model these wiggles, especially in the range [500, 800] nm, carefully excluding the signature left by the H α line (the response model must not be a function of any source astrophysical parameter). The total number of response parameters is 26 for BP and 23 for RP. The model curves represented in the top plot were computed after this upgrade. The last stage of the processing is dedicated to the creation of the ensemble of instrument models: the DEA solver is run on the last instrument models until parameter relaxation is achieved (typically after about 400 iterations), and then all the 50 walkers are saved into the database providing the 50 instrument models. The walker with the lowest chi-square is chosen to represent the instrument model, which enables forward modelling to simulate mean BP and RP spectra or inverse modelling to provide SEDs through the inverse basis representation. The ensemble instead is used to derive the uncertainties in the simulated spectra. The instrument model ensemble and the inverse bases are used in the GaiaXPy tool  to simulate mean spectra and to generate sampled Gaia BP and RP calibrated spectra in the absolute system.

Results
The final BP and RP instrument models were used to create Fig. 9 where the corresponding instrument matrices I u,λ are represented: this plot shows at a glance how BP and RP mean spectra are simulated for a given SPD. The instrument matrix can be used to express Eq. 1 in a discretised form as already done in Eq. 39: a mean spectrum is the row-by-column product of the instrument matrix with the SPD (sampled on the same wavelength grid of I u,λ ). Provided that: (i) columns and rows represent small intervals of wavelengths and AL sample coordinates respectively, (ii) each column represents a monochromatic dispersed LSF scaled by the response (the response curve is overplotted in white) at that wavelength and the telescope pupil area, and (iii) the loci of LSF maxima, highlighted by the dashed green lines, correspond to the dispersion functions, then we can visualise the mean spectrum formation by splitting the incoming SPD into packets of photons according to their wavelength. These are then distributed following the corresponding matrix column profile, and the final mean spectrum is given by the accumulation on the u axis of each dispersed packet. One of the elements that catches the eye is that the dispersion direction is  opposite for the two instruments, and hence the wavelength decreases from left to right in the BP spectra plots. Furthermore, in Fig. 9, the RP instrument exhibits a pattern of ripples that run roughly parallel to the dispersion relation. Intensity and distance from the LSF crest grow with wavelength: these ripples are the signature left by diffraction patterns of theoretical LSF in the U bases of the LSF model (Montegriffo 2017); the amplitude of the diffraction pattern scales linearly with λ and this is the reason why it is virtually invisible below 700 nm. Figure  10 shows the dispersed monochromatic LSF in logarithmic scale for wavelengths λ = (400, 500, 600) nm for BP (top panel) and λ = (700, 800, 900) nm for RP (bottom panel). The structure of the RP ripples is clearly visible in the bottom plot. On the other hand, we can figure out which wavelength ranges of the SPD contribute more to the photon budget of a particular sample of the mean spectrum by looking at the corresponding matrix row: this contribution for samples at u = (20, 30, 40) is visualised in Fig. 11 for both instruments as the percentage of the incoming photons that are detected in each data sample. The width of the distribution sets the level of smearing of the spectrum at that wavelength. As can be seen, there are sections of mean spectra (as the case shown for BP at u = 20) that receive photons from a very large wavelength range: this explains why it is so important to also resolve for the LSF while calibrating mean spectra. Figure 12 shows the first few inverse bases φ P n for the BP and RP instruments (in units of photons s −1 m −2 nm −1 ) as a function of wavelength. The top panels show the inverse bases of the canonical Hermite functions while the bottom panels represent the inverse bases for the optimised basis functions. The amplification of the bases below λ 400 nm and above λ 900 nm is due to the normalisation of the bases by the response function R(λ) (see Eq. 31). For the same reason, it is pointless to look at the bases outside the represented wavelength range because they diverge as the response goes to zero.  It is interesting to compare the image through the instrument model of the inverse bases ϕ † n (hereafter referred to as the reconstructed bases),

Inverse basis functions
against the original bases ϕ n . This provides a means to evaluate the accuracy of the inverse representation: in principle the reconstructed bases should be equal to the original ones. This comparison is made in the top panels of Fig. 13 for the canonical Hermite functions and in Fig. 14 In the case of non-optimised basis functions, the residuals for BP gradually increase at shorter wavelengths (from left to right) for higher order terms, becoming as high as 1%; in the RP case the residuals are much smaller, about 0.1%, over the entire wavelength range. In the case of optimised BP functions, the situation seems to worsen with residuals of ∼ 2% over the entire pseudowavelength range, while for RP the quality of the comparison remains good. To understand this behaviour, it is convenient to quantify the quality of a reconstructed basis as a function of the order of the basis itself. This is done in Fig. 15 where, for each basis, we compute the quantity which is used as a proxy for the percentage error on the pseudowavelength integral of the basis function. The reader must be aware that this is only an indirect way of assessing the accuracy of each inverse basis: we are evaluating the accuracy of external bases by forward modelling in the sampled spectra space where they are never used, and use them only in the inverse modelling of ECS. The plot in the top panel shows the reconstruction error for the non-optimised case, and one can see that the quality of the reconstruction is very high for both instruments (with numerical precision for RP outperforming the BP case by a factor of ten) for approximately the first 47 lower order bases, and worsens very quickly for the higher order terms. We understand this behaviour as an intrinsic limit of the current inverse basis model implementation (set in Eq. 20 with K = 55): an inverse basis represents by definition a given Hermite function deconvolved by the LSF function, hence its model should have the ability to reproduce features with higher spatial frequencies with respect to its smeared representation. If we consider the φ 54 case, despite the fact that it should have higher spatial frequencies than the ϕ 54 Hermite function, we are effectively modelling it as a linear combination of the ϕ 0 · · · ϕ 54 Hermite functions and therefore we are possibly missing the higher frequencies. This problem will be addressed in future releases; for now we note that this inadequacy of the model will mainly affect the accuracy in reproducing features with high spatial frequencies, such as narrow emission lines (introducing some systematic errors into the shape of the ECS), but it will not have consequences for the vast majority of the sources. As the optimisation process consists in a rotation of the bases, this determines a redistribution of the error budget of the higher terms to lower order components (and this explains the behaviour of BP residuals in Fig. 14): however, thanks to the efficiency of the optimised bases in concentrating most of the information in the lower order coefficients, this will have no effect on the overall quality of the ECS reconstruction. In Sect. 8.3, we perform a test to evaluate the impact of this reconstruction error on real data.

Spectral resolution
The spectral resolution of a Gaia spectrum changes considerably across the wavelength range. This is even more evident in ECS where the resolution changes abruptly near the boundary between the BP and RP wavelength ranges. The full width at half maximum (FWHM) of the instrument LSF model provides a direct measurement of the spectral resolution of mean BP and RP spectra. However, the resolution of ECS should be increased by the deconvolution from the LSF smearing effect obtained by the bases inversion process. To evaluate the spectral resolution of ECS, we simulate the response of the instrument model to a monochromatic signal at a well-defined wavelength as follows: 1) we produce a synthetic spectrum with a flat null continuum and a single emission line of zero intrinsic width (the Dirac delta function) centred at a given wavelength; 2) we forward model the corresponding mean spectrum into the coefficients space; 3) we reconstruct the corresponding ECS; 4) we measure the FWHM of the line: this is the measurement of the 'instrumental width' at the considered wavelength, that is the size of the spectral resolution element; 5) and then we repeat with a synthetic spectrum whose emission line is displaced by 10 nm until the whole BP and RP spectral range is covered.
As we use the same instrument model in the forward and reverse directions, we are testing the theoretical resolution that would be obtained given 'perfect' knowledge of the instrument model. The presence of undetected systematic differences between the instrument model and the actual instrument will possibly worsen the resolution of the current ECS with respect to this ideal case. Furthermore, the total regression results in Gaussian noise on the elements of the instrument matrix. In the inversion of the noisy instrument matrix, this well-behaved Gaussian random noise can become non-trivial noise and even create systematic errors due to the non-linearity of inversion. Figure 16 shows the simulated ECS for a subset of the Dirac delta spectra. The reconstructed spectra show wiggles due to the inversion process. These will create artefacts in real ECS around small-scale features such as emission lines. As can be noticed, these wiggles tend to become very large at wavelengths greater than 1000 nm due to the amplification due to the response normalisation (see Fig. 16. Response of the instrument model to a simulated monochromatic stimulus at different wavelengths. Fig. 17. Full width at half maximum measured in nm as a function of wavelength for a monochromatic signal in BP and RP mean spectra (blue crosses) and in the externally calibrated spectra (yellow triangles) compared to the width of one CCD pixel (grey squares) and the width of two pixels to represent the Nyquist sampling limit (dashed grey line).
Eqs. 31-32). We show that the presence of such large fluctuations is common in ECS at wavelengths λ 1030 nm.
The FWHM for internally calibrated BP/RP mean spectra (hereafter ICS) and for ECS are represented in Fig. 17 as a function of wavelength together with the CCD pixel scale that should be regarded as an upper limit to the resolution achievable by the basis inversion process (it is not possible to reconstruct a signal smaller than the pixel integration scale with the current approach). The plot also reproduces the scale length corresponding to 2 pixels, which is the scale on which Nyquist sampling can take place: interestingly, the BP instrument suffers from a small under-sampling at all wavelengths (the FWHM being equal to ∼ 1.7 pixel) and therefore does not satisfy the Nyquist criterion. This fact could play a relevant role in causing the rather high number of wiggles seen in BP mean spectra compared to the RP ones (see De ). Values for the FWHM of a monochromatic signal measured on ICS, ECS, and the corresponding spectral resolution are provided in Table 1 as a function of wavelength.
The spectral resolution is finally shown in Fig. 18: the RP instrument exhibits a more uniform variation with respect to wave-P. Montegriffo et al.: Gaia Data Release 3: External calibration of BP/RP low-resolution spectroscopic data Table 1. Full width at half maximum of a monochromatic signal measured on ICS, ECS, and spectral resolution for ECS. length compared to the BP case. The anomalous drop in RP resolution just below 650 nm in the ECS curve as well as small fluctuations visible in the BP curve around 600 nm are possibly due to ripples in the response profiles affecting the basis inversion process. The final resolution of a typical Gaia ECS drops from R λ = 70 to R λ ∼ 22 in the wavelength range λ ∈ [330, 640] nm, and then suddenly rises to R λ ∼ 78 before decreasing smoothly to R λ ∼ 55 at longer wavelengths.

Wavelength calibration accuracy
To assess the accuracy of the wavelength calibration, we compared the wavelength position of emission lines measured on the BP and RP mean spectra of a selected sample of QSO extracted from our calibrator set with the expected positions measured on simulated spectra computed with the instrument model and the corresponding source SEDs known from ground-based observations. We created a starting list of sources by selecting a sample of 102 SDSS QSOs showing strong emission lines in their BP and RP spectra and compiling a list of estimated wavelength positions for 263 lines based on the QSO SDSS redshift values and the wavelength at rest of each line. The differences between the AL position corresponding to the peak of each line in the mean spectra and in the simulated spectra are shown in the upper panel Fig. 19. Wavelength calibration accuracy evaluated on a sample of QSOs with emission lines in their spectra. Top panel: Difference in AL sample position of emission line peaks measured on BP and RP mean spectra and the expected position measured on simulated mean spectra (filled symbols). Open symbols refer to SEDs with incomplete wavelength coverage for which the expected position has been estimated based on the line wavelength at rest and the QSO redshift value. The vertical grey line separates BP from RP data. The black lines delimit the region within ±10% FWHM of the LSF for mean spectra. Bottom panel: Observed and expected emission line wavelengths as measured on the ECS. Dashed grey lines represents ±10% of the expected LSF FWHM for calibrated spectra, and black lines delimit the region within ±10% FWHM of the LSF for mean spectra.
of Fig. 19. Open symbols refer to spectra where the SDSS SED does not cover the full BP/RP wavelength range, and therefore we do not have a reliable simulation of the expected ICS: in these cases, the expected AL position is based on the initial estimate of the line wavelength. The red curve represents a smoothed median of the data while the black lines delimit the region within ±10% of the LSF FWHM. As can be seen, at wavelengths below 400 nm there is an indication of a lower accuracy of the disper-Article number, page 15 of 33 A&A proofs: manuscript no. 43880final Fig. 20. Examples of emission lines in mean RP spectra. Red symbols are observed spectra, the grey line is the model prediction, and the yellow curve is the source high-resolution SPD from SDSS (in arbitrary units). The asymmetries in the line profile are not accurately reproduced by the model predictions.
sion relation due to the lack of a reliable number of QSOs with emission lines in this range employed in the instrument calibration process. We hope to resolve this issue in future releases. The other interesting element in this plot is the presence of a small systematic offset in the RP spectra (roughly equal to one-tenth of a pixel), possibly due to systematic error in the LSF model, which is unable to correctly reproduce the instrument chromaticity as can be seen in Fig. 20 where three examples are shown (the LSF core is more asymmetric in RP than in BP). As the resolution of ECS is higher than that of ICS, we also checked the accuracy of wavelength calibration for the ECS case: starting from the estimated initial wavelength positions, we searched for local maxima in the ECS and in the corresponding simulated ECS (which is computed as described at beginning of Sect. 8). The wavelength positions of the peaks are then refined by computing the centroid of the lines accordingly to Eq. 8. The residuals between observed and expected wavelength positions are plotted in the lower panel of Fig. 19. Most of the points are enclosed between ±10% of the expected FWHM for ECS, represented by the dashed lines (the dark lines represent ±10% FWHM for ICS). A systematic difference of about 1 nm is confirmed in the RP wavelength range. Finally, we notice a lower precision in BP calibration that may be a consequence of LSF under-sampling seen in Sect. 7.2. We note that the number of measured lines in ECS is greater than that in ICS because several lines in ICS were lost because of the intrinsic shape of ICS (typically faint lines falling in regions with large variations of the instrument response).

Validation
Validation of the instrument models and externally calibrated spectra is mainly based on comparisons between observations (i.e. Gaia mean BP/RP spectra) and model predictions for sources with known SED. A test of this kind was also carried out by Andrae, R. et al. (2022) where observed spectra of known solar twins were compared both to model spectra of the Sun and to IM predictions based on X-Shooter SEDs (Verro et al. 2021) of the same sources, showing remarkably good agreement. Hereafter, we use the term observation to refer to Gaia spectra and simulation to refer to external data simulated via the instrument model. The comparison data always rely on external data sets. In the following sections, we use sources from the SPSS, PVL, and NGSL sets for the comparisons. Sources with magnitude G < 4 were filtered out from the NGSL sample to avoid satu- Fig. 21. Comparison between mean and simulated spectra for source Gaia DR3 1435896975388228224 (top) and source Gaia DR3 4339417394313320192 (bottom), both belonging to the PVL dataset. Blue and red points represent the Gaia BP and RP mean spectra, while the grey curves represent the model predictions for the corresponding instrument. The superimposed yellow curves represent the source SPD arbitrarily rescaled in flux. We note that the wavelength in BP decreases from left to right. ration in BP/RP spectra (see Sect. 9.1). Unless we specify otherwise, plots combine data from all three data sets. The comparison can be made in terms of sampled internally calibrated mean BP/RP spectra or in terms of externally calibrated spectra. In the first case, the observations and predictions are sampled on a common AL grid. However, results are often shown as a function of the more familiar absolute wavelength, with 640 nm being used as the boundary between BP and RP. Here, ECS are represented as SEDs: in this case, the expected distribution of a source is not the SED at its original high resolution, but rather its spectral resolution is degraded at the resolution equivalent to one Gaia pixel with the following procedure: first we compute a degraded SED f † X (λ) for each BP and RP instrument: where f (λ) is the original SED and the integration limits are computed as: where u(λ) is the BP/RP dispersion function and u −1 the inverse dispersion function. The two degraded SEDs f † BP and f † RP are then combined into a unique SED f † (λ) following the same rule described in Sect. ?? for the combination of BP/RP ECS. Finally, a further validation comes from computing synthetic photometry on ECS and comparing the resulting magnitudes to photometric standards, as described in Sect. 8.4. P. Montegriffo et al.: Gaia Data Release 3: External calibration of BP/RP low-resolution spectroscopic data Fig. 22. Difference between observed and expected mean flux as a function of wavelength for BP and RP sampled mean spectra for the whole SPSS, PVL, and NGSL sets. Yellow curves represent the P 16 and P 84 percentiles. The colour map encodes the G BP − G RP colour index. To make a more comprehensive comparison, for each source we compute the percentage difference between the observed spectrum and the model prediction for the SPSS, PVL, and NGSL sets: these curves are plotted in Fig. 22 as a function of wavelength, colour coded by G BP − G RP . The vertical grey line highlights the boundary between BP and RP data. The two yellow curves represent the P 16 and P 84 percentile distributions that are used as a proxy for the ±1σ distributions: for wavelengths higher than λ 400 nm the accuracy of the calibration is mostly enclosed in the ±2% level marked by the two horizontal dashed blue lines, with some sources showing systematic offsets. Below 400 nm there is a clear systematic error correlated to source colour: this is the most prominent systematic error known in current Gaia BP and RP spectra and is analysed in detail in Sect. 8.1.1. To eliminate sources with systematic errors in the absolute flux scale (which can be present at the 1% level), we apply the same grey calibration described in Sect. 6. Figure 23 shows residuals after such an additional step: precision is at the ±1% level except for λ 400 nm. The three curves in the plot represent the median distribution for SPSS (red), PVL (orange), and NGSL (yellow) sources. As can be seen, while the SPSS curve is almost completely flat, the median distributions for the other two control groups show significant deviations in the BP data as well as in the redder part of the RP spectra. As PVL and NGSL sources are generally brighter than SPSS, with the quality of PVL and NGSL being at least similar to SPSS, this suggests that the problem is unlikely due to a problem in PVL and NGSL data, but is rather related to magnitude-related systematic errors in Gaia mean spectra. This is confirmed by the two plots in Fig. 24 where sources with G > 12 are shown in Fig. 23. Difference between observed and expected mean flux as a function of wavelength for BP and RP sampled mean spectra. A grey calibration has been applied to spectra. The three curves represent the smoothed medians for SPSS (red), PVL (orange), and NGSL (yellow). The colourmap encodes the G BP − G RP colour index. Fig. 24. Quantities are the same as Fig. 23 but data have been divided into two luminosity classes: sources with G > 12 (top) and G < 12 (bottom). Red and orange curves represent P 50 , P 16 , and P 84 percentiles distributions, respectively. the top panel while the bottom panel includes only sources with G < 12. Residuals from the first group are very flat and the precision is well below the ±1% level, while for brighter sources the situation is more complex. Figure 1 from De  clearly illustrates the extreme complexity faced by the internal calibration when dealing with BP and RP data: sources brighter than G = 11.5 are observed as 2D windows and, depending on their magnitude, under a high number of different observing configurations (so-called gates, which effectively limit the exposure time and the area of the CCD over which integration occurs, are activated to prevent chip saturation of bright sources). These results may suggest that the internal calibration of bright sources is not fully converged on the common internal reference system, which is mostly dominated by faint sources observed mainly as 1D windows and under more uniform conditions. Bearing in mind that one of the assumptions of the external calibration is that the instrument model does not depend on the source magnitude, by definition it cannot account for internal inhomogeneities in the Article number, page 17 of 33 A&A proofs: manuscript no. 43880final  data related to different observing conditions. It is expected that the external calibration model shows a better fit to the faint end because of the magnitude distribution of the SPSSs, most of which fall in the well-behaved faint magnitude range.

Systematic effects
The most prominent systematic effect visible in previous plots is a colour term affecting residuals at wavelengths λ 400 nm. Figure 25 shows the normalised residuals as a function of G BP − G RP colour index; residuals are calculated at wavelengths 350, 375, and 400 nm. As can be seen, the colour term, which is virtually absent at λ = 400 nm, rapidly grows when wavelengths decrease, with observed fluxes for redder sources being higher than expected and fluxes of blue sources being slightly lower than expected. There is some evidence that the colour term is magnitude dependent, because it is smaller for faint sources at the same wavelength. Although there may be issues in this wavelength range related to the wavelength calibration accuracy (see Sect. 7.3), as well as possible systematic effects in the LSF model, we believe that the presence of non-linear effects left by the internal calibration chain might not be excluded.
A second magnitude-dependent effect is visible in BP spectra for wavelengths between 560 and 600 nm and is represented in Fig. 26: the plot shows residuals measured at λ = 580 nm as a function of G magnitude of the source. While residual are flat for sources fainter than G 11, a linear trend is clearly visible with fluxes becoming progressively lower than expected, up to −2% at the bright end. This systematic effect is located immediately before the BP/RP interface and could be related to internal LSF calibration issues that induce small deformations in the shape of the mean spectrum (as already seen in the residuals in the bottom panel of Fig. 24). A further example is shown in Fig. 27   with the comparison between spectra of two PVL sources whose SPD shape is extremely similar but with different luminosity levels: source Gaia DR3 1633143932573832448 at magnitude G A = 12.46 and source Gaia DR3 1435190745326633984 with magnitude G B = 6.82. In all the panels, the flux of the second source has been rescaled by a factor of 10 −0.4(G A −G B ) to put both fluxes on the same scale. As can be seen, while the expected BP mean spectra are almost indistinguishable, in the bottom panel the observed spectrum for the bright source is below the expected level.
A similar magnitude-dependent systematic effect is also visible in the red part of RP spectra, even if data are more noisy in this second case: the plot in Fig. 28 shows residuals measured at λ = 950 nm as a function of G magnitude. In this case, residuals are almost flat for G 9, but at the brighter magnitude end, a linear trend can be seen as a rise with fluxes becoming higher than expected by ∼ 3% at G 4. P. Montegriffo et al.: Gaia Data Release 3: External calibration of BP/RP low-resolution spectroscopic data Fig. 29. Difference between observed and expected mean flux normalised by the squared sum of errors as a function of wavelength for BP and RP sampled mean spectra. Yellow curves are the P 16 and P 84 quantile distributions while horizontal dashed blue lines set the ±2 level.

Errors
De  observed that mean spectra errors are generally underestimated for spectral coefficients with low index and are slightly overestimated for higher order coefficients. For most of the sources, this translates to a general underestimation of errors because most of the information is contained in the lower order coefficients. Moreover, these errors are only reliable for fainter sources. To assess the validity of error on fluxes, we normalise the residuals between observed and expected mean fluxes by the error on fluxes obtained by summing in quadrature the error from the BP and RP mean spectra, with the error on the model prediction coming from the SPD error budget and a calibration error obtained by the instrument models ensemble (see Sect. 6). These normalised residuals are shown in Fig. 29 for sources with G > 12. The yellow curves represent P 16 and P 84 percentiles used as a proxy for the ±1σ level. As can be seen, σ 2 for most of the wavelength range (dashed blue horizontal lines are set at ±2 to guide the eye): this means that differences are larger than the estimated uncertainties. It is worth pointing out that the considered error does not contain the contribution of covariances between samples: however, these covariances are not negligible because the mean spectra are represented as a continuous model, and therefore random noise takes the form of random wiggles that give rise to long-range correlations across several samples. The solution is to evaluate a χ 2 by projecting sampled spectra into coefficient space as described by Eq. 23 and to use the full coefficient covariance matrix defined by Eq. 42. However, before doing so, it is useful to look at the full distribution of normalised residuals as a function of the AL sampling coordinate. In Fig. 30, we plot these residuals in the form interpreting this quantity as the contribution of each sample to the χ 2 red computation. The top panel shows residuals for the BP case: the vertical red line at u 14.7 corresponds to the wavelength λ = 640 nm and roughly identifies the position of the filter cut-off while the vertical blue dashed line at u 51.4 corresponds to wavelength λ = 330 nm and roughly corresponds to the filter cut-off: outside this range the mean spectrum contains only the contribution coming from other parts of the spectrum due to the effect of LSF smearing. Red and yellow curves represent the P 50 and the P 16 and P 84 percentile distributions. A relevant piece of information coming from this plot is that while the χ 2 contribution coming from the spectrum is reliable, with the median confined within the 1 − 10 range, we observe unrealistic  values in the spectra wings that would make the global χ 2 useless. The situation is similar for the RP case shown in the bottom plot (the dashed blue line identifies the filter cut-off at λ = 640 nm, the red line the cut-off at λ = 1030). The discrepancies in the wings of spectra are driven by small systematic differences between observations and model predictions that are mostly due to the LSF model: the transmissivity of the instrument in that sample range is almost null and all the detected photons are due indeed to the smearing action of the LSF (they originate from different wavelengths); a small systematic effect in the wings of the LSF model can justify the observed discrepancies. To evaluate the χ 2 in the coefficient space, we then safely excluded a few pixels in the wings of sampled spectra from the computation, projecting only the sample range u = [9, 53], and obtaining the distributions visualised in Fig. 31 for both instruments. We plot the distributions for sources with G > 12 as blue and red histograms for BP (left) and RP (right), respectively, while brighter sources are represented as green distributions in both plots. The peak for the faint sources is located roughly at χ 2 6 for BP and χ 2 3 for RP: these rather large values may be in part due to an underestimation of spectra covariances, but there is evidence that the calibration error budget coming from the in-A&A proofs: manuscript no. 43880final strument models ensemble may also be underestimated because of the suboptimal procedure followed for its construction (differential evolution algorithm instead of a more appropriate Markov Chain Monte Carlo method). Values for bright sources are meaningless because of the combined effect of systematic differences in the shapes of spectra seen in Sect. 8.1 and unreliable coefficient covariances. Figure 32 shows the comparison between Gaia ECS and the corresponding reference SED from external data for the same sources as in Fig. 21. The ECS is compared to the degraded version of the SED computed as described at the beginning of Sect. 8. Although the agreement between the distribution is excellent, some wiggles are present, especially in the top plot in the blue part of the spectrum and around the Balmer absorption lines. The wiggles originate in the basis inversion process and their effect can vary a lot in ECS as can be seen by looking at the other examples in this section. Figure 33 shows the ECS SED comparison for the same sources represented in Fig. 27. These sources have reference SEDs with almost the same shape, but different magnitudes (G = 6.82 in the top panel vs. G = 12.46 in the bottom panel). The intensity of wiggles is much higher in the bright source with respect to the fainter one, especially for λ 500 nm, indicating that neither the shape of the spectrum nor the noise level necessarily determines the amount of wiggling. Another example of two sources with similar spectral type and comparable apparent magnitude but different wiggle intensity is given in Fig. 34: sources Gaia DR3 1399559249961569792 (G = 13.32) and Gaia DR3 2323394345824851584 (G = 13.62), both belonging to the SPSS catalogue. Finally, an example of    showing a better-behaved ECS (therefore wiggle intensity is not directly related to S/N).

Comparing mean spectra in wavelength space
A more global analysis is presented in Fig. 36, which shows the percentage residuals computed on the whole set of SPSS, PVL, and NGSL samples divided into two luminosity classes, sources with G > 12 in the top panel and sources with G < 12 in the bottom panel. If we compare these residuals with the equiv- Fig. 37. Normalised residuals between ECS and reference fluxes for sources with G > 12. The two red curves represent the P 16 and P 84 percentile distributions smoothed by a Gaussian with σ = 10 nm. The ±2 levels are also plotted as horizontal dashed black lines.  Fig. 38. Comparison between ECS and model SED for SPSS with colour index G BP − G RP respectively equal to (from left to right, from top to bottom) 0.06, 0.27, 0.83, 1.03, 1.07, 1.35, 1.68, 2.26, and 2.65. ECS flux at wavelengths λ < 400 nm is systematically lower than expected for bluer sources and higher than expected for the redder ones. The colour convention and symbols are the same as in Fig. 32. alent quantities measured on ICS (Fig. 24), the effect of wiggles is evident. Nevertheless, the red lines representing the median of the distributions, once smoothed by a Gaussian with σ = 10 nm, show a substantially equivalent behaviour to that seen in the ICS residuals. Finally, Fig. 37 shows the normalised residuals for the faint sample. The error is computed here as the sum in quadrature of the contribution of the ECS error and the error on the model SED computed by propagating the errors on the full-resolution SED following Eq. 47 under the hypothesis that errors are independent. Percentiles P 16 and P 84 (red curves), smoothed by a Gaussian with σ = 10 nm, are used as a proxy to the actual standard deviation of the residuals which proves to be wider than expected. This result may in part be due to the lack of covariance terms for the high-resolution SED samples, which results in underestimation of the model error; however, the result is consistent with what is seen in Fig. 29 for the ICS comparison. The systematic colour-dependent difference seen in ICS at wavelengths λ < 400 nm is still present and is illustrated in Fig. 38: a sample of SPSS ECS with colour ranging from G BP −G RP = 0.06 Fig. 39. Effect of the reconstruction error on ECS.Top: Percentage residuals between mean BP/RP fluxes and corresponding forward modelled mean spectra computed from the ECS and instrument models as a function of wavelength; the plot uses data for the full set of SPSS, PVL, and NGSL samples. Bottom: Residuals are normalised by the model error. In both cases, the colour map encodes the G BP − G RP colour index, while horizontal dashed lines placed at ±1 provide a reference. The vertical grey line at λ = 640 sets the separation between BP and RP data.
(top left panel) to G BP − G RP = 2.65 (bottom right panel) is compared against the corresponding model SED. While the ECS is fainter than expected in the bluest source, it becomes systematically brighter than expected for redder sources. Moreover, the difference is higher at smaller wavelengths, following the same behaviour shown in Fig. 25 for ICS.

Effect of the reconstruction error on ECS
In Sect. 7.1 we evaluate the numerical reconstruction error on inverse bases by forward modelling their image through the instrument model and comparing the resulting functions with the original Hermite functions. As discussed in Sect. 7.2, well-behaved Gaussian noise on the instrument model can badly propagate to inverse bases, creating systematic errors in ECS. To look for such systematic effects, we performed the following test: using measured BP/RP mean spectra coefficients of SPSS, PVL, and NGSL sources, we computed the corresponding ECS, and then we forward modelled the BP/RP mean spectra corresponding to these ECS in sample space and compared the results with the original Gaia BP/RP sampled mean spectra. This comparison is shown in Fig. 39 as percentage residuals (top panel) and as normalised residuals (bottom panel) as a function of wavelength. In the second case, the normalisation is done by dividing the residuals by the error on the model spectra 6 . In both cases, residuals are computed as observation minus model and the colour-coding indicates source colour. The percentage residuals show a strong systematic difference in RP for wavelengths λ 950 nm, while in the remaining RP wavelength range, residuals are extremely well behaved, and are generally confined to the ±0.1% region. In comparison, BP shows more important ripples confined to the range ±0.2% for wavelengths in the interval [480, 600] nm, progressively worsening outside this interval. It is interesting to notice the colour dependence of the amplitude of the ripples from the G BP − G RP colour index, with redder sources showing more extended ripples than bluer ones. The mean behaviour of the residuals at the shortest wavelength is more clear in the bottom panel, where the normalisation by the error has the effect of smoothing the ripples in the blue part of the BP spectrum, particularly for the reddest sources: residuals reveal a significant increasing systematic difference for wavelengths λ 350 nm. Residuals for RP at longer wavelengths show a trend with colour index; this is expected considering the larger flux errors at these wavelengths for blue sources than for redder ones. Finally, it is worth noting that these systematic effects have a much smaller amplitude than those seen in the ICS comparison in Fig. 24 and consequently their amplitude in wavelength space could possibly be overwhelmed by other systematic effects, such as those mentioned in relation to Fig. 36 and Fig. 37.

Synthetic photometry on ECS
The availability of calibrated low-resolution SEDs means various applications are possible, such as performing wide-and medium-band synthetic photometry in any photometric system whose passbands are fully enclosed in the 330 − 1050 nm wavelength range covered by Gaia BP and RP spectra. The science verification paper by Montegriffo et al. (2022a) is dedicated to this specific application. Here, we use this technique for the validation of our calibrations because it allows comparisons with completely independent reference data. In the following, we illustrate two different test cases: a comparison with Landolt standard photometry in the Johnson-Kron-Cousins (JKC hereafter, Bessell 2005) system and a comparison with Hipparcos photometry in the B T , V T , and H p bands (van Leeuwen et al. 1997). In both cases, we first show a comparison between synthetic photometry realised on external SED data and synthetic photometry on ECS data computed for the SPSS, PVL, and NGSL samples to verify the consistency between the two scales. We then test the ECS synthetic photometry against external photometric standards to allow for independent validation of the ECS flux scale. The acronym HRS is used to distinguish external high-resolution spectra from ECS data.

Landolt photometry
A first comparison is made by computing synthetic photometry in the JKC system (Johnson & Morgan 1953;Johnson 1963;Kron et al. 1953;Cousins 1973Cousins , 1983Cousins , 1984 for a set of nearly 32 800 sources belonging to the Landolt collection of standard stars (Landolt 1992;Landolt & Uomoto 2007;Landolt 2007Landolt , 2009Landolt , 2013Clem & Landolt 2013; for a detailed description of this reference sample and the selection criteria applied to the original collection please refer to Pancino et al. (2022) and references therein. The comparison is performed in the B, V, R, and I passbands only because U band does not fall entirely within the wavelength range covered by Gaia BP and RP spectra: passband response curves are taken from Bessell & Murphy (2012). Before comparing synthetic ECS photometry with Landolt data, it is useful to check the impact of systematic effects in ECS on the synthetic photometry itself. To do this, we compare the synthetic photometry computed on external refer-P. Montegriffo et al.: Gaia Data Release 3: External calibration of BP/RP low-resolution spectroscopic data ence SEDs for SPSS, PVL, and NGSL with the corresponding synthetic photometry computed on ECS (Fig. 40). Residuals are shown as a function of G magnitude and colour index G BP − G RP on the left and right panels, respectively. As expected, the most relevant systematic differences are seen in the G 11 bright range (covered mostly by NGSL sources, plotted as green open triangles, and PVL data, plotted as blue crosses): synthetic ECS magnitudes are brighter than their HRS counterparts by nearly 0.02 mag in B, are fainter than expected by about 0.01 mag in V and R bands, and are again brighter by 0.01 mag in I band. Plots against colour index do not show any relevant colour trend, except for the blue part of the I band where SPSS data (red open circles) create a cluster of points with a mean residual around −0.01 mag (however, we note that SPSS data are more noisy in I band compared to other filters). Figure 41 shows residuals between the standard photometry and ECS synthetic photometry for the Landolt standards as a function of G magnitude (left panels) and G BP −G RP colour index (right panels) for the B, V, R, and I bands of the JKC system. Red curves represent the median (P 50 ) of the distributions. Residuals as a function of G magnitude reveal a magnitude-dependent term at the faint end (at G 16): as discussed by Montegriffo et al. (2022a), this trend in residuals is interpreted as the result of a systematic overestimation of background in Gaia BP and RP spectra, which causes the actual BP and RP fluxes to be slightly underestimated. This effect was already revealed by Evans et al. (2018) by their comparison between Gaia DR2 photometry and external data, where the hockey stick term was first used to describe the observed feature, and by Riello et al. (2021), who compared G-band photometry and synthetic G photometry computed on ECS (see their Fig. 23). At magnitudes G 11, residuals show offsets analogous to those seen in Fig. 40. However the overall accuracy estimated through the comparison with Landolt standards is excellent and ranges from 5 mmag in B band to 18 mmag in R band, while the standard deviations for sources with G < 17 range from 27 mmag for B to 15 mmag for   Table 2). To minimise the noise due to the hockey stick effect and better trace the genuine colour terms, plots as a function of G BP −G RP colour index refer only to sources with G < 17. These residuals reveal a significant colour term in the B filter causing differences of up to 0.06 mag in the considered colour range. This colour term could be related to systematic effects impacting the blue part of ECS, considering that the B transmission curve extends to wavelengths λ < 400 nm. A smaller colour term is still visible in the V and R bands while residuals in I band are almost flat with differences within 0.01 mag.

Hipparcos photometry
The Hipparcos mission 7 , in addition to state-of-the art astrometry for the time, provided space-based photometry in three wide bands for the 10 5 stars included in the catalogue (Perryman et al. 1997;van Leeuwen et al. 1997). The Hipparcos B T and V T passbands are similar to JKC B and V, respectively, while the H p band ranges from 340 nm to 900 nm, peaking around 450 nm. Hipparcos photometry is generally recognised as a benchmark of excellent precision (see e.g. Bessell 2005), and 95% of the stars have uncertainties of < 0.01 mag in H p . The comparison with Gaia photometry is limited by the relatively bright magnitude limit of the Hipparcos catalogue, where the vast majority of stars have G < 11.0. The original passband A&A proofs: manuscript no. 43880final curves (van Leeuwen et al. 1997) were revised by Bessell & Murphy (2012) who provide improved transmissivity curves that we adopt for the comparisons shown in this section. Figure 42 shows the comparison between HRS and ECS synthetic photometry computed on SPSS, PVL, and NGSL spectra. Symbols are the same as Fig. 40. As can be seen, thanks to its large width, the H p photometry shows excellent agreement between the two sets, with no magnitude or colour trends. On the other hand, the B T filter, covering the more problematic BP wavelength region at λ < 400 nm, shows differences of up to 0.02 mag for sources brighter than G 10.5 -the magnitude range covered by Hipparcos catalogue-while the agreement is still excellent at fainter magnitudes. V T photometry shows better agreement, with synthetic ECS magnitudes being slightly fainter (∼ 0.01 mag) than their HRS counterparts at the bright end.
The original Hipparcos catalogue, once cross-correlated with the Gaia DR3 catalogue, gives a list of 99525 matches that reduce to a list of 88662 sources when the selection criteria for published BP and RP spectra are applied (Fouesneau, M. et al. 2022;De Angeli, F. et al. 2022). According to van Leeuwen et al. (1997), the magnitude scales for Hipparcos passbands were chosen such that H p = V T = V and B T = B at B − V = 0, and therefore in order to compare these magnitude scales to synthetic ECS magnitude, we need to adjust our zero points accordingly, with the following relations (see details in Appendix E): Residuals between standard Hipparcos/Tycho magnitudes and synthetic ECS magnitudes computed through Bessell passbands are shown in Fig. 43 as a function of G BP − G RP colour index. The H p panel (top) reveals a small colour term spanning an excursion of less than 2 mmag along the whole colour range. At G BP − G RP = 0.0, the median distribution (red curve) shows an offset of roughly 10 mmag. Due to the rather large width of the H p band, the collected flux is larger than that from the other two bands, resulting in a very narrow sequence of residuals with a standard deviation of ∼ 12 mmag. B T band residuals are almost flat along the whole colour range, with a standard deviation of ∼ 34 mmag. Good agreement is also found in V T band, with a small colour term spanning an excursion similar to that of H p band but opposite in sign; the standard deviation of the residuals is ∼ 26 mmag. The comparison with Hipparcos photometry also provides a unique opportunity to test the spatial homogeneity of BP and RP spectrophotometry over the entire sky with space-based data. In order to show the residuals between the Hipparcos and Gaia synthetic magnitude scales, excluding colour systematic effects, we fitted each median distribution of Fig. 43 with a fourth degree polynomial function in G BP − G RP colour and subtracted this term from ∆mag: Figure 44 shows the HEALPiX level 3 maps in Galactic coordinates (Hammer Aitoff projections) of residuals in the three bands, limiting the representation to sources with |C | < 0.05 8 , −0.5 < G BP −G RP < 3.0 and |∆mag| < 0.2 to avoid spurious signals due to outliers. The maps show that the synthetic photometry from Gaia BP and RP spectra matches the original Hipparcos photometry to within ±5.0 mmag in the vast majority of cases, in all three Hipparcos bands. Fluctuations of amplitude 10.0 mmag are quite rare. It is interesting to compare these maps with the sky density of objects with BP/RP spectral data represented in the top left panel of Figure Fig. 43. To avoid spurious signals, we limited the comparison to stars with −0.5 < G BP − G RP < 3.0, |∆mag| < 0.2, and |C | < 0.05. The 80000 sources involved are not evenly distributed but cover the entire sky at this resolution: beige pixels correspond to absolute median differences 2.5 mmag and not to empty pixels.

Saturation
Saturation can affect observations of bright sources. Even though an ad hoc gating scheme has been designed and configured to reduce the effective exposure time and avoid saturation, large uncertainties in the on-board magnitude estimates at the bright end imply that gate activation is not always optimal. Moreover, the saturation level is different between different CCDs and dispersion differences across the focal plane will also play a role. A bright source can therefore have a mixture of saturated and non-saturated BP/RP observed spectra. The size of the effect on combined mean spectra will depend on the fraction of saturated versus non-saturated epoch spectra and on the size of the effects on single observations. As a consequence, the shape of a saturated spectrum may not always be recognisable as a spectrum with a cropped, sharp, flat section at the top of it, corresponding to some fixed flux threshold, but it can exhibit a very complicated behaviour, as can be seen in Fig. 45 where a couple of examples are shown.
The NGSL sample (containing very bright sources up to G = 1.97) represents an ideal dataset for assessing the effect of saturation in Gaia spectra. We use synthetic photometry to measure the saturation level present in spectra by comparing the synthetic flux in BP and RP measured on the Gaia ECS with the synthetic flux obtained on NGSL SEDs. The difference between the corresponding synthetic magnitudes is represented in Fig. 46 as a function of the source magnitude in the same filter: results for BP and RP are shown respectively in left and right top panels. Saturation in BP spectra starts around BP 4.5 depending on the colour, with redder sources being increasingly affected. This behaviour is due to the properties of the BP dispersion relation: in the case of a red source as in the example shown in Fig. 45 (lower panel), most of the source flux is concentrated in a few pixels. These are the ones that will saturate in the case of a sufficiently bright source. At the same flux level, a bluer/hotter source has less chance of saturating because photons are distributed over a wider range of samples. In the case of the RP, saturation is found at magnitudes RP 3 and affects mostly red sources. In the bottom left panel of Fig. 46, data for both BP and RP are plotted against the G magnitude. Sources showing signs of saturation in their BP or RP spectra (differences in shape between observed ICS and model prediction) were selected manually and are visualised in the plot as filled symbols while well behaved sources are represented as open transparent ones. The bottom right panel of Fig. 46 shows the distribution with G magnitude of all NGSL objects and those that show saturation in BP or RP: from the histogram, we can conclude that saturation is not significant at G > 5 , and that most sources will have saturated BP spectra at G 4, and it is only G 3.5 that saturation will also occur in most RP spectra. A final word of Fig. 46. Saturation in BP and RP spectra. Top: Difference between synthetic magnitudes computed on NGSL SEDs and the corresponding synthetic magnitude computed on Gaia ECS, used as an indicator of saturation in Gaia spectra. Differences are computed for the BP band (left) and RP (right). Bottom: Delta synthetic magnitudes measured on both BP and RP spectra against the G magnitude of the source (left); filled symbols represent spectra with clear signs of saturation (selected manually) in their spectra shape. Right panel: distributions of BP and RP saturated spectra against the distribution for unsaturated sources. caution should be added here to say that given the very small number of sources brighter than the thresholds indicated in this section and the fact that Gaia was not designed to cover such bright magnitudes, further investigation is needed to completely understand the origin of the discrepancies observed in Fig. 46. Other effects such as CCD non-linearities and larger uncertainties in the source astrometry could play a role.

Wiggles in ECS errors
The ECS flux error distribution against wavelength shows wiggles, as seen in Fig. 47 for source Gaia DR3 1435896975388228224. This is visible for all sources. Considering that the flux error is computed as the squared root of the diagonal elements of the covariance matrix on the ECS (see Eq. 37) and that the flux error distribution of BP/RP ICS does not show such wiggles, we deduce that the effect is a numerical problem introduced by the inversion process and may be due to poor conditioning of the inverse basis design matrices.

Summary and conclusions
This paper describes the process leading to the external calibration of Gaia BP and RP low-resolution spectra. We derived an instrument model for the BP and RP spectrophotometers that allows us to forward-model observed Gaia mean spectra starting from empirical or theoretical SEDs. The model has been optimised using a large set of primary flux calibrators (SPSS) and secondary calibrators (PVL and emission lines sources).
Through inverse modelling, we generated a set of inverse bases allowing the reconstruction of externally calibrated lowresolution SEDs for the 220 million sources with published BP/RP spectra within Gaia DR3. These reconstructed SEDs are flux-and wavelength calibrated and have an increased resolution with respect to the original mean spectra due to mitigation of the LSF smearing effect. Errors associated to calibrated spectra appear to be underestimated, especially for sources brighter than magnitude G 12.

Appendix A: Acknowledgements
The Gaia mission and data processing have financially been supported by, in alphabetical order by country: where D x and D y are the pupil dimensions along x and y: Gaia telescopes have two rectangular primary mirrors with dimensions D x = 1.4510 m and D y = 0.5016 m. The WFE map depends on the FoV and field angles, but is independent from wavelength and should be relatively stable over time. However, it is important to understand that the optical PSF is never observed directly: what is observed is the effective PSF, which takes into account the discretised nature of the data as well as the additional smearing introduced by the TDI technique and the effects due to charge diffusion. The effective PSF P m can be computed as the inverse Fourier transform of the product between the modulation transfer function (MTF) M and the optical transfer function O m (by definition O m is the Fourier Transform of the optical PSF, Eq. C.2). The total MTF must combine the effect of several spatial response functions, i.e.: the AL pixel integration (a rectangle of width p u equal to the CCD AL pixel size expressed in radians); the AL four-phase TDI charge transfer (a rectangle of width p u /n u , where n p = 4 is the number of phases per pixel); the AC pixel integration (a rectangle of width p v equal to the CCD AC pixel size expressed in radians); the charge diffusion (modelled as a bivariate normal distribution with diffusion width σ u = σ v = 4µm); M( f x , f y ) = sinc(π f x p u ) sinc π f x p u n p sinc(π f y p v ) e −2π 2 (σ 2 u f 2 x +σ 2 v f 2 y ) , where f x and f y are the spatial frequencies AL and AC. The effective PSF can then finally computed as: (C.4) By definition, the PSF is normalised to unity. The effective LSF L m (u) is obtained by summing Eq. C.4 over the AC direction. Among all the effects cited here, the WFE maps modelling optical aberrations are the most uncertain and as explained in Sect. 4.2 numerical maps measured by Airbus are unusable. Instead we model optical WFE maps as a linear combination of normalised Legendre polynomials up to the 6 th order: these maps are randomly generated and scaled to have a root mean square (RMS) uniformly distributed between 40 and 60 nm. Using this setup we generated up to 5000 random numerical WFE maps to compute the corresponding numerical LSFs: in practice each LSF is sampled on a discrete wavelength grid of c points unevenly distributed between 288 nm and 1150 nm with a sampling scheme chosen to guarantee the same relative accuracy of the PSF/LSF computation over the entire wavelength range. Moreover these monochromatic LSFs are sampled over a regular grid of r points centred on u = 0 and with a spatial resolution high enough to sample the images at least to the Nyquist frequency. Each numerical LSF is included twice by reversing the AL axis in order to preserve the symmetry of the problem: LSFs are arranged into a stack of matrices L i ∈ R r×c with i = 1, · · · , 10000. From this stack we calculate the mean LSF L: and then we compute a stack of residuals that is reduced to a set of two-dimensional basis functions by means of GPCA. In practice, an optimal ( 1 , 2 )-dimensional axis system, with 1 < r, 2 < c, defined by two matrices U ∈ R r× 1 and W ∈ R c× 2 with orthonormal columns, is derived such that the projections of the data points L i onto this axis system have the maximum variance over all the possible ( 1 , 2 )dimensional axis systems, where the variance is defined as: var(U, W) = 1 n − 1 n i=1 U T · L i · W F . (C.7) The symbol . F denotes the Frobenius norm of a matrix. For given U and W matrices, the projection of L i can be computed as From D i we can reconstruct L i by setting L i ≈ U · D i · W T , (C.9) then L i ≈ U · D i · W T + L, (C.10) or, in extended notation: L(u i , λ j ) = L u i ,λ j + 1 m=1 2 n=1 d m,n · U i,m · W j,n . (C.11)  The numerical mean LSF L is represented in Fig. C.1 in logarithmic scale; the second term of Eq. C.11 is the component of the model that is fitted during the optimisation process and represents the deviation of the current model from the mean LSF.
The reconstruction error for L i is then The RMS error is defined as: (C.13) and it measures the average reconstruction error. Figure C.2 shows the RMS error that has been measured on the complete set of 10000 LSF as a function of ( 1 , 2 ): the shape of the surface suggests that the LSF wavelength modelling should require about half dimensions with respect to the AL dependency modelling. The optimal subspace dimensions for GPCA has been set to ( 1 , 2 ) = (20, 10). Columns of matrices U and W can be Appendix F: Gaia-related acronyms