Gaia Data Release 3: Surface brightness profiles of galaxies and host galaxies of quasars

Since July 2014, the Gaia space mission has been continuously scanning the sky and observing the extragalactic Universe with unprecedented spatial resolution in the optical domain ($\sim$ 180 mas by the end of the mission). Gaia provides an opportunity to study the morphology of the galaxies of the local Universe (z<0.45) with much higher resolution than has ever been attained from the ground. It also allows us to provide the first morphological all-sky space catalogue of nearby galaxies and galaxies that host quasars in the visible spectrum. We present the Data Processing and Analysis Consortium CU4-Surface Brightness Profile fitting pipeline, which aims to recover the light profile of nearby galaxies and galaxies hosting quasars. The pipeline uses a direct model based on the Radon transform to measure the two-dimensional surface brightness profile of the extended sources. It simulates a large set of 2D light profiles and iteratively looks for the one that best reproduces the 1D observations by means of a Bayesian exploration of the parameters space. We also present our method for setting up the input lists of galaxies and quasars to be processed. We successfully analysed 1\,103\,691 known quasars and detected a host galaxy around 64\,498 of them ($\sim$6\%). We publish the surface brightness profiles of the host for a subset of 15\,867 quasars with robust solutions. The distribution of the S\'ersic index describing the light profile of the host galaxies peaks at $\sim$ 0.8 with a mean value of $\sim$ 1.9, indicating that these galaxies hosting a quasar are consistent with disc-like galaxies. The pipeline also analysed 940\,887 galaxies with both a \sersic and a de Vaucouleurs profile and derived robust solutions for 914\,837 of them. The distribution of the S\'ersic indices confirms that \gaia mostly detects elliptical galaxies and that very few discs are measured.


Introduction
The Gaia space mission, Gaia Collaboration et al. (2016) is one of the most ambitious and spectacular projects in astronomy from the last 20 years. The observations carried out by this satellite have radically transformed the base of knowledge upon which the astronomy community relies to explore and understand the Milky Way as well as the extragalactic Universe. The primary target of Gaia is the stellar content of the Milky This paper is dedicated to the memory of Dimitri Pourbaix who supported the CU4-EO group over the years and with whom we shared very nice moments. onboard video processing algorithm (VPA) are accepted and observed. The VPA has a strong impact on the type of sources observed, as it filters out most disc-like galaxies (de Souza et al. 2014;de Bruijne et al. 2015). Observation windows are transmitted to the data processing centres. These windows are one dimensional for most astrometric fields (AFs 1-9) and two dimensional for the Sky Mapper (SM). In order to derive extremely accurate astrometric measurements of the objects, Gaia repeatedly scans the full sky. The scanning law of the satellite determines the frequency at which a given object is observed. By the end of the mission, approximately 140 observations should have been made for each source, with various transit angles on the sky (as illustrated in Figure 3) so that a large fraction of the selected extragalactic extended objects are completely covered by different observations at different transit angles. From these, it is possible to extract information about their morphology by running the CU4-Surface Brightness pipeline which is designed to reproduce the observed data by means of a direct model.
Quasars are key objects for the Gaia astrometric mission. One of the main scientific goals of Gaia is to provide the first ever realisation of a rotation-free celestial reference frame (CRF) at sub-mas level in the visible wavelength domain, matching the International Celestial Reference System (ICRS) specifications. The axes of the resulting Gaia inertial optical frame have been aligned on the third realisation of ICRS (ICRF3) based on 4300 radio-loud quasars observed by the very long baseline interferometry (VLBI). Among those objects in common between the ICRF3 and Gaia-CRF2, some tens of sources exhibit large (up to 10 mas) angular positional differences which could be real offsets between the centres of emission at optical and radio wavelengths. These spatial offsets could be linked to various effects: active galactic nucleus (AGN) activity triggering star formation, dual AGN, or recoiling super massive black holes; these offsets are the subject of an active field of research (e.g. Skipper & Browne 2018;Suh et al. 2019).
Galaxies are not the primary targets of the Gaia mission but the compact and bulge-dominated ones are quite easily detected by Gaia, which enables us to obtain valuable information on the morphological characteristics of this population, provided a dedicated processing exists. Galaxy morphology is a fundamental tracer in observational cosmology. Indeed, it provides clues as to how galaxies form and evolve over the Hubble time by a combination of minor and major mergers, interaction with the neighbourhood, gas accretion, and secular evolution, thereby allowing a better understanding of the relations between this morphology, mass assembly, and star formation. Having a large sample of galaxies with a shape classification is therefore necessary in order to address, for example, the issue of the formation history of the Hubble sequence, to discriminate among the inside-out and outside-in scenarios (Pérez et al. 2013), to study the relative role of (major) mergers and AGN feedback in quenching star formation, to measure the time-delay between this quenching and the colour and morphological transformations, and so on. These topics motivate our methodological efforts on the subject.
The Gaia Data Release 3 (Gaia Collaboration & Prusti 2022) is based on data collected during the first 34 months of the nominal mission (between 25 July 2014 and 28 May 2017) and provides an astrometric and photometric catalogue for more than 1.8 billion sources with an apparent G magnitude down to 21 mag. It also provides, for the first time, a comprehensive set of results for the extragalactic sources observed by the satellite. Several coordination units (CU3, CU4, CU7, and CU8) within the data processing and analysis consortium (DPAC) have developed specific pipelines to analyse and classify these sources. The results of these CUs (surface brightness profiles, redshifts, variability, Gaia-CRF3, and classifications) are gathered in two separate tables, namely qso_candidates and galaxy_candidates, which are provided alongside Gaia DR3. An overview of these tables and their main properties is presented by Bailer-Jones et al. (2022).
The Gaia CU4-Surface brightness profile fitting includes several modules dedicated to different tasks: preparing, transforming and organising the observations; performing the fitting; filtering out the solutions according to their reliability; and attributing quality flags to the sources. This article presents the method used to fit the surface brightness profile of the extragalactic sources detected by Gaia and presents the results with relevant indications for their use. No scientific exploitation of these results is made in this paper. We describe the data processing behind the surface brightness profiles of extragalactic sources. Section 2 presents the construction of the lists of quasars and galaxies that we processed and Section 3 presents the filtering applied to these lists. An overview of the pipeline is given in Section 4. The specificity of the Gaia data is presented in Section 5 together with a description of how extended sources are seen by the satellite. The algorithms used to derive the surface brightness profiles are presented in Section 6. A post-processing step to eliminate and flag non-robust solutions has been applied to the results and is presented in Section 7. Results and their validation are presented in Section 8 and the data product is given in Section 9. Finally, we summarise our findings and briefly describe our plans for future improvement of the methods in Section 10.

Input lists of quasars and galaxies
In this section, we summarise the creation of the input lists of objects that were processed by our pipeline. Gaia DR3 will release a probabilistic classification of the sources into five classes: star, galaxy, quasar, binary star, or white dwarf (see Creevey et al. 2022). Nevertheless, due to the complex and long processing plan within the Gaia DPAC, we were not able to benefit from this classification before publication of the present paper. Indeed, we had to set up input lists of sources using surveys and literature studies as of early 2018. The list of quasars was set up by compiling major AGN and quasars catalogues. The list of galaxies was established from a previous paper (Krone-Martins et al. 2022) classifying sources from the Gaia DR2 catalogue with entry in the allWISE catalogue (Cutri et al. 2013) into point-like or extended sources. Due to long processing cycles within DPAC, these input lists had to be delivered by early 2018, preventing the inclusion of more recent catalogues in our compilation of quasars.

Input list of quasars
We set up the input list of quasars by merging the major catalogues of candidate quasars and candidate AGN published before 2018. We considered the following catalogues: AllWISE R90 (Assef et al. 2018), HMQ (Half Million Quasars catalogue) (Flesch 2015), AllWISE (Secrest et al. 2015), LQAC3 (Souchay et al. 2015), SDSS-DR12Q (Pâris et al. 2017), and the ICRF2 (Ma et al. 2009). Most of these catalogues include stellar contaminants, except for ICRF2, which provides spectroscopically confirmed quasars. A selection of unpublished classifications of Gaia DR2 quasars based on photometric variability (designated CU7 hereafter) shared within the Gaia DPAC was also appended to the compiled list (Rimoldini et al. 2019).
In order to cross-match the seven catalogues, we first estimated their astrometric precision by cross-matching them with the Gaia DR2 using a search radius of 1 . The mean distance between the catalogue positions and the Gaia positions for the matched sources is adopted as an estimation of the astrometric precision of the catalogues, assuming that Gaia is error-free.
The catalogues were then cross-matched one with another using a search radius of three times the precision before merging. This compilation contains 6 166 355 sources of which 1 996 597 have a match in Gaia DR2.
This list was cleaned of stellar contaminants by applying an astrometric filter that rejects sources with a two-parameter solution in Gaia DR2, a parallax of ≥ 7 mas, or total proper motion of |µ| ≥7 mas/yr. This filter was derived from the Gaia DR2 astrometric properties of multiply imaged quasars by gravitational lensing (Ducourant et al. 2018, Fig. 2). No additional constraint on the galactic latitude has been added because several ICRF2 validated quasars are located at very low Galactic latitude. This filter is intentionally not severe because most quasars are in the faint luminosity regime of Gaia (see Figure 6) where the astrometry is less accurate and the potential presence of a surrounding host galaxy can perturb the astrometry of the central nucleus.
The final list of quasars encompasses 1 392 788 sources with an entry in Gaia DR3. In this list, more than 1 million sources have an entry in at least two catalogues (see Table 1). We keep track of the catalogue where the sources are identified in the qso_catalogue_name table that is provided alongside Gaia DR3. The sky distribution of the quasars from the compiled list is given in Figure 1 in galactic coordinates. The sky coverage of each of the merged catalogues is heterogeneous, as is the resulting input list. One notices a light over-density in the region of the Large Magellanic clouds (LMC: l = 280.4652 • , b = −32.8884 • ), which corresponds to contamination by stars that were not filtered out by the astrometric filter described above. Similarly, some of the sources lying near the Galactic plane are probably stellar contaminants. The yellow overdensity in the Northern hemisphere corresponds to the coverage of the SDSS DR12Q catalogue.

Input list of galaxies
The list of galaxies analysed by our pipeline was established by Krone-Martins et al. (2022). This catalogue was compiled using a fully unsupervised method based on the use of a stochastic iterative scheme specifically tailored to the star-galaxy separation problem. It sets up a catalogue of extended extragalactic sources. This approach first relies on a random sampling of the data points then on a random selection of a dimension in the analysed data space at each iteration. This method applies a hierarchical density-based clustering method (HDBSCAN Campello et al. 2015) and an automatically optimised supervised method (a Radial Basis Support Vector Machine Andrew 2000) trained on the initial unsupervised solution obtained at each iteration. The method analyses Gaia DR2 combined with the AllWISE survey (Cutri et al. 2013).
The resulting list of extended sources contains 1 742 933 galaxy candidates with an entry in Gaia DR3. The distribution of these sources on the sky is given in Figure 2 where one can observe a homogeneous spread of the sources, except in the Galactic plane which appears mostly empty.

On-board filtering of extragalactic sources
While Gaia is scanning the entire sky, all the sources observed are not sent to Earth because the flow of data would exceed the capabilities of the telemetry. The VPA (Gaia Collaboration et al. 2016) is implemented on board to select the observations to be transferred. A windowing scheme is used that selects part of the CCD detectors centred on the source (hereafter windows of observation). The VPA takes the decision of whether or not to send a window in order to filter out a large number of contaminants (e.g. cosmics) while preserving as many real sources as possible. Schematically, the decision is based on the shape of the central light profile of the source as seen by the SM and the astrometric field 1 (AF1) detector . If the profile is steep enough to be similar to a star-like source then the observation is transmitted, but if the profile is too flat then it is rejected. This selection function of the VPA was analysed by de Souza et al. (2014);de Bruijne et al. (2015). These authors showed that the majority of disc-like galaxies are rejected by the VPA except when they encompass a bright bulge, while most elliptical galaxies are detected in the limit of the sensitivity of the detectors.
This VPA filtering mostly affects the galaxies to be analysed and we therefore produce an incomplete and Hubble-type biased catalogue of galaxies.

Filtering on angular coverage
The sources are repetitively scanned along the mission through various transit angles that are determined by the nominal scanning law of Gaia (Gaia Collaboration et al. 2016). To recover the morphology of the extended sources, one must have a sufficient number of transits whose angles are uniformly spread over the source as illustrated in Figure 3.
The angular coverage of a source is estimated as the ratio between the area of the polygon created from the union of all the observed windows and transit angles over the area of a circle with a diameter equal to the diagonal of the largest observed window. The coverage corresponds to the fraction of the surface of the source covered by the observations (see Krone-Martins 2011, for more details).
The angular coverage of the quasars and galaxies from the input lists is presented in Figure 4 as a function of the number of transits over the sources. When the number of transits over a source is too small or the angular coverage low, it is then impossible to properly recover the morphology of the source (Krone-Martins et al. 2013). The first action of the pipeline is then to filter out sources with less than 25 transits or with an angular coverage < 86%.    The filtering on the number of transits and on the angular coverage leaves us with 1 103 691 quasars and 940 887 galaxies to be analysed. Their distribution on the sky is presented in Figure 5 where the signature of the Gaia scanning law unambiguously appears: the depletion area corresponds to regions where the number of transits is too small or where the coverage is insufficient.
The distribution of the Gaia magnitudes of the final list of quasars and of galaxies is presented in Figure 6 along with their colour G-RP. Most galaxies appear fainter than G=20 mag, whereas quasars appear brighter. This is a consequence of the  way Gaia measures magnitudes (phot_g_mean_mag) which is tuned for point-like sources (and most quasars) and not suited for extended objects. All galaxies are redder than quasars. There is a slight overlap between two distributions that corresponds to quasars with host galaxies and for which the host modifies the mean colour. Figure 7 presents a flowchart of the CU4-Surface Brightness Profile fitting pipeline, illustrating the different tasks of the pipeline. It presents the three major steps and their details: preparation of auxiliary data needed by the chain, organisation of observations, and fitting of light profiles. The pipeline is operated at the Centre d'Etudes Spatiales (CNES). The details of the steps are presented in the following sections.

Gaia data
The observations processed by the CU4 Surface Brightness Profile pipeline for quasars and galaxy candidates were acquired between 25 July 2014 and 28 May 2017 and represent a total of ∼116 million transits over the sources. Observations suspected to contain a corrupted signal were removed from these data (e.g. observation gaps, decontamination, basic angle gaps).

AF/SM observed windows
The satellite scans sources with various transit angles as imposed by the scanning law. One transit over a source is illustrated in Figure 8 where the source passes through the focal plane of Gaia in the along scan (AL) direction, entering first the SM and then the nine AFs (AF1-AF9). The transmitted windows of observation, as selected by the VPA, are represented in light blue. Figure 9 presents the distribution of the number of transits of the quasars and galaxies. Quasars have an average number of 40 transits while galaxies have 35 transits. Quasars have slightly more transits because of the presence of a bright nucleus (the quasar) while all galaxies are diffuse and therefore less frequently detected. The size of the transmitted windows depends on the magnitude of the source. There are three regimes of magnitudes: G<13 mag, 13 mag < G < 16 mag, and G>16 mag, corresponding to different window sizes. Extragalactic objects are generally fainter than G=16 mag. Table 2 presents the characteristics of these windows.
The nine AF windows are 1D with rectangular samples narrower in the along scan (AL) direction and larger in the across scan (AC) direction. The SM window is two-dimensional, covering a surface that is approximately seven times larger than that of the AF. Its resolution in the AL direction is much poorer than that in the AF direction. SM catches a wide-angle view of the objects with low resolution and AF provides an extremely accurate view of their inner part.  For extended objects, the rectangular shape of the samples is responsible for a very specific light distribution in the observed windows, as illustrated in Figure 10. When an elongated object is scanned along its minor axis, the flux collected by the central samples is much larger than when it is scanned along its major axis. This property is one of the specificities of the observations that our algorithm exploits in particular to extract the position angles of the objects.
The small overall angular size of the Gaia windows (2 121 mas for AF (AC), 4 715 mas for SM) is a limitation to the analysis of surface brightness profiles of large extended sources. Large sources with an effective radius (encompassing half of their total flux) of larger than ∼2 have less than 50% of their total flux collected in the observed windows. The surface brightness profile algorithm is then forced to use extrapolation of the flux outside the observation windows. Accordingly, a minimum of one effective radius should be encompassed within the AF windows for a reliable analysis of the light profiles.

Bias and background correction
The raw observations (AF and SM observed windows) have to be corrected for bias and background before being analysed in terms of surface brightness profile fitting. A dedicated chain has been developed to handle this task. The bias and background are calculated using routines provided by the CU5 DPAC group in charge of the photometric calibration of the observations (see Castañeda et al. 2021, for a detailed description of the bias and background treatment within DPAC) 1 . The science observations are obtained by subtracting the bias and the background from the raw observations. The background is dominated by scattered sunlight and the zodiacal light to which the Milky Way makes a significant contribution.

Enhancement of the signal-to-noise
The vast majority of the sources analysed by the pipeline are faint with low signal-to-noise ratio (S/N), with most galaxies being fainter than G=20 (see Figure 6). Quasars appear generally brighter than the galaxies in this figure because of their central bright nucleus but their surrounding host galaxy -when detectable by our pipeline -appears much fainter.
In order to enhance the S/N in the AF data, the nine AF windows of each transit are combined to produce a median-AF window. The first step is to align the signal of each window on a common frame. This is performed by first applying a cubic spline interpolation in each observed AF1-9 window and then producing nine corresponding over-sampled (synthetic AF1-9) windows (with a pixel size=5 mas). The signals in the nine synthetic AF1-9 windows are then re-centred respectively using the local plane coordinates (LPCs) (Hobbs et al. 2018) that place the observed windows onto the sky. The median of the nine aligned synthetic windows is then computed after a three-sigma clipping rejection. This median-AF window is finally re-binned into the original window sampling corresponding to the standard AF2-9 dimensions. The AF1 window contains fewer samples than AF2-9 and therefore requires a specific treatment in order to incorporate its fluxes into the median AF window. For this purpose, AF1 is resized to the AF2-9 dimensions by adding empty samples at its extremities in order to handle it in the same way as other AFs.
In parallel, the SM windows are binned in the AC direction in order to obtain a 1D window (binned-SM) of (20x1) samples C. Ducourant et al.: Surface brightness profiles of extragalactic sources in Gaia DR3

Statistical estimators of the flux in windows
There are several quantities that characterise the distribution of the flux of the sources in the median-AF and the binned-SM windows. One of them is referred to here as the mean integrated flux in the median-AF or the SM windows and is defined as: where s i j is the flux value (in e-/s) of sample i of transit j. N AF and N S M are the number of AF and SM transits. These quantities are calculated after the elimination of abnormal sample values using a 3σ clipping rejection criterion applied to each sample over all transits. When the central samples or too many samples of a transit are eliminated, the transit is rejected. Only sources with 20 or more remaining transits are kept for processing. Nb Quasars Galaxies Fig. 9. Histogram of the number of valid transits for the quasars and the galaxies to be processed after the elimination of abnormal sample values using a 3σ clipping rejection criterion (bin width=5). This effect is also seen in the SM detector.
Figures 11 presents a comparison of the mean integrated fluxes in the AF detector f lux AF against the mean integrated fluxes in the SM detector f lux S M for the lists of quasars and of galaxies. A star has a similar mean integrated flux in both AF and SM windows because it is fully encompassed by both. Sources with a larger flux in the SM window than in the AF window have a significant spatial extension. Most quasars are point-like and lie on the diagonal of this figure. Most galaxies are extended and lie well above the diagonal. Sources with f lux AF <120 e-/s do not appear to be measurable because of their S/N. These sources are not further processed and considered afterwards as point-like. When analysing the signal in AF windows (highest resolution), several other statistical indicators can be used to investigate the spatial extension of the source: the mean of the flux distribution in each AF window (Pos AF ) and the standard deviation about this mean position (σ Pos AF ), and finally the mean over all transits of these standards deviations (σ Pos AF ), which is a good indicator of the spatial extension of the sources in the AF detector: (2) Figure 12 presents the distribution of σ Pos AF for the quasars and the galaxies in our lists. It is clear that most quasars extend over 1-1.5 AF samples (∼ 60-90 mas) while almost all galaxies extend over 2-3 AF samples (∼ 170 mas). The overlap of both distributions corresponds to quasars for which the host galaxy is detected.

Organisation of the observations in the Radon space
The way Gaia scans the objects in different transit angles is similar to the Radon transform (Radon 1986), which establishes the possibility of reconstructing a real two-variable function (similar to an image) using all of its projections along concurrent straight lines. Specific algorithms have been developed in the medical domain to recover the internal structure of patients from various profiles acquired by tomography. Although very rarely used in astronomy, this technique is perfectly adapted to the observations of Gaia, as demonstrated by Krone-Martins et al. (2013).
To analyse the surface brightness profile of a source, we first  place its observations in a Radon space (so-called sinogram), which organises the fluxes in the observed windows along their transit angles. Figure 13 presents two sinograms (AF left and SM right) of the observations of a simulated galaxy (ellipticity=0.7, position angle=45 • , effective radius=2500 mas and Sérsic index=3) scanned regularly (each 5 • , from 0 • to 180 • ).
It is easy to note an over-brightness in both sinograms in the upper part around transit angle= 135 • . This over-brightness cor-responds to the transit angle where the source was scanned along its minor axis (as explained in Figure 10). The over-brightness corresponds then to a transit angle = galaxy position angle (45 • )+90 • = 135 • . In real observations by Gaia, transit angles are not regularly spaced and therefore the resulting sinograms are less easily readable.

Fitting surface brightness profiles
Fitting surface brightness profiles is achieved by a direct model that tends to reproduce the observed sinograms as best it can. The combination of parameters leading to the closest sinograms is then selected as the fitted profile. This is done via a global iterative strategy based on a direct model with a Bayesian exploration of the parameter space.

Light profiles
The global iterative algorithm can be applied to both types of objects (quasars and their host galaxies or galaxies) but selects different models to be fitted.

Quasars
In the case of quasars, the model must decompose the structure of the source into two components: the central quasar and a potential surrounding host galaxy. The central quasar is expected to be point-like, its extension being essentially due to the line spread function (LSF) of Gaia (Fabricius et al. 2016). It is modelled by a circular exponential profile with a fixed scale length (r s = 39.4 mas) which approximately corresponds to the LSF of Gaia. The shape of the host galaxy is known to be spiral for distant quasars but could be bulge-dominated for closer quasars. We adopt a free Sérsic profile (Sérsic 1963) to model the host galaxy because of its ability to represent spirals as well as bulges: Exponential (quasar) : I q (r) = I 0 exp −r r s Sersic (host galaxy) : I g (r) = I r e exp − b n r r e where I q (r) is the intensity of the central quasar at radius r, I 0 is the central intensity of the quasar, and r s is the scale length of the quasar (radius where the intensity drops by a factor e). I g (r) is the intensity of the host galaxy at radius r, r e is the major-axis effective radius encompassing half of the total flux of the source, I r e is the intensity of the galaxy at effective radius, n is the Sérsic index, and b n is a function of n such that Γ(2n) = 2γ(2n, b n ), with Γ being the (complete) gamma function (Ciotti 1991) and γ the incomplete gamma function. The value of b n is determined numerically.
The typical distribution of the flux of a galaxy along its radius according to the Sérsic index is illustrated in Figure 14 which presents the variation of the light profile along the Sérsic index for a fixed integrated flux and a fixed effective radius. Index value n=0.5 corresponds to a Gaussian profile, n=1 to an exponential profile, and n=4 to a de Vaucouleurs profile. The larger the index value, the steeper the central core, and the more extended the outer wing. Low Sérsic indices have a flatter core and a more sharply truncated wing. Large Sérsic indices are very sensitive to uncertainties in the sky background level determination because of the extended wings. The profiles corresponding to indices n>4 are in fact very similar to each other.
A&A proofs: manuscript no. output Quasars are usually variable and in particular those of our input list coming from the CU7 subset which were selected because of their photometric variability in the Gaia data. If this variability has a long period then there is no impact in the present analysis. If it is with short-period variations, then it acts as an addition of noise in the data, eventually perturbing the profile fitting. In some cases, it leads to the non-convergence of the fitting. When a host galaxy surrounds the central quasar and is clearly detected by Gaia, the impact of variability on the parameters of the host galaxy is minor.

Galaxies
Spiral and elliptical galaxies have intrinsically different shapes. Gaia filters out most spiral galaxies and mostly detects elliptical galaxies. The consensus choice in the literature for describing dwarf and ordinary elliptical galaxies is the Sérsic profile, which is a generalisation of the de Vaucouleurs R 1/4 model (de Vaucouleurs 1948(de Vaucouleurs , 1953. We successively adjusted these two profiles on our galaxies: where I g (r) is the intensity of the profiles at radius r, I r e their respective intensity at the effective radius, r e their major-axis effective radius, n is the Sérsic index, and b n is a function of n (as described above Equation 3).
An exponential profile has also been adjusted on all galaxies but is not published in Gaia DR3 (see Section 8.2.4 for the discussion).

Shape parameters
For both galaxies and galaxies hosting quasars, the shape parameters ellipticity, , and position angle, PA (from north to east), are also inferred. This is implemented by calculating the intensities given in equations 3 and 4 at radius r (e.g Athanassoula et al. 1990) were r is defined as with x p = x cos(PA) − y sin(PA) and y p = x sin(PA) + y cos(PA) and the ellipticity = 1-b/a with a and b being the major and minor axis of the elliptical profile, respectively. Nevertheless a subset of the sources from the input lists are identified as being too faint for any tentative measurement of shape parameters; in which case they are fitted with circular profiles. To identify these objects, we compare their mean integrated fluxes in the AF and the SM windows ( f lux AF , f lux S M ) (see Figure 11). The extended sources have a larger mean integrated flux in the SM window than in the AF window, indicating that these objects extend beyond the limit of the AF window. It has been empirically determined that the condition f lux S M <1.06 f lux AF +200 isolates objects that are not suited for a determination of ellipticity and position angle, which are then not provided in the catalogue.

Fitting
The fitting relies on an iterative application of a direct model that produces synthetic sinograms of the selected profile with chosen parameters and compares them to the observed ones.

Forward model
The forward model is used to produce synthetic sinograms. It produces a 2D over-sampled synthetic image (with pixel size = 58.9 mas x 58.9 mas) of the selected light profile for a given set of parameters. Using the Radon transform, AF-like and SM-like windows are then extracted from the synthetic image that match to the observed transit angles. The synthetic windows are organised into two sinograms and their integrated fluxes are compared to the integrated fluxes in the observed sinograms and the difference is characterised by a weighted sum L2 of two 2 norms: where SM and AF correspond to the observed integrated fluxes in the binned-SM and median-AF windows, SM synth and AF synth to the integrated fluxes in the synthetic SM and AF windows, N S M and N AF the number of valid windows for SM and AF corresponding to all transits over the source.

Iterative approach of the solution
The fitting consists in finding the set of parameters from the light profile that minimises the L2 norm defined above (Eq. 6). This is done iteratively through a two-step strategy that numerically runs the forward model using different sets of parameters. The algorithm first locates the region of the space of parameters where the minimum L2 norm is found using a global optimiser based on multivariate normal distribution and maximum likelihood estimation and then applies a local optimiser (Matrix Adaptation Evolution Strategy (CMA-ES, Hansen 2006)) to accurately determine the solution, that is, the set of parameters leading to the lowest L2 norm.
To locate the region of the parameter space where the smallest L2 norm is likely to be found, the algorithm randomly tries 10 000 sets of parameters with uniform distribution within the search domain described in Table 3. The forward model is run for each set of parameters leading to 10 000 L2 norms. The 30 best sets of parameters (those having the lowest L2 norms) are kept. The mean and the covariance matrix of the parameters are computed. At the next iteration, 10 000 new sets of parameters are randomly drawn, this time using a multivariate normal distribution characterised by the covariance matrix derived at the previous iteration. The forward model is evaluated with these new sets and with the 30 best solutions of parameters until the best solution no longer improves for ten consecutive iterations or when 500 iterations are reached.
In order to prevent the optimisation from getting stuck in local minima or on an obviously incorrect location of the parameter space and also to search for parameters in regions where real galaxies are found, we force, at each iteration, the random selection of parameters to follow some local constraints (that are part of the global boundaries presented in Table 3). For instance, we force the central intensity of the galaxy to be smaller than that of the quasar. Most of these constraints were determined empirically.
For the galaxies, considering f s = 2πnΓ(2n)/b 2n n , we can additionally compute the upper and lower boundaries of the effective intensity of the Sérsic and de Vaucouleurs profiles using the following equations: where φ min and φ max are empirically defined as φ max = max 20.0 f lux AF , 6.0 f lux S M φ min = min 1.5 f lux AF , f lux S M .
The output of the global optimisation (means of parameters and covariance matrix over the last best solutions) is then used as input of the local optimiser Covariance Matrix Adaptation Evolution Strategy (CMA-ES, Hansen 2006), which is applied to accelerate the convergence and to more efficiently locate the optimal solution of the problem. CMA-ES is an evolutionary algorithm designed for the optimisation of problems whose input is real. It is a randomised, derivate-free and bounded optimisation method that is considered as state-of-the-art among evolutionary algorithms (Hansen et al. 2010). This method is also based on multi-variate normal distribution but the covariance matrix of the distribution is incrementally updated such that the likelihood of the previous successful search steps is increased.
A test of convergence stops the iterative process when no improvement to the L2 norm is imposed. A maximum of 150 iterations is set. Objects reaching this limit are probably not well fitted and their parameters should be used with due caution.

Correlation matrix
A correlation matrix as well as internal errors are concurrently provided as auxiliary data product. These quantities are usually extremely small and reflect the final step of the convergence of the iterative process rather than the evaluation of the uncertainties on the quantities. During the fit of the parameters, only the 30 best solutions from our 10 000 random trials are kept to compute our statistics, such as the mean and covariance matrix. Therefore, the computation of the final covariance matrices is done on the final set of 30 best parameters. This explains the very small values obtained, because at this step, the algorithm should have converged, which should lead to very small differences between the 30 best sets of parameters.

Post-processing
After running the pipeline on the two lists of sources, it is necessary to apply a post-processing step to the results to identify the sources that did not converge and to attribute quality flags to the sources.

Quasars
The model that is fitted on the quasars and their potential host galaxy is complex as it is the combination of two separate models, one for the nucleus and one for the surrounding host galaxy. This may lead to non-converged situations as well as situations where some of the fitted parameters converged towards the limit of the search domain, which eventually indicates that the model did not converge properly. All these cases were therefore filtered out. Another important filtering that has been applied to the quasar output is that only host galaxies with a radius of smaller than 2.5 are published, ensuring that at least one effective radius of the galaxy is encompassed in the SM windows (4 715 mas AL), consequently preventing it from being in the extrapolation regime of the pipeline.
Two flags, host_galaxy_detected and host_galaxy_flag, are attributed to each source to respectively indicate that a host galaxy is detected and to indicate the specificity and quality of the fitted profile.
The post-processing identified the quasars with a host galaxy detected and raised the flag host_galaxy_detected ='true' when Table 3. Search domain of the surface brightness profile parameters. I re is the intensity [e-/s] of the profile at the effective radius r e [mas], is the ellipticity of the source calculated as (1-b/a) where (a,b) are the semi-major and semi-minor axis of the source, PA is the position angle of the source (from north to east), and n the Sérsic index. Ic is the central intensity [e-/s] of the exponential profile of the central quasar. The scale length of the quasar exponential profile is fixed at r s =39.4 mas.

Profile
Ic I r e r e PA Sérsic Index n e − /s e − /s mas When set to 'false', the host_galaxy_detected flag recovers two different situations: (i) there is no host (majority of cases), or (ii) there is another source in the immediate neighbourhood of the target in which case it is impossible to determine whether or not there is a host around the quasar.
The values taken by host_galaxy_flag, their signification, and the reason for their attribution are the following: We refer the reader to Appendix A.1 for an efficient combination of host_galaxy_detected and host_galaxy_flag flags as well as for typical queries of the qso_candidates table.

Galaxies
The models fitted on the galaxies are not as complex as the model used for the quasars. This is why, in most cases, the fitting procedure converged towards a rather robust solution and the filtering applied to the output is not as severe as for the quasars.
The flags flag_sersic and flag_de_vaucouleurs are given in the output table to indicate the specificity of each of the fitted profiles. The values taken by these flags and their signification are the following: 1: Elliptical profile fitted, an external source is detected within 2.5" of this source; doubtful solution. 2: Circular profile fitted, an external source is detected within 2.5" of this source; doubtful solution. 3: Elliptical profile fitted, PA did not converged and one parameter or more converged towards the limit of the search domain; the solution can be considered as poor.
4: Elliptical profile fitted, PA did not converge. 5: Elliptical profile fitted, one parameter or more converged towards the limit of the search domain; poor solution. 6: Elliptical profile well fitted. 7: Circular profile fitted, one parameter or more converged towards the limit of the search domain; poor solution. 8: Circular profile well fitted.
We refer the reader to Appendix A.2 for typical queries of the galaxy_candidates table.

Results and Validation
Our pipeline has analysed the surface brightness profile of 1 103 691 quasars and 940 887 galaxies from the input lists. The results are included in the extragalactic tables qso_candidates and galaxy_candidates that come along with Gaia DR3 and that are presented in (Bailer-Jones et al. 2022). These tables are a compilation of the results from all DPAC processing modules that have classified or analysed extragalactic objects (surface brightness profiles, variability profiles, redshift measurement, source classifications). Concerning the quasars processed by the surface brightness profile module, additional information can be found in the table qso_catalogue_name which provides the name of the external catalogues in which the quasars were found (as detailed in Section 2.1).

Quasars
From the 1 103 691 quasars processed, the vast majority (1 031 607) were classified as point-like sources either based on their low integrated flux in the AF data or on the result of the fit. A host galaxy has been detected by our pipeline for 64 498 sources and for 15 867 of these, the fitting was satisfying enough to provide all or part of the parameters of the profile in the output table. The distribution of the fitted parameters is given in Figure 15.
One can observe in this last figure that the Sérsic index tends to culminate for n<1 which is an indicator that the host galaxies are disc-like. The effective radii culminate around 600-800 mas and are limited to 2 500 mas. The distribution of position angles is not totally homogeneous, which is unexpected. The pipeline tends to attribute a position angle close to 90 • when low or no ellipticity is found. The ellipticities culminate around 0.2-0.3 which is what is expected from the projection of random ellipsoids on a sky and what is also observed in an analysis (Petit et al. 2022) of the shape of the galaxies from the EAGLE Universe simulation (Schaye et al. 2015).
One way to validate the results of the quasar analysis is to examine the correlation between the host_galaxy_detected='true' flag and the redshift of the sources. One would expect the Fig. 15. Histograms of the surface brightness profile parameters of the 15 867 host galaxies of quasars that are published in the Gaia DR3. The bin widths are: 0.2 for Sérsic indices, 100 mas for effective radius, variable for effective intensity, 0.025 for ellipticities, and 5 • for position angles. resolved galaxies hosting quasars to surround nearby quasars while point-like quasars would lie further away. There are 268 229 quasars present in our catalogue that benefit from a Gaia Quasar Classifier (QSOC, Delchambre et al. 2022) redshift with flags_qsoc = 0. A host galaxy is detected for 6 488 of these objects. In Figure 16, we compare their f lux AF and f lux S M , colour coded according to the presence or absence of a host galaxy as defined by the pipeline and colour-coded according to the QSOC redshifts. Figure 17 presents the normalised distributions of the redshifts of the quasars with and without a host galaxy detected.
There is a clear correlation between the two plots of Figure 16: quasars with a host galaxy detected have small redshifts (mean z=0.54) and quasars for which no host galaxy could be resolved have larger redshifts (mean z=1.71), as expected. In very few cases (∼40 sources), the host is detected for larger redshifts. These sources are very faint (G> 20 mag) and suffer either from uncertainties in the light profile fit or in the redshift measurement.
To the best of our knowledge, there are no HST-based studies analysing the brightness profile of galaxies hosting quasars with objects in common with the Gaia list of quasars processed here. This is essentially due to the different ranges of magnitudes of HST and Gaia which barely overlap. We could find comparison data with a ground-based survey adjusting a free Sérsic profile on sources from the GAMMA survey (Robotham et al. 2011), from the NASA-Sloan-Atlas (Maller et al. 2009) (hereafter NASAT-LAS), and from the work of Simard et al. (2011) which is based on SDSS data. Nevertheless, these surveys considered the quasar with its host galaxy as a whole. In these works, the quasar drastically influences the fitted profile, preventing any comparison with our analysis. Even the radii cannot be compared because the Sérsic index is sensitive to the concentration of the light profile and the effective radius is strongly linked to the index of the profile: the smaller the index, the smaller the effective radius.
The distribution of the Sérsic indices in Figure 15 exhibits a peak around an index of ∼ 0.8 and a mean value of ∼ 1.9 which is consistent with quasars being hosted by galaxies with disclike profiles. This result is in agreement with a recent study of the sizes of galaxies hosting quasars in the Hyper Suprime-Cam Subaru Strategic Program (Li et al. 2021).

Galaxies
From the 940 887 galaxies processed by the pipeline, we were able to derive a valid result with clear convergence for 914 837 of them. These are the sources published in the output table.
The distributions of the fitted parameters are presented in Figure 18 for the Sérsic profile and in Figure 19 for the de Vaucouleurs profile.

Sérsic profile
Most galaxies measured have a Sérsic index of between 4 and 5, which is typical for elliptical galaxies and is coherent with the theoretical predictions (de Souza et al. 2014;de Bruijne et al. 2015) that Gaia would filter out disc galaxies. A few thousand galaxies have an index of below 2, indicative of disc galaxies or pseudo-bulge plus disc and only a few hundred of them have an index of below 1.5. The accumulation of indices of around 8 corresponds to small sources that are not well described by a free Sérsic profile. Nevertheless, as seen in Figure 14, the light profiles with indices n=4 are very similar to profiles with higher Article number, page 13 of 22 A&A proofs: manuscript no. output  indices, which means that the galaxies fitted with n=8 can be considered as elliptical as well, the slight remaining background light being interpreted as wings by the algorithm. The distribution of effective radius of the Sérsic profile has a peak value of around 2000 mas. These radii are not comparable from source to source, or with other models because the radius is linked to the Sérsic index: the larger the index, the larger the radius. There is an accumulation of radii at 8000 mas which is our upper limit of investigation, meaning that these galaxies are eventually larger. The effective intensity exhibits a bimodal distribution which corresponds to the circular and the elliptical fitting of the sources. This is because the intensity is calculated at the major-axis effective radius for elliptical profiles. Figure 20 presents the distribution of colours G-RP of galaxies hosting quasars and of galaxies for sources selected based on their Sérsic index: n<2 for disc galaxies and n∼4 for elliptical galaxies. We observe a dependence of the Sérsic index along the colour G-RP of the sources. Disc-like galaxies that are well represented by index<2 appear bluer than elliptical galaxies (index∼4) as expected from the ongoing star formation in discs. This effect also concerns galaxies hosting quasars.
From the 914 837 galaxies successfully processed, 388 552 benefit from a redshift measurement from the Gaia Unresolved Galaxy Classifier (UGC, Delchambre et al. 2022). Figure 21 presents the distribution of Sérsic indices fitted as a function of the Gaia redshifts. There is a clear dependence of the Sérsic index on redshift. This is not completely expected but results from the apparent size of the sources: the most distant sources appear more compact and are fitted with large indices, while the closest galaxies are better represented by various indices corresponding to the difference in light concentration in their inner part.
To perform an external validation of the fitted profiles, it would be desirable to have space-based studies for comparison. While analysing the space-based studies of galaxy morphology that fit a Sérsic profile and use HST data, only five galaxies were found in common with our list of processed sources: one in common with Trujillo & Aguerri (2004), one with van der Wel et al. There are a few ground-based surveys that make use of a free Sérsic profile (such as GAMMA, NASATLAS or the work from Simard et al. 2011). The values of Sérsic indices derived by our pipeline and those given in these surveys are not similar. One reason of this disagreement is the atmospheric seeing, which modifies the inner light profile of the sources (see Balcells et al. 2003;Trujillo & Aguerri 2004, for similar analysis) leading to smaller Sérsic indices when observed from the ground. Nonetheless, the shape parameters (ellipticity and position angle) of the galaxies derived from space and from the ground should be globally comparable. We present a comparison for the galaxies of the Gaia shape parameters of Sérsic profile with the SDSS DR16 (Ahumada et al. 2020) de Vaucouleurs profile (no Sérsic profile has been adjusted by SDSS) in Figure 23.
The comparison of position angles is excellent; the sources that depart from the diagonal usually exhibit a small ellipticity, for which the position angle parameter is meaningless.
The comparison of ellipticities reveals a systematic trend: our pipeline tends to find galaxies to be rounder than SDSS. This trend is also observed as a function of the effective radius: the larger the galaxies, the larger the difference in ellipticity be- Fig. 18. Distribution of the parameters fitted with a Sérsic profile on galaxies. The bin widths are: 0.25 for Sérsic indices, 250 mas for effective radius, variable for effective intensity, 0.025 for ellipticities, and 5 • for position angle.
tween Gaia and SDSS. Gaia tends to observe the inner parts of large objects. The fitted ellipticities are therefore influenced by the bulge shape, as ellipticities vary along the radius where they are measured (Ferrari et al. 2004).
To further investigate the systematic difference in ellipticity between our fit and the SDSS DR16, we confronted the SDSS values with several other surveys ( Figure 24). There is a systematic shift between SDSS and DES1 (Tarsitano et al. 2018) or SPLUS (Mendes de Oliveira et al. 2019) surveys, and good agreement is observed when comparing SDSS with GAMMA and NASATLAS surveys. It is therefore very difficult to conclude which survey is at the origin of the observed shifts.

de Vaucouleurs profile
To the best of our knowledge, there is also no space-based survey that adjusts a de Vaucouleurs profile on galaxies. The SDSS DR16 ground-based survey provides a de Vaucouleurs profile for 390 615 objects in common with the list of galaxies used in this study. The problem of the seeing modifying the shape of the light profile is still present and limits the comparison that can be done. We present the comparison of the parameters of the fit with the SDSS in Figure 25. The accordance between both radii is reasonable, with no systematic effect except for the very large objects for which Gaia is in the extrapolation regime and tends to underestimate the radii. The position angles are in very good agreement with those provided by the Sérsic model. The same systematic effect as seen with the Sérsic profile affects the ellipticities of the de Vaucouleurs profile.

Internal coherence: Sérsic versus de Vaucouleurs
To some extent, the comparison of the two profiles fitted for the galaxies allows some internal assessment of the validity of the results. We compare the position angles and ellipticities derived by the Sérsic and the de Vaucouleurs models in Figure 26. The comparison of effective radii is also presented for a selection of sources. To compare the effective radii given by the two models, we selected the sources with a Sérsic index of close to 4 that have profiles similar to a de Vaucouleurs profile and for which the comparison of the two models is meaningful.
There is excellent agreement between the parameters fitted by the two models which indicates that the models are coherent.

Exponential profile
The theoretical analyses from de Souza et al. (2014) and de Bruijne et al. (2015) indicate that the on-board video processing unit would filter almost all disc galaxies that are typically well modelled by an exponential profile. This is confirmed by the results of our Sérsic profile fitting (see Figure 18) which converges towards small values of the Sérsic index (< 1.5), that is, those typical of discs, for only a few hundred objects in the entire analysed sample of galaxies.
Despite this, we also adjusted an exponential profile on all galaxies as I(r) = I 0 exp(−r/r s ). A comparison of the results with the SDSS DR16 exponential profile shows that very few objects are well described by our exponential profile. The fitted radii are systematically smaller than those given by the SDSS. The size of the differences appears to correlate with the angular size of the objects: the larger the size, the larger the difference. These differences are mostly attributable to the following fac-Article number, page 15 of 22 A&A proofs: manuscript no. output Fig. 19. Distribution of the parameters fitted with a de Vaucouleurs profile on galaxies. The bin widths are: 250 mas for effective radius, variable for effective intensity, 0.025 for ellipticities and, 5 • for position angle.
tors: (i) The real shape of most Gaia galaxies does not follow an exponential profile as the disc galaxies have been filtered out by the VPA algorithm; and (ii) Gaia observes the bulge of large galaxies that are better represented by a de Vaucouleurs profile. Our fitting tends to extrapolate the radius though it is systematically underestimated, while SDSS considers images convolved by the atmosphere and of the entire objects, thus accounting for a larger influence of the disc.
In our simulations of bulge plus disc galaxies, the fitting of an exponential profile leads to radii following mostly the bulge characteristics.

Known issues
There are limitations to the efficiency of the CU4-Surface brightness profile fitting applied to extragalactic sources. The first is the angular size of the field of view of Gaia. The algorithm extrapolates the solution for sources with an effective radius of larger than 2.5 , working with less than half of their total flux and the solutions are therefore less reliable. It is observed that, for these sources, the pipeline analyses the bulge properties instead of the galaxy as a whole.
Most sources with a disc-like light distribution are filtered out by the detection algorithm of Gaia unless they encompass a bright compact bulge. For this reason, the resulting table of galaxies contains a majority of elliptical galaxies and almost no disc galaxies and can therefore not be used in its current state for statistical analysis of the local environment of the Milky Way.
A systematic effect between the ellipticities determined by the pipeline and ground-based catalogues is observed. The origin of the problem may concern large sources for which the algorithm analyses the bulge ellipticity when the surveys measure the objects as a whole, but may also be related to the effect of atmosphere in the ground-based measurements.
The fitting of galaxies hosting quasars appears to be more difficult than fitting other galaxies due to the complexity of the combined model (quasar plus host galaxy). The filtering applied during the post processing is strict, removing all host galaxies with an effective radius of larger than 2.5 . Several modifications to the model are being tested in order to improve its robustness in view of Gaia DR4.

Catalogue overview
The pipeline delivers the surface brightness light profile parameters of the sources analysed, including the shape parameters (position angle and ellipticity). These are stored accordingly in the qso_candidates and galaxy_candidates tables. A flag of quality and specificity of the fitting is given in both tables: host_galaxy_flag for quasars and flags_sersic,  flags_de_vaucouleurs for the galaxies that complement the light profiles. The flag host_galaxy_detected is also provided in the qso_candidates table to indicate when the host galaxy around the central quasar is detected by Gaia. The sources analysed by our pipeline are a fraction of the sources provided in the tables because we analysed external lists set up by ourselves and not the sources classified as quasars or galaxies by the groups working on classification within DPAC. The complete content of these tables is presented in Bailer-Jones et al. (2022).

Quasars
Here is the list of surface brightness profile parameters provided in the qso_candidates table.
-n_transits: number of transits used for the fit.  Two flags, namely host_galaxy_detected and host_galaxy_flag, are given in the table to indicate (i) if a host galaxy is detected by Gaia (host_galaxy_detected='true') and (ii) to indicate the specificity of the profile fitted (host_galaxy_flag). The values taken by the flags as well as their meaning are presented in Section 7.1. Typical ADQL queries based on the combination of these flags are given in Appendix A.1.

Galaxies
The list of surface brightness profiles parameters given in table galaxy_candidates is the following: -n_transits: Number of transits used for the fit.

Conclusions
We present the Gaia DPAC CU4-Surface brightness profile pipeline, which we used to analyse the light profile of galaxies  from the local Universe (z<0.45) and of quasars with their host galaxy.
The pre-defined lists of extragalactic sources that were analysed have been previously established. For quasars, several major catalogues of quasars and candidates were compiled. For galaxies, we used a machine learning analysis of Gaia DR2 combined with the WISE survey to identify extended sources. Both lists favour purity at the expense of completeness but are not completely free of contamination. Of these lists, we only retained the sources that have at least 25 Gaia observations that together cover at least 86% of the surface area of the source.
The pipeline has processed the data collected during the first three years of operations of the satellite which represents ∼116 million transits.
The surface brightness profile fitting consists in a global iterative strategy based on a direct model with a Bayesian exploration of the parameter space that tends to best reproduce the AF and SM observations of Gaia through simulations. The combination of parameters leading to the lowest difference between the observations and the simulations is then selected as the fitted profile. We analysed quasars with a two-component profiles: the central quasar described by an exponential and the host galaxy represented by a Sérsic profile. All the galaxies were analysed with two separate profiles: a Sérsic and a de Vaucouleurs profile.
A post-processing step was applied to the results of the pipeline in order to flag the sources according to the outcome of the fitting process. All host galaxies with an effective radius of larger than 2.5" were removed from the catalogue in order to avoid the extrapolation regime of the pipeline which relates to the limited size of the field of view of Gaia.
The pipeline identified 64 498 host galaxies around quasars whereas 1 031 607 quasars appear as point-like sources to our analysis. The distribution of the Sérsic indices of the host galaxies indicates that most of them are disc-like galaxies. Regarding galaxies, 914 837 were successfully analysed and two profiles are published (a Sérsic and a de Vaucouleurs profiles). The distribution of their Sérsic indices indicates that most are ellipticals and confirms that the Gaia detection system is filtering out most disc galaxies unless they host a bright bulge. Most of the sources analysed in this work are compact with effective radii of smaller than 2 , and have never been resolved from the ground.
The results are released in the Gaia DR3 associated tables qso_candidates, qso_catalogue_name and galaxy_candidates and offer for the first time an all-sky, space-based catalogue of the morphology of galaxies and of galaxies hosting quasars in the visible wavelengths derived from exceptional data.
For Gaia DR4, several improvements are currently foreseen for the treatment of extragalactic sources: for example, (i) a finer sampling of the simulated images, which will ease the convergence of the fitting, and (ii) a search for an offset between the central quasar and the host galaxy. Concerning the galaxies, a composite model (bulge plus disc) should be fitted in addition to the Sérsic and de Vaucouleurs models in order to better represent the true profile of the sources and to avoid the windowing effect observed with the exponential model.
The number of transits over the sources will double for Gaia DR4 compared to the data used for Gaia DR3. There will therefore be an improved angular coverage for these sources. As a consequence, less sources will be discarded during the filtering. We estimate that the lists of sources will be at least twice as large as the present ones. The input list will also improve, complemented by the findings of the DPAC classification together with newer published catalogues.   2-The following query returns the list of source_id of the quasars that have a host galaxy detected by Gaia and a morphological profile at least partially measured. It selects 15 867 sources.
SELECT source_id FROM gaiadr3.qso_candidates WHERE host_galaxy_detected='true' AND (host_galaxy_flag = 1 OR host_galaxy_flag = 2 OR host_galaxy_flag = 4) 3-The following query returns the list of source_id of the quasars that have a host galaxy detected by Gaia with no morphological profile published. It selects 48 631 sources.
SELECT source_id FROM gaiadr3.qso_candidates WHERE host_galaxy_detected='true' AND host_galaxy_flag = 5 4-The following query returns the list of source_id of all the quasars that have no host galaxy detected by Gaia. It selects 1 031 607 sources.
SELECT source_id FROM gaiadr3.qso_candidates WHERE host_galaxy_detected='false' AND host_galaxy_flag < 6 5-The following query returns the list of source_id of all the quasars belonging to the ICRF2 catalogue, being processed by our pipeline and producing results.
SELECT source_id FROM gaiadr3.qso_catalogue_name WHERE catalogue_id=2 A.2. Queries on the galaxy catalogue 1-The following query returns the list of source_id of galaxies that have a measured effective radius of larger than 2 000 mas regardless of the profile used. It selects 606 331 sources. SELECT source_id FROM gaiadr3.galaxy_candidates WHERE (radius_sersic > 2000OR radius_de_vaucouleurs > 2000 2-The following query returns the list of source_id of galaxies that have a measured ellipticity larger than 0.7 regardless of the profile used. It selects 2 255 sources. SELECT source_id FROM gaiadr3.galaxy_candidates WHERE (ellipticity_sersic > 0.7 OR ellipticity_de_vaucouleurs > 0.7) 3-The following query returns the list of source_id of galaxies with a Sérsic elliptical profile well fitted. It selects 569 382 sources.
SELECT source_id FROM gaiadr3.galaxy_candidates WHERE flag_sersic =6 4-The following query returns the list of source_id of galaxies fainter than magnitude 18 and having a morphological profile measured. It selects 914 837 sources.